By Bruce Goldman
Illustration by Greg Mably
Genes rule. But they’re not quite the dictators some have claimed and others have feared.
In the past decade or two, researchers have learned that genes themselves are governed by a benign bureaucracy of regulatory loops that curb some genes while stimulating others. Now they have discovered a new class of molecules that may play a major role in that regulation, from determining what cells want to be when they grow up to keeping them on task once they do.
This means that tailoring medicine to suit an individual requires looking beyond the genes. Or more precisely, looking between the genes, because that’s where the newest of new players originate. These molecules, long linear strands called lincRNA, are the latest surprise to arise from the vast spaces along chromosomes that separate one gene from another — spaces once considered so bereft of purpose they were disparaged as “junk DNA.”
And while some pioneering physicians have begun incorporating patients’ genetic information into their treatment plans, they can’t yet factor in molecules like lincRNAs that regulate those genes. At this early stage of discovery, they wouldn’t know what to test for. But with evidence building that at least a couple of lincRNAs have been implicated in cancer, it’s realistic to expect that testing people for these gene-regulating molecules will become part of medical practice. Knowing the genes alone will take us only so far along the road to personalized medicine.
Like royalty on a throne, genes never leave the nucleus, which lies just about smack-dab in the cell’s center. Yet the molecules that make up the working class of the cell, the proteins, are stitched together in the cell’s watery outer provinces, or cytoplasm. The genome delivers its edicts to the cytoplasm via messengers made of RNA, a substance seen previously as a passive “wax impression” of DNA, but looking more like an Olympic gymnast every day.
Until recently, RNA’s main claim to fame was largely as a bit player in the extravaganza that is gene activation. When the DNA double helix is inactive, its two strands are zipped up and spooled around specialized packaging proteins called histones. “Reading” a gene’s instructions requires unzipping the two strands at the site of the gene.
Somewhere around the beginning of life many billions of years ago, cells evolved bulky molecular machines (each of them an assembly including numerous large proteins) that can do this very well. These transcription machines can unpack and unpeel the DNA temporarily from its associated histone husk. They can part its two strands at key points. They can then hover above a strand near the start of a gene, barrel down the exposed DNA and crank out copies of its protein-coding instructions. The copies are made of RNA, which is chemically similar to DNA — they’re both chains of constituent chemical links called nucleotides — but RNA is more travel-ready and short-lived.
Fresh “messenger RNA” molecules float out of the nucleus into the cell’s far reaches. There in the cytoplasm they are fed into still other gigantic molecular machines, called ribosomes, their strings of nucleotides read as consecutive three-nucleotide chunks, and the proteins they specify produced according to a code whereby each three-letter RNA “word” indicates which of some 20 different chemical building blocks should next be spliced onto a growing protein molecule.
But not all genes in all cells get copied all the time. Different kinds of cells, and the same cells at different stages of their lives, are different because they make different proteins. Otherwise, we’d all be blobs of undifferentiated tissue. Just how do these differences in protein production come about?
The hulking transcription machines in the nucleus that zip and unzip DNA are exactly the same from one cell to the next. So are the ribosomes in the cytoplasm that decipher the genetic code to manufacture proteins. So those giant complexes can’t determine all by themselves which genes get read within a given cell at a given time.
Still other huge protein complexes sporting Jurassic Park–ish names such as Trithorax and Polycomb affix or remove small chemical tags to the DNA or histones in the vicinity of genes. The tags serve as long-lasting “read” or “skip” signals to the gene-reading machinery. But those lumbering juggernauts, Trithorax and Polycomb, are exactly the same in every cell, too. So who tells them where along the genome to slap those “read” and “skip” tags? Who guides them to the appropriate spots in the first place?
Cue the music. Enter lincRNA molecules, discovered by two researchers who looked where no one else was looking and found what no one else had thought would be there.
A cell can’t make a protein without making RNA first. Thus, a quick-and-dirty way to see which genes in a cell or tissue are in active use as protein templates is to use a gene-expression chip: a microarray pioneered by Stanford School of Medicine biochemistry professor Pat Brown, PhD, and biochemistry and genetics professor Ronald Davis, PhD, in the mid-1990s. This device represents the amounts of RNA made from each gene on the chip as a separate pixel displayed on a computer screen — the more RNA made from that gene, the brighter the pixel — making it easy to analyze aggregate patterns of gene expression: that is, which ones are actively getting read, and which just sitting there, at any given time.
In 1999, shortly after the Human Genome Project pulled out its first plum from the genomic pudding — the full sequencing of chromosome 22 — Michael Snyder, PhD, at that time a Yale University geneticist, used the newly published sequence data to design a high-resolution gene-expression chip he called a tiling array. It combed the entirety of chromosome 22 for small snippets of RNA emanating not just from its known or likely protein-coding portions (“genes,” that is), but from anywhere along its entire length. Snyder’s custom-built tiling array would, in principle, allow the detection of RNA molecules made not only where you’d expect to find them being made — at, near or overlapping all the places where a protein-coding sequence had already been identified — but from anyplace along chromosome 22, including vast mysterious stretches between one gene and the next.
This was ambitious and, some thought, a waste of time and money. As the Human Genome Project unfolded, it began to look as though not much more than 1 percent of the genome consisted of recipes for viable proteins. The other 99 percent appeared to have no function, save for small sections near genes that served as landing strips and homing beacons for the molecular machines that read or mark up DNA. One high-profile Harvard biologist referred to the overwhelmingly large non-coding stretches as “junk DNA.” The name stuck.
“Many of us never really believed that,” says Snyder, who last year moved to Stanford to become professor and chair of the medical school’s genetics department.
Snyder told one of his graduate students, John Rinn, to take a close look at chromosome 22. Applying Snyder’s tiling array to the just-sequenced chromosome, Rinn found that RNA was getting made at all kinds of sites along the DNA that bore no resemblance to protein-coding genes. Some of these RNA molecules were very small, consisting of tens of nucleotides. But lots of them were thousands of nucleotides long, as lengthy as those that do code for a protein. These RNA molecules weren’t doing that, as could be determined by their nonsensical sequences — for example, they tended to contain too many three-nucleotide signals that, in effect, stop the protein-making machinery in its tracks. Yet they featured many of the same “gene-like” elements (for instance, regulatory nucleotide sequences that invite gene-reading machinery to have a sit) that protein-coding RNA molecules did.
“There were as many genes making RNA but not proteins as there were protein-coding genes,” Rinn recalls. “It was a Eureka moment.”
Knowing the genes alone will take us only so far along the road to personalized medicine.
Of course, that Eureka moment electrified only the people who thought this was interesting — a small minority. “People felt that something had to be wrong,” says Snyder. Mightn’t all this unexpected non-protein-producing RNA, asked the skeptics, simply be the results of a trigger-happy molecular gene-reading machine?
That didn’t seem likely to Rinn and Snyder. A surprising number of the newly discovered longish RNA molecules were being produced in the middle of nowhere, so to speak: in those vast intervening DNA stretches that separate one gene from the next.
Still other evidence pointed to the potential significance of these long, intergenic non-coding RNA molecules, subsequently dubbed lincRNAs. Over evolutionary time, mutations creep into DNA and pile up, so in any stretch of DNA sequence that’s not absolutely essential, you’ll see evidence of evolutionary drift. Any stretch that is “conserved” — maintained largely unchanged over billions of years of evolution — is presumed to be important: For whatever reason, the organism just couldn’t do without it. Interestingly, something like 5 percent of diverse organisms’ genomic sequences shows strong signs of evolutionary conservation, as opposed to the mere 1 percent accounted for by protein-coding genes.
If most of the conserved stretches of our genome lie in its non-coding regions, then that means they’re somehow important for the survival of the organism. They obviously have some kind of function, Rinn reasoned.
Show me, the skeptics said.
In 2004, Howard Chang, MD, PhD, was a newly minted assistant professor of dermatology at Stanford. Rinn showed up that year to be Chang’s first-ever postdoctoral researcher. He arrived, in his own words, “with a chip on my shoulder” owing to others’ doubts about lincRNA’s significance. But together he and Chang discovered that these underappreciated molecules have the power to control what a cell becomes when it grows up.
Chang, now an associate professor, naturally thinks about skin a lot. “If you look at your own skin, you can see it’s not the same everywhere,” he says. “You have long hairs growing from your scalp, but not from your palms or soles. How do skin cells, which are constantly dividing, dying and being replaced by new ones, know where they’re located in the body and act accordingly? There must be some sort of address code that tells them where they are. You can find the same kind of thing happening in any other tissue — heart, lung, brain, fat, bone, blood vessels.”
When Rinn showed up in 2004, Chang was studying fibroblasts, cells abounding under our skin that secrete factors determining skin cells’ local character — such as which will be hairy and which smooth. Generation upon generation of cultured fibroblasts from different parts of the body retain their ability to program skin cells according to the fibroblasts’ site of origin.
The two researchers showed that in cultured fibroblasts from different places in the body, somewhat different sets of old-fashioned, protein-coding genes were turned on or off in four key genomic “hot spots.” In one of these clusters, individual genes’ activity levels varied according to whether a fibroblast belonged closer to the head or to the foot; and in another cluster, whether the fibroblast belonged in or near the body’s core versus its surface. This was the address code Chang had been looking for.
Further investigation using tiling arrays revealed that numerous lincRNAs were being produced in or near these hot spots (not, mind you, from the protein-coding genes themselves) in relative amounts that corresponded to which part of the body the fibroblasts were from. Not only that, but some of these lincRNA molecules seemed to be doing something that shut down banks of genes within the various clusters. One lincRNA in particular was present in detectable amounts only in fibroblasts from the body’s lower half, particularly at extremities such as foot or foreskin. Lower-body fibroblasts in which Chang and Rinn experimentally blocked this lincRNA’s activity started acting like fibroblasts from the body’s upper half, and their “address code” genes took on an “upper-body fibroblast” expression pattern.
The duo dubbed the newly identified lincRNA “HOTAIR,” in an act of humility befitting the smirks they expected to see on their peers’ faces once they went public with this finding. But the smirks, if ever there were any, have faded.
Chang has subsequently implicated HOTAIR in cancer. Biopsied breast-tumor samples in which high levels of HOTAIR are detected are more likely to metastasize, making HOTAIR a potential clinical marker of cancer’s aggressiveness — and possibly a target for drug therapy, as forcing up HOTAIR levels in breast-tumor cells increases the chances of metastasis. A group at Massachusetts General Hospital has independently found inordinate HOTAIR levels in a particular subtype of kidney cancer. This August, Rinn published a study in Cell showing the centrality of another lincRNA to one of the body’s critical, natural anti-tumor defense pathways.
Probing HOTAIR’s modus operandi, Chang, Rinn and their colleagues showed in a 2007 Cell study that HOTAIR binds to Polycomb, the massive gene-suppressor complex. This year, in experiments published in Science, Chang demonstrated that HOTAIR can simultaneously bind to not only Polycomb but also other big gene-regulating protein complexes. It seems that HOTAIR chauffeurs the entire entourage to as many as 800 different genetic locations it thinks they should see. The visited genes are then silenced, presumably due to the bound regulatory complexes’ combined efforts.
In 2008 Rinn, now in his own lab as an assistant professor at Harvard Medical School, got a rough count of all the places in the genome from which long RNA molecules, coding or not, are being produced. After discounting for the ones associated with protein-coding genes, the team found at least 1,500 different lincRNA-production sites. A similar study the next year boosted that estimate to over 3,300, and Rinn says he thinks there may be far more. And, he adds, there’s no reason to assume this army of lincRNA molecules will prove to be any less diverse regarding what they do in living cells than the proteins specified by their cousins, the messenger RNAs.
These RNA chains are not the first molecules found to control genes. For more than four decades researchers have known about proteins called transcription factors that perch on certain stretches of DNA, causing quiet genes to go active, or active genes to become more so or less so. About 1,500 of the 25,000 genes in the human genome encode transcription factors, according to Snyder, who is cataloging them.
In fact, lincRNA isn’t even the first type of RNA to be implicated in the control of gene expression. One recent example is a class of very short RNA strands, called microRNA, that can temporarily shut down or reduce targeted proteins’ production. This finding won a Nobel prize in 2006 for Andrew Fire, PhD, professor of pathology and of genetics at Stanford, and Craig Mello, PhD, professor of molecular medicine at the University of Massachusetts.
HOTAIR exerts long-lasting effects that can lock in a cell’s gene-activity patterns for a lifetime. But maybe other lincRNAs with less-long-lasting effects are generated in reaction to fluctuating conditions in the microenvironment of a cell — such as its nutrition status, the arrival of hormones from the bloodstream, stress signals from adjacent cells, infection or other short-term cues.
Because lincRNAs require one less time-consuming production step than proteins do, they could be ideal for quickly and broadly shifting a cell’s behavior in reaction to variations in its microenvironment. In that case, testing tissues’ levels of various lincRNAs may someday tell clinicians a good deal about the state of a patient’s health and recent physiological history. Many disease-associated genetic variations are turning out to be, in fact, in spots where lincRNAs are made. Sensing lincRNAs’ commercial potential, companies are holding private discussions with both Chang and Rinn.
Nor is it just cancer that is showing links to lincRNA. “In collaboration with many colleagues at Stanford and elsewhere, we have started to discover lincRNAs that are associated with many different diseases,” Chang says. “We’re creating animal models to address whether altering lincRNA function can provide therapeutic benefits. And we’ve started to search for drugs that can do that.”
There’s a message here for medical research and development. Yes, genes rule. But not in a vacuum.