G. P. Manjunath and Farhat Habib

ENCODE – The first flag on a new frontier  

 G.P Manjunath and Farhat Habib


ENcode nature cover

For the data junkies amongst us Christmas came early this year! Five years after the first set of publications the Encyclopedia of DNA Elements (ENCODE) consortium has released a comprehensive analysis of the human epigenome. The project that kicked off in 2003, released data sets from the pilot phase in mid 2007. In June of that year, a paper detailing the analysis of functional elements in 1% of the human genome was published in Nature with 36 other papers in Genome Research. Just over 5 years later, the ENCODE project reported a similar analysis of the entire human genome in 30 publications across 3 journals.

Scientifically, these publications mark a significant departure from the gene centric nature of biology. For over a century now, our view of the genetic basis of inheritance has been dominated by a rather simplistic central dogma of life. The notion of genes as “the” functional unit of life dominated our thinking to an extent that a large part of the genome that does not code for proteins was labeled not so gently as “junk DNA”. In an article published in the September issue of Science, Elizabeth Pennisi called the ENCODE papers a eulogy of junk DNA1. She is not too far off the mark-80% of the human genome performs one function or the other and often times not mediated by its ability to code for proteins.

It is increasingly apparent that the information required for an organism to function is not only coded into the sequence of the genome, but also its very structure. In a paper published in October 2011 issue of Molecular Cell, Mark Umbarger and colleagues reported the construction of three-dimensional map of a bacterial genome2. In a first of its kind study they demonstrated how the macroscopic properties of the genome could provide insights into roles of sequence elements and fundamental DNA processes in defining such properties. A similar study in a relatively more complex and larger Drosophila genome by Tom Sexton and colleagues, published in Cell3 earlier this year showed that a similar paradigm can be applied to more complex organisms. We are, perhaps, now in a position where such experimental methodology can be applied to study macroscopic properties of the genome in the context of specific cellular processes.

Recent publications by the ENCODE consortium have added to this narrative by defining the roles of not only regulatory sequences in modulating gene expression and thereby phenotypes but also those of non-coding RNAs and covalent modifications of DNA bases themselves.  Together these studies demonstrate the complexity of regulating genomic functions in general and gene expression in particular. These studies represent a paradigm shift from a protein centric view of cellular physiology. In doing so, it has lead to the development of new tools that will, perhaps, help unravel several other mysteries of life.

The ENCODE project represents the pinnacle of collaborative science in a way few biological studies have managed to do so far. Its the successful application of a model for executing scientific projects that, with the exception of the Human Genome Project (HGP), has so far only been seen in experimental physics. It is the largest project of its kind second only to the HGP. ENCODE starts off where HGP ended. ENCODE’s funding is about an order of magnitude lower than that of the HGP. In a classic case of necessity fostering invention advances in sequencing technology has enabled experiments that were unthinkable a decade ago and at a price that would make the HGP drool!

Biology is turning into a data-centric science like never before. While the individual researcher toiling away through the night is still an integral part of doing biology, it is obvious that certain questions in biology can only be answered by collaborative research. This has meant a change in not only the mindset of the participants, but also in the manner in which data is collected and shared between participants. Establishing clear structures with open communication channels are essential for success of such a complex endeavor. The numbers are staggering by themselves- 442 individual scientists, 10 major institutes, 1600 experiments on 147 cell types, employing 235 antibodies using multiple assays, resulting in almost 15 TB of data. University of California, Santa Cruz is the chosen data coordination center where all the data is collected, archived, and integrated with the UCSC genome browser, a widely used tool for visualizing genomic information.

Projects of this nature necessitate changes in the mindset of the funding agencies themselves. Existing models of evaluating performance fail and new benchmarks need to be defined. The perspective of participating groups shifts from the conventional goal of publishing papers to contributing data to a common pool. Chances of a fundamental breakthrough are relatively slimmer in such exploratory projects as opposed to projects that are hypothesis driven. The contribution of individual groups must therefore be measured not only in terms of new insights but also in terms of how they enable science in general.

As Ewan Birney, lead ENCODE analyst and associate director of European Molecular Biology Laboratory’s (EMBL) European Bioinformatics Institute says, the exact impact of ENCODE science will not be realized until years after the original data sets are assembled4. The impact of such exploratory projects goes far beyond the original publications. Lessons learned from ENCODE will come in handy for conducting other ‘big data’ projects such as the 1000 Genomes, the modENCODE (model organism ENCODE), and a host of other projects. The modENCODE is particularly exciting as it permits biological validation of the computational and experimental findings of the ENCODE project. With ENCODE itself entering the third phase, where elements of the human genome are to be mapped out in greater detail across a larger number of cell types, the times ahead promise to be truly exciting!

In all the excitement, however, it is easy to overlook the potential pitfalls. As McGeorge Bundy, American security advisor responsible for escalating the Vietnam War, realized at great cost to himself and the country he served "There is no safety in unlimited technological hubris". A healthy dose of skepticism with an appreciation of the limitations of ENCODE is therefore more than just a philosophical necessity. Unreasonable expectations risk alienating the population at large that funds these projects. At the same time generating data is only the first step in creating knowledge. These publications, therefore, represent not the end but the beginning of a journey.

Is the money invested in such large consortia justified? The answer to that question may never be clearly known. Years from now, scientific workers will be using these datasets to guide their research. It is possible that one of them comes up with a fundamental insight that will justify the large investment of money and effort that has gone into making the ENCODE. Then again, it may never be apparent. What is clear though, is that ENCODE has not only questioned our preconceived notions of the genome but also the way we practice our science. That is the unmistakable legacy of ENCODE.

1.         Pennisi, E. Genomics. ENCODE project writes eulogy for junk DNA. Science 337, 1159, 1161 (2012).

2.         Umbarger, M. A. et al. The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation. Mol. Cell 44, 252–264 (2011).

3.         Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).

4.         Birney, E. The making of ENCODE: Lessons for big-data projects. Nature 489, 49–51 (2012).

5.         Image Credits - Nature cover September 2012

About the Authors:  G. P Manjunath* and Farhat Habib, are scientists at the Center of Excellence in Epigenetics @IISER Pune. Any questions, comments can be sent to *manjunathgp@iiserpune.ac.in

I wonder when India can have its own "Encode" !

I wonder when India can have its own "Encode" !

Post new comment

  • Use [collapse] and [/collapse] to create collapsible text blocks. [collapse collapsed] or [collapsed] will start with the block closed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.

Bookmark and Share