What is the GRCC?
The Genome Regulatory Code Consortium (GRCC) is a global consortium of researchers that are committed to working together to decipher how the sequence of our genomes encodes regulatory activity. We coalesced around the idea that we could solve the regulatory code using a “Big Data” approach fueled by characterizing the regulatory activity of synthetic DNA sequences in cells.

What is the genome regulatory code?
Our bodies are made of many cells and the cell types can do different things because they express different genes. For instance, our red blood cells carry oxygen around our body and to do this they must express the hemoglobin gene, which encodes for the oxygen-carrying hemoglobin protein. The genome regulatory code (also called the gene regulatory code, or cis-regulatory code) is the molecular program that cells use to turn on genes. It is a critical part of the “recipe” that enables a fertilized egg to develop from a single cell to a fully formed person, and underlies much of what makes each individual unique.

Regulatory DNA is “read” by proteins that can bind and recognize specific DNA sequences. Where they bind, these “transcription factor” proteins can change the expression of neighboring genes. While we have a good conceptual understanding of how gene regulation works, predicting it has proven to be a major challenge because of the huge number of moving parts in the system. Much of the work to date has focused on measuring what the different regulatory proteins are doing on our genomes. While this taught us a lot, it is now clear that genome regulation is too complex to learn from our genomes alone!
Why solve the regulatory code?
The regulatory code is critical in nearly all diseases. Genetic studies have shown that most common complex inherited diseases, like heart disease, autoimmunity, and mental illness, are largely diseases of genome regulation. They originate from a constellation of differences in our genome sequences that, together, predispose one to disease. The vast majority of these genetic differences are in the regulatory genome and are thought to change the expression of genes. Similarly, about half of rare diseases are thought to result from mutations that change the regulatory genome.

Until we solve the regulatory code, the mechanisms underlying many diseases will remain opaque. We cannot yet predict which genes are altered by these disease-associated mutations, how the genes are altered, what cell types are affected, and under what conditions. Consequently, the molecular pathways that underlie many of these diseases remain opaque, limiting our abilities to understand the diseases and develop new treatments.
Why now?
We believe that the time is right to decipher the genome regulatory code. Recent developments in synthetic biology, functional genomics, and machine learning have enabled new approaches to deciphering the gene regulatory code. Synthetic biology and, in particular, synthetic genomics have matured to the point where DNA can now be synthesized at unprecedented scales and lengths, enabling us to create new DNA sequences that can better query the regulatory code. New experiment types have enabled us to measure gene regulation at single molecule resolution and in high-throughput. Stem cell technology has matured to where we can now perform these assays on synthetic DNA in stem cell-derived models to replicate disease-relevant cell types. Finally, the AI revolution that is impacting many other aspects of our lives is also hitting genomics. Innovations in GPU hardware, model design, and data scale are driving ever increasing model performance. Combining these innovations in experiments and computation has the potential to revolutionize our understanding of genome regulation!

