Novel Computational Tool Models RNA-Binding Specificity, Provides Better Understanding of Gene Expression Regulation

Chaolin Zhang
Chaolin Zhang, PhD, associate professor of systems biology

A new study by researchers in Dr. Chaolin Zhang’s lab at Columbia’s Department of Systems Biology details a novel computational method that models how RNA-binding proteins (RBPs) recognize specific sites in the target RNA transcripts, precisely and accurately. The researchers’ findings include identification of entirely new motifs (RNA sequence patterns), and their research in complex RNA regulation contributes to our understanding of the molecular basis of disease and conditions, and down the road, could aid in the development of targeted therapies. 

The study, led by Dr. Zhang, associate professor of systems biology, with senior co-authors Suying Bao, PhD, and Huijuan Feng, PhD, appears today in Molecular Cell

RNA has traditionally been considered mere “messengers” that transfer genetic information from DNA to proteins that ultimately carry out cellular functions. However, it is now increasingly appreciated that RNA can be tightly regulated to control gene expression and diversity protein products. RNA-binding proteins (RBPs) are at the center of such regulation, with important roles in many cellular processes, including cell function, transport, and location. Gaining mechanistic insights of the binding specificity of RBPs in a genome-wide scale helps advance our knowledge of gene regulation.

“RNA-binding proteins are crucial for gene expression,” says Dr. Feng, coauthor of the study and post-doctoral research scientist in the Zhang lab. “RNA is heavily regulated, and when this regulation goes wrong, instabilities or disease could occur.”  

Suying Bao and Huijuan Feng
Study coauthors Suying Bao (left) with Huijuan Feng, pictured at the 2018 department retreat and poster session.

For example, RNAs need to be properly spliced, a molecular process to remove noncoding segments and stitch coding segments together. Aberrant splicing in RNA has been implicated in a wide range of diseases, from neurological disorders to cancer.  Excitingly, targeted approaches have been developed to correct splicing errors, as demonstrated by a new antisense oligo drug named SPINRAZA approved by the FDA two years ago to treat spinal muscular atrophy (SMA), a devastating pediatric motor neuron disease.

The key to gaining deeper knowledge of RNA regulation and potentially targeting RNA for new therapeutic options is to be able to identify sites with regulatory roles and determine how they can be recognized by RBPs. In this study, the researchers demonstrate a much more precise computational method for defining binding specificities for RBPs. 

While extensive efforts have been made to computationally model the specificity of RBPs and predict their binding sites, the results have not been very precise. A major challenge is connected to the fact that RBPs apparently are quite flexible in interacting with various sequence patterns, so there is limited information embedded in their numerous binding sites.  

The researchers previously developed a strategy to map protein-RNA interaction sites at single nucleotide resolution using UV crosslinking and immunoprecipitation (CLIP) technique followed by deep sequencing. Widely used in the scientific field, CLIP is a biochemical assay  that enables the analysis of protein interactions with RNA on a genome-wide scale.  

“We are able to identify the exact RNA nucleotide that is bound and crosslinked to the RBP,” notes Dr. Zhang, “and importantly, such crosslinking frequently occurs in specific positions in the sequence motif recognized by the RBP”.

In this study, researchers demonstrate a much more precise computational method for defining binding specificities for RNA binding proteins.

The researchers’ new computational method, called mCross, goes a step further  to leverage these crosslink sites as landmarks for each binding site. This approach dramatically narrows the search space when one looks for the common sequence patterns recognized by the RBPs.  

MCross was applied to over a hundred RBPs, for which enhanced CLIP (eCLIP) data have been generated by the ENCODE consortium.  

“The intrinsic flexibility of RBPs in recognizing their RNA regulatory sequences imposes a big challenge in accurate characterization and predictive modeling of this process, even when a large number of binding footprints are already mapped by CLIP,” explains Dr. Bao, a post-doctoral research scientist in the Zhang lab. “MCross serves as an important tool to characterize the binding specificity of RBPs and the fundamental step towards understanding gene expression regulation and cell type specificity.”

Based on motifs identified by the mCross method, Dr. Zhang and his team discovered a new motif, or RNA sequence pattern, one that plays an important role in regulation of alternative splicing. Alternative splicing is a highly complex process of generating multiple transcripts and protein variants by joining different combinations of coding segments. The researchers discovered that the prototypical SR protein involved in many aspects of cellular functions, SRSF1, recognizes clusters of GGA half sites in addition to its canonical GGAGGA motif, and as a result, a majority of SRSF1 targets were missed in previous analysis. 

“The mere fact that we found a new mode of RNA binding for a protein extensively investigated for over three decades suggests how little we know about the RNA world!”  notes Dr. Zhang.

Additionally, Dr. Zhang and his team developed a searchable, interactive website, named mCrossbase to serve the research community. It provides RBP binding specificity data already defined by their method. In future work, the team plans to investigate how this work can facilitate the identification of mutations that affect functional protein-RNA interactions in the context of human disease. 

The full list of authors on the paper, “Modeling RNA-binding Protein Specificity in vivo by Precisely Registering Protein-RNA Crosslink Sites”, include: Huijuan Feng, Suying Bao, Mohammad Alinoor Rahman, Sebastien M. Weyn-Vanhentenryck, Aziz Khan, Justin Wong, Ankeeta Shah, Elise D. Flynn, Adrian R. Krainer, and Chaolin Zhang. 

The study was supported by grants from the National Institutes of Health (NIH) and a Columbia Precision Medicine Research Fellowship.

The Zhang lab at Columbia concentrates on the study of the nervous system and its underlying molecular mechanisms. The group focuses on the function of post-transcriptional gene regulation, in particular a level of molecular regulation called alternative RNA splicing, in the nervous system.