Genetics ×

News

Yufeng Shen , PhD, found his passion for science in childhood, but he developed a fascination for both math and physics as his education progressed. In an earlier generation, he would have needed to choose between divergent paths. Instead, he chased his calling within an important emerging discipline. 

Yufeng Shen, PhD
Dr. Yufeng Shen

Dr. Shen was awarded tenure and promoted to the rank of associate professor in Columbia University's Departments of Systems Biology and Biomedical Informatics (DBMI) last summer. Utilizing new methods, he answers long-standing questions that impact health. Specifically, his research has focused on discovering novel genetic variants that cause human diseases.

His current work focuses on developing new computational methods to interpret genome data, identifying genetic causes of human diseases by integrating multiple types of genomic data, and modeling of immune cell populations. That research has led to important findings, including his work on the Deep Genetic Connection between Cancer and Developmental Disorders , published in Human Mutation.

Using innovative sequencing techniques from published studies of cancer and developmental disorders, Dr. Shen and his students identified a significant number of genes implicated in both diseases.

“This project allows us to use the larger cancer data to inform analysis in genetic variations of developmental disorders, and to find new risk genes and new risk variance,” he said. “It also provides a new perspective on how to optimize care for kids with developmental disorders. There is probably two to three times more risk of developing cancer for kids with developmental disorders than otherwise healthy kids.”

Congenital diaphragmatic hernia (CDH) is a severe birth defect. For babies born with CDH, their diaphragms are not developed properly, with some or all parts of the abdominal organs pushed into the chest. The displacement of these critical organs can have a significant impact on how the lungs develop and grow. 

Dr. Yufeng Shen
Yufeng Shen, PhD, associate professor of systems biology

A recent study , led by Yufeng Shen , PhD, and Wendy Chung , MD, PhD, and their labs at Columbia University Irving Medical Center , investigated the genetic risk factors linked to CDH and analyzed data from whole genome sequencing and exome sequencing to determine novel mutations. The study also uncovered the link between CDH and additional developmental disorders. 

“Many babies with this birth defect also have lung hypoplasia or pulmonary hypertension and babies have difficulty breathing. Even with advanced care available, the mortality rate is still about 20 percent,” says Dr. Shen, associate professor of systems biology at Columbia, with a joint appointment in the Department of Biomedical Informatics

“One hypothesis is that the lung condition is not necessarily caused by the physical compression on the developing lungs in the chest,” explains Dr. Shen, “it can be caused by the same genetic defect that causes CDH. Finding those genes is absolutely necessary to improve care and develop effective treatment in the long run.”   

Scientists have been aiming to identify new risk genes in CDH—and other developmental disorders—with  the hope that with improved genetic diagnosis more tailored or long-term care for patients born with this defect could be provided, as well as potential targets for intervention down the road. 

Dr. Tuuli Lappalainen Science Study

The illustration above depicts with an example of four genes, how knowing how variable genes are in the normal population helps to find candidate disease genes in a patient. Above, top: Tuuli Lappalainen, PhD; bottom: Pejman Mohammadi, PhD.

For individuals with rare diseases, getting a diagnosis is often a long and complicated odyssey. Over the past few years, this has been greatly improved by genome sequencing that can pinpoint the mutation that breaks a gene and leads to a severe disease. However, this approach is still unsuccessful in the majority of patients, largely because of our inability to read the genome to identify all mutations that disrupt gene function.

In a new study published on October 10 in Science , researchers from New York Genome Center , Columbia University , and Scripps Research Institute propose a solution to this problem. Building a new computational method for analyzing genomes together with transcriptome data from RNA-sequencing, they can now identify genes where genetic variants disrupt gene expression in patients and improve the diagnosis of rare genetic disease.

The new method introduced in this study, Analysis of Expression Variation or ANEVA, first takes allele-specific expression data from a large reference sample of healthy individuals to understand how much genetic regulatory variation each gene harbors in the normal population. Then, using the ANEVA Dosage Outlier Test, researchers can analyze the transcriptome of any individual – such as a patient – to find a handful of genes where he or she carries a genetic variant with an unusually large effect compared to what healthy individuals have. By applying this test to a cohort of muscle dystrophy and myopathy patients, the researchers demonstrated  the performance of their method and diagnosed additional patients where previous methods of genome and RNA analysis had failed to find the broken genes.

Tuuli Lamport Research Award

Tuuli Lappalainen, PhD, was honored with the Lamport Research faculty award at the 2019 Commencement ceremony. Dr. Lappalainen is pictured here with Columbia University Trustee Andrew Barth (left) and Dean Lee Goldman of Columbia University Irving Medical Center. (Courtesy of CUIMC Communications)

Tuuli Lappalainen , PhD, assistant professor of systems biology at Columbia University and core faculty member at the New York Genome Center (NYGC) , has received the Harold and Golden Lamport Research award, presented on May 22 at the Vagelos College of Physicians and Surgeons Commencement Ceremony. 

The Lamport Research award is an annual prize given to junior faculty members that show promise in basic science or clinical science research. This year it recognizes Dr. Lappalainen’s ongoing research in functional genetic variation in human populations, and her work in elucidating the cellular mechanisms linked to genetic risk for various diseases and traits. Dr. Lappalainen and her lab combine computational analysis of high-throughput sequencing data, human population genetics approaches and experimental work. 

Her group at NYGC and Columbia is highly collaborative and has made important contributions to several international research consortia in human genomics, including the Genotype Tissue Expression (GTEx) Project and the TOPMed Consortium. 

Dr. Lappalainen joined the faculty at Columbia University in 2014 as part of the Department of Systems Biology and NYGC. In 2018, she received the annual Leena Peltonen Prize for Excellence in Human Genetics, which was presented to her in Milan, Italy, at the 52nd European Society of Human Genetics meeting. 

Yufeng Shen Episcore

The epigenomic profile of RBFOX2, a haploinsufficient gene recently identified as a risk gene of congenital heart disease. Each small box represents 100 bp region around transcription start sites (TSSs) and the shade of the color reflect the strength of the histone mark signal in tissues under normal conditions. RBFOX2 has large expansion of active histone marks (H3K4me3 and H3K9ac), especially in heart and epithelial tissues (purple and gray rows), and tissue-specific suppression mark (H3K27me3) in blood samples.(Credit: Shen lab)

The genetics of developmental disorders, such as congenital heart disease and autism, are highly complex. There are roughly 500 to 1,000 risk genes that can lead to each of these diseases, and to date, only about a few dozen have been identified. Scientists have ramped up efforts to develop computational approaches to address challenges in accurately identifying genetic risk factors in ongoing genetic studies, and the availability of such tools would greatly assist researchers in gaining a deeper understanding of the root causes of these diseases. 

Focusing on haploinsufficiency, a key biological mechanism of genetic risk in developmental disorders, Yufeng Shen , PhD, and his lab have developed a novel computational method that enables researchers to find new risk genes in these diseases. Their key idea is that the expression of haploinsufficient genes must be precisely regulated during normal development, and such regulation can be manifested in distinct patterns of genomic regulatory elements. Using data from the NIH Roadmap Epigenomics Project, they showed there is a strong correlation of certain histone marks and known haploinsufficient genes. Then based on supervised machine learning algorithms, they developed a new method, which they call Episcore , to predict haploinsufficiency from epigenomic data representing a broad range of tissue and cell types. Finally, they demonstrate the utility of Episcore in identification of novel risk variants in studies of congenital heart disease and intellectual disability.  

composite image of the scientists and research figure
Tuuli Lappalainen (top photo) and Stephane Castel co-led the new study. The hypothesis of the study is illustrated here with an example in which an individual is heterozygous for both a regulatory variant and a pathogenic coding variant. The two possible haplotype configurations would result in either decreased penetrance of the coding variant, if it was on the lower-expressed haplotype, or increased penetrance of the coding variant, if it was on the higher-expressed haplotype. (Composite image courtesy of NYGC)

Researchers at the New York Genome Center (NYGC) and Columbia University's Department of Systems Biology have uncovered a molecular mechanism behind one of biology’s long-standing mysteries: why individuals carrying identical gene mutations for a disease end up having varying severity or symptoms of the disease. In this widely acknowledged but not well understood phenomenon, called variable penetrance, the severity of the effect of disease-causing variants differs among individuals who carry them. 

Reporting in the Aug. 20 issue of Nature Genetics, the researchers provide evidence for modified penetrance, in which genetic variants that regulate gene activity modify the disease risk caused by protein-coding gene variants. The study links modified penetrance to specific diseases at the genome-wide level, which has exciting implications for future prediction of the severity of serious diseases such as cancer and autism spectrum disorder.

NYGC Core Faculty Member and Systems Biology Assistant Professor Dr. Tuuli Lappalainen, PhD, led the study alongside post-doctoral research fellow Dr. Stephane Castel.

Organoids bladder cancer

Organoids created from the bladder cancers of patients mimic the characteristics of each patient’s tumor and may be used in the future to identify the best treatment for each patient. Images: Michael Shen

Columbia University Irving Medical Center (CUIMC) and NewYork-Presbyterian researchers have created patient-specific bladder cancer organoids that mimic many of the characteristics of actual tumors. As reported by CUIMC, the use of organoids, tiny 3-D spheres derived from a patient’s own tumor, may be useful in the future to guide treatment of patients.

The study was published April 5 in the online edition of Cell.

TRACE cell recorders Wang Lab

Columbia University Medical Center reports on a new study in Science   led by Harris Wang, assistant professor of systems biology. Wang and collaborators, which include researchers at the Department of Pathology & Cell Biology, have converted a natural bacterial immune system into a microscopic data recorder, an innovative framework that can lead to advances in biological applications utilizing bacterial cells for everything from disease diagnosis to environmental monitoring.

Integrating data sources

Clinical and molecular data are currently stored in many different databases using different semantics and different formats. A new project called DeepLink aims to develop a framework that would make it possible to compare and analyze data across platforms not originally intended to intersect. (Image courtesy of Nicholas Tatonetti.)

Medical doctors and basic biological scientists tend to speak about human health in different languages. Whereas doctors in the clinic focus on phenomena such as symptoms, drug effects, and treatment outcomes, basic scientists often concentrate on activity at the molecular and cellular levels such as genetic alterations, gene expression changes, or protein profiles. Although these various layers are all related physiologically, there is no standard terminology or framework for storing and organizing the different kinds of data that describe them, making it difficult for scientists to systematically integrate and analyze data across different biological scales. Being able to do so, many investigators now believe, could provide a more efficient and comprehensive way to understand and fight disease.

A new project recently launched by Nicholas Tatonetti (Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics) along with co-principal investigators Chunhua Weng (Department of Biomedical Informatics) and Michel Dumontier (Stanford University), aims to bridge this divide. With the support of a $1.1 million grant from the National Center for Advancing Translational Science (NCATS) the scientists have begun to develop a tool they call DeepLink, a data translator that will integrate health-related findings at multiple scales.

As Dr. Tatonetti explains, “We want to close what we call the interoperability gap, a fundamental difference in the language and semantics used to describe the models and knowledge between the clinical and molecular domains. Our goal is to develop a scalable electronic architecture for integrating the enormous multiscale knowledge that is now available.”

Cell Types in Autism

By inventing a new computational pipeline called DAMAGES, Chaolin Zhang and Yufeng Shen showed that brain cell types on the left of the plot are more prone to have rare autism risk mutations than cell types at the right. Narrowing the focus to these types of cells also helped to identify a molecular signature of the disorder that involves haploinsufficiency. Figure: Human Mutation.

Autism, a spectrum of neurodevelopmental disorders typically identified during early childhood, is widely thought to be the result of genetic alterations that change how the growing brain is wired. Nevertheless, despite a substantial effort in the field of autism genetics, the specific alterations that place one child at greater risk than another remain elusive. Although the list of alterations associated with autism is growing, it has been difficult to conclusively distinguish those that truly increase disease risk from those that are merely coincident with it. One troubling reason for this is that research so far seems to indicate that specific genetic abnormalities associated with autism risk are extremely rare, with many being found only in single patients. This has made it hard to reproduce findings conclusively.

In a paper recently published in the journal Human Mutation, Department of Systems Biology faculty members Chaolin Zhang and Yufeng Shen describe a method and some new findings that could help to more precisely identify rare autism-driving alterations. A new analytical pipeline they call DAMAGES (Disease Associated Mutation Analysis using Gene Expression Signatures) uses a unique approach to identifying autism risk genes, looking at differences in gene expression among different cell types in the brain in order to focus more specifically on mechanisms that are likely to be relevant for autism. Using this approach, they identified a pronounced molecular signature that is shared by disease risk genes due to haploinsufficiency, a type of genetic alteration that causes a dramatic drop in the expression of a particular protein.

Yufeng Shen
Yufeng Shen's lab is interested in developing better computational methods for identifying rare genetic variants that increase disease risk.

On the surface, birth defects and cancer might not seem to have much in common. For some time, however, scientists have observed increased cancer risk among patients with certain developmental syndromes. One well-known example is seen in children with Noonan syndrome, who have an eightfold increased risk of developing leukemia. Recently, researchers studying the genetics of autism also observed mutations in PTEN, an important tumor suppressor gene. Although such findings have been largely isolated and anecdotal, they raise the tantalizing question of whether cancer and developmental disorders might be fundamentally linked.

According to a paper recently published in the journal Human Mutation, many of these similarities might not be just coincidental, but the result of shared genetic mutations. The study, led by Yufeng Shen, an Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics, together with Wendy Chung, Kennedy Family Associate Professor of Pediatrics at Columbia University Medical Center, found that cancer-driving genes also make up more than a third of the risk genes for developmental disorders. Moreover, many of these genes appear to function through similar modes of action. The scientists suggest that this could make tumors “natural laboratories” for pinpointing and predicting the damaging effects of rare genetic alterations that cause developmental disorders.

“In comparison with cancer, there are relatively few patients with developmental disorders,” Shen explains, “For geneticists, this makes it hard to identify the risk genes solely based on statistical evidence of mutations from these patients. This study indicates that we should be able to use what we learn from cancer genetics — where much more data are available — to help in the interpretation of genetic data in developmental disorders.”

Clonal evolution in GBM tumors
The researchers' model of tumor evolution indicates that different clonal lineages branch from a common ancestral cell and then diversify, independently causing aggressive tumor behavior at different stages of disease.

Glioblastoma multiforme (GBM) is the most common and most aggressive type of primary brain tumor in adults. Existing treatments against the disease are very limited in their effectiveness, meaning that in most patients tumors recur within a year. Once GBM returns, no beneficial therapeutics currently exist and prognosis is generally very poor.

To better understand how GBM evades treatment, an international team led by Antonio Iavarone and Raul Rabadan at the Columbia University Center for Topology of Cancer Evolution and Heterogeneity has been studying how the cellular composition of GBM tumors changes over the course of therapy. In a paper just published online by Nature Genetics, they provide the first sketch of the main routes of GBM tumor evolution during treatment, showing that different cellular clones within a tumor become dominant within specific tumor states. The study uncovers important general principles of tumor evolution, novel genetic markers of disease progression, and new potential therapeutic targets.

Chimpanzee

By using statistical methods to compare genomic data across species, such as chimpanzees and humans, the Przeworski Lab is gaining insights into the origins of genetic variation and adaptation. (Photo: Common chimpanzee at the Leipzig Zoo. Thomas Lersch, Wikimedia Commons.)

Launched approximately 100 years ago, population genetics is a subfield within evolutionary biology that seeks to explain how processes such as mutation, natural selection, and random genetic drift lead to genetic variation within and between species. Population genetics was originally born from the convergence of Mendelian genetics and biostatistics, but with the recent availability of genome sequencing data and high-performance computing technologies, it has bloomed into a mature computational science that is providing increasingly high-resolution models of the processes that drive evolution.

Molly Przeworski, a professor in the Columbia University Departments of Biological Sciences and Systems Biology, majored in mathematics at Princeton before beginning her PhD in evolutionary biology at the University of Chicago in the mid-1990s. While there, she realized that the availability of increasingly large data sets was changing population genetics, and has since been interested in using statistical approaches to investigate questions such as how genetic variation drives adaptation and why mutation rate and recombination rate differ among species. In the following interview, she describes how population genetics is itself evolving, as well as some of her laboratory’s contributions to the field.

Yaniv Erlich
Yaniv Erlich. Photo: Jared Leeds.

A new article published online in Nature Genetics reports that short tandem repeats, a class of genetic alterations in which short motifs of nucleotide base pairs occur multiple times in a row, play a role in modulating gene expression. Leading the study was Yaniv Erlich, an assistant professor in the Columbia University Department of Computer Science and core member of the New York Genome Center who recently joined the Center for Computational Biology and Bioinformatics.

As an article in Columbia Engineering explains, the findings reveal a new class of genome regulation.

The Department of Systems Biology and Center for Computational Biology and Bioinformatics are pleased to announce that three Columbia University faculty members have recently joined our community. Kam Leong, the Samuel Y. Sheng Professor of Biomedical Engineering at Columbia University, is now an interdisciplinary faculty member in the Department of Systems Biology. In addition, Yaniv Erlich and Guy Sella are now members of the Center for Computational Biology and Bioinformatics (C2B2). Their addition to the Department and to C2B2 will bring new expertise that will benefit our research and education activities, incorporating perspectives from fields such as nanotechnology, bioinformatics, and evolutionary genomics.

Topology of cancer

The Columbia University Center for Topology of Cancer Evolution and Heterogeneity will combine mathematical approaches from topological data analysis with new single-cell experimental technologies to study cellular diversity in solid tumors. Image courtesy of Raul Rabadan.

The National Cancer Institute’s Physical Sciences in Oncology program has announced the creation of a new center for research and education based at Columbia University. The Center for Topology of Cancer Evolution and Heterogeneity will develop and utilize innovative mathematical and experimental techniques to explore how genetic diversity emerges in the cells that make up solid tumors. In this way it will address a key challenge facing cancer research in the age of precision medicine — how to identify the clonal variants within a tumor that are responsible for its growth, spread, and resistance to therapy. Ultimately, the strategies the Center develops could be used to identify more effective biomarkers of disease and new therapeutic strategies.

Comorbidity between Mendelian disease and cancer
Researchers in the Rabadan Lab have found that comorbidity between Mendelian diseases and cancer may result from shared genetic factors.

Genetic diseases can arise in a variety of ways. Mendelian disorders, for example, occur when specific mutations in single genes — called germline mutations — are inherited from either of one’s two parents. Well-known examples of Mendelian diseases include cystic fibrosis, sickle cell disease, and Duchenne muscular dystrophy. Other genetic diseases, including cancer, result from somatic mutations, which occur in individual cells during a person’s lifetime. Because the genetic origins of Mendelian diseases and cancer are so different, they are typically understood to be distinct phenomena. However, scientists in the Columbia University Department of Systems Biology have found evidence that there might be interesting genetic connections between them. 

In a paper just published in Nature Communications, postdoctoral research scientist Rachel Melamed and colleagues in the laboratory of Associate Professor Raul Rabadan report on a new method that uses knowledge about Mendelian diseases to suggest mutations involved in cancer. The study takes advantage of an enormous collection of electronic health records representing over 110 million patients, a substantial percentage of US residents. The authors show that clinical co-occurrence of Mendelian diseases and cancer, known as comorbidity, can be tied to genetic changes that play roles in both diseases. The paper also identifies several specific relationships between Mendelian diseases and the cancers melanoma and glioblastoma.

ALK-negative ALCL mutation map
A map of mutations observed in ALK-negative anaplastic large cell lymphoma. (Credit: Dr. Rabadan)

The following article is reposted with permission from the Columbia University Medical Center Newsroom. Find the original here.

The first-ever systematic study of the genomes of patients with ALK-negative anaplastic large cell lymphoma (ALCL), a particularly aggressive form of non-Hodgkin’s lymphoma (NHL), shows that many cases of the disease are driven by alterations in the JAK/STAT3 cell signaling pathway. The study also demonstrates, in mice implanted with human-derived ALCL tumors, that the disease can be inhibited by compounds that target this pathway, raising hopes that more effective treatments might soon be developed. The study, led by researchers at Columbia University Medical Center (CUMC) and Weill Cornell Medical College, was published today in the online edition of Cancer Cell.

Autism Spectrum Disorders Genetic Network

Network of autism-associated genes. (Credit: Dennis Vitkup)

The following article is reposted with permission from the Columbia University Medical Center Newsroom. Find the original here.

People with autism have a wide range of symptoms, with no two people sharing the exact type and severity of behaviors. Now a large-scale analysis of hundreds of patients and nearly 1000 genes has started to uncover how diversity among traits can be traced to differences in patients’ genetic mutations. The study, conducted by researchers at Columbia University Medical Center, was published Dec. 22 in the journal Nature Neuroscience.

Autism researchers have identified hundreds of genes that, when mutated, likely increase the risk of developing autism spectrum disorder (ASD). Much of the variability among people with ASD is thought to stem from the diversity of underlying genetic changes, including the specific genes mutated and the severity of the mutation.

“If we can understand how different mutations lead to different features of ASD, we may be able to use patients’ genetic profiles to develop accurate diagnostic and prognostic tools and perhaps personalize treatment,” said senior author Dennis Vitkup, PhD, associate professor of systems biology and biomedical informatics at Columbia University’s College of Physicians & Surgeons.

Sequence of genomic alterations in CLLA graph representing the sequence of genomic alterations in chronic lymphocytic leukemia (CLL). Each node represents a mutation, with arrows indicating temporal relationships between them. The size of the nodes indicates the number of patients in the study who exhibited the alteration, while the thickness of the lines shows how often the temporal relationships between nodes were seen. The method the researchers use enabled them to identify multiple, distinct evolutionary patterns in CLL.

As biologists have gained a better understanding of cancer, it has become clear that tumors are often driven not by a single mutation, but by a series of genetic changes that correspond to particular stages of cancer progression. In this sense, a tumor is constantly evolving, with different groups of cells that harbor distinctive mutations multiplying at different rates, depending on their fitness for particular disease states. As the search for more effective cancer diagnostics and therapies continues, one key question is how to disentangle the order in which mutations occur in order to understand how tumors change over time. Being able to predict how a tumor will behave based on signs seen early in the course of disease could enable the development of new diagnostics that could better inform treatment planning.

In a paper just published in the journal eLife, a team of investigators led by Department of Systems Biology Associate Professor Raul Rabadan reports on a new computational strategy for addressing this challenge. Their framework, called tumor evolutionary directed graphs (TEDG), considers next-generation sequencing data from tumor samples from a large number of patients. Using TEDG to analyze cancer cells in patients with chronic lymphocytic leukemia (CLL), they were able to develop a model of how the disease’s mutational landscape changes from its initial onset to its late stages. Their findings suggest that CLL may not be just the result of a single evolutionary path, but can evolve in alternative ways.