Interdisciplinary Research×


Integrating data sources

Clinical and molecular data are currently stored in many different databases using different semantics and different formats. A new project called DeepLink aims to develop a framework that would make it possible to compare and analyze data across platforms not originally intended to intersect. (Image courtesy of Nicholas Tatonetti.)

Medical doctors and basic biological scientists tend to speak about human health in different languages. Whereas doctors in the clinic focus on phenomena such as symptoms, drug effects, and treatment outcomes, basic scientists often concentrate on activity at the molecular and cellular levels such as genetic alterations, gene expression changes, or protein profiles. Although these various layers are all related physiologically, there is no standard terminology or framework for storing and organizing the different kinds of data that describe them, making it difficult for scientists to systematically integrate and analyze data across different biological scales. Being able to do so, many investigators now believe, could provide a more efficient and comprehensive way to understand and fight disease.

A new project recently launched by Nicholas Tatonetti (Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics) along with co-principal investigators Chunhua Weng (Department of Biomedical Informatics) and Michel Dumontier (Stanford University), aims to bridge this divide. With the support of a $1.1 million grant from the National Center for Advancing Translational Science (NCATS) the scientists have begun to develop a tool they call DeepLink, a data translator that will integrate health-related findings at multiple scales.

As Dr. Tatonetti explains, “We want to close what we call the interoperability gap, a fundamental difference in the language and semantics used to describe the models and knowledge between the clinical and molecular domains. Our goal is to develop a scalable electronic architecture for integrating the enormous multiscale knowledge that is now available.”

Peter Sims & Jinzhou Yuan

Assistant Professor Peter Sims and postdoctoral research scientist Jinzhou Yuan displaying their platform for automated single cell RNA-Seq. Photo: Lynn Saville.

RNA sequencing (RNA-Seq) has become a workhorse technology for research in systems biology. Unlike genome sequencing, which reveals a sample’s DNA blueprint, RNA-Seq catalogs the constantly changing transcriptome; that is, it itemizes and quantifies the complete set of messenger RNA transcripts that are present in cells at a specific time and under specific conditions. In this way, RNA-Seq makes it possible to investigate how the information encoded in the genome is functionally transformed into observable traits, and provides valuable data for defining and comparing different biological states.

Conventional RNA-Seq generates an average summary of mRNA abundance across all of the cells in a sample. Recent research, however, has created a demand for higher resolution technologies capable of generating mRNA profiles at the level of single cells. In cancer biology, for example, there is an increasingly acute awareness that gene expression in the cells that make up malignant tumors is highly heterogeneous. This suggests that in order to understand how the cells work together to drive a tumor’s cancerous behavior, scientists need better methods for characterizing the entire ecology of cells of which it is made. Being able to quantify differences in gene expression cell by cell could be one valuable way to explore such complex environments and understand how they sustain malignancy.

Although several single cell RNA-seq technologies have been unveiled in the past two years, they are expensive to operate and are not optimized to produce data on the scale that is required for systems biology research, particularly in tissue specimens with limited numbers of cells. In a new paper just published in the journal Scientific Reports, however, researchers in the laboratory of Department of Systems Biology Assistant Professor Peter Sims describe a novel approach that offers several important advantages over other existing methods.

The new, automated platform builds on previous innovations in the Sims Lab to offer a cheap, efficient, and reliable way to simultaneously measure gene expression in thousands of individual cells from a single tissue sample. Using custom designed microwell plates, microfluidics, temperature control systems, and software, the technology captures, tags, and generates a readout of the complete transcriptome in each cell, providing robust data that can then be analyzed to distinguish functional diversity among the cells in the sample. Already, the technology is playing a key role in several research projects being conducted in the Department of Systems Biology and promises to become even more powerful as the field of single cell genomics continues to evolve.


In a recent paper published in Molecular Systems Biology, Kam Leong describes a two-compartment microfluidic device that consists of a chamber within which is embedded a "microbial swarmbot" that is isolated by a permeable hydrogel shell. In collaboration with Lingchong You (Duke University), Leong used the device to regulate the dynamics of a population of bacteria containing a genetically engineered switch that reacts to population size. The scale bar in panel 1 represents a length of 250 micrometers.

With a restless curiosity, Kam Leong always seems to be on the lookout for new problems to solve. A versatile biomedical engineer originally trained in chemical engineering, he has developed an impressive array of innovative nanotechnologies that have opened up new opportunities in biomedical research and drug delivery. 

The most widely known of his designs resulted from his work as a postdoc in the laboratory of MIT’s Robert Langer. While there, Leong played a critical role in the development of Gliadel, a controlled-release therapy that uses biodegradable polymer particles to deliver an anticancer drug to a brain tumor site following surgery. Since then his name has appeared on more than 70 patents covering a wide range of inventions — from microfluidics technologies, to scaffolds for growing organic tissues, to nanoscale fluorescent probes, to a method that uses nanoparticles instead of viruses for the oral delivery of gene therapies. These achievements have gained him widespread respect within the engineering community, as evidenced by his 2013 election to both the National Academy of Engineering and the National Academy of Inventors.

Dr. Leong joined Columbia University in 2014. Although his primary affiliation is with the Department of Biomedical Engineering, he was also attracted by the chance to assume an interdisciplinary faculty appointment in the Department of Systems Biology. Since his arrival he has been developing collaborations with several Systems Biology faculty members as well as other scientists at Columbia University Medical Center, and plans are underway for his lab to move into the Lasker Biomedical Research Building to better facilitate interactions with systems biology and clinical investigators. In the following interview, Leong describes why opportunities to interact with scientists in other disciplines is so important to his work, and how the kinds of technologies he has developed could be relevant for systems biology research, as well as for improving treatment of human diseases.

Master regulators of tumor homeostasis

In this rendering, master regulators of tumor homeostasis (white) integrate upstream genetic and epigenetic events (yellow) and regulate downstream genes (purple) responsible for implementing cancer programs such as proliferation and migration. CaST aims to develop systematic methods for identifying drugs capable of disrupting master regulator activity.

The Columbia University Department of Systems Biology has been named one of four inaugural centers in the National Cancer Institute’s (NCI) new Cancer Systems Biology Consortium. This five-year grant will support the creation of the Center for Cancer Systems Therapeutics (CaST), a collaborative research center that will investigate the general principles and functional mechanisms that enable malignant tumors to grow, evade treatment, induce disease progression, and develop drug resistance. Using this knowledge, the Center aims to identify new cancer treatments that target master regulators of tumor homeostasis.

CaST will build on previous accomplishments in the Department of Systems Biology and its Center for Multiscale Analysis of Genomic and Cellular Networks (MAGNet), which developed several key systems biology methods for characterizing the complex molecular machinery underlying cancer. At the same time, however, the new center constitutes a step forward, as it aims to move beyond a static understanding of cancer biology toward a time-dependent framework that can account for the dynamic, ever-changing nature of the disease. This more nuanced understanding could eventually enable scientists to better predict how individual tumors will change over time and in response to treatment.

Andrew Anzalone and Sakellarios ZairisMD/PhD students Andrew Anzalone and Sakellarios Zairis combined approaches based in chemical biology, synthetic biology, and computational biology to develop a new method for protein engineering.

The ribosome is a reliable machine in the cell, precisely translating the nucleotide code carried by messenger RNAs (mRNAs) into the polypeptide chains that form proteins. But although the ribosome typically reads this code with uncanny accuracy, translation has some unusual quirks. One is a phenomenon called -1 programmed ribosomal frameshifting (-1 PRF), in which the ribosome begins reading an mRNA one nucleotide before it should. This hiccup bumps translation “out of frame,” creating a different sequence of three-nucleotide-long codons. In essence, -1 PRF thus gives a single gene the unexpected ability to code for two completely different proteins.

Recently Andrew Anzalone, an MD/PhD student in the laboratory of Virginia Cornish, set out to explore whether he could take advantage of -1 PRF to engineer cells capable of producing alternate proteins. Together with Sakellarios Zairis, another MD/PhD student in the Columbia University Department of Systems Biology, the two developed a pipeline for identifying RNA motifs capable of producing this effect, as well as a method for rationally designing -1 PRF “switches.” These switches, made up of carefully tuned strands of RNA bound to ligand-sensing aptamers, can react to the presence of a specific small molecule and reliably modulate the ratio in the production of two distinct proteins from a single mRNA. The technology, they anticipate, could offer a variety of exciting new applications for synthetic biology. A paper describing their approach and findings has been published in Nature Methods.

Staphylococcus epidermis
Interactions between human cells and the bacteria that inhabit our bodies can affect health. Here, Staphylococcus epidermis binds to nasal epithelial cells. (Image courtesy of Sheetal Trivedi and Sean Sullivan.)

Launched in 2014 by investigators in the Mailman School of Public Health, the CUMC Microbiome Working Group brings together basic, clinical, and population scientists interested in understanding how the human microbiome—the ecosystems of bacteria that inhabit and interact with our tissues and organs—affects our health. Computational biologists in the Department of Systems Biology have become increasingly involved in this interdepartmental community, contributing expertise in analytical approaches that make it possible to make sense of the large data sets that microbiome studies generate.

Economic Markets and Biological Markets

In a similar manner to the ways in which countries make and trade goods, microbial cells within bacterial communities exchange metabolites to promote cell growth. This perspective could provide a way of studying microbial communities from the perspective of economics.

An article in the Wall Street Journal reports on a recent collaboration involving Columbia University Department of Systems Biology Assistant Professor Harris Wang and Claremont Graduate University economist Joshua Tasoff that identified some intriguing similarities between economic markets and the exchange of resources among microbes within bacterial communities. 

In an unusual marriage, biology and economics appear to be a match made in heaven.

Four years ago, two former roommates reunited at a friend’s wedding had time to catch up. The first, an economist, asked: “What are you working on?” The second, a biologist, answered: “How microbial communities interact. It’s kind of like in economics.”

And that’s when the intellectual sparks began to fly.

Peter Sims, Sagi Shapira, and Harris Wang

Assistant Professors Peter Sims, Sagi Shapira, and Harris Wang recently moved into a new Department of Systems Biology laboratory space designed to facilitate the development of new technologies for biological and biomedical research. Photo: Lynn Saville.

The Columbia University Department of Systems Biology has opened a new experimental research hub focused on biotechnology development. Occupying one and a half floors in the Mary Woodard Lasker Biomedical Research Building at Columbia University Medical Center, the facility will promote the design and implementation of new experimental methods for the study and engineering of biological systems. It will also enable a substantial expansion of Columbia’s next-generation genome sequencing capabilities.

The first occupants of the new facility are the laboratories of Department of Systems Biology Assistant Professors Sagi Shapira, Peter Sims, and Harris Wang, along with the Genome Sequencing and Analysis Center of the JP Sulzberger Columbia Genome Center. The community is slated to grow, as currently unoccupied space will soon accommodate additional Columbia University faculty labs that are also developing new biotechnologies.

Breast cancer cells

A histological slide of cancerous breast tissue. The pink "riverways" are normal connective tissue while areas stained blue are cancer cells. (Source: National Cancer Institute)

Investigators at Columbia University Medical Center and the Icahn School of Medicine at Mount Sinai have discovered a molecular signaling mechanism that drives a specific type of highly aggressive breast cancer. As reported in a paper in Genes & Development, a team led by Jose Silva and Andrea Califano determined that the gene STAT3 is a master regulator of breast tumors lacking hormone receptors but testing positive for human epidermal growth receptor 2 (HR-/HER2+). The researchers also characterized a pathway including IL-6, JAK2, STAT3, and S100A8/9 — genes already known to play important roles within the immune response — as being essential for the survival of HR-/HER2+ cancer cells. Additional tests showed that disrupting this pathway severely limits the ability of these cells to survive.

These findings are particularly exciting because the pathway the researchers identified contains multiple targets for which known FDA-approved drugs exist. The paper reports that when these drugs were tested in disease models, the cancer cells showed a dramatic response, suggesting promising strategies for the treatment of the HR-/HER2+ cancer subtype. A clinical trial is now underway to investigate the effects of these approaches in humans.


PhenoGraph, a new algorithm developed in Dana Pe'er's laboratory, proved capable of accurately identifying AML stem cells, reducing high-dimensional single cell mass cytometry data to an interpretable two-dimensional graph. Image courtesy of Dana Pe'er.

A key problem that has emerged from recent cancer research has been how to deal with the enormous heterogeneity found among the millions of cells that make up an individual tumor. Scientists now know that not all tumor cells are the same, even within an individual, and that these cells diversify into subpopulations, each of which has unique properties, or phenotypes. Of particular interest are cancer stem cells, which are typically resistant to existing cancer therapies and lead to relapse and recurrence of cancer following treatment. Finding better ways to distinguish and characterize cancer stem cells from other subpopulations of cancer cells has therefore become an important goal, for once these cells are identified, their vulnerabilities could be studied with the aim of developing better, long lasting cancer therapies.

In a paper just published online in Cell, investigators in the laboratories of Columbia University’s Dana Pe’er and Stanford University’s Garry Nolan describe a new method that takes an important step toward addressing this challenge. As Dr. Pe’er explains, “Biology has come to a point where we suddenly realize there are many more cell types than we ever imagined possible. In this paper, we have created an algorithm that can very robustly identify such subpopulations in a completely automatic and unsupervised way, based purely on high-dimensional single-cell data. This new method makes it possible to discover many new cell subpopulations that we have never seen before.”

Topology of cancer

The Columbia University Center for Topology of Cancer Evolution and Heterogeneity will combine mathematical approaches from topological data analysis with new single-cell experimental technologies to study cellular diversity in solid tumors. Image courtesy of Raul Rabadan.

The National Cancer Institute’s Physical Sciences in Oncology program has announced the creation of a new center for research and education based at Columbia University. The Center for Topology of Cancer Evolution and Heterogeneity will develop and utilize innovative mathematical and experimental techniques to explore how genetic diversity emerges in the cells that make up solid tumors. In this way it will address a key challenge facing cancer research in the age of precision medicine — how to identify the clonal variants within a tumor that are responsible for its growth, spread, and resistance to therapy. Ultimately, the strategies the Center develops could be used to identify more effective biomarkers of disease and new therapeutic strategies.

Gut bacteria

Photo by David Gregory and Debbie Marshall, Wellcome Images. 

Recent deep sequencing studies are providing an increasingly detailed picture of the genetic composition of the human microbiome, the diverse collection of bacterial species that inhabit the gut. At the same time, however, little is known about the dynamics of these colonies, particularly why certain microbial strains outcompete others in the same environment. In a new paper published in the journal Molecular Systems Biology, Department of Systems Biology Assistant Professor Harris Wang, in collaboration with Georg Gerber and researchers at Harvard University, report on their development of the first method for using functional metagenomics to identify genes within commensal bacterial genomes that give them an evolutionary fitness advantage.

Expanding the landscape of breast cancer drivers

In comparison with a previous study (Stephens et al., 2012, shown in gray), a new computational approach that focuses on somatic copy number mutations increased the number of known driver mutations in breast tumors to a median of five for each tumor. The findings could raise the likelihood of finding actionable targets in individual patients with breast cancer.

For many years, researchers have known that somatic copy number alterations (SCNA’s) — insertions, deletions, duplications, and transpositions of sections of DNA that are not inherited but occur after birth — play important roles in causing many types of cancer. Indeed, most recurrent drivers of epithelial tumors are copy number alterations, with some found in up to 40% of patients with specific tumor types. However, because SCNA’s occur when entire sections of chromosomes become damaged, biologists have had difficulty developing effective methods for distinguishing genes within SCNA’s that actually drive cancer from those genes that might lie near a driver but do not themselves cause disease.

Helios nearly doubled the number of high-confidence predictions of breast cancer drivers.

In a new paper published in Cell, researchers in the laboratories of Dana Pe’er (Columbia University Departments of Systems Biology and Biological Sciences) and Jose Silva (Icahn School of Medicine at Mount Sinai) report on a new computational algorithm that promises to dramatically improve researchers’ ability to identify cancer-driving genes within potentially large SCNA’s. The algorithm, called Helios, was used to analyze a combination of genomic data and information generated by functional RNAi screens, enabling them to predict several dozen new SCNA drivers of breast cancer. In follow-up in vitro experimental studies, they tested 12 of these predictions, 10 of which were validated in the laboratory. Their findings nearly double the number of breast cancer drivers, providing many new opportunities towards personalized treatments for breast cancer. Their methodology is general and could also be used to locate disease-causing SCNA’s in other cancer types.

Leading this effort was Felix Sanchez-Garcia, a recent PhD graduate from the Pe’er Lab and a first author on the paper. The story of how this breakthrough came about illuminates how the interdisciplinary research and education that take place at the Department of Systems Biology can address important challenges facing biological and biomedical research.

Comparing human and mouse prostate cancer networks

Computational synergy analysis depicting FOXM1 and CENPF regulons from the human (left) and mouse (right) interactomes showing shared and nonshared targets. Red corresponds to overexpressed targets and blue to underexpressed targets.

Two genes work together to drive the most lethal forms of prostate cancer, according to new research by investigators in the Columbia University Department of Systems Biology.  These findings could lead to a diagnostic test for identifying those tumors likely to become aggressive and to the development of novel combination therapy for the disease.

The two genes—FOXM1 and CENPF—had been previously implicated in cancer, but none of the prior studies suggested that they might work synergistically to cause the most aggressive form of prostate cancer. The study was published today in the online issue of Cancer Cell.

Distribution of marker expression across development

A new algorithm called Wanderlust uses single-cell measurements to detect how marker expression changes across development.

In a new paper published in the journal Cell, a team of researchers led by Dana Pe’er at Columbia University and Garry Nolan at Stanford University describes a powerful new method for mapping cellular development at the single cell level. By combining emerging technologies for studying single cells with a new, advanced computational algorithm, they have designed a novel approach for mapping development and created the most comprehensive map ever made of human B cell development. Their approach will greatly improve researchers’ ability to investigate development in cells of all types, make it possible to identify rare aberrations in development that lead to disease, and ultimately help to guide the next generation of research in regenerative medicine.

Pointing out why being able to generate these maps is an important advance, Dr. Pe’er, an associate professor in the Columbia University Department of Systems Biology and Department of Biological Sciences, explains, “There are so many diseases that result from malfunctions in the molecular programs that control the development of our cell repertoire and so many rare, yet important, regulatory cell types that we have yet to discover. We can only truly understand what goes wrong in these diseases if we have a complete map of the progression in normal development. Such maps will also act as a compass for regenerative medicine, because it’s very difficult to grow something if you don’t know how it develops in nature. For the first time, our method makes it possible to build a high-resolution map, at the single cell level, that can guide these kinds of research.”

Department of Systems Biology Symposium

On Thursday, October 17 more than 200 attendees filled the Hammer Health Sciences Center auditorium to celebrate the recent creation of the new Columbia University Department of Systems Biology. The event featured a keynote address by pioneering systems and synthetic biologist James Collins, as well as talks from more than a dozen Department faculty members and other collaborating investigators that spotlighted the wide range of research in computational and systems biology being pursued at Columbia. 


viSNE reveals the progression of cancer in a sample of cells taken from a patient with acute myeloid leukemia. Cells are colored according to intensity of expression of the indicated cell markers, enabling the comparison of expression patterns before and after relapse. For example, Fit3 is expressed primarily in the diagnosis sample, while CD34 emerges in the relapse sample.

Researchers in the Columbia Initiative in Systems Biology have developed a computational method that enables scientists to visualize and interpret high-dimensional data produced by single-cell measurement technologies such as mass cytometry. The method, called viSNE (visual interactive Stochastic Neighbor Embedding), has just been published in the online edition of Nature Biotechnology. It has particular relevance to cancer research and therapeutics. As Columbia University Medical Center reports:

Researchers now understand that cancer within an individual can harbor subpopulations of cells with different molecular characteristics. Groups of cells may behave differently from one another, including in how they respond to treatment. The ability to study single cells, as well as to identify and characterize subpopulations of cancerous cells within an individual, could lead to more precise methods of diagnosis and treatment.

“Our method not only will allow scientists to explore the heterogeneity of cancer cells and to characterize drug-resistant cancer cells, but also will allow physicians to track tumor progression, identify drug-resistant cancer cells, and detect minute quantities of cancer cells that increase the risk of relapse,” said co-senior author Dana Pe’er, associate professor of biological sciences and systems biology at Columbia.

Barry Honig

When Columbia University founded the Center for Multiscale Analysis of Genomic and Cellular Networks (MAGNet) in 2005, one of its goals was to integrate the methods of structural biology with those of systems biology. Considering protein structure within the context of computational models of cellular networks, researchers hoped, would not only improve the predictive value of their models by giving another layer of evidence, but also lead to new types of predictions that could not be made using other methods.

In a new paper published in Nature magazine, Barry Honig, Andrea Califano, and other members of the Columbia Initiative in Systems Biology, including first authors Qiangfeng Cliff Zhang and Donald Petrey, report that this goal has now been realized. For the first time, the researchers have shown that information about protein structure can be used to make predictions about protein-protein interactions on a genome-wide scale. Their approach capitalizes on innovative techniques in computational structural biology that the Honig lab has developed over the last 15 years, culminating in the development of a new algorithm called Predicting Protein-Protein Interactions (PrePPI). In this interview, Honig describes the evolution of this new approach, and what it could mean for the future of systems biology.