News

Harris WangHarris Wang

Harris Wang has been named a recipient of the prestigious Presidential Early Career Award for Scientists and Engineers (PECASE). Dr. Wang is among 102 researchers recognized today by President Barack Obama as the newest recipients of this honor.

The PECASE is considered the United States’ highest award for young scientists and engineers, conferred annually at the White House at the recommendation of participating federal agencies. The award celebrates young researchers at the beginning of their independent research careers who show exceptional promise to lead at the frontiers of twenty-first century science and technology.

Integrating data sources

Clinical and molecular data are currently stored in many different databases using different semantics and different formats. A new project called DeepLink aims to develop a framework that would make it possible to compare and analyze data across platforms not originally intended to intersect. (Image courtesy of Nicholas Tatonetti.)

Medical doctors and basic biological scientists tend to speak about human health in different languages. Whereas doctors in the clinic focus on phenomena such as symptoms, drug effects, and treatment outcomes, basic scientists often concentrate on activity at the molecular and cellular levels such as genetic alterations, gene expression changes, or protein profiles. Although these various layers are all related physiologically, there is no standard terminology or framework for storing and organizing the different kinds of data that describe them, making it difficult for scientists to systematically integrate and analyze data across different biological scales. Being able to do so, many investigators now believe, could provide a more efficient and comprehensive way to understand and fight disease.

A new project recently launched by Nicholas Tatonetti (Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics) along with co-principal investigators Chunhua Weng (Department of Biomedical Informatics) and Michel Dumontier (Stanford University), aims to bridge this divide. With the support of a $1.1 million grant from the National Center for Advancing Translational Science (NCATS) the scientists have begun to develop a tool they call DeepLink, a data translator that will integrate health-related findings at multiple scales.

As Dr. Tatonetti explains, “We want to close what we call the interoperability gap, a fundamental difference in the language and semantics used to describe the models and knowledge between the clinical and molecular domains. Our goal is to develop a scalable electronic architecture for integrating the enormous multiscale knowledge that is now available.”

Regulators of mesenchymal GBM subtype

An example of tumor oncotecture. Transcription factors involved in the activation of mesenchymal glioblastoma subtype are shown in purple. Together, they comprise a tightly knit tumor checkpoint, controlling 74% of the genes in the mesenchymal signature of high-grade glioma. CEBP (both β and δ subunits) and STAT3 regulate the other three transcription factors in the tumour checkpoint, synergistically regulating the state of mesenchymal GBM cells. (Image: Nature Reviews Cancer)

In a detailed Perspective article published in Nature Reviews Cancer, Department of Systems Biology chair Andrea Califano and research scientist Mariano Alvarez (DarwinHealth) summarize more than a decade of work to propose the existence of a universal, tumor independent “oncotecture” that consistently defines cancer at the molecular level. Their findings, they argue, indicate that identifying and targeting highly conserved, essential proteins called master regulators — instead of the widely diverse genetic and epigenetic alterations that initiate cancer and have been the focus of much cancer research — could offer an effective way to classify and treat disease.

As coverage of the paper in The Economist reports:

ONE of the most important medical insights of recent decades is that cancers are triggered by genetic mutations. Cashing that insight in clinically, to improve treatments, has, however, been hard. A recent study of 2,600 patients at the M.D. Anderson Cancer Centre in Houston, Texas, showed that genetic analysis permitted only 6.4% of those suffering to be paired with a drug aimed specifically at the mutation deemed responsible. The reason is that there are only a few common cancer-triggering mutations, and drugs to deal with them. Other triggering mutations are numerous, but rare—so rare that no treatment is known nor, given the economics of drug discovery, is one likely to be sought. 

Facts such as these have led many cancer biologists to question how useful the gene-led approach to understanding and treating cancer actually is. And some have gone further than mere questioning. One such is Andrea Califano of Columbia University, in New York. He observes that, regardless of the triggering mutation, the pattern of gene expression—and associated protein activity—that sustains a tumour is, for a given type of cancer, almost identical from patient to patient. That insight provides the starting-point for a different approach to looking for targets for drug development. In principle, it should be simpler to interfere with the small number of proteins that direct a cancer cell’s behaviour than with the myriad ways in which that cancer can be triggered in the first place. (Read full article.)

Department of Systems Biology bioengineer Harris Wang describes the goals of the Human Genome Project - Write (HGP-write), an international initiative to develop new technologies for synthesizing very large genomes from scratch. 

In June 2016, a consortium of synthetic biologists, industry leaders, ethicists, and others  published a proposal in Science calling for a coordinated effort to synthesize large genomes, including a complete human genome in cell lines. The organizers of the project, called GP-write (for work in model organisms and plants) or sometimes HGP-write (for work in human cell lines), envision it as a successor to the Human Genome Project (retroactively termed HGP-read), which 25 years ago promoted rapid advances in DNA sequencing technology. As the ability to read the genome became more efficient and less expensive, it in turn enabled a revolution in how we study biology and attempt to improve human health. Now, by coordinating the development of new technologies for writing DNA on a whole-genome scale, GP-write aims to have a similarly transformative impact.

Among the paper’s authors were Virginia Cornish and Harris Wang, two members of the Columbia University Department of Systems Biology whose contributions to the field of engineering biology have in part made the idea of writing large-scale DNA sequences imaginable. We spoke with them to learn more about what GP-write hopes to accomplish, its potential benefits, and how the effort is evolving.

PrePPI inputs
PrePPI predicts the likelihood that two proteins A and B are capable of interacting based on their similarities to other proteins that are known to interact. This requires integrating structural data (green) as well as other kinds of information (blue), such as evidence of protein co-activity in other species as well as involvement in similar cellular functions. PrePPI now offers a searchable database of unprecedented scope, constituting a virtual interactome of all proteins in human cells. (Image courtesy of eLife.) 

The molecular machinery within every living cell includes enormous numbers of components functioning at many different levels. Features like genome sequence, gene expression, proteomic profiles, and chromatin state are all critical in this complex system, but studying a single level is often not enough to explain why cells behave the way they do. For this reason, systems biology strives to integrate different types of data, developing holistic models that more comprehensively describe networks of interactions that give rise to biological traits. 

Although the concept of an interaction network can seem abstract, at its foundation each interaction is a physical event that takes place when two proteins encounter one another, bind, and cause a change that affects a cell’s activity. In order for this to take place, however, they need to have compatible shapes and physical properties. Being able to predict the entire universe of possible pairwise protein-protein interactions could therefore be immensely valuable to systems biology, as it could both offer a framework for interpreting the feasibility of interactions proposed by other methods and potentially reveal unique features of networks that other approaches might miss. 

In a 2012 paper in Nature, scientists in the laboratory of Barry Honig first presented a landmark algorithm and database they call PrePPI (Predicting Protein-Protein Interactions). At the time, PrePPI used a novel computational strategy that deploys concepts from structural biology to predict approximately 300,000 protein-protein interactions, a dramatic increase in the number of available interactions when compared with experimentally generated resources.

Since then, the Honig Lab has been working hard to improve PrePPI’s scope and usefulness. In a paper recently published in eLife they now report on some impressive developments. With enhancements to their algorithm and the incorporation several new types of data into its analysis, the PrePPI database now contains more than 1.35 million predictions of protein-protein interactions, covering about 85% of the entire human proteome. This makes it the largest resource of its kind. In parallel with these improvements, the investigators have also begun to apply PrePPI in new ways, using the information it contains to provide new kinds of insights into the organization and function of protein interaction networks.