Electronic Health Records×


Integrating data sources

Clinical and molecular data are currently stored in many different databases using different semantics and different formats. A new project called DeepLink aims to develop a framework that would make it possible to compare and analyze data across platforms not originally intended to intersect. (Image courtesy of Nicholas Tatonetti.)

Medical doctors and basic biological scientists tend to speak about human health in different languages. Whereas doctors in the clinic focus on phenomena such as symptoms, drug effects, and treatment outcomes, basic scientists often concentrate on activity at the molecular and cellular levels such as genetic alterations, gene expression changes, or protein profiles. Although these various layers are all related physiologically, there is no standard terminology or framework for storing and organizing the different kinds of data that describe them, making it difficult for scientists to systematically integrate and analyze data across different biological scales. Being able to do so, many investigators now believe, could provide a more efficient and comprehensive way to understand and fight disease.

A new project recently launched by Nicholas Tatonetti (Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics) along with co-principal investigators Chunhua Weng (Department of Biomedical Informatics) and Michel Dumontier (Stanford University), aims to bridge this divide. With the support of a $1.1 million grant from the National Center for Advancing Translational Science (NCATS) the scientists have begun to develop a tool they call DeepLink, a data translator that will integrate health-related findings at multiple scales.

As Dr. Tatonetti explains, “We want to close what we call the interoperability gap, a fundamental difference in the language and semantics used to describe the models and knowledge between the clinical and molecular domains. Our goal is to develop a scalable electronic architecture for integrating the enormous multiscale knowledge that is now available.”

Nicholas Tatonetti
Nicholas Tatonetti is an assistant professor in the Department of Biomedical Informatics and Department of Systems Biology.

A team of Columbia University Medical Center (CUMC) scientists led by Nicholas Tatonetti has identified several drug combinations that may lead to a potentially fatal type of heart arrhythmia known as torsades de pointes (TdP). The key to the discovery was a new bioinformatics pipeline called DIPULSE (Drug Interaction Prediction Using Latent Signals and EHRs), which builds on previous methods Tatonetti developed for identifying drug-drug interactions (DDIs) in observational data sets. The results are reported in a new paper in the journal Drug Safety and are covered in a detailed multimedia feature published by the Chicago Tribune.

The algorithm mined data contained in the US FDA Adverse Event Reporting System (FAERS) to identify latent signals of DDIs that cause QT interval prolongation, a disturbance in the electrical cycle that coordinates the heartbeat. It then validated these predictions by looking for their signatures in electrocardiogram results contained in a large collection of electronic health records at Columbia. Interestingly, the drugs the investigators identified do not cause the condition on their own, but only when taken in specific combinations.

Previously, no reliable methods existed for identifying these kinds of combinations. Although the findings are preliminary, the retrospective confirmation of many of DIPULSE’s predictions in actual patient data suggests its effectiveness, and the investigators plan to test them experimentally in the near future.

The Department of Systems Biology and Center for Computational Biology and Bioinformatics are pleased to announce that three Columbia University faculty members have recently joined our community. Kam Leong, the Samuel Y. Sheng Professor of Biomedical Engineering at Columbia University, is now an interdisciplinary faculty member in the Department of Systems Biology. In addition, Yaniv Erlich and Guy Sella are now members of the Center for Computational Biology and Bioinformatics (C2B2). Their addition to the Department and to C2B2 will bring new expertise that will benefit our research and education activities, incorporating perspectives from fields such as nanotechnology, bioinformatics, and evolutionary genomics.

Monthly disease risk

Columbia scientists used electronic records of 1.7 million New York City patients to map the statistical relationship between birth month and disease incidence. Image courtesy of Nicholas Tatonetti.

Columbia University Medical Center reports on a new study in the Journal of American Medical Informatics Association led by Nicholas Tatonetti, also an assistant professor in the Department of Systems Biology.

Comorbidity between Mendelian disease and cancer
Researchers in the Rabadan Lab have found that comorbidity between Mendelian diseases and cancer may result from shared genetic factors.

Genetic diseases can arise in a variety of ways. Mendelian disorders, for example, occur when specific mutations in single genes — called germline mutations — are inherited from either of one’s two parents. Well-known examples of Mendelian diseases include cystic fibrosis, sickle cell disease, and Duchenne muscular dystrophy. Other genetic diseases, including cancer, result from somatic mutations, which occur in individual cells during a person’s lifetime. Because the genetic origins of Mendelian diseases and cancer are so different, they are typically understood to be distinct phenomena. However, scientists in the Columbia University Department of Systems Biology have found evidence that there might be interesting genetic connections between them. 

In a paper just published in Nature Communications, postdoctoral research scientist Rachel Melamed and colleagues in the laboratory of Associate Professor Raul Rabadan report on a new method that uses knowledge about Mendelian diseases to suggest mutations involved in cancer. The study takes advantage of an enormous collection of electronic health records representing over 110 million patients, a substantial percentage of US residents. The authors show that clinical co-occurrence of Mendelian diseases and cancer, known as comorbidity, can be tied to genetic changes that play roles in both diseases. The paper also identifies several specific relationships between Mendelian diseases and the cancers melanoma and glioblastoma.

Some factors in the expo some

The exposome incorporates factors such as the environment we inhabit, the food we eat, and the drugs we take.

Although genomics has dramatically improved our understanding of the molecular origins of certain human genetic diseases, our health is also influenced by exposures to our surrounding environment. Molecules found in food, air and water pollution, and prescription drugs, for example, interact with genetic, molecular, and physiologic features within our bodies in highly personalized ways. The nature of these relationships is important in determining who is immune to such exposures and who becomes sick because of them.

In the past, methods for studying this interface have been limited because of the complexity of the problem. After all, how could we possibly cross-reference a lifetime’s worth of exposures with individual genetic profiles in any kind of meaningful way? Recently, however, an explosion in the generation of quantitative data related to the environment, health, and genetics — along with new computational methods based in machine learning and bioinformatics — have made this landscape ripe for exploration.

At this year’s South by Southwest Interactive Festival in Austin, Texas, Department of Systems Biology Assistant Professor Nicholas Tatonetti and his collaborator Chirag Patel (Harvard Medical School) discussed the remarkable new opportunities that “big data” approaches offer for investigating this landscape. Driving Tatonetti and Patel’s approach is a concept called the exposome. First proposed by Christopher Wild (University of Leeds) in 2005, an exposome represents all of the environmental exposures a person has experienced during his or her life that could play a role in the onset of chronic diseases. Tatonetti and Chirag’s presentation highlighted how investigation of the exposome has become tractable, as well as the important roles that individuals can play in supporting this effort.

In the following interview, Dr. Tatonetti discusses some of the approaches his team is using to explore the exposome, and how the project has evolved out of his previous research.

Autism Spectrum Disorders Genetic Network

Network of autism-associated genes. (Credit: Dennis Vitkup)

The following article is reposted with permission from the Columbia University Medical Center Newsroom. Find the original here.

People with autism have a wide range of symptoms, with no two people sharing the exact type and severity of behaviors. Now a large-scale analysis of hundreds of patients and nearly 1000 genes has started to uncover how diversity among traits can be traced to differences in patients’ genetic mutations. The study, conducted by researchers at Columbia University Medical Center, was published Dec. 22 in the journal Nature Neuroscience.

Autism researchers have identified hundreds of genes that, when mutated, likely increase the risk of developing autism spectrum disorder (ASD). Much of the variability among people with ASD is thought to stem from the diversity of underlying genetic changes, including the specific genes mutated and the severity of the mutation.

“If we can understand how different mutations lead to different features of ASD, we may be able to use patients’ genetic profiles to develop accurate diagnostic and prognostic tools and perhaps personalize treatment,” said senior author Dennis Vitkup, PhD, associate professor of systems biology and biomedical informatics at Columbia University’s College of Physicians & Surgeons.

Searches for hyperglycemia-related terms

Percentage of users in each of the three user groups searching for hyperglycemia-related terms, computed per week over 12 months of search log data. Background refers to the fraction of all searchers who search for hyperglycemia-related symptoms or terminology independent of the presence of the drugs in the users’ search histories.

Although the US Food and Drug Organization and other agencies collect and analyze reports on adverse drug effects, alerts for single drugs and drug-drug interactions are often delayed due to the time it takes to accumulate evidence. Columbia University Department of Systems Biology faculty member Nicholas Tatonetti, in collaboration with investigators at Stanford University and Microsoft Research, hypothesized that Internet users can provide early clues of adverse drug events as they seek information on the web concerning symptoms they are experiencing. A new paper explains their results.