Electronic Health Records×

News

Nicholas Tatonetti, PhD
Nicholas Tatonetti, Phd

Nicholas Tatonetti , PhD, solves problems. He has always enjoyed it, and as the informatics community has discovered, he is both creative and proficient in his methods.

Dr. Tatonetti, who was recently awarded tenure and promoted to the rank of Associate Professor in the Columbia Department of Biomedical Informatics (DBMI) and Department of Systems Biology , focuses on the use of advanced data science methods, including artificial intelligence and machine learning, to investigate medicine safety. Using emerging resources, such as electronic health records (EHR) and genomics databases, his lab is working to identify for whom these drugs will be safe and effective and for whom they will not.

His path to Columbia wasn’t a traditional one, but that fits his work. Since joining in 2012, Dr. Tatonetti has used non-traditional thinking to benefit both health and healthcare.

Utilizing both data mining of medical records and prospective lab experiments, Dr. Tatonetti created a methodology for both finding and validating adverse drug reactions and drug-drug interactions. During a two-year collaboration with Pulitzer Prize-winning journalist Sam Roe of the Chicago Tribune , Dr. Tatonetti discovered that the drugs ceftriaxone and lansoprazole, when taken together, induces an arrhythmia in the heart.

The data mining identified adverse effects, while the lab experiments established causality. Dr. Tatonetti wasn’t specifically looking for a negative reaction of those particular drugs; he had no reason to suspect them.

“We are able to find things that nobody expects to happen because the world of hypotheses we consider is basically everything,” he said. “We consider every possible combination, a type of analysis that would be impossible without a huge data set and significant computational power.”

Phyllis Thangaraj
Phyllis Thangaraj, MD/PhD student (Tatonetti lab)

Aspiring physician-scientists from Columbia's Vagelos College of Physicians and Surgeons presented their research posters at the 14th annual MD-PhD Student Research Symposium on April 25. Their research delved into a range of topics, including Alzheimer’s disease, stroke, and stem cells. The event included a guest lecture by an alumna about her own career path as a physician-scientist, and culminated in the poster session judged by MD-PhD alumni who currently work at the University. Department of Systems Biology’s Phyllis Thangaraj, an MD/PhD student in the Nicholas Tatonetti lab , was named one of five poster winners at the event. 

She presented work on applying machine learning methods to phenotype acute ischemic stroke patients in the electronic health records. In cohort research studies, it is essential to identify a large number of subjects in an accurate and efficient manner, but often this requires time-consuming manual review of patient charts. 

“We applied machine learning methods to data within a patient’s electronic health records to develop a high-throughput way to define research cohorts,” explains Thangaraj. “Our test case is in acute ischemic stroke. We extracted clues within a person’s medical record that required minimal data processing to classify those who have had a stroke. In a separate cohort, the UK Biobank, we were able to use our model to identify patients with self-reported stroke but no mention in their medical data with 65-fold better precision than random selection of patients.” Although stroke was the test case in this particular work, she explained that their workflow could be applied to identify patients for cohorts of other diseases, particularly when the dataset has missing data. 

Tatonetti Heritability Image

Each subgraph in this image is a family reconstructed from EHR data: Each node represents an individual and the colors represent different health conditions. (Figure: Nicholas Tatonetti, PhD, Columbia University Vagelos College of Physicians and Surgeons).

Acne is highly heritable, passed down through families via genes, but anxiety appears more strongly linked to environmental causes, according to a new study that analyzed data from millions of electronic health records to estimate the heritability of hundreds of different traits and conditions. 

As reported by the Columbia Newsroom, the findings, published in Cell by researchers at Columbia University Irving Medical Center and NewYork-Presbyterian could streamline efforts to understand and mitigate disease risk—especially for diseases with no known disease-associated genes.

“Knowledge of a condition’s heritability—how much the condition’s variability can be attributed to genes—is essential for understanding the biological causes of the disease and for precision medicine,” says study co-leader Nicholas Tatonetti, PhD , the Herbert Irving Assistant Professor of Biomedical Informatics at Columbia University Vagelos College of Physicians and Surgeons and an assistant professor of systems biology. “It is clinically useful for estimating disease risk, customizing treatment, and tailoring patient care.”

Nicholas P. Tatonetti, PhD, has recently been named director of clinical informatics at the Institute for Genomic Medicine (IGM) at Columbia University Medical Center. In this new role, he is charged with planning, organizing, directing and evaluating all clinical informatics efforts across the Institute. In particular, he will focus on the integration of electronic health record data for use in genetics and genomics studies.

Dr. Tatonetti, who is Herbert Irving Assistant Professor of Biomedical informatics with an interdisciplinary appointment in the Department of Systems Biology, specializes in advancing the application of data science in biology and health science. Researchers in his lab integrate their medical observations with systems and chemical biology models to not only explain drug effects, but also further understanding of basic biology and human disease. They focus also on integration of high throughput data capture technologies, such as next-generation genome and transcriptome sequencing, metabolomics, and proteomics, with the electronic medical record to study the complex interplay between genetics, environment, and disease.

At the Institute for Genomic Medicine, researchers are focused on innovative approaches to genomic medicine. Their multi-tiered approach to genomic medicine utilizes large scale genomic sequencing and analysis, paired with functional biology to advance the diagnosis, characterization, and treatment of genetic diseases. IGM is playing a critical role in Columbia’s overall Precision Medicine Initiative, a major University-wide effort to provide medical diagnosis, prevention and treatment based on an individual’s variation in genes, environment, and lifestyle. 

Dr. Tatonetti, who joined Columbia in 2012, is also affiliated with the Center for Computational Biology and Bioinformatics, the Department of Medicine, the Department of Biomedical Informatics, and the Center for Cancer Systems Therapeutics.

Integrating data sources

Clinical and molecular data are currently stored in many different databases using different semantics and different formats. A new project called DeepLink aims to develop a framework that would make it possible to compare and analyze data across platforms not originally intended to intersect. (Image courtesy of Nicholas Tatonetti.)

Medical doctors and basic biological scientists tend to speak about human health in different languages. Whereas doctors in the clinic focus on phenomena such as symptoms, drug effects, and treatment outcomes, basic scientists often concentrate on activity at the molecular and cellular levels such as genetic alterations, gene expression changes, or protein profiles. Although these various layers are all related physiologically, there is no standard terminology or framework for storing and organizing the different kinds of data that describe them, making it difficult for scientists to systematically integrate and analyze data across different biological scales. Being able to do so, many investigators now believe, could provide a more efficient and comprehensive way to understand and fight disease.

A new project recently launched by Nicholas Tatonetti (Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics) along with co-principal investigators Chunhua Weng (Department of Biomedical Informatics) and Michel Dumontier (Stanford University), aims to bridge this divide. With the support of a $1.1 million grant from the National Center for Advancing Translational Science (NCATS) the scientists have begun to develop a tool they call DeepLink, a data translator that will integrate health-related findings at multiple scales.

As Dr. Tatonetti explains, “We want to close what we call the interoperability gap, a fundamental difference in the language and semantics used to describe the models and knowledge between the clinical and molecular domains. Our goal is to develop a scalable electronic architecture for integrating the enormous multiscale knowledge that is now available.”

Nicholas Tatonetti
Nicholas Tatonetti is an assistant professor in the Department of Biomedical Informatics and Department of Systems Biology.

A team of Columbia University Medical Center (CUMC) scientists led by Nicholas Tatonetti has identified several drug combinations that may lead to a potentially fatal type of heart arrhythmia known as torsades de pointes (TdP). The key to the discovery was a new bioinformatics pipeline called DIPULSE (Drug Interaction Prediction Using Latent Signals and EHRs), which builds on previous methods Tatonetti developed for identifying drug-drug interactions (DDIs) in observational data sets. The results are reported in a new paper in the journal Drug Safety and are covered in a detailed multimedia feature published by the Chicago Tribune.

The algorithm mined data contained in the US FDA Adverse Event Reporting System (FAERS) to identify latent signals of DDIs that cause QT interval prolongation, a disturbance in the electrical cycle that coordinates the heartbeat. It then validated these predictions by looking for their signatures in electrocardiogram results contained in a large collection of electronic health records at Columbia. Interestingly, the drugs the investigators identified do not cause the condition on their own, but only when taken in specific combinations.

Previously, no reliable methods existed for identifying these kinds of combinations. Although the findings are preliminary, the retrospective confirmation of many of DIPULSE’s predictions in actual patient data suggests its effectiveness, and the investigators plan to test them experimentally in the near future.

The Department of Systems Biology and Center for Computational Biology and Bioinformatics are pleased to announce that three Columbia University faculty members have recently joined our community. Kam Leong, the Samuel Y. Sheng Professor of Biomedical Engineering at Columbia University, is now an interdisciplinary faculty member in the Department of Systems Biology. In addition, Yaniv Erlich and Guy Sella are now members of the Center for Computational Biology and Bioinformatics (C2B2). Their addition to the Department and to C2B2 will bring new expertise that will benefit our research and education activities, incorporating perspectives from fields such as nanotechnology, bioinformatics, and evolutionary genomics.

Monthly disease risk

Columbia scientists used electronic records of 1.7 million New York City patients to map the statistical relationship between birth month and disease incidence. Image courtesy of Nicholas Tatonetti.

Columbia University Medical Center reports on a new study in the Journal of American Medical Informatics Association led by Nicholas Tatonetti, also an assistant professor in the Department of Systems Biology.

Columbia University scientists have developed a computational method to investigate the relationship between birth month and disease risk. The researchers used this algorithm to examine New York City medical databases and found 55 diseases that correlated with the season of birth. Overall, the study indicated people born in May had the lowest disease risk, and those born in October the highest. The study was published this week in the Journal of American Medical Informatics Association.

“This data could help scientists uncover new disease risk factors,” said study senior author Nicholas Tatonetti, PhD, an assistant professor of biomedical informatics at Columbia University Medical Center (CUMC) and Columbia’s Data Science Institute. The researchers plan to replicate their study with data from several other locations in the U.S. and abroad to see how results vary with the change of seasons and environmental factors in those places. By identifying what’s causing disease disparities by birth month, the researchers hope to figure out how they might close the gap.

Comorbidity between Mendelian disease and cancer
Researchers in the Rabadan Lab have found that comorbidity between Mendelian diseases and cancer may result from shared genetic factors.

Genetic diseases can arise in a variety of ways. Mendelian disorders, for example, occur when specific mutations in single genes — called germline mutations — are inherited from either of one’s two parents. Well-known examples of Mendelian diseases include cystic fibrosis, sickle cell disease, and Duchenne muscular dystrophy. Other genetic diseases, including cancer, result from somatic mutations, which occur in individual cells during a person’s lifetime. Because the genetic origins of Mendelian diseases and cancer are so different, they are typically understood to be distinct phenomena. However, scientists in the Columbia University Department of Systems Biology have found evidence that there might be interesting genetic connections between them. 

In a paper just published in Nature Communications, postdoctoral research scientist Rachel Melamed and colleagues in the laboratory of Associate Professor Raul Rabadan report on a new method that uses knowledge about Mendelian diseases to suggest mutations involved in cancer. The study takes advantage of an enormous collection of electronic health records representing over 110 million patients, a substantial percentage of US residents. The authors show that clinical co-occurrence of Mendelian diseases and cancer, known as comorbidity, can be tied to genetic changes that play roles in both diseases. The paper also identifies several specific relationships between Mendelian diseases and the cancers melanoma and glioblastoma.

Some factors in the expo some

The exposome incorporates factors such as the environment we inhabit, the food we eat, and the drugs we take.

Although genomics has dramatically improved our understanding of the molecular origins of certain human genetic diseases, our health is also influenced by exposures to our surrounding environment. Molecules found in food, air and water pollution, and prescription drugs, for example, interact with genetic, molecular, and physiologic features within our bodies in highly personalized ways. The nature of these relationships is important in determining who is immune to such exposures and who becomes sick because of them.

In the past, methods for studying this interface have been limited because of the complexity of the problem. After all, how could we possibly cross-reference a lifetime’s worth of exposures with individual genetic profiles in any kind of meaningful way? Recently, however, an explosion in the generation of quantitative data related to the environment, health, and genetics — along with new computational methods based in machine learning and bioinformatics — have made this landscape ripe for exploration.

At this year’s South by Southwest Interactive Festival in Austin, Texas, Department of Systems Biology Assistant Professor Nicholas Tatonetti and his collaborator Chirag Patel (Harvard Medical School) discussed the remarkable new opportunities that “big data” approaches offer for investigating this landscape. Driving Tatonetti and Patel’s approach is a concept called the exposome. First proposed by Christopher Wild (University of Leeds) in 2005, an exposome represents all of the environmental exposures a person has experienced during his or her life that could play a role in the onset of chronic diseases. Tatonetti and Chirag’s presentation highlighted how investigation of the exposome has become tractable, as well as the important roles that individuals can play in supporting this effort.

In the following interview, Dr. Tatonetti discusses some of the approaches his team is using to explore the exposome, and how the project has evolved out of his previous research.

Autism Spectrum Disorders Genetic Network

Network of autism-associated genes. (Credit: Dennis Vitkup)

The following article is reposted with permission from the Columbia University Medical Center Newsroom. Find the original here.

People with autism have a wide range of symptoms, with no two people sharing the exact type and severity of behaviors. Now a large-scale analysis of hundreds of patients and nearly 1000 genes has started to uncover how diversity among traits can be traced to differences in patients’ genetic mutations. The study, conducted by researchers at Columbia University Medical Center, was published Dec. 22 in the journal Nature Neuroscience.

Autism researchers have identified hundreds of genes that, when mutated, likely increase the risk of developing autism spectrum disorder (ASD). Much of the variability among people with ASD is thought to stem from the diversity of underlying genetic changes, including the specific genes mutated and the severity of the mutation.

“If we can understand how different mutations lead to different features of ASD, we may be able to use patients’ genetic profiles to develop accurate diagnostic and prognostic tools and perhaps personalize treatment,” said senior author Dennis Vitkup, PhD, associate professor of systems biology and biomedical informatics at Columbia University’s College of Physicians & Surgeons.

Searches for hyperglycemia-related terms

Percentage of users in each of the three user groups searching for hyperglycemia-related terms, computed per week over 12 months of search log data. Background refers to the fraction of all searchers who search for hyperglycemia-related symptoms or terminology independent of the presence of the drugs in the users’ search histories.

Although the US Food and Drug Organization and other agencies collect and analyze reports on adverse drug effects, alerts for single drugs and drug-drug interactions are often delayed due to the time it takes to accumulate evidence. Columbia University Department of Systems Biology faculty member Nicholas Tatonetti, in collaboration with investigators at Stanford University and Microsoft Research, hypothesized that Internet users can provide early clues of adverse drug events as they seek information on the web concerning symptoms they are experiencing. A new paper explains their results.

As a test, Tatonetti and colleagues asked whether it would be possible to detect evidence of an interaction between the antidepressant paroxetine and the anti-cholesterol drug pravastatin by analyzing web search logs from 2010. As a postfoc at Stanford, Tatonetti and colleagues used a data mining algorithm to analyze FDA adverse event reporting records, and retroactively found this combination to be associated with hyperglycemia (high blood sugar) in some patients. In this new project, the researchers analyzed the search logs of millions of Internet users from a period before the above association was identified to see how often they entered search terms related to hyperglycemia and to one or both medications under investigation. (Participants in this study opted in by voluntarily installing a web browser extension that tracked their activity anonymously.)