New Machine Learning Method Predicts Damaging Missense Variants

The premise of genomic medicine is that a person’s genomic characterization can be used to improve medical diagnosis, prognosis, and treatment. Each person, however, has millions of genetic variants, the vast majority of which have negligible impact on their health. How to determine which variants are relevant to a particular condition is a central issue in genomic medicine.

The issue is most pressing in the case of missense variants, which alter a single amino acid in proteins. Only about 20–30 percent of these mutations have a functional impact. Thus the question of how likely a variant is to change protein function—contributing to a health condition—is extremely uncertain for missense variants. As a result, most missense variants in clinical genetic testing are classified as VUS (variant of uncertain significance).

Yufeng Shen, PhD, an associate professor in the Department of Systems Biology and the Department of Biomedical Informatics, and his group have developed a new method for predicting which missense variants are potentially damaging. The method, called gMVP (graphical model for predicting Missense Variant Pathogenicity), uses one of the latest machine learning techniques, a graph attention model, to capture information relevant to predicting which variants are potentially damaging. Their paper, “Predicting Functional Effect of Missense Variants Using Graph Attention Neural Networks,” was published in Nature Machine Intelligence on November 15th, 2022.

Andrea Califano will receive  $6,909,000 over seven years from the National Cancer Institute for “Predicting Cancer Cell Response to Endogenous and Exogenous Perturbations at the Single Cell Level”. The aims of the project are to create the first generation of genome-and proteome-wide network models that can effectively predict the probabilistic, dynamic response of mammalian cells to small molecule and genetic perturbations, as well as their ability to plastically reprogram  across the relatively small number of molecularly distinct states detected in a specific human malignancy. 

Chaolin Zhang will receive a R56 award for $568,846 from NIH/NHGRI for “Mapping proximal and distal splicing-regulatory elements”. The aims of the project are to develop a high-throughput platform technology for screening of splicing-regulatory elements, to facilitate annotation of noncoding regions in the human genome and drug discovery.


Andrea Califano will receive  $4,893,902 over five years from the National Cancer Institute for “Elucidating and Targeting tumor dependencies and drug resistance determinants at the single cell level”. The aims of the project are to elucidate Master Regulators representing non-oncogene dependencies of molecularly distinct malignant subpopulations coexisting in the same tumor to support novel combination and sequential therapy approaches.