Nature Publication: “Predicting Functional Effect of Missense Variants using Graph Attention Neural Networks” (Yufeng Shen, PhD)

New Machine Learning Method Predicts Damaging Missense Variants

The premise of genomic medicine is that a person’s genomic characterization can be used to improve medical diagnosis, prognosis, and treatment. Each person, however, has millions of genetic variants, the vast majority of which have negligible impact on their health. How to determine which variants are relevant to a particular condition is a central issue in genomic medicine.

The issue is most pressing in the case of missense variants, which alter a single amino acid in proteins. Only about 20–30 percent of these mutations have a functional impact. Thus the question of how likely a variant is to change protein function—contributing to a health condition—is extremely uncertain for missense variants. As a result, most missense variants in clinical genetic testing are classified as VUS (variant of uncertain significance).

Yufeng Shen, PhD, an associate professor in the Department of Systems Biology and the Department of Biomedical Informatics, and his group have developed a new method for predicting which missense variants are potentially damaging. The method, called gMVP (graphical model for predicting Missense Variant Pathogenicity), uses one of the latest machine learning techniques, a graph attention model, to capture information relevant to predicting which variants are potentially damaging. Their paper, “Predicting Functional Effect of Missense Variants Using Graph Attention Neural Networks,” was published in Nature Machine Intelligence on November 15th, 2022.

The new method uses the coevolution of pairs of amino acid positions in a protein to determine whether a pair is functionally correlated. This makes it possible to pool information across functionally correlated positions that are not close in sequence. Shen and his group used several independent data sets representing different applications of the method, including clinical genetic tests and new disease gene discovery, to evaluate the performance of the method. In all the tests, gMVP substantially out-performed other methods. "Predicting the effects of missense variants,” says Shen, “has been well studied. We usually avoid well-studied problems. But this problem is far from being solved and, more important, even small improvements can lead to real changes in genomic medicine. Anytime we can confidently change the interpretation of a VUS to a damaging variant for a patient, it has the potential to lead to more effective treatment."

"We’ve also had a lot of fun,” adds Shen, “learning the latest machine learning techniques and trying out different modeling methods.”

Read  full article  on Nature Machine Intelligence page.