Probability Models of Genomic Sequences Give Protein 3D, Mutation Effects and Design
Attributes of living systems are constrained in evolution and the DNA sequence record we see today is result of millions of evolutionary experiments. An alternative to the analysis of conserved attributes ('characters') is analysis of functional interactions ('couplings') that cause conservation. For proteins, the evolutionary sequence record can be exploited to provide exquisitely accurate information about 3D structures and functional sites. Recent progress is based on cheap sequencing as an experimental technology and global probability models under the maximum entropy principle as a key theoretical tool. I will describe how these advances are used in accurate prediction of 3D interactions, complexes, protein plasticity, designing proteins for synthetic biology and therapeutics - and extrapolate to the study of the effects of human genetic variation. I will also describe recent methodological advances and challenges. There is a now major opportunity to link genomic information to phenotype and apply this to concrete engineering and health problems, such as disease likelihood, the emergence of drug resistance. My lab will concentrate on the development of algorithms that address this challenge to infer causality in biological information.