Handling Informative Missingness of Data in Genetic Association Studies

Cameron Palmer
Itsik Pe'er Lab

Irving Cancer Research Center
1130 St. Nicholas Avenue
Room 816
New York, NY 10032

Some proportion of predictor data collected for genetic association studies will be sporadically missing. As more sources of data are integrated into association models, more complex patterns of missingness will arise. The current solution for handling missing genotypes and untyped variants in GWAS datasets, genotype imputation, generates probabilistic genotypes. Analogously, variants called from sequencing are probabilistic values that are thresholded to generate calls or missing values. The patterns of missingness in variant datasets are frequently correlated with the missing values, and cannot justifiably be ignored. Current statistical methods employed in association studies must be scalably adapted to more rigorously handle missingness; this requires a fundamental restructuring of most statistical tools.


Event Series Name
Department of Systems Biology Student Seminar