Machine Learning Based Somatic Mutation Calling in Whole Genome Sequencing Data from Formalin-Fixed, Paraffin-Embedded (FFPE) Tumor Samples
Based on the advances in massively parallel sequencing (i.e. Next Generation Sequencing) technology, whole genome sequencing has become a routine in cancer research. In several studies, genome sequencing using fresh frozen (FF) tumor tissue or cell lines demonstrated its utility in identifying somatic mutations critical for tumor development and tumor progression. Interestingly, the genome sequencing has been done quite few on Formalin-fixed, paraffin-embedded (FFPE) samples, which constitute the standard tumor sample preparation method in clinical pathology. It is known that DNA (or RNA) in the FFPE samples is often damaged during the fixation process and also degraded at room temperature, which is the common FFPE storing environment. The poor quality of tumor specimens from FFPE samples is suspected as a major source of false positive somatic mutations called during the FFPE tumor genome sequencing. In this study, we introduced a simple analytic solution for reducing false positives in somatic mutation calls from FFPE samples using a machine-learning approach.