Genetic Data Analysis ×



Andrea Califano
Andrea Califano, Dr

The migration away from “one-size-fits-all” medicine, particularly in the areas of cancer detection and treatment, holds great promise for patients and the field of precision medicine. Demand and jobs are increasing for researchers, clinicians and professionals who are at home collecting, analyzing and using more and newer forms of data, according to a recent feature reported in Science magazine which spotlights Dr. Andrea Califano , founding chair of the Department of Systems Biology

In the field of oncology, innovations continue to grow rapidly in precision, or targeted medicine, as clinicians seek to find better treatments for specific kinds of cancer, rather than take a blanket approach via the traditional trifecta of radiation, chemotherapy, and surgery. To do so, they must test patients, note mutations, and identify biomarkers to determine what treatments could work best with the fewest side effects.

Scientific breakthroughs, in these areas and more, have led to greater understanding of genes and their functions and have created new opportunities for precision medicine—and for those with technical, research, and clinical skills eager to work in this ever-expanding field. Special consideration will be given to those job applicants who can perform big data analysis and multidisciplinary research. However, new jobs will also emerge in previously unseen areas, such as business, translational medicine, and genetic counseling.

New and powerful tools have aided the precision medicine movement. The Human Genome Project, the first complete mapping of human genes, published its preliminary results in 2001. The project’s numerous benefits include knowing the location of the approximately 20,500 genes identified in the body and gaining a clearer understanding of how genes areorganized and operate.


NORI (Non-coding RNA Identification) is a computational tool that identifies lncRNAs using next generation sequencing.








A tool for detecting copy number variants (CNVs) whole genome data based on both depth of coverage and mate pair information.


Xplorigin is a software tool for deciphering population ancestry of different regions along an individual's genome. The tool is based on a generalized hidden Markov model, trained on data from the International HapMap Project.


SIXPAC (Search for Interactions is Probably Approximately Complete) is an efficient, scalable search algorithm that finds synergy between pairs of physically unlinked SNPs (genome-wide) in large case-control datasets.



OPERA is a tool for power estimation and design of whole-genome resequencing projects aimed at rare variant associations.


MutaGeneSys uses genome-wide genotype data to estimate individual disease susceptibility. It integrates three data sources: the International HapMap project (, whole-genome marker correlation data (description), and the Online Mendelian Inheritance in Man database (OMIM).


A tool for selecting individual for sequencing by total information potential, based on GERMLINE output.


The human leukocyte antigen (HLA) genes play a major role in adaptive immune response and are used to differentiate self cells from non-self cells. HLA genes are hyper-variable with nearly every locus containing over a dozen alleles. This variation plays an important role in autoimmune diesease and organ transplantation. HLA typing by serological methods is time-consuming and expensive. This computational method can be used to infer per-locus HLA types using shared segments that are identical by descent (IBD), inferred from genotype data.


HATS is a tool for calling the amplified alleles and constructing the amplified haplotype within called tumor amplicons.


HADiT (Haplotype Amplification Distortion in Tumors) is a tool for computing and visualizing allelic distortion in tumor SNP data. It implements the amplification distortion test (ADT).


GERMLINE is an algorithm for discovering long shared segments of Identity by Descent (IBD) between pairs of (unrelated) individuals in a large population.


DASH (DASH Associates Shared Haplotypes) is a tool for detecting association to clusters of IBD segments detected by Germline. It builds upon pairwise IBD shared segments to infer clusters of IBD individuals.