Genome Sequencing and Analysis
Frequently Asked Questions
Choosing a library prep and sequencing package
What is the difference between STRPOLYA, RIBOZERO, and CLONTECH?
STRPOLYA libraries are created by using a poly-A pulldown for mRNA, libraries are made with all the transcripts that are pulled down. RIBOZERO libraries are created by depleting the ribosomal RNA, libraries are made with all the remaining transcripts. CLONTECH is a low input method for samples with less than 100ng of starting material.
STRPOLYA and RIBOZERO both create strand-specific libraries, meaning you should not have antisense reads in your data. The CLONTECH protocol does not have strand specificity.
How do I choose a method?
If your experimental goal is coding gene expression profiling or differential expression and you have good quality total RNA samples (RIN>8 by Agilent Bioanalyzer or TapeStation), you should choose one of the STRPOLYA packages. The kit is the TruSeq Stranded mRNA Library Prep Kit (Illumina).
If you have a RIN<8 (including FFPE samples), that means the poly-A tails could be degraded and the STRPOLYA method may not work. We would recommend choosing a RIBOZERO package. Even degraded ribosomal RNA should still be removed (though perhaps not as efficiently as with higher RIN samples), and your library will be made with the remaining transcripts. The kit is the TruSeq Stranded Total RNA Library Prep Kit Gold (Illumina).
If you would like to profile non-coding RNA, you should choose a RIBOZERO package regardless of the RIN. The kit is designed to remove rRNA and mtrRNA from human/mouse/rat samples. If you have a different species please inquire at genome@columbia.edu . The kit is the TruSeq Stranded Total RNA Library Prep Kit Gold (Illumina). Please note that small RNA/microRNA species are generally size-selected out of the library.
CLONTECH utilizes the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing (Takara) to reverse transcribe and amplify small amounts of RNA. After creating cDNA, the library is prepared using Nextera XT (Illumina). This should only be chosen if you have starting inputs lower than 100ng. This prep does not retain strand specificity.
It is difficult to compare samples across library prep methods. You should choose one library prep method for all samples you are planning to analyze together. For example, if you have a few samples with a lower RIN, they should all be prepared with RIBOZERO instead of some with RIBOZERO and some with POLYA.
What depth do I need?
All packages are paired-end. We will use ‘reads’ to refer to pairs of reads here.
For coding gene expression profiling in human/mouse/rat (with high quality RNA), the 1P20M is sufficient. Deeper sequencing (40M-80M reads) is recommended if your genes of interest are rare, or if you are interested in detecting structural variations, alternative splicing patterns, or even point mutations, as budget allows.
Because many non-coding RNA species are found along with mRNA, we recommend depper sequencing for RIBOZERO samples. Though we offer a 20M reads RIBOZERO package, it is only recommended for those with a very strong bioinformatics background.
How should I design my RNA-seq experiment?
Can I collect samples in batches?
It is best if you can collect all your samples at once, but it is common practice that researchers collect/extract RNAs at different times using the same protocol.
Can I compare data across batches?
Please choose the same package repeatedly in order to compare different submissions. In addition, please let us know at the time of submission if you’d like to compare the current project with a previous project to ensure that we use the same tools for data analysis. You can re-submit older samples along with newer samples to determine whether there is significant batch effect from library prep and sequencing.
We also include an RNA spike-in control in each sample, which can be used for normalization.
Do I need biological or technical replicates? If so, how many?
Yes, biological replicates are strongly recommended. We suggest at least 3 biological replicates for each treatment/group. We do not make technical replicate libraries from one sample.
Submitting your samples
What are the submission requirements for my samples?
Please see Submitting Your Samples for information.
Do I need to do a Bioanalyzer/TapeStation report for RNA-Seq samples?
Yes. We require that the Bioanalyzer/TapeStation report not be older than a week at the time of submission. If samples have undergone a freeze/thaw cycle or have been shipped since then we strongly recommend rerunning the sample. We cannot provide a guarantee for data quality if the above conditions are not met.
A Bioanalyzer/TapeStation report is not required for gDNA samples, but is required for cDNA samples.
How do I Bioanalyze/TapeStation my samples?
Researchers can send samples to the Molecular Pathology Core . If you dilute your samples, please indicate this in your Sample Form.
If you would like to purchase your own reagents, you may use our Bioanalyzer or TapeStation as a self-service instrument. Please contact genome@columbia.edu to set up an appointment. This is recommended for advanced users only, we do not offer training on this instrument.
External users can submit their samples directly to Molecular Pathology. If no report is submitted, we will submit them and charge an additional per sample fee.
Custom projects/self-prepared libraries
What should I do if my request is not RNA-seq/exome sequencing/whole genome sequencing?
If you are interested in ChIP-seq, panel sequencing, or other applications for which we don’t have a full service, we recommend preparing your own libraries with Illumina platform-compatible protocols, and we will be happy to sequence them for you. Please see below for self-prepared library requirements.
What should I do with self-prepared libraries?
You will need to quantify self-prepared libraries correctly, as this is critical for the success of a sequencing run. We recommend using qPCR, or using the average fragment length from the Bioanalyzer with a fluorescent reading to calculate the molarity (Qubit or PicoGreen).
We require at least a 10 nM pooled library in 10 μl water. We will load the sampled based on your quantification. We require all index information at the time of submission (name of index and sequence) and a Bioanalyzer report of your pooled library. Please complete the Self-Prepared Library Sample Form and send it to genome@columbia.edu .
Self-Service
What do you mean by self-service?
If you are interested in ChIP-seq, panel sequencing, or other applications for which we don’t have a full service package, we recommend preparing your own libraries with Illumina platform-compatible protocols, and we will be happy to help you get them sequenced on our smaller benchtop sequencers.
We host three NextSeq instruments and a MiSeq and offer 24/7 access to these instruments. Once a researcher is trained, she/he is able to use them whenever the instruments are available.
We do not offer bioinformatics support on self-prepared libraries
NextSeq Self-Service
The NextSeq has two read depths, 130 million reads (Mid Output, MO) and 400 million reads (High Output, HO). There are three different read lengths: 75 cycles (HO only), 150 cycles, and 300 cycles. All kits can be run as either single- or paired-end (though there is a significant quality drop halfway through if you attempt to do 300-cycles single-end).
The Genome Center stocks the 75-cycle and 150-cycle kits, as well as a limited number of 300 cycle kits. We require you to purchase your kits through us, please see our Pricing page for pricing information. Purchasing your own kit and using our instruments is not permitted. The price is all inclusive (kit, service contract, instrument usage). Please contact us at genome@columbia.edu for training.
MiSeq Self-Service
The MiSeq has more extensive different sequencing depths and read lengths, so we do not stock these kits. You are required to purchase them from Illumina yourself. The service fee for the MiSeq is $199.
What if I need many NextSeq runs? Can I run on the NovaSeq?
Our standard sequencing run on the NovaSeq is a 2x100bp S4 run. We can sell individual lanes on these S4 runs if the researcher prepares and pools his/her own libraries into one pool. The depth is 2-2.5B reads per lane, see Pricing page for pricing information. Please note that custom primers are not compatible with this option unless you are purchasing a full run.
If you need high depth sequencing of a different read configuration (i.e. NOT 2x100bp), please contact us at genome@columbia.edu to discuss your project.
Post-sequencing questions
What is your turnaround time?
We aim for 3-4 weeks, but it can sometimes be shorter or longer depending on what else we have in the lab. If you need a project delivered by a certain date for grants or publications, please contact us and we can try to accommodate you to the best of our abilities. Please let us know this at the time of submission so we can plan properly.
What are your deliverables?
Our RNA-Seq pipeline performs sequencing QC, alignment to the genome, and read counting to obtain transcriptome data. Your data release will include the raw sequencing data, RNA expression data on both the gene and the transcript level, and optional differential expression analysis upon request. We also provide md5 files to check the integrity of your data after downloading, as well as expression h5 files which can be processed in R for your own computational needs.
How do you release the data? How long will you store our data? Can we get our samples back?
We will give you a web link to download the data. You are also welcome to bring over a hard drive and we can transfer the data to the drive.
Data will be stored on our server for 1 month before it is deleted permanently, unless otherwise noted for very large projects.
Remaining samples will be stored for 1 week before being discarded. Your data release email will include a note indicating that you can pick up any remaining sample material. We will store libraries for 6 months, after which time they will be discarded without warning.