October 27, 2015

Graduate Course Focuses on Foundations of Deep Sequencing

Deep sequencing class

A new team-taught course covers both the experimental and analytical basics of next-generation sequencing. Assistant Professor Chaolin Zhang led the discussion in a recent class. Photo: Lynn Saville.

As the cost of next-generation sequencing has fallen, it has become a ubiquitous and indispensable tool for research across the biomedical sciences. DNA and RNA sequencing — along with other technologies for profiling phenomena such as de novo mutations, protein-nucleic acid interactions, chromatin accessibility, ribosome activity, and microRNA abundance — now make it possible to observe multiple layers of cellular function on a genome-wide scale.

Regardless of a biologist’s chosen area of investigation, such methods have made it possible to explore many exciting new kinds of problems. At the same time, however, it has also dramatically transformed the expertise that young scientists need to develop in order to participate in cutting-edge biological research. Bringing students up to speed with the pace of change in next-generation sequencing has posed a particular challenge for educators.

Now, a new multidisciplinary, graduate-level course organized by the Columbia University Department of Systems Biology is enabling young investigators to begin incorporating these powerful new tools into their studies and future research. Designed by assistant professors Yufeng Shen, Peter Sims, and Chaolin Zhang, the course covers both the experimental principles of next-generation sequencing and key statistical methods for analyzing the enormous datasets that such technologies produce. In this way, it gives students a strong grounding in principles that are critical for more advanced graduate courses as well as the ability to begin applying deep sequencing technologies to investigate the questions they are interested in pursuing.

As Dr. Sims explains, “Whether you are a graduate student in systems biology, biochemistry, or microbiology, the chance that you are going to be doing next-generation sequencing is pretty high. At the same time, it’s completely not taught at the undergraduate level. There is no text book nor is there any time in a typical undergraduate biology curriculum to get into this in any kind of detail. Even at top-tier universities students come into graduate school without having any experience with it, and often they’re expected to jump right into this kind of research. We decided that this was a problem we had to fix.”

“Whether you are a graduate student in systems biology, biochemistry, or microbiology, the chance that you are going to be doing next-generation sequencing is pretty high."

The course, simply titled Deep Sequencing, is designed to provide the basics in both the experimental and analytical dimensions of the discipline. Among the topics covered are the history and development of modern sequencing technologies, an introduction to foundational statistics and algorithms, laboratory and analysis techniques for whole genome and exome sequencing and their applications in medical genetics, and methods related to RNA-seq and the study of transcriptional and post-transcriptional regulation. The course also includes a focus on cancer genomics and insights into new third- and fourth-generation sequencing technologies such as single-cell sequencing and analysis.

Teaching duties are shared among the three faculty members, with each professor covering topics that are closest to his own field of expertise. “No one faculty member could teach this whole course and do a good job,” Sims points out. “If you want to understand it properly, deep sequencing is a big topic and it’s really necessary to have all three instructors involved.”

A physical chemist by training whose current research focuses on the development of new sequencing technologies, Dr. Sims is teaching class sessions on the experimental principles of deep sequencing. Covering the computational dimensions of the technology are Dr. Shen, an expert in computational genomics who uses genome and exome sequence to hunt human disease genes, and Dr. Zhang, who utilizes both experimental and computational approaches to study the function of RNA binding proteins.

As Zhang explains, “What makes this course unique is how it combines several important aspects of deep sequencing. The three instructors have very different backgrounds, but each of us relies heavily on deep sequencing in our research and has first-hand experience of the most recent developments. In this way our expertise is very complementary.”

Over the course of the semester students attend lectures, lead discussions of key scientific articles, and participate in a workshop focusing on the analysis of RNA-seq data. The workshop also helps students with diverse backgrounds to develop basic skills of computational data analysis.

In addition, students have organized into teams that are independently pursuing semester-long projects. This assignment involves designing and analyzing deep sequencing data that address a real biological question, a task that benefits from the enormous amount of such data now publicly available on the Internet.

Peter Sims, Yufeng Shen, and Chaolin Zhang
Peter Sims, Yufeng Shen, and Chaolin Zhang worked together to design the course.

As Shen explains, “We ask students to find a question that has never been systematically investigated before. We believe the best training is to solve a real world problem using the arsenal of experimental and computational knowledge that they take away from the lectures.”

“We think this is a very exciting opportunity to make some interesting discoveries,” Sims adds. “If the students can formulate an interesting question, there is probably a dataset available for them to answer it.”

In addition to practice in articulating scientific problems and applying the analytical skills they are learning, group-based collaboration also gives students valuable experience in learning to build and work within the context of interdisciplinary teams. Important to the success of the class is that it includes students pursuing degrees in biological sciences as well as others specializing in computational fields like computer science or electrical engineering. These interests are proving to be highly synergistic, and in a sense the course is a microcosm of the interdisciplinary nature of systems biology itself.

The response to the course has been extremely positive. Its allotted for-credit slots were filled quickly and an additional 12 to 15 students have been auditing it as well. Interestingly, the students come not just from the Department of Systems Biology, but also from a wide range of other departments, both at Columbia University Medical Center and at Columbia’s Morningside Heights campus. The impressive turnout suggests that the course is addressing an important educational need that exists in many different graduate programs. And considering how fundamental next-generation has become to biological investigation today, it could grow to become an important component of graduate education across the entire university.

Chris Williams