News

January 29, 2016

New Course Covers Fundamentals of High-Performance Computing

Cluster computer

Students participating in a new course gain experience using the Department of Systems Biology's computing cluster, a Top500 supercomputer dedicated to biological research.

As more and more biological research moves to a “big data” model, the ability to use high-performance computing platforms for analysis is rapidly becoming an essential skill set. To prepare students to work with these new tools more successfully, the Columbia University Department of Systems Biology recently partnered with the Mailman School of Public Health in launching a new graduate level class focused on providing a strong grounding in the fundamental concepts behind the technology.

Developed by Rebecca Yohannes, director of high-performance computing at the Mailman School, the 1.5-credit course addresses both practical challenges users face in programming high-performance computing clusters and theoretical questions that they raise. Over seven weeks students attend lectures, complete practical exercises using the Department of Systems Biology’s high-performance computing cluster, and attend talks by visitors from Amazon, Google, and Isilon (a producer of clustered storage). The course assumes no prior knowledge in supercomputing, and is intended to quickly help budding scientists from other disciplines become comfortable using it for data analysis, modeling, and other research functions.

Hugh Ediet, lead engineer for the Department of Systems Biology’s Information Technology group (DSBIT), has also been closely engaged with the development of the course. He manages the operation of the Department’s high-performance computing cluster and sees the development of the course as a great opportunity for Systems Biology students, and other young investigators around the university, to become comfortable working in a computing environment that is new to many of them when they arrive at Columbia.

"Our goal with the class is to help students hit the ground running with our HPC system.”

“The interesting thing about being at Columbia University Medical Center is that we have people who are brilliant doctors, biologists, chemists, physicists, and statisticians, but they’re not necessarily fully reaping the benefits of the opportunities HPC affords,” Ediet explains. “There are unique considerations that come into play performing an analysis on a cluster made up of thousands of computers, as opposed to working on a single machine. Our goal with the class is to help students hit the ground running with our HPC system.”

Topics being covered include parallel computing theory, as well as examples of the design, analysis, and implementation of high performance computing applications across a variety of scientific disciplines. Students learn about high-performance computing system architecture and the basics of evaluating computing performance. The course also provides an introduction to programming in C and Python as well as guidance in utilizing common software packages and libraries for scientific research.

"Across the sciences," Yohannes says, "making sense of the large amounts of data out there requires high-performance computing, so these are critical skills. We're excited that the Mailman School was able to work with the Department of Systems Biology to design the first introductory course in supercomputing at Columbia, and the only supercomputing course among our peer institutions of public health."

In addition to Yohannes and Ediet, Daniel Bauer, a PhD candidate in the Department of Computer Science, and Rob Lane, manager of Columbia Research Computing Services, are participating in the development and teaching of the course.

Offered this winter, Fundamentals of High-Performance Computing has been enthusiastically received, with even more student demand than had been anticipated. Enrolled students have come from the Mailman School, the Department of Systems Biology, and other departments across Columbia University Medical Center. This strong interest is indicative of a growing awareness that high-performance computing platforms now make it possible to ask exciting new kinds of questions in research across the biological and biomedical sciences.

Chris Williams