SOFTWARE USERS: To access a browsable list of software and datasets created at the Columbia University Department of Systems Biology, go here.

An important part of systems biology research is to develop improved methods for analyzing large amounts of biological data. Researchers in the Columbia University Department of Systems Biology and its Center for Computational Biology and Bioinformatics have developed a rich, diverse collection of algorithms for analyzing biological data as well as databases of reliable biological information. Our computational tools include methods for inferring regulatory networks, for predicting protein structure and interactions, for analyzing gene sequence and expression data, and for studying genetics and evolution, among other applications.

geWorkbench

Under the auspices of the Center for Multiscale Analysis of Genetic and Cellular Networks (MAGNet), we have developed an integrated, user-friendly platform called geWorkbench, which makes the tools and datasets produced by the center available to researchers anywhere. This interoperable, grid-enabled, state-of-the-art bioinformatics platform allows them to be:

  • integrated with a variety of other existing bioinformatics modules for the analysis, visualization, and management of multiple data modalities
  • assembled into complex bioinformatics workflows and biomedical applications using a simple yet powerful visual front-end and a scripting language.

geWorkbench provides a rich collection of components for the analysis and visualization of genomic data.

In addition to MAGNet tools, geWorkbench provides access to a rich collection of components that support the analysis and visualization of many genomic data types (e.g., gene expression, sequence, structure, and gene networks). Some of these components have been developed de novo while others wrap popular 3rd party software such as Cytoscape, the Multi Experiment Viewer (MEV), and GenePattern. More than 70 geWorkbench modules are available, including:

  • parsers for most common genomic data file formats
  • gene expression analysis algorithms for supervised and unsupervised learning
  • sequence homology, pattern discovery, promoter region prediction
  • gene interaction network inference and visualization
  • 3-D protein modeling
  • gene ontology enrichment analysis
  • and many others

geWorkbench has also been supported through the cancer Biomedical Informatics Grid (caBIG) initiative and leverages many of the caBIG technologies. For example, computationally intensive analyses are wrapped and deployed as grid services using caGrid, the grid middleware layer of caBIG. Similarly, caGrid as well as the cancer Bioinformatics Infrastructure Objects (caBIO) programmatic interface are used in order to provide seamless access to remote data and annotations sources. For instance, geWorkbench offers an integrated interface to the caArray genomic data repository and it also allows the retrieval of gene, pathway, and disease information from the Cancer Genome Anatomy Project (CGAP), the NCI-Nature Pathway Interaction Database, and the Cancer Gene Index (CGI), to name a just a few.

Open source development and user support

geWorkbench is an open source Java-based platform and contributions by members of the community are welcome and encouraged. Access to the geWorkbench source code repository and to the latest production code releases is available through the project's gForge page. Additional technical documentation, including code samples, can be found in the "Developers" section of the geWorkbench web site. Finally, support for both developers and end users of geWorkbench is offered through the Molecular Analysis Tools Knowledge Center which provides access to documentation, FAQs, knowledge base entries, and user forums for community and expert support.