Systems Management Resources

We are compiling a list of hardware, software, and databases available at the CGC. As a start, we describe below specialized genomic hardware and software recently purchased from Paracel as part of our AMDeC sponsored bioinformatics facility for New York State affiliated institutions.

Paracel GeneMatcher2 (GM2). Successor of the GeneMatcher-Plus, the instrument used by Celera to help assemble and analyze the Drosophilae and Human genomes. Designed specifically for Celara, each of the 9,216 parallel processors performs the basic suite of sequence alignment and sequence comparison algorithms at speeds and efficiencies unmatched by conventional computers. For example, the GM2 performs 90 billion Smith-Waterman cell updates per second. The GM2 will allow scientists to perform search routines in one hour that required days using our fastest general purpose computers.

Paracel BlastMachine (BM). Implements BLAST – the Basic Local Alignment Search Tool developed for speed and simplicity by the National Center for Biotechnology Information (NCBI) – which is the sequence search algorithm most commonly used by genomic researchers. The BM system is a turnkey software and hardware solution running a Paracel-optimized version of the NCBI BLAST algorithm on a pre-packaged Linux farm, for large-scale sequence similarity analysis. Paracel has rewritten portions of the NCBI BLAST algorithm to improve speed and to accommodate longer query sequence lengths and larger databases.

CAP4 Sequence Assembly Software: CAP4 is a powerful assembly engine, generating accurate contigs and consensus sequences in genomic as well as EST projects. This scalable, high throughput turnkey software was designed for use with the Paracel GM2 and BM and for integration with other popular bioinformatics software. X. Huang and A. Madan, CAP3: A DNA Sequence Assembly Program, Genome Research 9, 868-877 (1999).

Paracel EST Clustering and Assembly Package: Provides a complete system for processing EST sequences into maximally assembled transcript clusters. With rigorous and sensitive algorithms, as well as intelligent use of associated contextual data, sequences pass through a multi-step process that includes: clean-up, comprehensive pairwise comparisons, clustering, assembly and consensus generation.