The MAPGAPS program version 1.0.1: July 27, 2009. Written by: Andrew F. Neuwald The Institute for Genome Sciences and the Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, 801 West Baltimore St. BioPark II, Room 617, Baltimore, MD 21201 Uncompress mapgaps.tar.gz by typing "gunzip mapgaps1_0_1.tar.gz" untar the file by typing "tar xvf mapgaps1_0_1.tar" QUICK START: just run the 'run_example' script in the example directory to see how MAPGAPS works. DETAILS: The mapgaps1_0_1/bin directory contains the following MAPGAPS procedures (compiled for Linux x86_64 (64 bit)): fa2cma: Procedure for converting a fasta-formatted multiple sequence alignment into cma-format, which is the format required to create a array of multiple sequence alignments--the type of input file required by the run_map procedure. Each cma file follows the convention used by PSI-BLAST, where the first sequence serves as a master sequence against which the remaining sequences are aligned. Thus each amino acid residue in the first sequence corresponds to a column in the alignment whereas each deletion '-' in the first sequence corresponds to insertions in the alignment. For these reasons, the first sequence is typically a consensus sequence. press: A simple routine for removing from a fasta file extraneous newline characters, which fa2cma input files must not contain. press command line syntax: press < infile > outfile. run_map: The MAP procedure for creating a multiple-profile alignment from an array of cma-formatted multiple sequence alignments (infile.cma) and a corresponding cma-formatted template alignment (infile.tpl). The template alignment consists of multiple consensus sequences: a consensus sequence of the template itself (the first sequence in the alignment) followed by other consensus sequences, one for each of the alignments in the array of alignments. The order of the consensus sequence in the template must be the same as in the array and the first sequence in each of these alignments must be identical to the corresponding consensus sequence in the template. The MAP procedure outputs a multiple-profile alignment file (.mpa), which is required by the GAPS procedure. The run_map procedure can also generate an array of 'excluded' profiles from an optional input array of excluded profile alignments (infile.xpa). This requires the command line option -exclude. The output excluded profiles and corresponding query sequences are placed into the files infile.xup and infile.xpq, respectively. run_gaps: GAPS procedure for searching either a small set or an entire database of protein sequences for matches against an input multiple-profile alignment (defined by the .mpa + .tpl files), which serves as the query. It can also search a database using only a template alignment, in which case it will create a *.mpa file that can be used for a subsequent search. cma2fa: procedure for converting a cma-formatted multiple sequence alignment into fasta-format. This is useful for converting the output alignment created by the run_gaps program into standard fasta format. cma2rtf: routine for converting a cma file into a rich text format file, which can be viewed using MS Word. run_convert: routine for mapping a family template file onto a superfamily template file. The script "run_example" (in the example directory) demonstrates the use of all of these procedures (except for the press routine).