Search Search Status Download FAQ Help

OMSSA Command line options

 

Input

OMSSA can take one of the following input formats:

  • a single dta file using the "-f" option
  • a set of dta files merged into a single file, separated by blank lines.  Use the "-fb" option
  • a set of dta files merged into a single file, separated by xml-like tags that allow tracking of spectra by file name and by number.  Use the "-fx" option.  A Perl script to do the merge is included in the distribution.  It is called dta_merge_OMSSA.pl and is described in the README file. For example, "perl dta_merge_OMSSA.pl -i yeastdtas -n yeast -s 10000" takes all of the dta files in the yeastdtas directory and batches them into sets of 10000, each set into a file starting with "yeast".

 

Command Line Options

Sequence library

-d <String> Blast sequence library to search.  Do not include .p* filename suffixes.
-pc <Integer> The number of pseudocounts to add to each precursor mass bin.

Input format and filename

-f <String> single dta file to search
-fx <String> multiple xml-encapsulated dta files to search
-fb <String> multiple dta files separated by blank lines to search
-fm <String> mgf formatted file
-fp <String> pkl formatted file
-hs <Integer> the minimum number of m/z values a spectrum must have to be searched
-fxml <String> omssa xml search request file (contains search parameters and spectra. overrides command line)
-pm <String> search parameter input in xml format (contains search parameters but no spectra. overrides command line except for name of file containing spectra)

Output results

-o <String> filename for text asn.1 formatted search results
-ob <String> filename for binary asn.1 formatted search results
-ox <String> filename for xml formatted search results
-oc <String> filename for comma separated value (excel .csv) formatted search results

The following options output the search parameters and search spectra in the output results. This is necessary for viewing results in programs that require a spectrum:

-w include spectra and search params in search results

To turn off informational messages (but not error messages), use:

-ni don't print informational messages
 

Mass type and tolerance

-to <Real> product ion mass tolerance in Da
-te <Real> precursor ion mass tolerance in Da
-tez <Integer> scaling of precursor mass tolerance with charge (0 = none, 1= linear)

A precursor ion is the ion before fragmentation and the product ions are the ions generated after fragmentation. These values are specified in Daltons +/- the measured value, e.g. a value of 2.0 means +/- 2.0 Daltons of the measured value.

The tez value allows you to specify how the mass tolerance scales with the charge of the precursor. For example, you may search a precursor assuming that it has a charge state of 2+ and 3+. If you set tez to 1, then the mass tolerance for the +2 charge state will be 2 times the precursor mass tolerance, and for the 3+ charge state it will be 3 times the precursor mass tolerance. If you set tez to 0, the mass tolerance will always be equal to the precursor mass tolerance, irrespective of charge state.

-tom <Integer> product ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact.
-tem <Integer> precursor ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact.

Monoisotopic searching searches spectral peaks that correspond to peptides consisting entirely of carbon-12. Average mass searching searches on the average natural isotopic mass of peptides. Exact mass searches on the most abundant isotopic peak for a given mass range.

-tex <Double> threshold in Da above which the mass of a neutron should be added in an exact mass search.

Preprocessing

Preprocessing is the process of eliminating noise from a spectrum. Normally, you do not need to adjust these options as OMSSA automatically adjusts its preprocessing for best results.

-cl <Real> low intensity cutoff as a fraction of max peak
-ch <Real> high intensity cutoff as a fraction of max peak
-ci <Real> intensity cutoff increment as a fraction of max peak
-w1 <Integer> single charge window in Da
-w2 <Integer> double charge window in Da
-h1 <Integer> number of peaks allowed in single charge window
-h2 <Integer> number of peaks allowed in double charge window
-cp <Integer> eliminate charge reduced precursors in spectra (0=no, 1=yes). Typically turned on for ETD spectra.

Charge Handling

Determination of precursor charge and product ion charges.  Presently, OMSSA estimates which precursors are 1+.  All other precursors are searched with charge from the minimum to maximum precursor charge specified.

-zl <Integer> minimum precursor charge to search when not 1+
-zh <Integer> maximum precursor charge to search when not 1+
-zt <Integer> minimum precursor charge to start considering multiply charged products
-z1 <Double> the fraction of peaks below the precursor used to determine if the spectrum is charge +1
-zc <Integer> should charge +1 be determined algorithmically (1=yes)
-zcc <Integer> how should precursor charges be determined? (1=believe the input file,2=use the specified range)
-zoh <Integer> set the maximum product charge to search

Enzyme specification

Additional enzymes can be added upon request.

-v <Integer> number of missed cleavages allowed
-e <Integer> id number of enzyme to use (trypsin is the default)
-el print a list of enzymes and their corresponding id number
-no <Integer> minimum size of peptides for no-enzyme and semi-tryptic searches
-nox <Integer> maximum size of peptides for no-enzyme and semi-tryptic searches

Ions to search

OMSSA searches two ions series, both of which can be specified.  Normally one of the ion series specified is a forward ion series and the other is a reverse ion series.

-il print a list of ions and their corresponding id number
-i comma delimited list of id numbers of ions to search
-sp <Integer> number of product ions to search
-sb1 <Integer> should first forward (e.g. b1) product ions be searched (1 = no, 0 = yes)
-sct <Integer> should c terminus ions (e.g. y1) be searched (1 = no, 0 = yes)

Taxonomy

By default, OMSSA searches without limiting by taxonomy.  By specifying an NCBI taxonomy id, you can limit your search to a particular organism.  The taxonomy id can by found by searching the NCBI tax browser (enter the scientific name of the organism of interest in the search box and then click the correct search result and then the scientific name in the taxonomy browser to get the numeric taxonomy id).

-x comma delimited list of NCBI taxonomy ids to search (0 = all.  This is the default)

Search speed up parameters

These are options that can speed up the search.  The first two can result in decreased sensitivity.

-hm <Integer> the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded
-ht <Integer> number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide
-nt <Integer> number of search threads, useful for multicore processors. 0 means use all of the cores

Results

-hl <Integer> maximum number of hits retained for one spectrum
-he <Double> the maximum e-value allowed in the hit list

Post translational modifications

To specify modifications, first type in "omssacl -ml" to see a list of modifications available and their corresponding id number.  Then when running the search, specify the id numbers of the modification you wish to apply, e.g. "omssacl -mf 5 -mv 1,8 ...". Multiple PTMs can be specified by placing commas between the numbers without any spaces.  At the present time, the list of allowed post translational modifications will be expanded over time.

-mf  comma delimited list of id numbers for fixed modifications
-mv  comma delimited list of id numbers for variable modifications
-ml  print a list of modifications and their corresponding id number

To add your own user defined modifications, edit the usermod0-29 entries in the mods.xml file. If it is common modification, please contact NCBI so that it can be added to the standard list.

To reduce the combinatorial expansion that results when specifying multiple variable modifications, you can put an upper bound on the number of mass ladders generated per peptide using the -mm option.  The ladders are generated in the order of the least number of modification to the most number of modifications.

-mm <Integer> the maximum number of mass ladders to generate per database peptide

There is an upper bound on the number of combinations of variable mods that can be applied to a peptide from the sequence library. The hard upper bound is 1024, which effectively limits the number of variable modification sites per peptide for an exhaustive search to 10. If you set this number too low, you will miss highly modified peptides. If you set it too high, it will make the e-values less significant by searching for too many possible modifications.

To give an example what this means, assume that the hard limit is 11 and that the theoretical peptide is STYY and you've selected phosphorylation of S, T, and Y as variable mods. The combinations that OMSSA will test are:

STYY
----
0000
1000
0100
0010
0001
1100
1010
1001
0110
0101
0011

where 0 represents no modification and 1 represents a modification at the site indicated by the column the digit is in. As you can see, OMSSA tries the combinations with the least number of variable modifications and then adds modifications until the upper bound is reached.

OMSSA treats cleavage of the initial methionine in each protein record as a variable modification by default. To turn off this behavior use the command line option

-mnm n-term methionine should not be cleaved

Iterative searching

-is <Double> evalue threshold to include a sequence in the iterative search, 0 = all
-ir <Double> evalue threshold to replace a hit, 0 = only if better
-ii <Double> evalue threshold to iteratively search a spectrum again, 0 = always

-foms <String> read in search result in .oms format (binary asn.1).
-fomx <Double> read in search result in .omx format (xml).

Iterative searching is the ability to re-search search results in hopes of increasing the number of spectra identified. To accomplish this, an iterative search may change search parameters, such as using a no-enzyme search, or restrict the sequence search library to sequences already hit.

Sequence library restriction

Restricting the size of the sequence library can improve scores. To restrict the sequences searched to those found in a previous search, set the "is" parameter to an evalue. All sequences from the previous search that were hit with and e-value better than the threshold will be searched -- any other sequences are not searched. Here is an example search sequence:

1. omssacl -d nr -fb mydata.dta -w -ob search1output.oms

2. omssacl -d nr -foms search1output.oms -w -ob search2output.oms -is 0.01

In this search sequence, the first search is against the nr sequence library. The second search is against the sequences in the subset of nr that was hit in the first search with an e-value < 0.01.

Spectrum Restriction

If a spectrum was hit by a sequence in a previous search with a good e-value, you may not want to search this spectrum in any subsequent searches. By using the "ii" parameter, you can set a e-value threshold below which a spectrum will not be re-searched.

Hit Replacement Restriction

If a new hit to a spectrum is better than a previous hit, by default the new hit replaces the old hit. If you wish to replace hits only only above a certain e-value threhold, use the "ir" parameter.

Notes

In iterative searches, the sequence search library must be identical in all ways for each iteration of the search. This can be accomplished by always using the same sequence library file and not updating it in any way between searches. Additionally, iterative searches subsequent to the first must use as input the .omx or .oms files format that includes the spectra ("-w -ox search1.omx" or "-w -ob search1.oms").

Output

OMSSA can create output in XML and ASN.1 using the "-ox", "-o" and "-ob" options. 

The output for a set of spectra is called a response.  A response contains a set of hitsets, each hitset corresponding to a single input spectra.  Each hitset contains a set of hits to a particular peptide sequence.  Each of these hits has an E-value probability score, a charge, and sequence.    Each hit has a set of "PepHits" that correspond to peptides found in the sequence library that have the same sequence as the hit.  Each PepHit has a protein sequence identifier (the NCBI gi, or general identifier), a start position in the protein sequence, a stop position, and a text annotation for the sequence (called a defline).

OMSSA XML output can be converted to other text output formats several different ways. XSLT scripts, which can be easy to use, can be found in the contrib/xslt folder along with a README.txt file. Additionally, a sample Perl parser for the xml output is included in contrib/perl, also with a README.txt file.  To use the sample perl parser, type "perl readOMSSA.pl test.xml" where test.xml is a file containing xml output from OMSSA.

 

Sample command line

Assuming that you have downloaded and installed the BLAST nr database of non-redundant proteins, you can run a sample search using the dta file MSHHWGYGK.dta included in the download.  This dta file contains a spectra for a peptide from bovine carbonic anhydrase II.  The command line is

omssacl -d nr -f MSHHWGYGK.dta -ox MSHHWGYGK.xml

The XML output will be found in MSHHWGYGK.xml.  In this file you should see a significant hit to carbonic anhydrase II with an E-value much less than 0.1.

 

Performance characteristics

OMSSA is designed to perform best with large number of input spectra.  There is a fixed startup cost, so searches with many spectra will be faster per spectra than searches with few spectra.  The number of spectra OMSSA can handle at one time depends on your the configuration of your computer, but a typical computer should be able to handle at least several hundred at once.  The maximum number of spectra that it can handle efficiently depends on the size of the spectra, the size of the sequence library, and the amount of memory on the computer. OMSSA can take advantage of multiple processor or multiple core computers, as long as one OMSSA process is run per core.

 

User defined modifications

Modifications are defined in the mods.xml and usermods.xml files. The usermods.xml file is for user defined modifications -- do not modify the mods.xml file. In the usermods.xml file are 10 user defined modifications. If you wish to use one of these modifications, you will need to edit this file. The fields that you should modify are MSModType, MSModSpec_name, MSModSpec_monomass, MSModSpec_residues_E, and, optionally, those beneath MSModSpec_neutralloss. Edit no other fields. Here is the an unedited user modification entry:

<MSModSpec>
<MSModSpec_mod>
<MSMod value="usermod1">119</MSMod>
</MSModSpec_mod>
<MSModSpec_type>
<MSModType value="modaa">0</MSModType>
</MSModSpec_type>
<MSModSpec_name>User modification 1</MSModSpec_name>
<MSModSpec_monomass>0</MSModSpec_monomass>
<MSModSpec_averagemass>0</MSModSpec_averagemass>
<MSModSpec_n15mass>0</MSModSpec_n15mass>
<MSModSpec_residues>
<MSModSpec_residues_E>X</MSModSpec_residues_E>
</MSModSpec_residues>
</MSModSpec>

First, you need to specify the type of the modification. Types are:

0. modaa -- modification at particular amino acids
1. modn -- at the N terminus of a protein
2. modnaa -- at the N terminus of a protein at particular amino acids
3. modc -- at the C terminus of a protein
4. modcaa -- at the C terminus of a protein at particular amino acids
5. modnp -- at the N terminus of a peptide
6. modnpaa -- at the N terminus of a peptide at particular amino acids
7. modcp -- at the C terminus of a peptide
8. modcpaa -- at the C terminus of a peptide at particular amino acids

As an example, if you are specifying a modification that modifies K and R at the C terminus of peptides, put <MSModType value="modcpaa">8</MSModType>.

Second, edit the name of modification, e.g. <MSModSpec_name>My Modification</MSModSpec_name>.

Third, specify the monoisotopic mass of the modification in Daltons. So if the modification is 167.03 Daltons, put down <MSModSpec_monomass>167.03</MSModSpec_monomass>. Presently, OMSSA ignores the average and n15 mass of the modifications.

Fourth, specify the residues (if any) that the affected by the modification, e.g.

<MSModSpec_residues>
<MSModSpec_residues_E>K</MSModSpec_residues_E>
<MSModSpec_residues_E>R</MSModSpec_residues_E>
</MSModSpec_residues>

If there are no residues affected, delete everything including <MSModSpec_residues> to </MSModSpec_residues>

If this modification has a second mass loss after the precursor mass is measured, enter the mass that is lost, e.g.

<MSModSpec_neutralloss>
<MSMassSet>
<MSMassSet_monomass>97.976896</MSMassSet_monomass>
<MSMassSet_averagemass>97.9952</MSMassSet_averagemass>
<MSMassSet_n15mass>0</MSMassSet_n15mass>
</MSMassSet>

Do not modify the <MSModType> value or content in any way.

 

 

Write to the Help Desk | Disclaimer | Privacy statement | Accessibility |
NCBI Home NCBI Search NCBI SiteMap