seqmatchall |
The larger the specified word size, the faster the comparison will proceed. Regions whose stretches of identity are shorter than the word size will be missed. You should therefore choose a word size that is small enough to find those regions of similarity you are interested in within a reasonable time-frame.
Here is an example using an increased word size to avoid accidental matches:
% seqmatchall All-against-all comparison of a set of sequences Input sequence set: tembl:eclac* Word size [4]: 15 Output alignment [eclac.seqmatchall]: |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqset Sequence set USA -wordsize integer Word size [-outfile] align Output alignment file name Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -aformat2 string Alignment format -aextension2 string File name extension -adirectory2 string Output directory -aname2 string Base file name -awidth2 integer Alignment width -aaccshow2 boolean Show accession number in the header -adesshow2 boolean Show description in the header -ausashow2 boolean Show the full USA in the alignment -aglobal2 boolean Show the full sequence in alignment General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence set USA | Readable set of sequences | Required |
-wordsize | Word size | Integer 2 or more | 4 |
[-outfile] (Parameter 2) |
Output alignment file name | Alignment output file | |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
The sequences must be either all protein or all nucleic acid.
######################################## # Program: seqmatchall # Rundate: Fri Jul 15 2005 12:00:00 # Align_format: match # Report_file: eclac.seqmatchall ######################################## #======================================= # # Aligned_sequences: 2 # 1: ECLAC # 2: ECLACA #======================================= 1832 ECLAC + 5646..7477 ECLACA + 1..1832 #======================================= # # Aligned_sequences: 2 # 1: ECLAC # 2: ECLACI #======================================= 1113 ECLAC + 49..1161 ECLACI + 1..1113 #======================================= # # Aligned_sequences: 2 # 1: ECLAC # 2: ECLACY #======================================= 1500 ECLAC + 4305..5804 ECLACY + 1..1500 #======================================= # # Aligned_sequences: 2 # 1: ECLAC # 2: ECLACZ #======================================= 3078 ECLAC + 1287..4364 ECLACZ + 1..3078 #======================================= # # Aligned_sequences: 2 # 1: ECLACA # 2: ECLACY #======================================= 159 ECLACA + 1..159 ECLACY + 1342..1500 #======================================= # # Aligned_sequences: 2 # 1: ECLACY # 2: ECLACZ #======================================= 60 ECLACY + 1..60 ECLACZ + 3019..3078 #--------------------------------------- #--------------------------------------- |
ECLAC (the complete E.coli lac operon) matches ECLACI ECLACZ ECLACY and ECLACA (the individual genes), and there is a short overlap between ECLACY and the flanking genes ECLACZ and ECLACA
The output is a list of regions of identity in pairs of sequences, each consisting of one line with 7 columns of data separated by TABs or space characters.
The columns of data consist of:
Program name | Description |
---|---|
matcher | Finds the best local alignments between two sequences |
supermatcher | Match large sequences against one or more other sequences |
water | Smith-Waterman local alignment |
wordmatch | Finds all exact matches of a given size between 2 sequences |
polydot will give a graphical view of the same matches.