einverted |
It will find inverted repeats that include a proprtion of mismatches and gaps (bulges in the stem loop).
einverted uses dynamic programming and thus is guaranteed to find the optimal alignment, but is slower than, for example, a self-by-self BLAST. It can find multiple inverted repeats in a sequence.
einverted does not report overlapping matches.
The original "inverted" program was written to annotate the nematode genome. Excluding overlapping repeats saved problems with simple repeat sequences in this genome.
% einverted tembl:hsts1 Finds DNA inverted repeats Gap penalty [12]: Minimum score threshold [50]: Match score [3]: Mismatch score [-4]: Output file [hsts1.inv]: |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] sequence Sequence USA -gap integer Gap penalty -threshold integer Minimum score threshold -match integer Match score -mismatch integer Mismatch score [-outfile] outfile Output file name Additional (Optional) qualifiers: -maxrepeat integer Maximum separation between the start of repeat and the end of the inverted repeat (the default is 4000 bases). Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of the sequence to be used -send1 integer End of the sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -odirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence USA | Readable sequence | Required |
-gap | Gap penalty | Any integer value | 12 |
-threshold | Minimum score threshold | Any integer value | 50 |
-match | Match score | Any integer value | 3 |
-mismatch | Mismatch score | Any integer value | -4 |
[-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.einverted |
Additional (Optional) qualifiers | Allowed values | Default | |
-maxrepeat | Maximum separation between the start of repeat and the end of the inverted repeat (the default is 4000 bases). | Any integer value | 4000 |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
ID HSTS1 standard; DNA; HUM; 18596 BP. XX AC D00596; XX SV D00596.1 XX DT 17-JUL-1991 (Rel. 28, Created) DT 27-OCT-1998 (Rel. 57, Last updated, Version 2) XX DE Homo sapiens gene for thymidylate synthase, exons 1, 2, 3, 4, 5, 6, 7, DE complete cds. XX KW thymidylate syntase. XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Primates; Catarrhini; Hominidae; Homo. XX RN [1] RP 1-18596 RX MEDLINE; 91056070. RA Kaneda S., Nalbantoglu J., Takeishi K., Shimizu K., Gotoh O., Seno T., RA Ayusawa D.; RT "Structural and Functional Analysis of the Human Thymidylate Synthase RT Gene"; RL J. Biol. Chem. 265:20277-20284(1990). XX DR SWISS-PROT; P04818; TYSY_HUMAN. XX CC These data kindly submitted in computer readable form by: CC Sumiko Kaneda CC National Institute of Genetics CC 1111 Yata CC Mishima 411 CC Japan CC Phone: +81-559-72-2732 CC Fax: +81-559-71-3651 XX FH Key Location/Qualifiers FH FT source 1..18596 FT /chromosome="18" FT /db_xref="taxon:9606" FT /sequenced_mol="DNA" FT /organism="Homo sapiens" FT /clone="lambdaHTS-1 and lambdaHTS-3" FT /map="18p11.32" FT repeat_unit 1..148 FT /note="Alu sequence" FT repeat_unit 202..477 [Part of this file has been deleted for brevity] ttttgttttt agcttcagcg agaacccaga cctttcccaa agctcaggat tcttcgaaaa 15660 gttgagaaaa ttgatgactt caaagctgaa gactttcaga ttgaagggta caatccgcat 15720 ccaactatta aaatggaaat ggctgtttag ggtgctttca aaggagctcg aaggatattg 15780 tcagtcttta ggggttgggc tggatgccga ggtaaaagtt ctttttgctc taaaagaaaa 15840 aggaactagg tcaaaaatct gtccgtgacc tatcagttat taatttttaa ggatgttgcc 15900 actggcaaat gtaactgtgc cagttctttc cataataaaa ggctttgagt taactcactg 15960 agggtatctg acaatgctga ggttatgaac aaagtgagga gaatgaaatg tatgtgctct 16020 tagcaaaaac atgtatgtgc atttcaatcc cacgtactta taaagaaggt tggtgaattt 16080 cacaagctat ttttggaata tttttagaat attttaagaa tttcacaagc tattccctca 16140 aatctgaggg agctgagtaa caccatcgat catgatgtag agtgtggtta tgaactttaa 16200 agttatagtt gttttatatg ttgctataat aaagaagtgt tctgcattcg tccacgcttt 16260 gttcattctg tactgccact tatctgctca gttccttcct aaaatagatt aaagaactct 16320 ccttaagtaa acatgtgctg tattctggtt tggatgctac ttaaaagagt atattttaga 16380 aataatagtg aatatatttt gccctatttt tctcatttta actgcatctt atcctcaaaa 16440 tataatgacc atttaggata gagttttttt tttttttttt taaactttta taaccttaaa 16500 gggttatttt aaaataatct atggactacc attttgccct cattagcttc agcatggtgt 16560 gacttctcta ataatatgct tagattaagc aaggaaaaga tgcaaaacca cttcggggtt 16620 aatcagtgaa atatttttcc cttcgttgca taccagatac ccccggtgtt gcacgactat 16680 ttttattctg ctaatttatg acaagtgtta aacagaacaa ggaattattc caacaagtta 16740 tgcaacatgt tgcttatttt caaattacag tttaatgtct aggtgccagc ccttgatata 16800 gctatttttg taagaacatc ctcctggact ttgggttagt taaatctaaa cttatttaag 16860 gattaagtag gataacgtgc attgatttgc taaaagaatc aagtaataat tacttagctg 16920 attcctgagg gtggtatgac ttctagctga actcatcttg atcggtagga ttttttaaat 16980 ccatttttgt aaaactattt ccaagaaatt ttaagccctt tcacttcaga aagaaaaaag 17040 ttgttggggc tgagcactta attttcttga gcaggaagga gtttcttcca aacttcacca 17100 tctggagact ggtgtttctt tacagattcc tccttcattt ctgttgagta gccgggatcc 17160 tatcaaagac caaaaaaatg agtcctgtta acaaccacct ggaacaaaaa cagattttat 17220 gcatttatgc tgctccaaga aatgctttta cgtctaagcc agaggcaatt aattaatttt 17280 tttttttttg acatggagtc actgtccgtt gcccaggctg cagtgcagtg gcgcaatctt 17340 ggctcactgc aacctccacc tcccaggttc aagtgattct cctgcctcag cctcccatgt 17400 agctgggatc acaggcacct gccaccatgc ccggctaatt ttttgtattt tttgtagaga 17460 cagggtttca ccatgttggc caggctggtc tcaaacacct gacctcaaat gatccacctg 17520 cctcagcctc ccaaagtgtt gggattacag gcgtaagcca ccatgcccag ccctgaatta 17580 atatttttaa aataagtttg gagactgttg gaaataatag ggcagaggaa catattttac 17640 tggctacttg ccagagttag ttaactcatc aaactctttg ataatagttt gacctctgtt 17700 ggtgaaaatg agccatgatc tcttgaacat gatcagaata aatgccccag ccacacaatt 17760 gtagtccaaa ctttttaggt cactaacttg ctagatggtg ccaggttttt ttgcacaagg 17820 agtgcaaatg ttaagatctc cactagtgag gaaaggctag tattacagaa gccttgtcag 17880 aggcaattga acctccaagc cctggccctc aggcctgagg attttgatac agacaaactg 17940 aagaaccgtt tgttagtgga tattgcaaac aaacaggagt caaagcttgg tgctccacag 18000 tctagttcac gagacaggcg tggcagtggc tggcagcatc tcttctcaca ggggccctca 18060 ggcacagctt accttgggag gcatgtagga agcccgctgg atcatcacgg gatacttgaa 18120 atgctcatgc aggtggtcaa catactcaca caccctagga ggagggaatc agatcggggc 18180 aatgatgcct gaagtcagat tattcacgtg gtgctaactt aaagcagaag gagcgagtac 18240 cactcaattg acagtgttgg ccaaggctta gctgtgttac catgcgtttc taggcaagtc 18300 cctaaacctc tgtgcctcag gtccttttct tctaaaatat agcaatgtga ggtggggact 18360 ttgatgacat gaacacacga agtccctctg agaggttttg tggtgccctt taaaagggat 18420 caattcagac tctgtaaata tccagaatta tttgggttcc tctggtcaaa agtcagatga 18480 atagattaaa atcaccacat tttgtgatct atttttcaag aagcgtttgt attttttcat 18540 atggctgcag cagctgccag gggcttgggg tttttttggc aggtagggtt gggagg 18596 // |
Score 236: 108/130 ( 83%) matches, 0 gaps 13 gctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaa 142 ||||| | ||||||||||||| |||||| |||||||| |||||| |||||||||||||||| ||||| ||| || ||||||||||||| || ||||| ||||| | | |||||||||||||||| 328 cgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggtttttttttttttttttt 199 Score 228: 108/132 ( 81%) matches, 0 gaps 11 cagctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaa 142 ||||| | ||||||||||||| |||||| |||||||| |||||| |||||||||||||||| ||||| ||| || ||||||||||||| || ||||| ||||| | | |||||||||||||||| 330 accgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggtttttttttttttttttt 199 Score 226: 110/136 ( 80%) matches, 0 gaps 7 gtcccagctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaa 142 || ||||| | ||||||||||||| |||||| |||||||| |||||| |||||||||||||||| ||||| ||| || ||||||||||||| || ||||| ||||| | | |||||||||||||||| 334 caccaccgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggtttttttttttttttttt 199 Score 221: 111/139 ( 79%) matches, 0 gaps 4 gtagtcccagctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaa 142 | || ||||| | ||||||||||||| |||||| |||||||| |||||| |||||||||||||||| ||||| ||| || ||||||||||||| || ||||| ||||| | | |||||||||||||||| 337 ccgcaccaccgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggtttttttttttttttttt 199 Score 216: 112/142 ( 78%) matches, 0 gaps 1 cctgtagtcccagctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaa 142 | | || ||||| | ||||||||||||| |||||| |||||||| |||||| |||||||||||||||| ||||| ||| || ||||||||||||| || ||||| ||||| | | |||||||||||||||| 340 gacccgcaccaccgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggtttttttttttttttttt 199 Score 199: 54/170 ( 31%) matches, 1 gaps 1 -cctgtagtcccagctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaagaccgccagggctcaaacaaaaaacctc 170 ||| | | | | | | | | | || || | | | | | || | | | | || | | |||||||||||||||| | | | || || 342 tcgacccgcaccaccgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggttttttttttttttttttttttttttttctggcggtcccgaaaag 172 Score 175: 56/171 ( 32%) matches, 1 gaps 1 -cctgtagtcccagctacgcgagaggctgaggcagcagaattacttgaacccaggaggcggaggttgcagtgagccgagatcgcgccactgcactccagcctgggtgagagagcgagactctgtctcaaaaaaaaaaaaaaaagaccgccagggctcaaacaaaaaacctcg 171 | | | | | || | | |||||| | | || || | | | | | | || | | | | | | ||||||||||||||| | | | | | | 344 aatcgacccgcaccaccgatgaaccctccgactccgtcttcttaacgaacttggaccctccgtctccaacgtcactcggttctaggctggtaacatgaggtcggacccgctgtctcgttctgacagggttttttttttttttttttttttttttttctggcggtcccgaaaa 173 Score 172: 100/120 ( 83%) matches, 4 gaps 345 tttttgtacttttagtagagacgggggtttcaccatgttgtccaggctggtcttgaactcctgacctcaggtgatccacccgcctcggccccccaaagtactaggattacaggcgtgagccacc 468 ||||||| | ||||||||||| ||||| || |||||||| ||||||||||| || |||| |||||||||||| | | ||| ||| ||||||| | ||||||||||||||||||||| 3177 aaaaacacaataatcatctctgaccccacagcggtacaaccggtccgaccagggtttaaggattggagtccacta----gactgagtcggagggtttcgccaccctaatgtccgcactcggtgg 3058 Score 164: 128/174 ( 73%) matches, 3 gaps 12128 agaggatttttttttttttttttttttttgagacagagttttgctctgttgcccaggctggaatgcaacggcgtgatcttggctcactgtaacctctgcctcc-tgggttcgagtgattctcctgcctcagcctc-caagtagctgggattaca-gcatgtgccaccatgcctggct 12301 |||| || || || || || ||||| | ||| || | |||||||||||||| |||| ||||| ||||||||| || |||||| | |||| ||| ||||||| | |||| ||| |||| ||||| ||| | ||| |||||| | || ||||||| ||||||| 12749 tctcaaaacaacaacaacaacaaaaatttataagtcccagagtgagacaacgggtcctacctcacgttaccgcactagtactgagtgatgtcggagtttaaggcacccaagtttactaggagagcggaaccggagacctcaacaaccttaatgtccacactcggtggtgtggaccga 12573 Score 80: 44/51 ( 86%) matches, 2 gaps 12246 ctcctgcctcag-cctccaagtagctgggattaca-gcatgtgccaccatgcc 12296 |||||| ||||| | ||||| |||||||||||| ||||| |||||||| || 13938 gaggacagagtcagaaggtttcacgaccctaatgtccgtactcggtggtatgg 13886 Score 130: 74/94 ( 78%) matches, 1 gaps 13890 tggtggctcatgcctgtaatcccagcactttggaagactgagacaggagcaattgcttgaggtctggagttcaataccagcctgggcaacataac 13984 |||||||| ||||||||||| |||| ||||| | || ||||| |||| | | ||||| ||| |||||| | | |||||||| |||||| || 14822 accaccgaatacggacattaaggtcttgaaagcctccgactccgtccaccat-gtgaacttcagtcctcaaactctagtcggaccggttgtactg 14729 |
This is not due to a problem with either program. It is simply because some of the shortest repeats that you find with palindrome's default parameter values are below einverted's default cutoff score - you should decrease the 'Minimum score threshold' to see them.
For example, when palindrome is run with 'em:hsfau1', it finds the repeat:
64 aaaactaaggc 74 ||||||||||| 98 ttttgattccg 88
einverted will not report this as its score is 33 (11 bases scoring 3 each, no mismatches or gaps) with is below the default score cutoff of 50.
If einverted is run as:
% einverted em:hsfau1 -threshold 33
then it will find it:
Score 33: 11/11 (100%) matches, 0 gaps 64 aaaactaaggc 74 ||||||||||| 98 ttttgattccg 88
Anything can be considered to be a repeat if you set the score threshold low enough!
einverted does not report overlapping matches.
The original "inverted" program was written to annotate the nematode genome. Excluding overlapping repeats saved problems with simple repeat sequences in this genome.
Program name | Description |
---|---|
equicktandem | Finds tandem repeats |
etandem | Looks for tandem repeats in a nucleotide sequence |
palindrome | Looks for inverted repeats in a nucleotide sequence |
This application was modified for inclusion in EMBOSS by
Peter Rice (pmr © ebi.ac.uk)
Informatics Division, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK