helixturnhelix |
The helix-turn-helix motif was originally identified as the DNA-binding domain of phage repressors. One alpha-helix lies in the wide groove of DNA; the other lies at an angle across DNA.
% helixturnhelix Report nucleic acid binding motifs Input sequence(s): tsw:laci_ecoli Output report [laci_ecoli.hth]: |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall Sequence database USA [-outfile] report Output report file name Additional (Optional) qualifiers: -mean float Mean value -sd float Standard Deviation value -minsd float Minimum SD -eightyseven boolean Use the old (1987) weight data Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rusashow2 boolean Show the full USA in the report General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outfile] (Parameter 2) |
Output report file name | Report output file | |
Additional (Optional) qualifiers | Allowed values | Default | |
-mean | Mean value | Number from 1.000 to 10000.000 | 238.71 |
-sd | Standard Deviation value | Number from 1.000 to 10000.000 | 293.61 |
-minsd | Minimum SD | Number from 0.000 to 100.000 | 2.5 |
-eightyseven | Use the old (1987) weight data | Boolean value Yes/No | No |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
ID LACI_ECOLI STANDARD; PRT; 360 AA. AC P03023; P71309; Q47338; O09196; DT 21-JUL-1986 (Rel. 01, Created) DT 01-NOV-1997 (Rel. 35, Last sequence update) DT 15-DEC-1998 (Rel. 37, Last annotation update) DE LACTOSE OPERON REPRESSOR. GN LACI. OS Escherichia coli. OC Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae; OC Escherichia. RN [1] RP SEQUENCE FROM N.A. RX MEDLINE; 78246991. RA FARABAUGH P.J.; RT "Sequence of the lacI gene."; RL Nature 274:765-769(1978). RN [2] RP SEQUENCE FROM N.A. RC STRAIN=K12 / MG1655; RX MEDLINE; 97426617. RA BLATTNER F.R., PLUNKETT G. III, BLOCH C.A., PERNA N.T., BURLAND V., RA RILEY M., COLLADO-VIDES J., GLASNER F.D., RODE C.K., MAYHEW G.F., RA GREGOR J., DAVIS N.W., KIRKPATRICK H.A., GOEDEN M.A., ROSE D.J., RA MAU B., SHAO Y.; RT "The complete genome sequence of Escherichia coli K-12."; RL Science 277:1453-1474(1997). RN [3] RP SEQUENCE FROM N.A. RC STRAIN=K12 / MG1655; RA DUNCAN M., ALLEN E., ARAUJO R., APARICIO A.M., CHUNG E., DAVIS K., RA FEDERSPIEL N., HYMAN R., KALMAN S., KOMP C., KURDI O., LEW H., RA LIN D., NAMATH A., OEFNER P., ROBERTS D., SCHRAMM S., DAVIS R.W.; RL Submitted (NOV-1996) to the EMBL/GenBank/DDBJ databases. RN [4] RP SEQUENCE FROM N.A. RA CHEN J., MATTHEWS K.K.S.M.; RL Submitted (MAY-1991) to the EMBL/GenBank/DDBJ databases. RN [5] RP SEQUENCE FROM N.A. RA MARSH S.; RL Submitted (JAN-1997) to the EMBL/GenBank/DDBJ databases. RN [6] RP SEQUENCE OF 1-147; 159-230 AND 233-360. RX MEDLINE; 76091932. RA BEYREUTHER K., ADLER K., FANNING E., MURRAY C., KLEMM A., GEISLER N.; RT "Amino-acid sequence of lac repressor from Escherichia coli. RT Isolation, sequence analysis and sequence assembly of tryptic RT peptides and cyanogen-bromide fragments."; RL Eur. J. Biochem. 59:491-509(1975). RN [7] [Part of this file has been deleted for brevity] CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to license@isb-sib.ch). CC -------------------------------------------------------------------------- DR EMBL; V00294; CAA23569.1; -. DR EMBL; J01636; AAA24052.1; -. DR EMBL; AE000141; AAC73448.1; -. DR EMBL; U73857; AAB18069.1; ALT_INIT. DR EMBL; X58469; CAA41383.1; -. DR EMBL; U86347; AAB47270.1; ALT_INIT. DR EMBL; U72488; AAB36549.1; -. DR EMBL; U78872; AAB37348.1; -. DR EMBL; U78873; AAB37351.1; -. DR EMBL; U78874; AAB37354.1; -. DR PIR; A03558; RPECL. DR PIR; S02540; S02540. DR PDB; 1LCC; 31-JAN-94. DR PDB; 1LCD; 31-JAN-94. DR PDB; 1LTP; 31-OCT-93. DR PDB; 1TLF; 31-JUL-95. DR PDB; 1LBG; 11-JUL-96. DR PDB; 1LBH; 11-JUL-96. DR PDB; 1LBI; 11-JUL-96. DR PDB; 1LQC; 12-FEB-97. DR ECO2DBASE; H039.0; 6TH EDITION. DR ECOGENE; EG10525; LACI. DR PFAM; PF00356; lacI; 1. DR PFAM; PF00532; Peripla_BP_like; 1. DR PROSITE; PS00356; HTH_LACI_FAMILY; 1. KW Transcription regulation; DNA-binding; Repressor; 3D-structure. FT DNA_BIND 6 25 H-T-H MOTIF. FT MUTAGEN 17 17 Y->H: BROADENING OF SPECIFICITY. FT MUTAGEN 22 22 R->N: RECOGNIZE AN OPERATOR VARIANT. FT VARIANT 282 282 Y -> D (IN T41 MUTANT). FT CONFLICT 286 286 S -> L (IN AAA24052, REF. 2, 4 AND 5). FT HELIX 6 13 FT TURN 14 14 FT HELIX 17 24 FT HELIX 32 44 FT TURN 49 50 SQ SEQUENCE 360 AA; 38564 MW; 4CA5A1D6 CRC32; MKPVTLYDVA EYAGVSYQTV SRVVNQASHV SAKTREKVEA AMAELNYIPN RVAQQLAGKQ SLLIGVATSS LALHAPSQIV AAIKSRADQL GASVVVSMVE RSGVEACKAA VHNLLAQRVS GLIINYPLDD QDAIAVEAAC TNVPALFLDV SDQTPINSII FSHEDGTRLG VEHLVALGHQ QIALLAGPLS SVSARLRLAG WHKYLTRNQI QPIAEREGDW SAMSGFQQTM QMLNEGIVPT AMLVANDQMA LGAMRAITES GLRVGADISV VGYDDTEDSS CYIPPSTTIK QDFRLLGQTS VDRLLQLSQG QAVKGNQLLP VSLVKRKTTL APNTQTASPR ALADSLMQLA RQVSRLESGQ // |
The output is a standard EMBOSS report file.
The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq
See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats.
By default helixturnhelix writes a 'motif' report file.
######################################## # Program: helixturnhelix # Rundate: Fri Jul 15 2005 12:00:00 # Report_format: motif # Report_file: laci_ecoli.hth ######################################## #======================================= # # Sequence: LACI_ECOLI from: 1 to: 360 # HitCount: 1 # # Hits above +2.50 SD (972.73) # #======================================= Maximum_score_at at "*" (1) Score 2160.000 length 22 at residues 4->25 * Sequence: VTLYDVAEYAGVSYQTVSRVVN | | 4 25 Standard_deviations: 6.54 #--------------------------------------- #--------------------------------------- |
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
The data files are stored in the standard EMBOSS data directory. The names are:
The old (1987) data has a motif length of 20 residues, whilst the default data (Ehth.dat) has a motif length of 22 residues.
With care these can be replaced to suit your data sets. If the files are placed in the following directories they will be used in preference to the files in the EMBOSS distribution data directory:
# Amino acid counts for 91 Helix-turn-helix (presumed) protein motifs # from Dodd IB and Egan JB (1990) Nucl. Acids. Res. 18:5019-5026. # Sample: 91 aligned sequences # # R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Total Exp # - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ----- --- A 2 1 3 14 10 12 75 6 15 9 1 1 4 3 8 15 4 4 4 11 0 10 212 995 C 0 0 1 1 0 0 0 0 0 3 3 1 1 0 0 0 0 0 0 1 0 3 14 106 D 0 1 0 1 14 0 0 14 1 0 5 0 1 2 0 0 0 0 1 1 0 2 43 556 E 4 5 0 11 26 0 0 16 9 3 3 0 3 12 13 0 0 2 0 1 13 6 127 669 F 4 0 4 0 0 4 0 1 0 10 0 0 0 0 1 0 0 1 1 1 22 0 49 358 G 9 7 1 4 0 0 8 0 0 0 50 0 6 0 7 1 0 3 1 1 0 4 102 761 H 4 3 1 1 2 0 0 3 2 0 5 0 3 3 0 2 0 2 4 5 0 2 42 225 I 10 0 13 3 2 15 0 4 9 4 0 17 0 2 0 1 31 1 4 8 16 1 141 583 K 4 4 6 11 12 1 1 14 11 0 5 2 2 7 2 1 0 5 8 4 5 15 120 516 L 16 1 17 0 1 35 0 3 12 31 0 22 0 2 1 1 22 1 1 12 20 0 198 954 M 7 0 2 1 1 1 0 0 5 7 1 10 0 0 2 0 2 0 0 2 0 1 42 275 N 0 8 0 1 0 0 0 2 1 1 14 0 8 1 4 2 0 4 9 0 0 11 66 383 P 1 6 0 1 0 0 0 0 0 0 0 0 3 13 7 0 0 0 0 0 0 3 34 403 Q 2 1 21 9 11 0 0 9 8 0 0 2 1 17 7 12 0 3 12 5 3 9 132 437 R 9 10 14 9 5 0 1 16 10 0 1 0 1 17 8 7 0 17 28 3 0 16 172 609 S 2 17 0 8 4 1 6 1 2 2 3 0 37 1 25 5 0 29 3 0 1 5 152 552 T 6 24 3 12 1 5 0 2 2 4 0 5 20 4 3 39 0 4 1 0 4 3 142 512 V 7 3 1 1 2 16 0 0 2 12 0 29 0 5 3 3 32 0 7 8 7 0 138 724 W 2 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 2 21 0 0 27 105 Y 2 0 4 3 0 1 0 0 2 4 0 1 1 2 0 2 0 15 5 7 0 0 49 267
Program name | Description |
---|---|
antigenic | Finds antigenic sites in proteins |
digest | Protein proteolytic enzyme or reagent cleavage digest |
epestfind | Finds PEST motifs as potential proteolytic cleavage sites |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
garnier | Predicts protein secondary structure |
hmoment | Hydrophobic moment calculation |
oddcomp | Find protein sequence regions with a biased composition |
patmatdb | Search a protein sequence with a motif |
patmatmotifs | Search a PROSITE motif database with a protein sequence |
pepcoil | Predicts coiled coil regions |
pepnet | Displays proteins as a helical net |
pepwheel | Shows protein sequences as helices |
preg | Regular expression search of a protein sequence |
pscan | Scans proteins using PRINTS |
sigcleave | Reports protein signal cleavage sites |
tmap | Displays membrane spanning regions |
Original program "HELIXTURNHELIX" (EGCG 1990) by
Peter Rice (pmr © ebi.ac.uk)
Informatics Division, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK