restrict |
By default, only one of any group of isoschizomers (enzymes that have the same recognition site and cut positions) is reported (this behaviour can be turned off by setting the qualifier '-limit' to be false.) The reported enzyme from any one group of isoschizomers (the prototype) is specified in the REBASE database and the information is held in the data file 'embossre.equ'. You may edit this file to set your own preferred prototype,if you wish.
% restrict Finds restriction enzyme cleavage sites Input sequence(s): tembl:hsfau Minimum recognition site length [4]: Comma separated enzyme list [all]: Output report [hsfau.restrict]: |
Go to the input files for this example
Go to the output files for this example
Example 2
This gives the lengths of the restriction fragments produced by cutting with all of the specified enzymes.
% restrict -fragments Finds restriction enzyme cleavage sites Input sequence(s): tembl:hsfau Minimum recognition site length [4]: Comma separated enzyme list [all]: Output report [hsfau.restrict]: |
Go to the output files for this example
Example 3
This gives the lengths of the restriction fragments created by cutting with just one of each of the specified enzymes in turn.
% restrict -solofragment Finds restriction enzyme cleavage sites Input sequence(s): tembl:hsfau Minimum recognition site length [4]: Comma separated enzyme list [all]: Output report [hsfau.restrict]: |
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall Sequence database USA -sitelen integer This sets the minimum length of the restriction enzyme recognition site. Any enzymes with sites shorter than this will be ignored. -enzymes string The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI [-outfile] report Output report file name Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -datafile datafile Alternative RE data file -min integer This sets the minimum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut fewer times than this will be ignored. -max integer This sets the maximum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut more times than this will be ignored. -solofragment boolean This gives the fragment lengths of the forward sense strand produced by complete restriction by each restriction enzyme on its own. Results are added to the tail section of the report. -single boolean If this is set then this forces the values of the mincuts and maxcuts qualifiers to both be 1. Any other value you may have set them to will be ignored. -[no]blunt boolean This allows those enzymes which cut at the same position on the forward and reverse strands to be considered. -[no]sticky boolean This allows those enzymes which cut at different positions on the forward and reverse strands, leaving an overhang, to be considered. -[no]ambiguity boolean This allows those enzymes which have one or more 'N' ambiguity codes in their pattern to be considered -plasmid boolean If this is set then this allows searches for restriction enzyme recognition site and cut postions that span the end of the sequence to be considered. -[no]commercial boolean If this is set, then only those enzymes with a commercial supplier will be searched for. This qualifier is ignored if you have specified an explicit list of enzymes to search for, rather than searching through 'all' the enzymes in the REBASE database. It is assumed that, if you are asking for an explicit enzyme, then you probably know where to get it from and so all enzymes names that you have asked to be searched for, and which cut, will be reported whether or not they have a commercial supplier. -[no]limit boolean This limits the reporting of enzymes to just one enzyme from each group of isoschizomers. The enzyme chosen to represent an isoschizomer group is the prototype indicated in the data file 'embossre.equ', which is created by the program 'rebaseextract'. If you prefer different prototypes to be used, make a copy of embossre.equ in your home directory and edit it. If this value is set to be false then all of the input enzymes will be reported. You might like to set this to false if you are supplying an explicit set of enzymes rather than searching 'all' of them. -alphabetic boolean Sort output alphabetically -fragments boolean This gives the fragment lengths of the forward sense strand produced by complete restriction using all of the input enzymes together. Results are added to the tail section of the report. -name boolean Show sequence name Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rusashow2 boolean Show the full USA in the report General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
-sitelen | This sets the minimum length of the restriction enzyme recognition site. Any enzymes with sites shorter than this will be ignored. | Integer from 2 to 20 | 4 |
-enzymes | The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI | Any string is accepted | all |
[-outfile] (Parameter 2) |
Output report file name | Report output file | |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-datafile | Alternative RE data file | Data file | File in the data file path |
-min | This sets the minimum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut fewer times than this will be ignored. | Integer from 1 to 1000 | 1 |
-max | This sets the maximum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut more times than this will be ignored. | Integer up to 2000000000 | 2000000000 |
-solofragment | This gives the fragment lengths of the forward sense strand produced by complete restriction by each restriction enzyme on its own. Results are added to the tail section of the report. | Boolean value Yes/No | No |
-single | If this is set then this forces the values of the mincuts and maxcuts qualifiers to both be 1. Any other value you may have set them to will be ignored. | Boolean value Yes/No | No |
-[no]blunt | This allows those enzymes which cut at the same position on the forward and reverse strands to be considered. | Boolean value Yes/No | Yes |
-[no]sticky | This allows those enzymes which cut at different positions on the forward and reverse strands, leaving an overhang, to be considered. | Boolean value Yes/No | Yes |
-[no]ambiguity | This allows those enzymes which have one or more 'N' ambiguity codes in their pattern to be considered | Boolean value Yes/No | Yes |
-plasmid | If this is set then this allows searches for restriction enzyme recognition site and cut postions that span the end of the sequence to be considered. | Boolean value Yes/No | No |
-[no]commercial | If this is set, then only those enzymes with a commercial supplier will be searched for. This qualifier is ignored if you have specified an explicit list of enzymes to search for, rather than searching through 'all' the enzymes in the REBASE database. It is assumed that, if you are asking for an explicit enzyme, then you probably know where to get it from and so all enzymes names that you have asked to be searched for, and which cut, will be reported whether or not they have a commercial supplier. | Boolean value Yes/No | Yes |
-[no]limit | This limits the reporting of enzymes to just one enzyme from each group of isoschizomers. The enzyme chosen to represent an isoschizomer group is the prototype indicated in the data file 'embossre.equ', which is created by the program 'rebaseextract'. If you prefer different prototypes to be used, make a copy of embossre.equ in your home directory and edit it. If this value is set to be false then all of the input enzymes will be reported. You might like to set this to false if you are supplying an explicit set of enzymes rather than searching 'all' of them. | Boolean value Yes/No | Yes |
-alphabetic | Sort output alphabetically | Boolean value Yes/No | No |
-fragments | This gives the fragment lengths of the forward sense strand produced by complete restriction using all of the input enzymes together. Results are added to the tail section of the report. | Boolean value Yes/No | No |
-name | Show sequence name | Boolean value Yes/No | No |
ID HSFAU standard; RNA; HUM; 518 BP. XX AC X65923; XX SV X65923.1 XX DT 13-MAY-1992 (Rel. 31, Created) DT 23-SEP-1993 (Rel. 37, Last updated, Version 10) XX DE H.sapiens fau mRNA XX KW fau gene. XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Primates; Catarrhini; Hominidae; Homo. XX RN [1] RP 1-518 RA Michiels L.M.R.; RT ; RL Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases. RL L.M.R. Michiels, University of Antwerp, Dept of Biochemistry, RL Universiteisplein 1, 2610 Wilrijk, BELGIUM XX RN [2] RP 1-518 RX MEDLINE; 93368957. RA Michiels L., Van der Rauwelaert E., Van Hasselt F., Kas K., Merregaert J.; RT " fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as RT an antisense sequences in the Finkel-Biskis-Reilly murine sarcoma virus"; RL Oncogene 8:2537-2546(1993). XX DR SWISS-PROT; P35544; UBIM_HUMAN. DR SWISS-PROT; Q05472; RS30_HUMAN. XX FH Key Location/Qualifiers FH FT source 1..518 FT /chromosome="11q" FT /db_xref="taxon:9606" FT /organism="Homo sapiens" FT /tissue_type="placenta" FT /clone_lib="cDNA" FT /clone="pUIA 631" FT /map="13" FT misc_feature 57..278 FT /note="ubiquitin like part" FT CDS 57..458 FT /db_xref="SWISS-PROT:P35544" FT /db_xref="SWISS-PROT:Q05472" FT /gene="fau" FT /protein_id="CAA46716.1" FT /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLAG FT APLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKKTG FT RAKRRMQYNRRFVNVVPTFGKKKGPNANS" FT misc_feature 98..102 FT /note="nucleolar localization signal" FT misc_feature 279..458 FT /note="S30 part" FT polyA_signal 484..489 FT polyA_site 509 XX SQ Sequence 518 BP; 125 A; 139 C; 148 G; 106 T; 0 other; ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc gccaatatgc 60 agctctttgt ccgcgcccag gagctacaca ccttcgaggt gaccggccag gaaacggtcg 120 cccagatcaa ggctcatgta gcctcactgg agggcattgc cccggaagat caagtcgtgc 180 tcctggcagg cgcgcccctg gaggatgagg ccactctggg ccagtgcggg gtggaggccc 240 tgactaccct ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 300 gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag aagaagaaga 360 agacaggtcg ggctaagcgg cggatgcagt acaaccggcg ctttgtcaac gttgtgccca 420 cctttggcaa gaagaagggc cccaatgcca actcttaagt cttttgtaat tctggctttc 480 tctaataaaa aagccactta gttcagtcaa aaaaaaaa 518 // |
The output is a standard EMBOSS report file.
The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq
See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats.
By default restrict writes a 'table' report file.
######################################## # Program: restrict # Rundate: Fri Jul 15 2005 12:00:00 # Report_format: table # Report_file: hsfau.restrict ######################################## #======================================= # # Sequence: HSFAU from: 1 to: 518 # HitCount: 54 # # Minimum cuts per enzyme: 1 # Maximum cuts per enzyme: 2000000000 # Minimum length of recognition site: 4 # Blunt ends allowed # Sticky ends allowed # DNA is linear # Ambiguities allowed # #======================================= Start End Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev 11 14 TaqI TCGA 11 13 . . 28 25 AciI CCGC 25 27 . . 36 31 BseYI CCCAGC 31 35 . . 38 41 AciI CCGC 38 40 . . 44 40 BceAI ACGGC 25 27 . . 71 74 AciI CCGC 71 73 . . 71 81 BsiYI CCNNNNNNNGG 77 74 . . 73 76 HhaI GCGC 75 73 . . 73 76 Hin6I GCGC 73 75 . . 77 81 BssKI CCNGG 76 81 . . 77 81 EcoRII CCWGG 76 81 . . 94 97 TaqI TCGA 94 96 . . 103 106 HpaII CCGG 103 105 . . 105 108 HaeIII GGCC 106 106 . . 107 117 BsiYI CCNNNNNNNGG 113 110 . . 107 111 BssKI CCNGG 106 111 . . 107 111 EcoRII CCWGG 106 111 . . 122 132 BsiYI CCNNNNNNNGG 128 125 . . 125 135 Hin4I GAYNNNNNVTC 116 111 148 143 146 150 BsrI ACTGG 151 149 . . 161 165 BssKI CCNGG 160 165 . . 162 165 HpaII CCGG 162 164 . . 182 186 BssKI CCNGG 181 186 . . 182 186 EcoRII CCWGG 181 186 . . 190 193 HhaI GCGC 192 190 . . 190 193 Hin6I GCGC 190 192 . . 192 195 Hin6I GCGC 192 194 . . 192 195 HhaI GCGC 194 192 . . 197 201 BssKI CCNGG 196 201 . . 197 201 EcoRII CCWGG 196 201 . . 209 212 HaeIII GGCC 210 210 . . 219 222 HaeIII GGCC 220 220 . . 221 231 BsiYI CCNNNNNNNGG 227 224 . . 225 221 BsrI ACTGG 221 219 . . 229 226 AciI CCGC 226 228 . . 236 239 HaeIII GGCC 237 237 . . 248 252 BssKI CCNGG 247 252 . . 248 252 EcoRII CCWGG 247 252 . . 261 264 HaeIII GGCC 262 262 . . 263 266 AciI CCGC 263 265 . . 293 297 BssKI CCNGG 292 297 . . 293 297 EcoRII CCWGG 292 297 . . 296 299 HaeIII GGCC 297 297 . . 335 338 HaeIII GGCC 336 336 . . 380 377 AciI CCGC 377 379 . . 383 380 AciI CCGC 380 382 . . 395 398 HpaII CCGG 395 397 . . 398 401 HhaI GCGC 400 398 . . 398 401 Hin6I GCGC 398 400 . . 405 410 HindII GTYRAC 407 407 . . 408 413 AclI AACGTT 409 411 . . 409 412 MaeII ACGT 409 411 . . 417 427 BsiYI CCNNNNNNNGG 423 420 . . 438 441 HaeIII GGCC 439 439 . . #--------------------------------------- #--------------------------------------- |
######################################## # Program: restrict # Rundate: Fri Jul 15 2005 12:00:00 # Report_format: table # Report_file: hsfau.restrict ######################################## #======================================= # # Sequence: HSFAU from: 1 to: 518 # HitCount: 54 # # Minimum cuts per enzyme: 1 # Maximum cuts per enzyme: 2000000000 # Minimum length of recognition site: 4 # Blunt ends allowed # Sticky ends allowed # DNA is linear # Ambiguities allowed # #======================================= Start End Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev 11 14 TaqI TCGA 11 13 . . 28 25 AciI CCGC 25 27 . . 36 31 BseYI CCCAGC 31 35 . . 38 41 AciI CCGC 38 40 . . 44 40 BceAI ACGGC 25 27 . . 71 74 AciI CCGC 71 73 . . 71 81 BsiYI CCNNNNNNNGG 77 74 . . 73 76 HhaI GCGC 75 73 . . 73 76 Hin6I GCGC 73 75 . . 77 81 BssKI CCNGG 76 81 . . 77 81 EcoRII CCWGG 76 81 . . 94 97 TaqI TCGA 94 96 . . 103 106 HpaII CCGG 103 105 . . 105 108 HaeIII GGCC 106 106 . . 107 117 BsiYI CCNNNNNNNGG 113 110 . . 107 111 BssKI CCNGG 106 111 . . 107 111 EcoRII CCWGG 106 111 . . 122 132 BsiYI CCNNNNNNNGG 128 125 . . 125 135 Hin4I GAYNNNNNVTC 116 111 148 143 146 150 BsrI ACTGG 151 149 . . 161 165 BssKI CCNGG 160 165 . . 162 165 HpaII CCGG 162 164 . . 182 186 BssKI CCNGG 181 186 . . 182 186 EcoRII CCWGG 181 186 . . 190 193 HhaI GCGC 192 190 . . 190 193 Hin6I GCGC 190 192 . . 192 195 Hin6I GCGC 192 194 . . [Part of this file has been deleted for brevity] #--------------------------------------- # # Fragment lengths: # 79 # 41 # 39 # 33 # 29 # 20 # 19 # 17 # 16 # 15 # 15 # 14 # 14 # 14 # 12 # 11 # 10 # 10 # 10 # 9 # 9 # 9 # 7 # 7 # 7 # 6 # 5 # 5 # 3 # 3 # 3 # 3 # 3 # 2 # 2 # 2 # 2 # 2 # 2 # 2 # 2 # 1 # 1 # 1 # 1 # 1 # #--------------------------------------- |
######################################## # Program: restrict # Rundate: Fri Jul 15 2005 12:00:00 # Report_format: table # Report_file: hsfau.restrict ######################################## #======================================= # # Sequence: HSFAU from: 1 to: 518 # HitCount: 54 # # Minimum cuts per enzyme: 1 # Maximum cuts per enzyme: 2000000000 # Minimum length of recognition site: 4 # Blunt ends allowed # Sticky ends allowed # DNA is linear # Ambiguities allowed # #======================================= Start End Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev 11 14 TaqI TCGA 11 13 . . 28 25 AciI CCGC 25 27 . . 36 31 BseYI CCCAGC 31 35 . . 38 41 AciI CCGC 38 40 . . 44 40 BceAI ACGGC 25 27 . . 71 74 AciI CCGC 71 73 . . 71 81 BsiYI CCNNNNNNNGG 77 74 . . 73 76 HhaI GCGC 75 73 . . 73 76 Hin6I GCGC 73 75 . . 77 81 BssKI CCNGG 76 81 . . 77 81 EcoRII CCWGG 76 81 . . 94 97 TaqI TCGA 94 96 . . 103 106 HpaII CCGG 103 105 . . 105 108 HaeIII GGCC 106 106 . . 107 117 BsiYI CCNNNNNNNGG 113 110 . . 107 111 BssKI CCNGG 106 111 . . 107 111 EcoRII CCWGG 106 111 . . 122 132 BsiYI CCNNNNNNNGG 128 125 . . 125 135 Hin4I GAYNNNNNVTC 116 111 148 143 146 150 BsrI ACTGG 151 149 . . 161 165 BssKI CCNGG 160 165 . . 162 165 HpaII CCGG 162 164 . . 182 186 BssKI CCNGG 181 186 . . 182 186 EcoRII CCWGG 181 186 . . 190 193 HhaI GCGC 192 190 . . 190 193 Hin6I GCGC 190 192 . . 192 195 Hin6I GCGC 192 194 . . [Part of this file has been deleted for brevity] # [CCNNNNNNNGG] # 15 36 77 95 99 196 # # BsrI: # [ACTGG] # 70 151 297 # # BssKI: # [CCNGG] # 15 21 30 45 51 54 # 76 226 # # EcoRII: # [CCWGG] # 15 30 45 51 75 76 # 226 # # HaeIII: # [GGCC] # 10 17 25 35 39 79 # 103 104 106 # # HhaI: # [GCGC] # 2 75 117 118 206 # # Hin4I: # [GAYNNNNNVTC] # 32 116 370 # # Hin6I: # [GCGC] # 2 73 117 120 206 # # HindII: # [GTYRAC] # 111 407 # # HpaII: # [CCGG] # 59 103 123 233 # # MaeII: # [ACGT] # 109 409 # # TaqI: # [TCGA] # 11 83 424 # #--------------------------------------- |
The output from restrict is a simple text one. The base number, restriction enzyme name, recognition site and cut positions are shown. Note that cuts are always to the right of the residue shown and that 5' cuts are referred to by their associated 3' number sequence.
The program reports enzymes that cut at two or four sites. The program also reports isoschizomers and enzymes having the same recognition sequence but different cut sites.
When the "-fragments" or "-solofragments" qualifiers are given then the sizes of the fragments produced by either all of the specified enzymes cutting, or by each enzyme cutting individually, are given in the 'tail' section at the end of the report file.
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
The EMBOSS REBASE restriction enzyme data files are stored iin directory 'data/REBASE/*' under the EMBOSS installation directory.
These files must first be set up using the program 'rebaseextract'. Running 'rebaseextract' may be the job of your system manager.
The data files are stored in the REBASE directory of the standard EMBOSS data directory. The names are:
The reported enzyme from any one group of isoschizomers (the prototype) is specified in the REBASE database and the information is held in the data file 'embossre.equ'. You may edit this file to set your own preferred prototype, if you wish.
The format of the file "embossre.equ" is
Enzyme-name Prototype-name
i.e. two columns of enzyme names separated by a space. The first name of the pair of enzymes is the name that is not preferred and the second is the preferred (prototype) name.
The data files must have been created before running this program. This is done by running the rebaseextract program with the "withrefm" and "prot" files from an REBASE release. You may have to ask your system manager to do this.
Program name | Description |
---|---|
recoder | Remove restriction sites but maintain same translation |
redata | Search REBASE for enzyme name, references, suppliers etc |
remap | Display sequence with restriction sites, translation etc |
restover | Find restriction enzymes producing specific overhang |
showseq | Display a sequence with features, translation etc |
silent | Silent mutation restriction enzyme scan |