splitter

 

Function

Split a sequence into (overlapping) smaller sequences

Description

This simple editing program allows you to split a long sequence into smaller, optionally overlapping, subsequences.

There should be little requirement to split sequences into smaller sub-sequences in EMBOSS, but there may be circumstances where memory usage becomes restrictive when dealing with truly large sequences. In this case, memory usage may be reduced by repeating the analysis several times on split sub-sequences.

If you need to split a large sequence into smaller subsequences so that a non-EMBOSS program can analyse the smaller sequence, it may also be useful to write the sub-sequences into separate files instead of the default EMBOSS behaviour of concatenating them together into one file.

To write the output sequences to separate files, use the command-line switch '-ossingle'.

Usage

Here is a sample session with splitter

Split a sequence into sub-sequences of 10,000 bases (the default size) with no overlap between the sub-sequences:


% splitter tembl:AP000504 ap000504.split 
Split a sequence into (overlapping) smaller sequences

Go to the input files for this example
Go to the output files for this example

Example 2

Split a sequence into sub-sequences of 50,000 bases with an overlap of 3,000 bases on each sub-sequence:


% splitter tembl:AP000504 ap000504.split -size=50000 -over=3000 
Split a sequence into (overlapping) smaller sequences

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-outseq]            seqoutall  Output sequence(s) USA

   Additional (Optional) qualifiers:
   -size               integer    Size to split at
   -overlap            integer    Overlap between split sequences

   Advanced (Unprompted) qualifiers:
   -addoverlap         boolean    Add overlap to size

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1             integer    Start of each sequence to be used
   -send1               integer    End of each sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outseq" associated qualifiers
   -osformat2           string     Output seq format
   -osextension2        string     File name extension
   -osname2             string     Base file name
   -osdirectory2        string     Output directory
   -osdbname2           string     Database name to add
   -ossingle2           boolean    Separate file for each entry
   -oufo2               string     UFO features
   -offormat2           string     Features format
   -ofname2             string     Features file name
   -ofdirectory2        string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


Standard (Mandatory) qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-outseq]
(Parameter 2)
Output sequence(s) USA Writeable sequence(s) <sequence>.format
Additional (Optional) qualifiers Allowed values Default
-size Size to split at Integer 1 or more 10000
-overlap Overlap between split sequences Integer 0 or more 0
Advanced (Unprompted) qualifiers Allowed values Default
-addoverlap Add overlap to size Boolean value Yes/No No

Input File Format

splitter reads one or more sequence USAs.

Input files for usage example

'tembl:AP000504' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:AP000504

ID   AP000504   standard; DNA; HUM; 100000 BP.
XX
AC   AP000504; BA000025;
XX
SV   AP000504.1
XX
DT   28-SEP-1999 (Rel. 61, Created)
DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)
XX
DE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section
DE   3/20.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-100000
RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;
RT   ;
RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.
RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced
RL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
RL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,
RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)
XX
RN   [2]
RA   Shiina S., Tamiya G., Oka A., Inoko H.;
RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";
RL   Unpublished.
XX
DR   SWISS-PROT; O00299; CLI1_HUMAN.
DR   SWISS-PROT; O43196; MSH5_HUMAN.
DR   SWISS-PROT; O95445; APOM_HUMAN.
DR   SWISS-PROT; O95865; DDH2_HUMAN.
DR   SWISS-PROT; O95867; NG24_HUMAN.
DR   SWISS-PROT; P13862; KC2B_HUMAN.
XX
CC   This sequence is conducted by Tokai University as a JST sequencing
CC   Team.
CC   Principal Investigator: Hidetoshi Inoko Ph.D
CC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,
CC   The sequence is submitted by Human Genome Sequencing in ALIS
CC   project of JST
CC   Japan Science and Technology Corporation (JST)
CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 Japan
CC   For further infomation about this sequences, please visit our
CC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.


  [Part of this file has been deleted for brevity]

     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     97080
     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     97140
     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     97200
     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     97260
     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     97320
     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     97380
     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     97440
     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     97500
     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     97560
     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     97620
     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     97680
     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     97740
     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     97800
     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     97860
     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     97920
     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     97980
     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     98040
     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     98100
     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     98160
     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     98220
     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     98280
     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     98340
     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     98400
     ttttctgcct ttttgtgttt ggctggttgt gcgcagcacg agggccggga ggaggatcag     98460
     cggttgatca acttggtagg ggagagcctg cgactgctgg gcaacacctt tgttgcactg     98520
     tctgacctgc gctgcaatct ggcctgcacg cccccacgac acctgcatgt ggtccggcct     98580
     atgtctcact acaccacccc catggtgctc cagcaggcag ccattcccat acaggtgggt     98640
     tagggggagt ctggcctgag ggagagtgag gggtgttgat agagtgaccc agggtagcta     98700
     ctgggcctga aggaggttag gaaaggagga gactggaaac atggtgatga aggctggaga     98760
     tactttagag gtttatcatg aggttttctt ggttaggctc ttgtattttt ctcacatctg     98820
     cctgtccatc tgtctttttc agatcaatgt gggaaccact gtgaccatga caggaaatgg     98880
     gactcggccc cccccaactc ccaatgcaga ggcacctccc cctggtcctg ggcaggcctc     98940
     atccgtggct ccgtcttcta ccaatgtcga gtcctcagct gagggggctc ccccgccagg     99000
     tccagctccc ccgccagcca ccagccaccc gagggtcatc cggatttccc accagagtgt     99060
     ggaacccgtg gtcatgatgc acatgaacat tcaaggtgag aatagttgct ggcgagaaga     99120
     gcaggatcag catgatgagg gaggttcatg ctgaggtgtg agggaacagg gtggggaagg     99180
     gagaggcaca tgctggtggt ggtagcctgg ggaccagagc agaagcttaa gtagacagat     99240
     gtggggggtg tgggggttgg tttgtctttg gaggtgtgtt tgtgtggtga agggagtacc     99300
     tctccctgtt tagatggagg gaaaggcagg ctttctgatt gggggattat gggcctgaag     99360
     tatgcctgat ctcagaagga tatagttagg ccttggccct acctacctca gggccactgt     99420
     ctctgtctcc ctgcccagat tctggcacac agcctggtgg tgttccgagt gctcccactg     99480
     gccccctggg accccctggt catggccaaa ccctgggtaa gagtgagggc atcagggcag     99540
     gctgagctct gggtagagaa agggaagggc tgagtgggtg ggttgaaggg gtccaggttc     99600
     aaggttacat cagacccgcc ccccaggctc caccctcatc cagctgccct ccctgccccc     99660
     tgagttcatg cacgccgtcg cccaccagat cactcatcag gccatggtgg cagctgttgc     99720
     ctccgcggcc gcaggtaatg acctggaagg ggaggcttgg gaggtagggc acagtccatg     99780
     gtggcagctg gctggcaagg gcctggccct cagccctctt cggtctgtct cttctgccac     99840
     ccacaggaca gcaggtgcca ggcttcccaa cagctccaac ccgggtggtg attgcccggc     99900
     ccactcctcc acaggctcgg ccttcccatc ctggagggcc cccagtctct gggacactgg     99960
     tgagcaaggg tcggggagtt ctagtgcgta acagtctagg                          100000
//

Output File Format

Output files for usage example

File: ap000504.split

>AP000504_1-10000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section 3/20.
gaccaatctcactgtgaggaggcagtcaaagggaataatggaagagaggaagaggatttt
ctcagtggcagtcatggcgtctgggatgaaggagtagtttccagaaaggaggcgttgttt
gcttatctccagacctatttgagggaggcaagcaaagggaacggtcttgtagctcaattt
tttcaccccattttaagaatgagacaatagaagcaagagagattatttgacttgcccaag
ctcacacaggcagttaatggaaagctagagcaagaaccaaattttcagactcttagtcta
attctctttttattctacatataatataaagatacttgtctgaaagcacagcctgagaaa
gataaatggctgaggaaagtagacatctgtctggaattgaggattttggtcaaaataatg
gtattaatagaactagtaacactaatgccttaatatctaattaggatagtacactcctgt
tcttattgtaaacctaggaaagttatagaagtgccttatggatcataataagggtcactg
aggcagtgccttttggtttggtgataaaaggctttaacttaatggggagaattccaacaa
taaaaccctgtccaaaaagtgtcaccactcctcaggggaggccctcatccctagacatga
cttaagcagaggcttcccaataagctgcaggttattaaagggtagggagcaggagagatc
ttggggggacaggtcatagggcatgaggagcacaaaggtttaggatgacataaggcagag
gggagatctgtgatgatgaaggtagagttgggggaaagaatgggacaccggaacagggag
ttaggcaaagcaaaaggaaggagataccaaaatccacacttggcaaaaatatgatttcag
gtcttttaggctctctgtgctcctgggaggctgtgggggaggaaagaaaaggctatcatt
ctttacatctcagtccttctacctctgtctgacactccctctcacccaattctagccccc
tggaatattccatatattagtccttccccattttccctctatcctttaccaagtccttac
caagctttcccagaaatcgagtcatattctcatcctgtttggcactcgtaacaacagact
ggggattgatctcatccagaacttggaaggagaacagagatcaaatgagttaaaggatct
ttgtctttgactaagagaaaacccatagccctcctcttcctacccctctccttctcaaaa
acatttcctccctaggagtagggagtgctctgcacagtgggaacacaggtagaagttgag
atttagaaaagtagttaagagtggtgggatggtgagagggaagtgggatgttctggatgt
tgtcactaggctgtaaacccctggagaacagacatgactgatttgcccagggctgaatct
gaagcacctgaaacattgtaaatacgtcatatatatttgtggccaggcacagtggctcat
gcctataatccctgccctttgggaggccaaggcaggcagatcactggaggccaggagctc
aagacaagcctagccaacgtggtgaaaccctgcctctactaaaaatataaaaattagcca
ggcgtgatggcagattcttgtaatcccagctactcgggagactgaggcaggagaattgct
tgaatccgggagacggaggttgcagtgagccaagatggcaccactacacttccagcctga
gtgacggagcaagacactgtctcaaaaaagaacaaccaaacaaaccaaaaaacagcctca
caaatatttgttaaataatgaaatgaattcataaaaacaaaagagggagcctctgtgaag
caactgtaaaatatattgagtcagtgctatagtttggatgtgatttgtccctgccaaata
tcgtgttgaaatttaatccccagtgtgatagtgttgtgaggtagggcctagcaggaggtg
tgtgggtgatgggagtggatcgctcatgaacagattaatgcccttcctggagtgtgttgg
tgggtatgagtgagaggttctcactctattagttcctgagagagctggttgtcaaaaaga
gcctggcatctccctcccccttgcttcttctctgccatgtgacctctacacaccctgcct
tcccttcttccatgagttgaagcagtctgaggctctcaccagtgaagatgcccaattttg
agctttccaaccatccagaaccataagccaaataaaactttttttttttttttaacaaat
tactcagagtcaggtatttccttacagcaacacaaaatatgctagacagtgaggtgagtt
aatgtaagtaaaacatggctgggcgtggtgactcacacctgtagtcccagcactttagga
ggccaaggtgggcggatcacaaggtcaggagtttgagaccaccctggccaacatggtgaa
acaccgtctgtgctaaaaacacacacaaaaaactagctgggtgtggtggcacacgcctgt
agtcccagctactcgggaggttgagtcaggagaattgcttgaacccaggaggtggaggct
gcagtgagccaagattgcgccactgcacttgagcctgggtaacagagcaagactctgtct
agaaaaaaaaaatatgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg
taacacatctgcaatcccagagagcagaggaattcatggttccatccccacctctctgga
gaagcttgaggctctcgtggtctggggcatctggcatgaagtggatagtggagtcactag
tatcatagtaggcaatgcccaagtatcctgaattccacagcacacacagatggatctgtc
cagcaaggaagaaaggaaatcactattagaatcactcataagtgtagggtttaccatgtc


  [Part of this file has been deleted for brevity]

aaatacaggccgggcacagtggctcacgcctgtaatcccagcactttgggaggccgaggc
gggtggatcatgaggtcaagagatcgagactatcctggctaacatgatgaaaccccgtct
ctactaaaaatacaaaaaattagctgggcatggtggcgggcacctgtagtcccagctact
cgggaggctgagtcaggagaatggtgtgaacccaggagacggagcttgcagtgagctgag
gtcgcaccactgcactccagcctgggtgatagagcgagactctgtctcaaaaaaaaaaaa
aaaaaaaaaaaaaacaaaaattagccgggtgtggtggcaggcaacttaatcccagctact
tgggaggcagaggcaggagaatcgtttgaacctgggaggcggaggttgaagagaatagaa
gctctgctggtccagagaaggattgggccagggctctgggagaccagggagaaagagggc
acatgtggtccctgttgactgtgagggtgggaatctgaggaaggctttggctcattgccc
cttgggtttgtccacagccatccttcccctgcggagtatgtcgaggtgctccaggagcta
cagcggctggagagtcgcctccagcccttcttgcagcgctactacgaggttctgggtgct
gctgccaccacggactacaataacaatgtgagccctttgatggccctgccctttctcctc
agccccagtactcccaaaacagaacaggctgaaatacagataactctttccctccctgga
aaaacattgcaacagggccaggtgcagtggctcacgcctgtaatcccagcactttgggag
gccaaggtgggcggatcatctgagatcgggagtttgagaccagcctggccaacatggtgc
aaccccatctctactgaaaatataaacattagctggatgtagtggtgcacacctgtaatc
ccagctactcaggaggctgaggcaggagaatcgctagaactcgggaggagggggttgcag
tgagccgagattgcactactgcactctagcctgggtgacagagcgagactgtctcaaaaa
acaaaacaaaacaaaaaaacacacattgcaacaaaacaatttctctctaaacctgtaagt
gattttgtcctcccttacagagaaggtgataatctttgctgtaagcactgtcctcgtatc
gtaccccttgtgcccctgaatgaatttagaaaatgtaaagtacaggagatcagtatatga
tgacttactgattcatagtagtgttttaataggatgttccttatgtgaataagatataat
ttatttgcaaagatttggtctacatgtaaacttccaaggatataactgaaagttttggag
gacatggtattctcagtaggcattattgcttttattagtgagatggactccagcttgata
ttttctgcctttttgtgtttggctggttgtgcgcagcacgagggccgggaggaggatcag
cggttgatcaacttggtaggggagagcctgcgactgctgggcaacacctttgttgcactg
tctgacctgcgctgcaatctggcctgcacgcccccacgacacctgcatgtggtccggcct
atgtctcactacaccacccccatggtgctccagcaggcagccattcccatacaggtgggt
tagggggagtctggcctgagggagagtgaggggtgttgatagagtgacccagggtagcta
ctgggcctgaaggaggttaggaaaggaggagactggaaacatggtgatgaaggctggaga
tactttagaggtttatcatgaggttttcttggttaggctcttgtatttttctcacatctg
cctgtccatctgtctttttcagatcaatgtgggaaccactgtgaccatgacaggaaatgg
gactcggccccccccaactcccaatgcagaggcacctccccctggtcctgggcaggcctc
atccgtggctccgtcttctaccaatgtcgagtcctcagctgagggggctcccccgccagg
tccagctcccccgccagccaccagccacccgagggtcatccggatttcccaccagagtgt
ggaacccgtggtcatgatgcacatgaacattcaaggtgagaatagttgctggcgagaaga
gcaggatcagcatgatgagggaggttcatgctgaggtgtgagggaacagggtggggaagg
gagaggcacatgctggtggtggtagcctggggaccagagcagaagcttaagtagacagat
gtggggggtgtgggggttggtttgtctttggaggtgtgtttgtgtggtgaagggagtacc
tctccctgtttagatggagggaaaggcaggctttctgattgggggattatgggcctgaag
tatgcctgatctcagaaggatatagttaggccttggccctacctacctcagggccactgt
ctctgtctccctgcccagattctggcacacagcctggtggtgttccgagtgctcccactg
gccccctgggaccccctggtcatggccaaaccctgggtaagagtgagggcatcagggcag
gctgagctctgggtagagaaagggaagggctgagtgggtgggttgaaggggtccaggttc
aaggttacatcagacccgccccccaggctccaccctcatccagctgccctccctgccccc
tgagttcatgcacgccgtcgcccaccagatcactcatcaggccatggtggcagctgttgc
ctccgcggccgcaggtaatgacctggaaggggaggcttgggaggtagggcacagtccatg
gtggcagctggctggcaagggcctggccctcagccctcttcggtctgtctcttctgccac
ccacaggacagcaggtgccaggcttcccaacagctccaacccgggtggtgattgcccggc
ccactcctccacaggctcggccttcccatcctggagggcccccagtctctgggacactgg
tgagcaagggtcggggagttctagtgcgtaacagtctagg

Output files for usage example 2

File: ap000504.split

>AP000504_1-50000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section 3/20.
gaccaatctcactgtgaggaggcagtcaaagggaataatggaagagaggaagaggatttt
ctcagtggcagtcatggcgtctgggatgaaggagtagtttccagaaaggaggcgttgttt
gcttatctccagacctatttgagggaggcaagcaaagggaacggtcttgtagctcaattt
tttcaccccattttaagaatgagacaatagaagcaagagagattatttgacttgcccaag
ctcacacaggcagttaatggaaagctagagcaagaaccaaattttcagactcttagtcta
attctctttttattctacatataatataaagatacttgtctgaaagcacagcctgagaaa
gataaatggctgaggaaagtagacatctgtctggaattgaggattttggtcaaaataatg
gtattaatagaactagtaacactaatgccttaatatctaattaggatagtacactcctgt
tcttattgtaaacctaggaaagttatagaagtgccttatggatcataataagggtcactg
aggcagtgccttttggtttggtgataaaaggctttaacttaatggggagaattccaacaa
taaaaccctgtccaaaaagtgtcaccactcctcaggggaggccctcatccctagacatga
cttaagcagaggcttcccaataagctgcaggttattaaagggtagggagcaggagagatc
ttggggggacaggtcatagggcatgaggagcacaaaggtttaggatgacataaggcagag
gggagatctgtgatgatgaaggtagagttgggggaaagaatgggacaccggaacagggag
ttaggcaaagcaaaaggaaggagataccaaaatccacacttggcaaaaatatgatttcag
gtcttttaggctctctgtgctcctgggaggctgtgggggaggaaagaaaaggctatcatt
ctttacatctcagtccttctacctctgtctgacactccctctcacccaattctagccccc
tggaatattccatatattagtccttccccattttccctctatcctttaccaagtccttac
caagctttcccagaaatcgagtcatattctcatcctgtttggcactcgtaacaacagact
ggggattgatctcatccagaacttggaaggagaacagagatcaaatgagttaaaggatct
ttgtctttgactaagagaaaacccatagccctcctcttcctacccctctccttctcaaaa
acatttcctccctaggagtagggagtgctctgcacagtgggaacacaggtagaagttgag
atttagaaaagtagttaagagtggtgggatggtgagagggaagtgggatgttctggatgt
tgtcactaggctgtaaacccctggagaacagacatgactgatttgcccagggctgaatct
gaagcacctgaaacattgtaaatacgtcatatatatttgtggccaggcacagtggctcat
gcctataatccctgccctttgggaggccaaggcaggcagatcactggaggccaggagctc
aagacaagcctagccaacgtggtgaaaccctgcctctactaaaaatataaaaattagcca
ggcgtgatggcagattcttgtaatcccagctactcgggagactgaggcaggagaattgct
tgaatccgggagacggaggttgcagtgagccaagatggcaccactacacttccagcctga
gtgacggagcaagacactgtctcaaaaaagaacaaccaaacaaaccaaaaaacagcctca
caaatatttgttaaataatgaaatgaattcataaaaacaaaagagggagcctctgtgaag
caactgtaaaatatattgagtcagtgctatagtttggatgtgatttgtccctgccaaata
tcgtgttgaaatttaatccccagtgtgatagtgttgtgaggtagggcctagcaggaggtg
tgtgggtgatgggagtggatcgctcatgaacagattaatgcccttcctggagtgtgttgg
tgggtatgagtgagaggttctcactctattagttcctgagagagctggttgtcaaaaaga
gcctggcatctccctcccccttgcttcttctctgccatgtgacctctacacaccctgcct
tcccttcttccatgagttgaagcagtctgaggctctcaccagtgaagatgcccaattttg
agctttccaaccatccagaaccataagccaaataaaactttttttttttttttaacaaat
tactcagagtcaggtatttccttacagcaacacaaaatatgctagacagtgaggtgagtt
aatgtaagtaaaacatggctgggcgtggtgactcacacctgtagtcccagcactttagga
ggccaaggtgggcggatcacaaggtcaggagtttgagaccaccctggccaacatggtgaa
acaccgtctgtgctaaaaacacacacaaaaaactagctgggtgtggtggcacacgcctgt
agtcccagctactcgggaggttgagtcaggagaattgcttgaacccaggaggtggaggct
gcagtgagccaagattgcgccactgcacttgagcctgggtaacagagcaagactctgtct
agaaaaaaaaaatatgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg
taacacatctgcaatcccagagagcagaggaattcatggttccatccccacctctctgga
gaagcttgaggctctcgtggtctggggcatctggcatgaagtggatagtggagtcactag
tatcatagtaggcaatgcccaagtatcctgaattccacagcacacacagatggatctgtc
cagcaaggaagaaaggaaatcactattagaatcactcataagtgtagggtttaccatgtc


  [Part of this file has been deleted for brevity]

gaaaccctgtctctactaaaaaatacaggccgggcacagtggctcacgcctgtaatccca
gcactttgggaggccgaggcgggtggatcatgaggtcaagagatcgagactatcctggct
aacatgatgaaaccccgtctctactaaaaatacaaaaaattagctgggcatggtggcggg
cacctgtagtcccagctactcgggaggctgagtcaggagaatggtgtgaacccaggagac
ggagcttgcagtgagctgaggtcgcaccactgcactccagcctgggtgatagagcgagac
tctgtctcaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaattagccgggtgtggtggcag
gcaacttaatcccagctacttgggaggcagaggcaggagaatcgtttgaacctgggaggc
ggaggttgaagagaatagaagctctgctggtccagagaaggattgggccagggctctggg
agaccagggagaaagagggcacatgtggtccctgttgactgtgagggtgggaatctgagg
aaggctttggctcattgccccttgggtttgtccacagccatccttcccctgcggagtatg
tcgaggtgctccaggagctacagcggctggagagtcgcctccagcccttcttgcagcgct
actacgaggttctgggtgctgctgccaccacggactacaataacaatgtgagccctttga
tggccctgccctttctcctcagccccagtactcccaaaacagaacaggctgaaatacaga
taactctttccctccctggaaaaacattgcaacagggccaggtgcagtggctcacgcctg
taatcccagcactttgggaggccaaggtgggcggatcatctgagatcgggagtttgagac
cagcctggccaacatggtgcaaccccatctctactgaaaatataaacattagctggatgt
agtggtgcacacctgtaatcccagctactcaggaggctgaggcaggagaatcgctagaac
tcgggaggagggggttgcagtgagccgagattgcactactgcactctagcctgggtgaca
gagcgagactgtctcaaaaaacaaaacaaaacaaaaaaacacacattgcaacaaaacaat
ttctctctaaacctgtaagtgattttgtcctcccttacagagaaggtgataatctttgct
gtaagcactgtcctcgtatcgtaccccttgtgcccctgaatgaatttagaaaatgtaaag
tacaggagatcagtatatgatgacttactgattcatagtagtgttttaataggatgttcc
ttatgtgaataagatataatttatttgcaaagatttggtctacatgtaaacttccaagga
tataactgaaagttttggaggacatggtattctcagtaggcattattgcttttattagtg
agatggactccagcttgatattttctgcctttttgtgtttggctggttgtgcgcagcacg
agggccgggaggaggatcagcggttgatcaacttggtaggggagagcctgcgactgctgg
gcaacacctttgttgcactgtctgacctgcgctgcaatctggcctgcacgcccccacgac
acctgcatgtggtccggcctatgtctcactacaccacccccatggtgctccagcaggcag
ccattcccatacaggtgggttagggggagtctggcctgagggagagtgaggggtgttgat
agagtgacccagggtagctactgggcctgaaggaggttaggaaaggaggagactggaaac
atggtgatgaaggctggagatactttagaggtttatcatgaggttttcttggttaggctc
ttgtatttttctcacatctgcctgtccatctgtctttttcagatcaatgtgggaaccact
gtgaccatgacaggaaatgggactcggccccccccaactcccaatgcagaggcacctccc
cctggtcctgggcaggcctcatccgtggctccgtcttctaccaatgtcgagtcctcagct
gagggggctcccccgccaggtccagctcccccgccagccaccagccacccgagggtcatc
cggatttcccaccagagtgtggaacccgtggtcatgatgcacatgaacattcaaggtgag
aatagttgctggcgagaagagcaggatcagcatgatgagggaggttcatgctgaggtgtg
agggaacagggtggggaagggagaggcacatgctggtggtggtagcctggggaccagagc
agaagcttaagtagacagatgtggggggtgtgggggttggtttgtctttggaggtgtgtt
tgtgtggtgaagggagtacctctccctgtttagatggagggaaaggcaggctttctgatt
gggggattatgggcctgaagtatgcctgatctcagaaggatatagttaggccttggccct
acctacctcagggccactgtctctgtctccctgcccagattctggcacacagcctggtgg
tgttccgagtgctcccactggccccctgggaccccctggtcatggccaaaccctgggtaa
gagtgagggcatcagggcaggctgagctctgggtagagaaagggaagggctgagtgggtg
ggttgaaggggtccaggttcaaggttacatcagacccgccccccaggctccaccctcatc
cagctgccctccctgccccctgagttcatgcacgccgtcgcccaccagatcactcatcag
gccatggtggcagctgttgcctccgcggccgcaggtaatgacctggaaggggaggcttgg
gaggtagggcacagtccatggtggcagctggctggcaagggcctggccctcagccctctt
cggtctgtctcttctgccacccacaggacagcaggtgccaggcttcccaacagctccaac
ccgggtggtgattgcccggcccactcctccacaggctcggccttcccatcctggagggcc
cccagtctctgggacactggtgagcaagggtcggggagttctagtgcgtaacagtctagg

The names of the sequences are the same as the original sequence, with '_start-end' appended, where 'start', and 'end' are the start and end positions of the sub-sequence. eg: The name HSHBB would be changed in the sub-sequences to: HSHBB_1-50000 and HSHBB_50001-73308 if they were split at the size of 50000 with no overlap.

Data files

None.

Notes

There should be little requirement to split sequences into smaller sub-sequences in EMBOSS, but there may be circumstances where memory usage becomes restrictive when dealing with truly large sequences.

References

None

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0

Known bugs

None.

See also

Program nameDescription
biosedReplace or delete sequence sections
codcopyReads and writes a codon usage table
cutseqRemoves a specified section from a sequence
degapseqRemoves gap characters from sequences
descseqAlter the name or description of a sequence
entretReads and writes (returns) flatfile entries
extractfeatExtract features from a sequence
extractseqExtract regions from a sequence
listorWrite a list file of the logical OR of two sets of sequences
maskfeatMask off features of a sequence
maskseqMask off regions of a sequence
newseqType in a short new sequence
noreturnRemoves carriage return from ASCII files
notseqExclude a set of sequences and write out the remaining ones
nthseqWrites one sequence from a multiple set of sequences
pasteseqInsert one sequence into another
revseqReverse and complement a sequence
seqretReads and writes (returns) sequences
seqretsplitReads and writes (returns) sequences in individual files
skipseqReads and writes (returns) sequences, skipping first few
trimestTrim poly-A tails off EST sequences
trimseqTrim ambiguous bits off the ends of sequences
unionReads sequence fragments and builds one sequence
vectorstripStrips out DNA between a pair of vector sequences
yankReads a sequence range, appends the full USA to a list file

Author(s)

Gary Williams (gwilliam © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Completed 22 March 1999

Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None