EMBOSS: diffseq manual

diffseq

Function

Find differences between nearly identical sequences

Description

diffseq takes two overlapping, nearly identical sequences and reports the differences between them, together with any features that overlap with these regions. GFF files of the differences in each sequence are also produced.

diffseq finds the region of overlap of the input sequences and then reports differences within this region, like a local alignment.

The start and end positions of the overlap are reported.

diffseq should be of value when looking for SNPs, differences between strains of an organism and anything else that requires the differences between sequences to be highlighted.

The sequences can be very long. The program does a match of all sequence words of size 10 (by default). It then reduces this to the minimum set of overlapping matches by sorting the matches in order of size (largest size first) and then for each such match it removes any smaller matches that overlap. The result is a set of the longest ungapped alignments between the two sequences that do not overlap with each other. The mismatched regions between these matches are reported.

It should be possible to find differences between sequences that are Mega-bases long.

Usage

Here is a sample session with diffseq


% diffseq tembl:ap000504 tembl:af129756 
Find differences between nearly identical sequences
Word size [10]: 
Output report [ap000504.diffseq]: 
Output features [AP000504.diffgff]: 
Second output features [AF129756.diffgff]:

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-asequence]         sequence   Sequence USA
  [-bsequence]         sequence   Sequence USA
   -wordsize           integer    The similar regions between the two
                                  sequences are found by creating a hash table
                                  of 'wordsize'd subsequences. 10 is a
                                  reasonable default. Making this value larger
                                  (20?) may speed up the program slightly,
                                  but will mean that any two differences
                                  within 'wordsize' of each other will be
                                  grouped as a single region of difference.
                                  This value may be made smaller (4?) to
                                  improve the resolution of nearby
                                  differences, but the program will go much
                                  slower.
  [-outfile]           report     Output report file name
  [-aoutfeat]          featout    File for output of first sequence's features
  [-boutfeat]          featout    File for output of second sequence's
                                  features

   Additional (Optional) qualifiers:
   -globaldifferences  boolean    Normally this program will find regions of
                                  identity that are the length of the
                                  specified word-size or greater and will then
                                  report the regions of difference between
                                  these matching regions. This works well and
                                  is what most people want if they are working
                                  with long overlapping nucleic acid
                                  sequences. You are usually not interested in
                                  the non-overlapping ends of these
                                  sequences. If you have protein sequences or
                                  short RNA sequences however, you will be
                                  interested in differences at the very ends .
                                  It this option is set to be true then the
                                  differences at the ends will also be
                                  reported.

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1            integer    Start of the sequence to be used
   -send1              integer    End of the sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2            integer    Start of the sequence to be used
   -send2              integer    End of the sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-outfile" associated qualifiers
   -rformat3           string     Report format
   -rname3             string     Base file name
   -rextension3        string     File name extension
   -rdirectory3        string     Output directory
   -raccshow3          boolean    Show accession number in the report
   -rdesshow3          boolean    Show description in the report
   -rscoreshow3        boolean    Show the score in the report
   -rusashow3          boolean    Show the full USA in the report

   "-aoutfeat" associated qualifiers
   -offormat4          string     Output feature format
   -ofopenfile4        string     Features file name
   -ofextension4       string     File name extension
   -ofdirectory4       string     Output directory
   -ofname4            string     Base file name
   -ofsingle4          boolean    Separate file for each entry

   "-boutfeat" associated qualifiers
   -offormat5          string     Output feature format
   -ofopenfile5        string     Features file name
   -ofextension5       string     File name extension
   -ofdirectory5       string     Output directory
   -ofname5            string     Base file name
   -ofsingle5          boolean    Separate file for each entry

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths

Standard (Mandatory) qualifiers		Allowed values	Default
[-asequence] (Parameter 1)	Sequence USA	Readable sequence	Required
[-bsequence] (Parameter 2)	Sequence USA	Readable sequence	Required
-wordsize	The similar regions between the two sequences are found by creating a hash table of 'wordsize'd subsequences. 10 is a reasonable default. Making this value larger (20?) may speed up the program slightly, but will mean that any two differences within 'wordsize' of each other will be grouped as a single region of difference. This value may be made smaller (4?) to improve the resolution of nearby differences, but the program will go much slower.	Integer 2 or more	10
[-outfile] (Parameter 3)	Output report file name	Report output file
[-aoutfeat] (Parameter 4)	File for output of first sequence's features	Writeable feature table	$(asequence.name).diffgff
[-boutfeat] (Parameter 5)	File for output of second sequence's features	Writeable feature table	$(bsequence.name).diffgff
Additional (Optional) qualifiers		Allowed values	Default
-globaldifferences	Normally this program will find regions of identity that are the length of the specified word-size or greater and will then report the regions of difference between these matching regions. This works well and is what most people want if they are working with long overlapping nucleic acid sequences. You are usually not interested in the non-overlapping ends of these sequences. If you have protein sequences or short RNA sequences however, you will be interested in differences at the very ends . It this option is set to be true then the differences at the ends will also be reported.	Boolean value Yes/No	No
Advanced (Unprompted) qualifiers		Allowed values	Default
(none)

Input file format

This program reads in two nucleic acid sequence USAs or two protein sequence USAs.

Input files for usage example

'tembl:ap000504' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:ap000504

ID   AP000504   standard; DNA; HUM; 100000 BP.
XX
AC   AP000504; BA000025;
XX
SV   AP000504.1
XX
DT   28-SEP-1999 (Rel. 61, Created)
DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)
XX
DE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section
DE   3/20.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-100000
RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;
RT   ;
RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.
RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced
RL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
RL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,
RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)
XX
RN   [2]
RA   Shiina S., Tamiya G., Oka A., Inoko H.;
RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";
RL   Unpublished.
XX
DR   SWISS-PROT; O00299; CLI1_HUMAN.
DR   SWISS-PROT; O43196; MSH5_HUMAN.
DR   SWISS-PROT; O95445; APOM_HUMAN.
DR   SWISS-PROT; O95865; DDH2_HUMAN.
DR   SWISS-PROT; O95867; NG24_HUMAN.
DR   SWISS-PROT; P13862; KC2B_HUMAN.
XX
CC   This sequence is conducted by Tokai University as a JST sequencing
CC   Team.
CC   Principal Investigator: Hidetoshi Inoko Ph.D
CC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,
CC   The sequence is submitted by Human Genome Sequencing in ALIS
CC   project of JST
CC   Japan Science and Technology Corporation (JST)
CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 Japan
CC   For further infomation about this sequences, please visit our
CC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.


  [Part of this file has been deleted for brevity]

     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     97080
     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     97140
     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     97200
     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     97260
     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     97320
     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     97380
     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     97440
     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     97500
     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     97560
     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     97620
     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     97680
     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     97740
     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     97800
     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     97860
     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     97920
     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     97980
     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     98040
     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     98100
     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     98160
     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     98220
     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     98280
     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     98340
     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     98400
     ttttctgcct ttttgtgttt ggctggttgt gcgcagcacg agggccggga ggaggatcag     98460
     cggttgatca acttggtagg ggagagcctg cgactgctgg gcaacacctt tgttgcactg     98520
     tctgacctgc gctgcaatct ggcctgcacg cccccacgac acctgcatgt ggtccggcct     98580
     atgtctcact acaccacccc catggtgctc cagcaggcag ccattcccat acaggtgggt     98640
     tagggggagt ctggcctgag ggagagtgag gggtgttgat agagtgaccc agggtagcta     98700
     ctgggcctga aggaggttag gaaaggagga gactggaaac atggtgatga aggctggaga     98760
     tactttagag gtttatcatg aggttttctt ggttaggctc ttgtattttt ctcacatctg     98820
     cctgtccatc tgtctttttc agatcaatgt gggaaccact gtgaccatga caggaaatgg     98880
     gactcggccc cccccaactc ccaatgcaga ggcacctccc cctggtcctg ggcaggcctc     98940
     atccgtggct ccgtcttcta ccaatgtcga gtcctcagct gagggggctc ccccgccagg     99000
     tccagctccc ccgccagcca ccagccaccc gagggtcatc cggatttccc accagagtgt     99060
     ggaacccgtg gtcatgatgc acatgaacat tcaaggtgag aatagttgct ggcgagaaga     99120
     gcaggatcag catgatgagg gaggttcatg ctgaggtgtg agggaacagg gtggggaagg     99180
     gagaggcaca tgctggtggt ggtagcctgg ggaccagagc agaagcttaa gtagacagat     99240
     gtggggggtg tgggggttgg tttgtctttg gaggtgtgtt tgtgtggtga agggagtacc     99300
     tctccctgtt tagatggagg gaaaggcagg ctttctgatt gggggattat gggcctgaag     99360
     tatgcctgat ctcagaagga tatagttagg ccttggccct acctacctca gggccactgt     99420
     ctctgtctcc ctgcccagat tctggcacac agcctggtgg tgttccgagt gctcccactg     99480
     gccccctggg accccctggt catggccaaa ccctgggtaa gagtgagggc atcagggcag     99540
     gctgagctct gggtagagaa agggaagggc tgagtgggtg ggttgaaggg gtccaggttc     99600
     aaggttacat cagacccgcc ccccaggctc caccctcatc cagctgccct ccctgccccc     99660
     tgagttcatg cacgccgtcg cccaccagat cactcatcag gccatggtgg cagctgttgc     99720
     ctccgcggcc gcaggtaatg acctggaagg ggaggcttgg gaggtagggc acagtccatg     99780
     gtggcagctg gctggcaagg gcctggccct cagccctctt cggtctgtct cttctgccac     99840
     ccacaggaca gcaggtgcca ggcttcccaa cagctccaac ccgggtggtg attgcccggc     99900
     ccactcctcc acaggctcgg ccttcccatc ctggagggcc cccagtctct gggacactgg     99960
     tgagcaaggg tcggggagtt ctagtgcgta acagtctagg                          100000
//

Database entry: tembl:af129756

ID   AF129756   standard; DNA; HUM; 184666 BP.
XX
AC   AF129756;
XX
SV   AF129756.1
XX
DT   12-MAR-1999 (Rel. 59, Created)
DT   29-OCT-1999 (Rel. 61, Last updated, Version 2)
XX
DE   Homo sapiens MSH55 gene, partial cds; and CLIC1, DDAH, G6b, G6c, G5b, G6d,
DE   G6e, G6f, BAT5, G5b, CSK2B, BAT4, G4, Apo M, BAT3, BAT2, AIF-1, 1C7, LST-1,
DE   LTB, TNF, and LTA genes, complete cds.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-184666
RA   Rowen L., Madan A., Qin S., Shaffer T., James R., Ratcliffe A., Abbasi N.,
RA   Dickhoff R., Loretz C., Madan A., Dors M., Young J., Lasky S., Hood L.;
RT   "Sequence of the human major histocompatibility complex class III region";
RL   Unpublished.
XX
RN   [2]
RP   1-184666
RA   Rowen L.;
RT   ;
RL   Submitted (22-FEB-1999) to the EMBL/GenBank/DDBJ databases.
RL   Department of Molecular Biotechnology, Box 357730 University of Washington,
RL   Seattle, WA 98195, USA
XX
RN   [3]
RP   1-184666
RA   Rowen L.;
RT   ;
RL   Submitted (28-OCT-1999) to the EMBL/GenBank/DDBJ databases.
RL   Multimegabase Sequencing Center, University of Washington, PO Box 357730,
RL   Seattle, WA 98195, USA
XX
DR   EPD; EP11158; HS_TNFA.
DR   EPD; EP11159; HS_TNFB.
DR   SPTREMBL; O00452; O00452.
DR   SPTREMBL; O14931; O14931.
DR   SPTREMBL; O95866; O95866.
DR   SPTREMBL; O95868; O95868.
DR   SPTREMBL; O95869; O95869.
DR   SPTREMBL; O95870; O95870.


  [Part of this file has been deleted for brevity]

     aaaccagttt accaccactc ctaacactaa acttaaatct gactctaaat gtaagtccaa    181740
     tctgagccac aagcctaaag ttgaacttta tcctgcttta tgaattattc atccattcct    181800
     ccatttagtg agtatctgcg tgcctaacac atgctgggca ttgtcctaag gcaggaggga    181860
     catggaggca aagggatcag agaaggtacc agcacctgtg gagcttgtat tccagtgagg    181920
     ccagacggaa aagaaagaaa ctgaagaaga aattggtact atgagaaaat aagacaggct    181980
     gatgttgtaa gagtggcagg gagctacttt taaatacagt agtcagcaaa atcctctttg    182040
     agtgtttggg tggcactgga gctgagaccc aaatgacaaa aaatagtgac caggtaaaag    182100
     tttgggagca aagcatttca ggtaaaggga gcagctactg caaaggctgg aaggcggaac    182160
     caagctgggg gtgttgacga caaacagaag gccagtgtgg ctggagcaga gagagagact    182220
     gggaggcggg tgggagatga ggtcagagag gagggcaggg gccaggtcat gcagggccat    182280
     gcaagaaggg taaagcctct agatttcatc cagccacagg aagcctttaa aggtcgtcag    182340
     agtgtgtggt gcgtgcgtgt gtgtgtgtgt gtgtgtgtgt gttgcagggg agagaggggg    182400
     agggagagag agagagagag agagaagagg gaggtgagca gaggtgattg gatttttttt    182460
     tcttttgaca tggtgtcttg ctctgtggcc taggctggag tgcagtggca ccatcatagc    182520
     ccactgcaac ctcaaaacca tgggctcaag tcatccttcc acctcagctt cccaagtatc    182580
     taggactaca ggtgtgtgcc actgtgcctg gctaatttta aaaaatattt taaaattttt    182640
     gttgagacag ggtctatgct gctcaggctg gtctcgaact cctggtttca agtgatctgc    182700
     ccatcttggc ctcccaaagt ttttttttgt tagtttgaga ggcggtttcg ctcgttgccc    182760
     aggctggagt gcaatgactg atctcatctc actgcaacct ctgcctcctg ggttcaagcg    182820
     attctcctgc ttcagcctcc caagtagctg ggattacagg tgcatgccac cattcccggc    182880
     taattttttg tatttagtag agatggggtt tcaccatgtt agtcaggctg atctcaaact    182940
     cctgacctca ggtgatccgc ctgcctcagc ctcccaaagt tttgggatta caggtgtgag    183000
     ccaccatgct gggccagcct cccaaagttt tgggattaca ggcatgagtc accacactgg    183060
     ccctggattt tttttctttc ttttttttgg agacggagtc tcactctgtt gcccaggctg    183120
     gagtgcaatg gcgtaatctc agctcactgc aacctctgct gcccgggttc aaacgattct    183180
     cctgtcttag cctcctgagt agctgggatt ataggtgcat gccaccatgc ctggctaatt    183240
     tttgtacttt tagtagagaa agtacaccat cttggccagg ctggtctcga actcctgacc    183300
     tcaggtgatc cacttgcgtc ggcctcccaa agtgctggga ttacaggcgt gagacaccgc    183360
     acccagcctt tttttttttt tttcttttaa gacagaatcg ctctgtcacc caggctggag    183420
     tgcagtggca caatctcggc tcactgcaac ctctgcctcc caggtttaag caatccacct    183480
     atgtcagtct cccaagtagc tgggattata ggtgcatgtc accatgcctg gctaattttt    183540
     gtacttttag tatagaaagt acaccatgtt ggccaggctg gtcttgaact cctgacctca    183600
     agtgatccgc ctgcctcagc ctcccgaagt gctggaatta cagacatgtg ccactgcacc    183660
     cggcctggtt ttttttttct aagagatgga gtctcacttt tctgcccagg ttggagtgca    183720
     atggcaccat catagctcac tgcagccttc aactcttggc ctcaggcaat ccttgcacct    183780
     tagcctcgca gtgttgggat tacaggcatg agccactgag ccttgcctgg actttttttt    183840
     ttttttgaga tggcgtctcg ctctgttgcc caggttggag tgctacggca tgatcttggc    183900
     tcactgcaac ttccacctcc caggttcaag cgattctctt gcctcggccc cccgagtagc    183960
     tgggattaca ggcatgcgcc accgtgcctg gctaattttg gtatttttag tagagatagg    184020
     gtttcatcat gttgggcagg ctggtcttga actcctgacc tcgtgatcca cccacctcgg    184080
     cctcccaaag tgctgggatt ataggcatag ccaacgcgcc cagcctggac ttgtttttaa    184140
     aagatcactg tggctcctgt gtttaggctg gctggtagga gacaggtggc agtggcattg    184200
     atggtgaaga gaaaatagtg gcagccatgg agatggagag aagtagacaa gtttgggata    184260
     tattatacat tccaggggta gaaacaacag gactagatga tggattgatg ggtgggagat    184320
     gtagatactg ggagagaagc aggattctga tggatggaaa aactaaaaaa ttctattttg    184380
     ggtgtggtaa gtctaagtct attagacatg caagtagaga tgtcactggg cagatacaca    184440
     tctggatttc aggggcaagg tccaagctag agaaagaaac ctgggcatgg tcagcatgag    184500
     gatggtgttt aaagccatgg aacttatctt gtgcatccct ataagacccc tttgaggcac    184560
     ttgtttcccc tcacaatgga tgcagtgcat cttccattct gaattccaga ggcaacaacc    184620
     tcctgctcct agaagctaaa ctctccagac ttagtcttct gaattc                   184666
//

Output file format

The output is a standard EMBOSS report file.

The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq

See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats.

By default diffseq writes a 'diffseq' report file.

Output files for usage example

File: ap000504.diffseq

########################################
# Program: diffseq
# Rundate: Fri Jul 15 2005 12:00:00
# Report_format: diffseq
# Report_file: ap000504.diffseq
# Additional_files: 2
# 1: AP000504.diffgff (Feature file for first sequence)
# 2: AF129756.diffgff (Feature file for second sequence)
########################################

#=======================================
#
# Sequence: AP000504     from: 1   to: 100000
# HitCount: 119
#
# Compare: AF129756     from: 1   to: 184666
# 
# AP000504 overlap starts at 1
# AF129756 overlap starts at 6036
# 
# (AP000504) start end length sequence
# (AF129756) start end length sequence
# 
#
#=======================================


AP000504 847-847 Length: 1
Sequence: a
Sequence: t
AF129756 6882-6882 Length: 1

AP000504 1795-1795 Length: 1
Sequence: g
Sequence: a
AF129756 7830-7830 Length: 1

AP000504 2273-2273 Length: 1
Sequence: t
Sequence: 
Feature: repeat_region 7920-8351 rpt_family='MSTB'
AF129756 8307 Length: 0

AP000504 2466-2466 Length: 1
Sequence: g
Sequence: a
Feature: repeat_region 8391-8686 rpt_family='AluSg'
AF129756 8500-8500 Length: 1

AP000504 2655-2658 Length: 4


  [Part of this file has been deleted for brevity]

Sequence: t
Sequence: c
AF129756 99280-99280 Length: 1

AP000504 93696-93696 Length: 1
Sequence: t
Sequence: g
AF129756 99726-99726 Length: 1

AP000504 93860-93860 Length: 1
Sequence: t
Sequence: g
AF129756 99890-99890 Length: 1

AP000504 95451-95451 Length: 1
Sequence: c
Sequence: t
AF129756 101481-101481 Length: 1

AP000504 96650-96650 Length: 1
Sequence: c
Sequence: t
AF129756 102680-102680 Length: 1

AP000504 97273-97274 Length: 2
Sequence: aa
Sequence: 
Feature: repeat_region 103299-103402 rpt_family='AluSq'
AF129756 103302 Length: 0

AP000504 97716-97716 Length: 1
Sequence: a
Sequence: g
AF129756 103744-103744 Length: 1

AP000504 97827-97827 Length: 1
Sequence: c
Sequence: t
Feature: repeat_region 103784-104083 rpt_family='AluSx'
AF129756 103855-103855 Length: 1

#---------------------------------------
#
# Overlap_end: 100000 in AP000504
# Overlap_end: 106028 in AF129756
# 
# SNP_count: 86
# Transitions: 58
# Transversions: 28
#
#---------------------------------------

File: AF129756.diffgff

##gff-version 2.0
##date 2005-07-15
##Type DNA AF129756
AF129756	diffseq	conflict	6882	6882	1.000	+	.	Sequence "AF129756.1" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	7830	7830	1.000	+	.	Sequence "AF129756.2" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	8500	8500	1.000	+	.	Sequence "AF129756.3" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	10945	10962	1.000	+	.	Sequence "AF129756.4" ; note "Insertion of 18 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	10999	11001	1.000	+	.	Sequence "AF129756.5" ; note "AP000504" ; replace "aaa"
AF129756	diffseq	conflict	12915	12915	1.000	+	.	Sequence "AF129756.6" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	15139	15139	1.000	+	.	Sequence "AF129756.7" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	17192	17192	1.000	+	.	Sequence "AF129756.8" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	19761	19761	1.000	+	.	Sequence "AF129756.9" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	20291	20291	1.000	+	.	Sequence "AF129756.10" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	20462	20462	1.000	+	.	Sequence "AF129756.11" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	25686	25686	1.000	+	.	Sequence "AF129756.12" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	26192	26192	1.000	+	.	Sequence "AF129756.13" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	27227	27227	1.000	+	.	Sequence "AF129756.14" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	27837	27837	1.000	+	.	Sequence "AF129756.15" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	29328	29328	1.000	+	.	Sequence "AF129756.16" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	29458	29458	1.000	+	.	Sequence "AF129756.17" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	29629	29629	1.000	+	.	Sequence "AF129756.18" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	29646	29646	1.000	+	.	Sequence "AF129756.19" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	30838	30838	1.000	+	.	Sequence "AF129756.20" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	31349	31349	1.000	+	.	Sequence "AF129756.21" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	31901	31901	1.000	+	.	Sequence "AF129756.22" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	36682	36682	1.000	+	.	Sequence "AF129756.23" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	38225	38226	1.000	+	.	Sequence "AF129756.24" ; note "Insertion of 2 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	38379	38379	1.000	+	.	Sequence "AF129756.25" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	38537	38537	1.000	+	.	Sequence "AF129756.26" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	39114	39114	1.000	+	.	Sequence "AF129756.27" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	39816	39816	1.000	+	.	Sequence "AF129756.28" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	40807	40807	1.000	+	.	Sequence "AF129756.29" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	40977	40977	1.000	+	.	Sequence "AF129756.30" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	41204	41204	1.000	+	.	Sequence "AF129756.31" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	42548	42548	1.000	+	.	Sequence "AF129756.32" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	45315	45315	1.000	+	.	Sequence "AF129756.33" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	48382	48382	1.000	+	.	Sequence "AF129756.34" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	50635	50635	1.000	+	.	Sequence "AF129756.35" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	50809	50809	1.000	+	.	Sequence "AF129756.36" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	51286	51286	1.000	+	.	Sequence "AF129756.37" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	51645	51645	1.000	+	.	Sequence "AF129756.38" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	52388	52388	1.000	+	.	Sequence "AF129756.39" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	52646	52646	1.000	+	.	Sequence "AF129756.40" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	53596	53596	1.000	+	.	Sequence "AF129756.41" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	53621	53621	1.000	+	.	Sequence "AF129756.42" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	54883	54883	1.000	+	.	Sequence "AF129756.43" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	55377	55377	1.000	+	.	Sequence "AF129756.44" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	55571	55571	1.000	+	.	Sequence "AF129756.45" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	55611	55611	1.000	+	.	Sequence "AF129756.46" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	55655	55661	1.000	+	.	Sequence "AF129756.47" ; note "Insertion of 7 bases in AF129756" ; replace ""


  [Part of this file has been deleted for brevity]

AF129756	diffseq	conflict	66604	66604	1.000	+	.	Sequence "AF129756.55" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	69445	69445	1.000	+	.	Sequence "AF129756.56" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	70182	70183	1.000	+	.	Sequence "AF129756.57" ; note "AP000504" ; replace "ta"
AF129756	diffseq	conflict	70195	70195	1.000	+	.	Sequence "AF129756.58" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	71102	71102	1.000	+	.	Sequence "AF129756.59" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	73566	73566	1.000	+	.	Sequence "AF129756.60" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	73758	73758	1.000	+	.	Sequence "AF129756.61" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	74597	74597	1.000	+	.	Sequence "AF129756.62" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	76175	76176	1.000	+	.	Sequence "AF129756.63" ; note "Insertion of 2 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	76463	76463	1.000	+	.	Sequence "AF129756.64" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	76710	76710	1.000	+	.	Sequence "AF129756.65" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	77331	77331	1.000	+	.	Sequence "AF129756.66" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	77597	77597	1.000	+	.	Sequence "AF129756.67" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	78092	78092	1.000	+	.	Sequence "AF129756.68" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	79671	79671	1.000	+	.	Sequence "AF129756.69" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	80042	80042	1.000	+	.	Sequence "AF129756.70" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	80115	80115	1.000	+	.	Sequence "AF129756.71" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	81882	81882	1.000	+	.	Sequence "AF129756.72" ; note "AP000504" ; replace "tttggaat"
AF129756	diffseq	conflict	82132	82132	1.000	+	.	Sequence "AF129756.73" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	83649	83649	1.000	+	.	Sequence "AF129756.74" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	84290	84290	1.000	+	.	Sequence "AF129756.75" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	86465	86465	1.000	+	.	Sequence "AF129756.76" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	86842	86844	1.000	+	.	Sequence "AF129756.77" ; note "Insertion of 3 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	87014	87014	1.000	+	.	Sequence "AF129756.78" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	87102	87102	1.000	+	.	Sequence "AF129756.79" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	87605	87605	1.000	+	.	Sequence "AF129756.80" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	87893	87893	1.000	+	.	Sequence "AF129756.81" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	88359	88359	1.000	+	.	Sequence "AF129756.82" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	88635	88635	1.000	+	.	Sequence "AF129756.83" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	88750	88750	1.000	+	.	Sequence "AF129756.84" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	88822	88826	1.000	+	.	Sequence "AF129756.85" ; note "Insertion of 5 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	89118	89118	1.000	+	.	Sequence "AF129756.86" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	89738	89738	1.000	+	.	Sequence "AF129756.87" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	91271	91271	1.000	+	.	Sequence "AF129756.88" ; note "Insertion of 1 bases in AF129756" ; replace ""
AF129756	diffseq	conflict	92311	92311	1.000	+	.	Sequence "AF129756.89" ; note "SNP in AP000504" ; replace "g"
AF129756	diffseq	conflict	92345	92345	1.000	+	.	Sequence "AF129756.90" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	93979	93979	1.000	+	.	Sequence "AF129756.91" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	94959	94959	1.000	+	.	Sequence "AF129756.92" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	95246	95246	1.000	+	.	Sequence "AF129756.93" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	95809	95810	1.000	+	.	Sequence "AF129756.94" ; note "AP000504" ; replace "aat"
AF129756	diffseq	conflict	96756	96756	1.000	+	.	Sequence "AF129756.95" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	97713	97713	1.000	+	.	Sequence "AF129756.96" ; note "AP000504" ; replace "tgtgtgtgtgtgtgtgt"
AF129756	diffseq	conflict	97827	97827	1.000	+	.	Sequence "AF129756.97" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	98195	98195	1.000	+	.	Sequence "AF129756.98" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	99280	99280	1.000	+	.	Sequence "AF129756.99" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	99726	99726	1.000	+	.	Sequence "AF129756.100" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	99890	99890	1.000	+	.	Sequence "AF129756.101" ; note "SNP in AP000504" ; replace "t"
AF129756	diffseq	conflict	101481	101481	1.000	+	.	Sequence "AF129756.102" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	102680	102680	1.000	+	.	Sequence "AF129756.103" ; note "SNP in AP000504" ; replace "c"
AF129756	diffseq	conflict	103744	103744	1.000	+	.	Sequence "AF129756.104" ; note "SNP in AP000504" ; replace "a"
AF129756	diffseq	conflict	103855	103855	1.000	+	.	Sequence "AF129756.105" ; note "SNP in AP000504" ; replace "c"

File: AP000504.diffgff

##gff-version 2.0
##date 2005-07-15
##Type DNA AP000504
AP000504	diffseq	conflict	847	847	1.000	+	.	Sequence "AP000504.1" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	1795	1795	1.000	+	.	Sequence "AP000504.2" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	2273	2273	1.000	+	.	Sequence "AP000504.3" ; note "Insertion of 1 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	2466	2466	1.000	+	.	Sequence "AP000504.4" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	2655	2658	1.000	+	.	Sequence "AP000504.5" ; note "Insertion of 4 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	4951	4953	1.000	+	.	Sequence "AP000504.6" ; note "AF129756" ; replace "tat"
AP000504	diffseq	conflict	6600	6600	1.000	+	.	Sequence "AP000504.7" ; note "Insertion of 1 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	6868	6868	1.000	+	.	Sequence "AP000504.8" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	8218	8221	1.000	+	.	Sequence "AP000504.9" ; note "Insertion of 4 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	9096	9096	1.000	+	.	Sequence "AP000504.10" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	11149	11149	1.000	+	.	Sequence "AP000504.11" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	13718	13718	1.000	+	.	Sequence "AP000504.12" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	14248	14248	1.000	+	.	Sequence "AP000504.13" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	14419	14419	1.000	+	.	Sequence "AP000504.14" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	19643	19643	1.000	+	.	Sequence "AP000504.15" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	20149	20149	1.000	+	.	Sequence "AP000504.16" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	21316	21319	1.000	+	.	Sequence "AP000504.17" ; note "Insertion of 4 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	21797	21797	1.000	+	.	Sequence "AP000504.18" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	23288	23288	1.000	+	.	Sequence "AP000504.19" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	23418	23418	1.000	+	.	Sequence "AP000504.20" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	23589	23589	1.000	+	.	Sequence "AP000504.21" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	23606	23606	1.000	+	.	Sequence "AP000504.22" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	24798	24798	1.000	+	.	Sequence "AP000504.23" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	25309	25309	1.000	+	.	Sequence "AP000504.24" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	25861	25861	1.000	+	.	Sequence "AP000504.25" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	28039	28040	1.000	+	.	Sequence "AP000504.26" ; note "Insertion of 2 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	30644	30644	1.000	+	.	Sequence "AP000504.27" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	32339	32339	1.000	+	.	Sequence "AP000504.28" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	32497	32497	1.000	+	.	Sequence "AP000504.29" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	33074	33074	1.000	+	.	Sequence "AP000504.30" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	33776	33776	1.000	+	.	Sequence "AP000504.31" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	34767	34767	1.000	+	.	Sequence "AP000504.32" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	35163	35163	1.000	+	.	Sequence "AP000504.33" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	36507	36507	1.000	+	.	Sequence "AP000504.34" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	37760	37762	1.000	+	.	Sequence "AP000504.35" ; note "Insertion of 3 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	38680	38683	1.000	+	.	Sequence "AP000504.36" ; note "Insertion of 4 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	42347	42347	1.000	+	.	Sequence "AP000504.37" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	42637	42638	1.000	+	.	Sequence "AP000504.38" ; note "Insertion of 2 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	44602	44602	1.000	+	.	Sequence "AP000504.39" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	44776	44776	1.000	+	.	Sequence "AP000504.40" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	45253	45253	1.000	+	.	Sequence "AP000504.41" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	46354	46354	1.000	+	.	Sequence "AP000504.42" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	46612	46612	1.000	+	.	Sequence "AP000504.43" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	47562	47562	1.000	+	.	Sequence "AP000504.44" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	47587	47587	1.000	+	.	Sequence "AP000504.45" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	48849	48849	1.000	+	.	Sequence "AP000504.46" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	49343	49343	1.000	+	.	Sequence "AP000504.47" ; note "SNP in AF129756" ; replace "a"


  [Part of this file has been deleted for brevity]

AP000504	diffseq	conflict	58685	58685	1.000	+	.	Sequence "AP000504.55" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	60558	60558	1.000	+	.	Sequence "AP000504.56" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	61209	61209	1.000	+	.	Sequence "AP000504.57" ; note "Insertion of 1 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	62958	62959	1.000	+	.	Sequence "AP000504.58" ; note "Insertion of 2 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	63402	63402	1.000	+	.	Sequence "AP000504.59" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	64139	64140	1.000	+	.	Sequence "AP000504.60" ; note "AF129756" ; replace "at"
AP000504	diffseq	conflict	64152	64152	1.000	+	.	Sequence "AP000504.61" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	65317	65317	1.000	+	.	Sequence "AP000504.62" ; note "Insertion of 1 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	67523	67523	1.000	+	.	Sequence "AP000504.63" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	67715	67715	1.000	+	.	Sequence "AP000504.64" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	68554	68554	1.000	+	.	Sequence "AP000504.65" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	69285	69285	1.000	+	.	Sequence "AP000504.66" ; note "Insertion of 1 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	70419	70419	1.000	+	.	Sequence "AP000504.67" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	70666	70666	1.000	+	.	Sequence "AP000504.68" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	71287	71287	1.000	+	.	Sequence "AP000504.69" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	71553	71553	1.000	+	.	Sequence "AP000504.70" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	72048	72048	1.000	+	.	Sequence "AP000504.71" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	73627	73627	1.000	+	.	Sequence "AP000504.72" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	73998	73998	1.000	+	.	Sequence "AP000504.73" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	74071	74071	1.000	+	.	Sequence "AP000504.74" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	75838	75845	1.000	+	.	Sequence "AP000504.75" ; note "AF129756" ; replace "g"
AP000504	diffseq	conflict	76095	76095	1.000	+	.	Sequence "AP000504.76" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	77612	77612	1.000	+	.	Sequence "AP000504.77" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	78253	78253	1.000	+	.	Sequence "AP000504.78" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	80428	80428	1.000	+	.	Sequence "AP000504.79" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	80974	80974	1.000	+	.	Sequence "AP000504.80" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	81564	81564	1.000	+	.	Sequence "AP000504.81" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	81852	81852	1.000	+	.	Sequence "AP000504.82" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	82318	82318	1.000	+	.	Sequence "AP000504.83" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	82594	82594	1.000	+	.	Sequence "AP000504.84" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	82709	82709	1.000	+	.	Sequence "AP000504.85" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	83072	83072	1.000	+	.	Sequence "AP000504.86" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	83692	83692	1.000	+	.	Sequence "AP000504.87" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	86264	86264	1.000	+	.	Sequence "AP000504.88" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	86298	86298	1.000	+	.	Sequence "AP000504.89" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	87932	87932	1.000	+	.	Sequence "AP000504.90" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	88912	88912	1.000	+	.	Sequence "AP000504.91" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	89199	89199	1.000	+	.	Sequence "AP000504.92" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	89762	89764	1.000	+	.	Sequence "AP000504.93" ; note "AF129756" ; replace "ca"
AP000504	diffseq	conflict	90710	90710	1.000	+	.	Sequence "AP000504.94" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	91667	91683	1.000	+	.	Sequence "AP000504.95" ; note "AF129756" ; replace "g"
AP000504	diffseq	conflict	91797	91797	1.000	+	.	Sequence "AP000504.96" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	92165	92165	1.000	+	.	Sequence "AP000504.97" ; note "SNP in AF129756" ; replace "a"
AP000504	diffseq	conflict	93250	93250	1.000	+	.	Sequence "AP000504.98" ; note "SNP in AF129756" ; replace "c"
AP000504	diffseq	conflict	93696	93696	1.000	+	.	Sequence "AP000504.99" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	93860	93860	1.000	+	.	Sequence "AP000504.100" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	95451	95451	1.000	+	.	Sequence "AP000504.101" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	96650	96650	1.000	+	.	Sequence "AP000504.102" ; note "SNP in AF129756" ; replace "t"
AP000504	diffseq	conflict	97273	97274	1.000	+	.	Sequence "AP000504.103" ; note "Insertion of 2 bases in AP000504" ; replace ""
AP000504	diffseq	conflict	97716	97716	1.000	+	.	Sequence "AP000504.104" ; note "SNP in AF129756" ; replace "g"
AP000504	diffseq	conflict	97827	97827	1.000	+	.	Sequence "AP000504.105" ; note "SNP in AF129756" ; replace "t"

The first line is the title giving the names of the sequences used.

The next two non-blank lines state the positions in each sequence where the detected overlap between them starts.

There then follows a set of reports of the mismatches between the sequences.
Each report consists of 4 or more lines.

The first line has the name of the first sequence followed by the start and end positions of the mismatched region in that sequence, followed by the length of the mismatched region. If the mismatched region is of zero length in this sequence, then only the position of the last matching base before the mismatch is given.
If a feature of the first sequence overlaps with this mismatch region, then one or more lines starting with 'Feature:' comes next with the type, position and tag field of the feature.
Next is a line starting "Sequence:" giving the sequence of the mismatch in the first sequence.

This is followed by the equivalent information for the second sequence, but in the reverse order, namely 'Sequence:' line, 'Feature:' lines and line giving the position of the mismatch in the second sequence.

At the end of the report are two non-blank lines giving the positions in each sequence where the detected overlap between them ends.

The last three lines of the report gives the counts of SNPs (defined as a change of one nucleotide to one other nucleotide, no deletions or insertions are counted, no multi-base changes are counted).

If the input sequences are nucleic acid, The counts of transitions (Pyrimide to Pyrimidine or Purine to Purine) and transversions (Pyrimidine to Purine) are also given.

It should be noted that not all features are reported.

The 'source' feature found in all EMBL/Genbank feature table entries is not reported as this covers all of the sequence and so overlaps with any difference found in that sequence and so is uninformative and irritating. It has therefore been removed from the output report.

The translation information of CDS features is often extremely long and does not add useful information to the report. It has therefore been removed from the output report.

Data files

None

Notes

It should be noted that not all features are reported.

The translation information of CDS features is often extremely long and does not add useful information to the report. It has therefore been removed from the output report.

If you run out of memory, use a larger word size.

Using a larger word size increases the length between mismatches that will be reported as one event. Thus a word size of 50 will report two single-base differences that are with 50 bases of each other as one mismatch.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

Gary Williams (gwilliam © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Written 15th Aug 2000 - Gary Williams.

18th Aug 2000 - Added writing out GFF files of the mismatched regions

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None

Function

Description

Usage

Command line arguments

Input file format

Input files for usage example

Database entry: tembl:ap000504

Database entry: tembl:af129756

Output file format

Output files for usage example

File: ap000504.diffseq

File: AF129756.diffgff

File: AP000504.diffgff

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Author(s)

History

Target users

Comments