polydot

 

Function

Displays all-against-all dotplots of a set of sequences

Description

A dotplot is a graphical representation of the regions of similarity between two sequences.

The two sequences are placed on the axes of a rectangular image and (subject to threshold conditions) wherever there is a similarity between the sequences a dot is placed on the image.

Where the two sequences have substantial regions of similarity, many dots align to form diagonal lines. It is therefore possible to see at a glance where there are local regions of similarity.

polydot compares all sequences in a set of sequences, draws a dotplot for each pair of sequences by marking where words (tuples) of a specified length have an exact match in both sequences and optionally reports all identical matches to feature files.

Usage

Here is a sample session with polydot


% polydot globins.fasta -gtitle="Polydot of globins.fasta" -graph cps 
Displays all-against-all dotplots of a set of sequences
Word size [6]: 

Created polydot.ps

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-sequences]         seqset     File containing a sequence alignment
   -wordsize           integer    Word size
*  -outfeat            featout    Output features UFO
   -graph              graph      Graph type

   Additional (Optional) qualifiers:
   -[no]boxit          boolean    Draw a box around each dotplot
   -dumpfeat           toggle     Dump all matches as feature files

   Advanced (Unprompted) qualifiers:
   -gap                integer    This specifies the size of the gap that is
                                  used to separate the individual dotplots in
                                  the display. The size is measured in
                                  residues, as displayed in the output.

   Associated qualifiers:

   "-sequences" associated qualifiers
   -sbegin1             integer    Start of each sequence to be used
   -send1               integer    End of each sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfeat" associated qualifiers
   -offormat            string     Output feature format
   -ofopenfile          string     Features file name
   -ofextension         string     File name extension
   -ofdirectory         string     Output directory
   -ofname              string     Base file name
   -ofsingle            boolean    Separate file for each entry

   "-graph" associated qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


Standard (Mandatory) qualifiers Allowed values Default
[-sequences]
(Parameter 1)
File containing a sequence alignment Readable set of sequences Required
-wordsize Word size Integer 2 or more 6
-outfeat Output features UFO Writeable feature table unknown.gff
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png, xml EMBOSS_GRAPHICS value, or x11
Additional (Optional) qualifiers Allowed values Default
-[no]boxit Draw a box around each dotplot Boolean value Yes/No Yes
-dumpfeat Dump all matches as feature files Toggle value Yes/No No
Advanced (Unprompted) qualifiers Allowed values Default
-gap This specifies the size of the gap that is used to separate the individual dotplots in the display. The size is measured in residues, as displayed in the output. Integer 0 or more 10

Input file format

polydot reads in a set of nucleic or protein sequences.

The sequences may or may not be aligned.

Input files for usage example

File: globins.fasta

>HBB_HUMAN Sw:Hbb_Human => HBB_HUMAN
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV
KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK
EFTPPVQAAYQKVVAGVANALAHKYH
>HBB_HORSE Sw:Hbb_Horse => HBB_HORSE
VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKV
KAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDPENFRLLGNVLVVVLARHFGK
DFTPELQASYQKVVAGVANALAHKYH
>HBA_HUMAN Sw:Hba_Human => HBA_HUMAN
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
VHASLDKFLASVSTVLTSKYR
>HBA_HORSE Sw:Hba_Horse => HBA_HORSE
VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHGK
KVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTPA
VHASLDKFLSSVSTVLTSKYR
>MYG_PHYCA Sw:Myg_Phyca => MYG_PHYCA
VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASED
LKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHP
GDFGADAQGAMNKALELFRKDIAAKYKELGYQG
>GLB5_PETMA Sw:Glb5_Petma => GLB5_PETMA
PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT
ADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRDLSGKHAKSFQVDPQYFKVLA
AVIADTVAAGDAGFEKLMSMICILLRSAY
>LGB2_LUPLU Sw:Lgb2_Luplu => LGB2_LUPLU
GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSEVPQNNPEL
QAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVADAHFPVVKEAILKTIKE
VVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA

Output file format

A graphical image is displayed on the specified graphics device.

Output files for usage example

Graphics File: polydot.ps

[polydot results]

Data files

None.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

0 if successful.

Known bugs

None.

See also

Program nameDescription
dotmatcherDisplays a thresholded dotplot of two sequences
dotpathNon-overlapping wordmatch dotplot of two sequences
dottupDisplays a wordmatch dotplot of two sequences

Author(s)

Ian Longden (il © sanger.ac.uk)
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Completed 2nd June 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None