prettyplot

 

Function

Displays aligned sequences, with colouring and boxing

Description

prettyplot reads in a set of aligned DNA or protein sequences. It displays them graphically, with conserved regions highlighted in various ways.

Usage

Here is a sample session with prettyplot


% prettyplot -resbreak=10 -boxcol -consensus -plurality=3 
Displays aligned sequences, with colouring and boxing
Input sequence set: globins.msf
Graph type [x11]: cps

Created prettyplot.ps

Go to the input files for this example
Go to the output files for this example

Example 2


% prettyplot globins.msf -plurality=3 -docolour 
Displays aligned sequences, with colouring and boxing
Graph type [x11]: cps

Created prettyplot.ps

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequences]         seqset     File containing a sequence alignment
   -graph              graph      Graph type

   Additional (Optional) qualifiers:
   -matrixfile         matrix     This is the scoring matrix file used when
                                  comparing sequences. By default it is the
                                  file 'EBLOSUM62' (for proteins) or the file
                                  'EDNAFULL' (for nucleic sequences). These
                                  files are found in the 'data' directory of
                                  the EMBOSS installation.
   -residuesperline    integer    The number of residues to be displayed on
                                  each line
   -resbreak           integer    Residues before a space
   -[no]ccolours       boolean    Colour residues by their consensus value.
   -cidentity          string     Colour to display identical residues (RED)
   -csimilarity        string     Colour to display similar residues (GREEN)
   -cother             string     Colour to display other residues (BLACK)
   -docolour           boolean    Colour residues by table oily, amide etc.
   -[no]title          boolean    Do not display the title
   -shade              string     Set to BPLW for normal shading
                                  so for pair = 1.5,1.0,0.5 and shade = BPLW
                                  Residues score Colour
                                  1.5 or over....... BLACK (B)
                                  1.0 to 1.5 ....... BROWN (P)
                                  0.5 to 1.0 ....... WHEAT (L)
                                  under 0.5 ....... WHITE (W)
                                  The only four letters allowed are BPLW, in
                                  any order.
   -pair               string     Values to represent identical similar
                                  related
   -identity           integer    Only match those which are identical in all
                                  sequences.
   -[no]box            boolean    Display prettyboxes
   -boxcol             boolean    Colour the background in the boxes
   -boxcolval          string     Colour to be used for background. (GREY)
   -[no]name           boolean    Display the sequence names
   -maxnamelen         integer    Margin size for the sequence name.
   -[no]number         boolean    Display the residue number
   -[no]listoptions    boolean    Display the date and options used
   -plurality          float      Plurality check value (totweight/2)
   -consensus          boolean    Display the consensus
   -[no]collision      boolean    Allow collisions in calculating consensus
   -alternative        integer    Use alternative collisions routine
                                  0) Normal collision check. (default)
                                  1) checks identical scores with the max
                                  score found. So if any other residue matches
                                  the identical score then a collision has
                                  occurred.
                                  2) If another residue has a greater than or
                                  equal to matching score and these do not
                                  match then a collision has occurred.
                                  3) Checks all those not in the current
                                  consensus.If any of these give a top score
                                  for matching or identical scores then a
                                  collision has occured.
   -showscore          integer    Print residue scores
   -portrait           boolean    Set page to Portrait

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-sequences" associated qualifiers
   -sbegin1             integer    Start of each sequence to be used
   -send1               integer    End of each sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-graph" associated qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


Standard (Mandatory) qualifiers Allowed values Default
[-sequences]
(Parameter 1)
File containing a sequence alignment Readable set of sequences Required
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png, xml EMBOSS_GRAPHICS value, or x11
Additional (Optional) qualifiers Allowed values Default
-matrixfile This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. Comparison matrix file in EMBOSS data path EBLOSUM62 for protein
EDNAFULL for DNA
-residuesperline The number of residues to be displayed on each line Any integer value 50
-resbreak Residues before a space Integer 1 or more Same as -residuesperline to give no breaks
-[no]ccolours Colour residues by their consensus value. Boolean value Yes/No Yes
-cidentity Colour to display identical residues (RED) Any string is accepted RED
-csimilarity Colour to display similar residues (GREEN) Any string is accepted GREEN
-cother Colour to display other residues (BLACK) Any string is accepted BLACK
-docolour Colour residues by table oily, amide etc. Boolean value Yes/No No
-[no]title Do not display the title Boolean value Yes/No Yes
-shade Set to BPLW for normal shading so for pair = 1.5,1.0,0.5 and shade = BPLW Residues score Colour 1.5 or over....... BLACK (B) 1.0 to 1.5 ....... BROWN (P) 0.5 to 1.0 ....... WHEAT (L) under 0.5 ....... WHITE (W) The only four letters allowed are BPLW, in any order. A string up to 4 characters, matching regular expression /^([BPLW]{4})?$/ An empty string is accepted
-pair Values to represent identical similar related Any string is accepted 1.5,1.0,0.5
-identity Only match those which are identical in all sequences. Integer 0 or more 0
-[no]box Display prettyboxes Boolean value Yes/No Yes
-boxcol Colour the background in the boxes Boolean value Yes/No No
-boxcolval Colour to be used for background. (GREY) Any string is accepted GREY
-[no]name Display the sequence names Boolean value Yes/No Yes
-maxnamelen Margin size for the sequence name. Any integer value 10
-[no]number Display the residue number Boolean value Yes/No Yes
-[no]listoptions Display the date and options used Boolean value Yes/No Yes
-plurality Plurality check value (totweight/2) Any numeric value Half the total sequence weighting
-consensus Display the consensus Boolean value Yes/No No
-[no]collision Allow collisions in calculating consensus Boolean value Yes/No Yes
-alternative Use alternative collisions routine 0) Normal collision check. (default) 1) checks identical scores with the max score found. So if any other residue matches the identical score then a collision has occurred. 2) If another residue has a greater than or equal to matching score and these do not match then a collision has occurred. 3) Checks all those not in the current consensus.If any of these give a top score for matching or identical scores then a collision has occured. Integer from 0 to 3 0
-showscore Print residue scores Any integer value -1
-portrait Set page to Portrait Boolean value Yes/No No
Advanced (Unprompted) qualifiers Allowed values Default
(none)

Input file format

prettyplot reads any sequence USA.

Input files for usage example

File: globins.msf

!!AA_MULTIPLE_ALIGNMENT 1.0

  ../data/globins.msf MSF:  164 Type: P 25/06/01 CompCheck: 4278 ..

  Name: HBB_HUMAN Len: 164  Check: 6914 Weight: 0.14
  Name: HBB_HORSE Len: 164  Check: 6007 Weight: 0.15
  Name: HBA_HUMAN Len: 164  Check: 3921 Weight: 0.15
  Name: HBA_HORSE Len: 164  Check: 4770 Weight: 0.19
  Name: MYG_PHYCA Len: 164  Check: 7930 Weight: 0.23
  Name: GLB5_PETMA Len: 164  Check: 1857 Weight: 0.21
  Name: LGB2_LUPLU Len: 164  Check: 2879 Weight: 0.10

//

           1                                               50
HBB_HUMAN  ~~~~~~~~VHLTPEEKSAVTALWGKVN.VDEVGGEALGR.LLVVYPWTQR
HBB_HORSE  ~~~~~~~~VQLSGEEKAAVLALWDKVN.EEEVGGEALGR.LLVVYPWTQR
HBA_HUMAN  ~~~~~~~~~~~~~~VLSPADKTNVKAA.WGKVGAHAGEYGAEALERMFLS
HBA_HORSE  ~~~~~~~~~~~~~~VLSAADKTNVKAA.WSKVGGHAGEYGAEALERMFLG
MYG_PHYCA  ~~~~~~~VLSEGEWQLVLHVWAKVEAD.VAGHGQDILIR.LFKSHPETLE
GLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQE
LGB2_LUPLU ~~~~~~~~GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKD

           51                                             100
HBB_HUMAN  FFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSE
HBB_HORSE  FFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSE
HBA_HUMAN  FPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD
HBA_HORSE  FPTTKTYFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSD
MYG_PHYCA  KFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQ
GLB5_PETMA FFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRD
LGB2_LUPLU LFSFLKGTSEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKN

           101                                            150
HBB_HUMAN  LHCDKLH..VDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVA
HBB_HORSE  LHCDKLH..VDPENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVA
HBA_HUMAN  LHAHKLR..VDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVS
HBA_HORSE  LHAHKLR..VDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVS
MYG_PHYCA  SHATKHK..IPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFR
GLB5_PETMA LSGKHAK..SFQVDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSA
LGB2_LUPLU LGSVHVSKGVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELA

           151        164
HBB_HUMAN  NALAHKYH~~~~~~
HBB_HORSE  NALAHKYH~~~~~~
HBA_HUMAN  TVLTSKYR~~~~~~
HBA_HORSE  TVLTSKYR~~~~~~
MYG_PHYCA  KDIAAKYKELGYQG
GLB5_PETMA Y~~~~~~~~~~~~~
LGB2_LUPLU IVIKKEMNDAA~~~

Output file format

An image of the alignment is displayed on the specified graphics device.

Output files for usage example

Graphics File: prettyplot.ps

[prettyplot results]

Output files for usage example 2

Graphics File: prettyplot.ps

[prettyplot results]

Data files

Prettyplot uses a comparison matrix file to calculate similarity to the consensus.

For protein sequences EBLOSUM62 is used for the substitution matrix. For nucleotide sequence, EDNAFULL is used.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It exits with status 0 unless an error is reported.

Known bugs

Portrait mode does not cover the whole page! This is a "feature" in plplot.

See also

Program nameDescription
abiviewReads ABI file and display the trace
cirdnaDraws circular maps of DNA constructs
emmaMultiple alignment program - interface to ClustalW program
infoalignInformation on a multiple sequence alignment
lindnaDraws linear maps of DNA constructs
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
plotconPlot quality of conservation of a sequence alignment
prettyseqOutput sequence with translated ranges
remapDisplay sequence with restriction sites, translation etc
seealsoFinds programs sharing group names
showalignDisplays a multiple sequence alignment
showdbDisplays information on the currently available databases
showfeatShow features of a sequence
showseqDisplay a sequence with features, translation etc
sixpackDisplay a DNA sequence with 6-frame translation and ORFs
textsearchSearch sequence documentation. Slow, use SRS and Entrez!
tranalignAlign nucleic coding regions given the aligned proteins

Author(s)

Ian Longden (il © sanger.ac.uk)
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Many features were first implemented in the EGCG program "prettyplot" by Peter Rice (pmr © ebi.ac.uk)
Informatics Division, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

The original suggestions for the PrettyPlot program were from Denis Duboule and Sigfried Labeit at EMBL. Gert Vriend added the star marking. Rita Grandori suggested the -NOCOLLISION option.

History

Completed 5th May 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None