EMBOSS: geecee manual

geecee

Function

Calculates fractional GC content of nucleic acid sequences

Description

This calculates the fraction of G+C bases of the input nucleic acid sequence(s).

It reads in nucleic acid sequences, sums the number of 'G' and 'C' bases and writes out the result as the fraction (in the interval 0.0 to 1.0) of the length of the whole sequence.

Usage

Here is a sample session with geecee


% geecee tembl:hhtetra 
Calculates fractional GC content of nucleic acid sequences
Output file [hhtetra.geecee]:

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1             integer    Start of each sequence to be used
   -send1               integer    End of each sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfile" associated qualifiers
   -odirectory2         string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths

Standard (Mandatory) qualifiers		Allowed values	Default
[-sequence] (Parameter 1)	Sequence database USA	Readable sequence(s)	Required
[-outfile] (Parameter 2)	Output file name	Output file	<sequence>.geecee
Additional (Optional) qualifiers		Allowed values	Default
(none)
Advanced (Unprompted) qualifiers		Allowed values	Default
(none)

Input file format

geecee reads any nucleic acid sequence USAs.

Input files for usage example

'tembl:hhtetra' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:hhtetra

ID   HHTETRA    standard; DNA; VRL; 1272 BP.
XX
AC   L46634; L46689;
XX
SV   L46634.1
XX
DT   06-NOV-1995 (Rel. 45, Created)
DT   04-MAR-2000 (Rel. 63, Last updated, Version 3)
XX
DE   Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region.
XX
KW   telomeric repeat.
XX
OS   Human herpesvirus 7
OC   Viruses; dsDNA viruses, no RNA stage; Herpesviridae; Betaherpesvirinae.
XX
RN   [1]
RP   1-1272
RX   MEDLINE; 96079055.
RA   Secchiero P., Nicholas J., Deng H., Xiaopeng T., van Loon N., Ruvolo V.R.,
RA   Berneman Z.N., Reitz M.S. Jr., Dewhurst S.;
RT   "Identification of human telomeric repeat motifs at the genome termini of
RT   human herpesvirus 7: structural analysis and heterogeneity";
RL   J. Virol. 69(12):8041-8045(1995).
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1272
FT                   /db_xref="taxon:10372"
FT                   /organism="Human herpesvirus 7"
FT                   /strain="JI"
FT                   /clone="ED132'1.2"
FT   repeat_region   207..928
FT                   /note="long and complex repeat region composed of various
FT                   direct repeats, including TAACCC (TRS), degenerate copies
FT                   of TRS motifs and a 14-bp repeat, TAGGGCTGCGGCCC"
FT   misc_signal     938..998
FT                   /note="pac2 motif"
FT   misc_feature    1009
FT                   /note="right genome terminus (...ACA)"
XX
SQ   Sequence 1272 BP; 346 A; 455 C; 222 G; 249 T; 0 other;
     aagcttaaac tgaggtcaca cacgacttta attacggcaa cgcaacagct gtaagctgca        60
     ggaaagatac gatcgtaagc aaatgtagtc ctacaatcaa gcgaggttgt agacgttacc       120
     tacaatgaac tacacctcta agcataacct gtcgggcaca gtgagacacg cagccgtaaa       180
     ttcaaaactc aacccaaacc gaagtctaag tctcacccta atcgtaacag taaccctaca       240
     actctaatcc tagtccgtaa ccgtaacccc aatcctagcc cttagcccta accctagccc       300
     taaccctagc tctaacctta gctctaactc tgaccctagg cctaacccta agcctaaccc       360
     taaccgtagc tctaagttta accctaaccc taaccctaac catgaccctg accctaaccc       420
     tagggctgcg gccctaaccc tagccctaac cctaacccta atcctaatcc tagccctaac       480
     cctagggctg cggccctaac cctagcccta accctaaccc taaccctagg gctgcggccc       540
     taaccctaac cctagggctg cggcccgaac cctaacccta accctaaccc taaccctagg       600
     gctgcggccc taaccctaac cctagggctg cggccctaac cctaacccta gggctgcggc       660
     ccgaacccta accctaaccc taaccctagg gctgcggccc taaccctaac cctagggctg       720
     cggccctaac cctaacccta actctagggc tgcggcccta accctaaccc taaccctaac       780
     cctagggctg cggcccgaac cctagcccta accctaaccc tgaccctgac cctaacccta       840
     accctaaccc taaccctaac cctaacccta accctaaccc taaccctaac cctaacccta       900
     accctaaccc taaccctaac cctaaccccg cccccactgg cagccaatgt cttgtaatgc       960
     cttcaaggca ctttttctgc gagccgcgcg cagcactcag tgaaaaacaa gtttgtgcac      1020
     gagaaagacg ctgccaaacc gcagctgcag catgaaggct gagtgcacaa ttttggcttt      1080
     agtcccataa aggcgcggct tcccgtagag tagaaaaccg cagcgcggcg cacagagcga      1140
     aggcagcggc tttcagactg tttgccaagc gcagtctgca tcttaccaat gatgatcgca      1200
     agcaagaaaa atgttctttc ttagcatatg cgtggttaat cctgttgtgg tcatcactaa      1260
     gttttcaagc tt                                                          1272
//

Output file format

Output files for usage example

File: hhtetra.geecee

#Sequence   GC content
HHTETRA       0.53

The first non-blank line is the title line. Subsequent lines consist of two columns of data.

The first column is the name of the sequence.
The second column is the percentage G+C content of the sequence.

Data files

None.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

0 on successful completion.

Known bugs

None.

Program name	Description
cpgplot	Plot CpG rich areas
cpgreport	Reports all CpG rich regions
newcpgreport	Report CpG rich areas
newcpgseek	Reports CpG rich regions

Author(s)

Richard Bruskiewich (r.bruskiewich@cgiar.org)
while he was at:
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

Completed 18th June 1999.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None /BODY>

Function

Description

Usage

Command line arguments

Input file format

Input files for usage example

Database entry: tembl:hhtetra

Output file format

Output files for usage example

File: hhtetra.geecee

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Author(s)

History

Target users

Comments