GATA: a graphic alignment tool for comparative sequence analysis View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2005-12

AUTHORS

David A Nix, Michael B Eisen

ABSTRACT

BACKGROUND: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments. RESULTS: To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file. CONCLUSIONS: GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0-200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis. More... »

PAGES

9

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-6-9

DOI

http://dx.doi.org/10.1186/1471-2105-6-9

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1009316807

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/15655071


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Amino Acid Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Animals", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Binding Sites", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Chromosome Mapping", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computer Graphics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Conserved Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Database Management Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Genetic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Protein", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Drosophila", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Deletion", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Duplication", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Data", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Mutation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Programming Languages", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Proteins", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Repetitive Sequences, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Lawrence Berkeley National Laboratory", 
          "id": "https://www.grid.ac/institutes/grid.184769.5", 
          "name": [
            "Department of Molecular and Cell Biology, University of California, 94720, Berkeley, CA, USA", 
            "Department of Genome Science, Life Science Division, Lawrence Berkeley National Laboratory, 94720, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nix", 
        "givenName": "David A", 
        "id": "sg:person.0701707747.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0701707747.50"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Lawrence Berkeley National Laboratory", 
          "id": "https://www.grid.ac/institutes/grid.184769.5", 
          "name": [
            "Department of Molecular and Cell Biology, University of California, 94720, Berkeley, CA, USA", 
            "Department of Genome Science, Life Science Division, Lawrence Berkeley National Laboratory, 94720, Berkeley, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Eisen", 
        "givenName": "Michael B", 
        "id": "sg:person.012502544064.15", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012502544064.15"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/molbev/msg140", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001597742"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/dbio.2002.0619", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005467732"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btg1005", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008039161"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2002-3-12-research0083", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010099513", 
          "https://doi.org/10.1186/gb-2002-3-12-research0083"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btg459", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011955235"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0022-2836(05)80360-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013618994"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/16.10.944", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014452632"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.1960404", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017077573"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35000615", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020681013", 
          "https://doi.org/10.1038/35000615"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35000615", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020681013", 
          "https://doi.org/10.1038/35000615"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1574-6968.1999.tb13575.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020936444"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1574-6968.1999.tb13575.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020936444"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2836(70)90057-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021169618"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1046/j.1525-142x.2001.01043.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023309430"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2836(81)90087-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024589839"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0959-437x(02)00345-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029658361"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btg406", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031076710"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0378-1119(95)00714-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032266910"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.2067704", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038796702"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/12.6.507", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039841649"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.2289704", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041222434"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.1957004", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050355176"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0959-437x(02)00355-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053600825"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0959-437x(02)00355-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053600825"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1083205397", 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2005-12", 
    "datePublishedReg": "2005-12-01", 
    "description": "BACKGROUND: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.\nRESULTS: To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file.\nCONCLUSIONS: GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0-200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1471-2105-6-9", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "6"
      }
    ], 
    "name": "GATA: a graphic alignment tool for comparative sequence analysis", 
    "pagination": "9", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "2c61bf64fd70e2792c489bf8210b2c29ffc856284270598fcfae12cc8d0cb9cb"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "15655071"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-6-9"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1009316807"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-6-9", 
      "https://app.dimensions.ai/details/publication/pub.1009316807"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T09:56", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89804_00000000.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2F1471-2105-6-9"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-6-9'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-6-9'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-6-9'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-6-9'


 

This table displays all metadata directly associated to this object as RDF triples.

255 TRIPLES      21 PREDICATES      79 URIs      49 LITERALS      37 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-6-9 schema:about N0f36778c201f40a299bb6ec2dc7988d2
2 N119952985c304a16a80e0fd1417770f4
3 N1de99fa5dd9d4b6abd6eb417426a4208
4 N23a65d8ea41d4300836fd8ff89eaa90b
5 N2a550faa7a7a4518a7248516c059e205
6 N32b11928a32d433d923e61227fdc34a2
7 N33729787ebcf425f8ceef340d9f7b9af
8 N48f0f5f570384714a8ae4ffa3bb393c8
9 N5fa598fa3ae14d338d52071c8c14e9d3
10 N61835e5682cf47f4b2aba2c3e148c92d
11 N665d1d2c8eab4492a4dd339454385684
12 N6b04490cdba0454d805d80b57d789352
13 N73020856fa394516b2c09c163c366792
14 N75f3a5c39029410da107b56bef2311cf
15 N7a8a60e1a28844eea5bb1b66e20d6eda
16 N7ed82e4e242c4e2ea6782bb9f10d0479
17 N8c79b29727f84fbc809af2f04875530c
18 Na2f97868cc744d588cba909c4b05a304
19 Na6419a2e4aa1492cb68d302cc016dd9f
20 Nab1e0b5dd4f54730a80d7f801dde8389
21 Nba48207ceaad4f97b0a9ee23fae584d4
22 Nca47d7caf90f4e2b98926a57da84bbe7
23 Nce1da0cac3e342819bee6ffa175cdc4c
24 Nd0c1096d981e408f9fd2a4ac029e1a0a
25 Ne5a7dc1c8c794b1eaf16f9a9c43fcef7
26 Ne82d86dd92d44331b4ec4b6d0ceac2c1
27 Necdc9a9b1180489e82a018b73580224d
28 Nfc273c239ca94088866910c69c9870ab
29 anzsrc-for:06
30 anzsrc-for:0604
31 schema:author N74676d6d385c4b75afea871d738d2d91
32 schema:citation sg:pub.10.1038/35000615
33 sg:pub.10.1186/gb-2002-3-12-research0083
34 https://app.dimensions.ai/details/publication/pub.1083205397
35 https://doi.org/10.1006/dbio.2002.0619
36 https://doi.org/10.1016/0022-2836(70)90057-4
37 https://doi.org/10.1016/0022-2836(81)90087-5
38 https://doi.org/10.1016/0378-1119(95)00714-8
39 https://doi.org/10.1016/s0022-2836(05)80360-2
40 https://doi.org/10.1016/s0959-437x(02)00345-3
41 https://doi.org/10.1016/s0959-437x(02)00355-6
42 https://doi.org/10.1046/j.1525-142x.2001.01043.x
43 https://doi.org/10.1093/bioinformatics/12.6.507
44 https://doi.org/10.1093/bioinformatics/16.10.944
45 https://doi.org/10.1093/bioinformatics/btg1005
46 https://doi.org/10.1093/bioinformatics/btg406
47 https://doi.org/10.1093/bioinformatics/btg459
48 https://doi.org/10.1093/molbev/msg140
49 https://doi.org/10.1101/gr.1957004
50 https://doi.org/10.1101/gr.1960404
51 https://doi.org/10.1101/gr.2067704
52 https://doi.org/10.1101/gr.2289704
53 https://doi.org/10.1111/j.1574-6968.1999.tb13575.x
54 schema:datePublished 2005-12
55 schema:datePublishedReg 2005-12-01
56 schema:description BACKGROUND: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments. RESULTS: To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file. CONCLUSIONS: GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0-200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.
57 schema:genre research_article
58 schema:inLanguage en
59 schema:isAccessibleForFree true
60 schema:isPartOf N9a93d5be2c784ceb971d2b5762900b57
61 Ncd2babbbb1c94b4ca7e7ee8a1cacc7b5
62 sg:journal.1023786
63 schema:name GATA: a graphic alignment tool for comparative sequence analysis
64 schema:pagination 9
65 schema:productId N3dbfbeb7e6aa4300913ca6dcc1b52944
66 N4326061c9c7c47148054438ef56ed242
67 N7c02b8d523d545e38d29d514b4f5b84c
68 Na9ccf75ea11b4f70831691317db3f2b1
69 Nb9f8e2a2101241449d3ea615aa8308d3
70 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009316807
71 https://doi.org/10.1186/1471-2105-6-9
72 schema:sdDatePublished 2019-04-11T09:56
73 schema:sdLicense https://scigraph.springernature.com/explorer/license/
74 schema:sdPublisher N26da6d8aae524a47b8b677714a020b8f
75 schema:url https://link.springer.com/10.1186%2F1471-2105-6-9
76 sgo:license sg:explorer/license/
77 sgo:sdDataset articles
78 rdf:type schema:ScholarlyArticle
79 N0f36778c201f40a299bb6ec2dc7988d2 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
80 schema:name DNA
81 rdf:type schema:DefinedTerm
82 N119952985c304a16a80e0fd1417770f4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
83 schema:name Chromosome Mapping
84 rdf:type schema:DefinedTerm
85 N1de99fa5dd9d4b6abd6eb417426a4208 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
86 schema:name Databases, Nucleic Acid
87 rdf:type schema:DefinedTerm
88 N23a65d8ea41d4300836fd8ff89eaa90b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
89 schema:name Gene Duplication
90 rdf:type schema:DefinedTerm
91 N26da6d8aae524a47b8b677714a020b8f schema:name Springer Nature - SN SciGraph project
92 rdf:type schema:Organization
93 N2a550faa7a7a4518a7248516c059e205 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
94 schema:name Amino Acid Sequence
95 rdf:type schema:DefinedTerm
96 N32b11928a32d433d923e61227fdc34a2 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
97 schema:name Mutation
98 rdf:type schema:DefinedTerm
99 N33729787ebcf425f8ceef340d9f7b9af schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
100 schema:name Drosophila
101 rdf:type schema:DefinedTerm
102 N3dbfbeb7e6aa4300913ca6dcc1b52944 schema:name pubmed_id
103 schema:value 15655071
104 rdf:type schema:PropertyValue
105 N4326061c9c7c47148054438ef56ed242 schema:name dimensions_id
106 schema:value pub.1009316807
107 rdf:type schema:PropertyValue
108 N48f0f5f570384714a8ae4ffa3bb393c8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
109 schema:name Gene Deletion
110 rdf:type schema:DefinedTerm
111 N4dd57065aac243369a188903afa20a1c rdf:first sg:person.012502544064.15
112 rdf:rest rdf:nil
113 N5fa598fa3ae14d338d52071c8c14e9d3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
114 schema:name Programming Languages
115 rdf:type schema:DefinedTerm
116 N61835e5682cf47f4b2aba2c3e148c92d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
117 schema:name Computer Graphics
118 rdf:type schema:DefinedTerm
119 N665d1d2c8eab4492a4dd339454385684 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
120 schema:name Base Sequence
121 rdf:type schema:DefinedTerm
122 N6b04490cdba0454d805d80b57d789352 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
123 schema:name Sequence Analysis, DNA
124 rdf:type schema:DefinedTerm
125 N73020856fa394516b2c09c163c366792 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
126 schema:name Conserved Sequence
127 rdf:type schema:DefinedTerm
128 N74676d6d385c4b75afea871d738d2d91 rdf:first sg:person.0701707747.50
129 rdf:rest N4dd57065aac243369a188903afa20a1c
130 N75f3a5c39029410da107b56bef2311cf schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
131 schema:name Binding Sites
132 rdf:type schema:DefinedTerm
133 N7a8a60e1a28844eea5bb1b66e20d6eda schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
134 schema:name Computational Biology
135 rdf:type schema:DefinedTerm
136 N7c02b8d523d545e38d29d514b4f5b84c schema:name readcube_id
137 schema:value 2c61bf64fd70e2792c489bf8210b2c29ffc856284270598fcfae12cc8d0cb9cb
138 rdf:type schema:PropertyValue
139 N7ed82e4e242c4e2ea6782bb9f10d0479 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
140 schema:name Molecular Sequence Data
141 rdf:type schema:DefinedTerm
142 N8c79b29727f84fbc809af2f04875530c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
143 schema:name Software
144 rdf:type schema:DefinedTerm
145 N9a93d5be2c784ceb971d2b5762900b57 schema:issueNumber 1
146 rdf:type schema:PublicationIssue
147 Na2f97868cc744d588cba909c4b05a304 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
148 schema:name Sequence Alignment
149 rdf:type schema:DefinedTerm
150 Na6419a2e4aa1492cb68d302cc016dd9f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
151 schema:name Database Management Systems
152 rdf:type schema:DefinedTerm
153 Na9ccf75ea11b4f70831691317db3f2b1 schema:name doi
154 schema:value 10.1186/1471-2105-6-9
155 rdf:type schema:PropertyValue
156 Nab1e0b5dd4f54730a80d7f801dde8389 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
157 schema:name Animals
158 rdf:type schema:DefinedTerm
159 Nb9f8e2a2101241449d3ea615aa8308d3 schema:name nlm_unique_id
160 schema:value 100965194
161 rdf:type schema:PropertyValue
162 Nba48207ceaad4f97b0a9ee23fae584d4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
163 schema:name Repetitive Sequences, Nucleic Acid
164 rdf:type schema:DefinedTerm
165 Nca47d7caf90f4e2b98926a57da84bbe7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
166 schema:name Algorithms
167 rdf:type schema:DefinedTerm
168 Ncd2babbbb1c94b4ca7e7ee8a1cacc7b5 schema:volumeNumber 6
169 rdf:type schema:PublicationVolume
170 Nce1da0cac3e342819bee6ffa175cdc4c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
171 schema:name Proteins
172 rdf:type schema:DefinedTerm
173 Nd0c1096d981e408f9fd2a4ac029e1a0a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
174 schema:name Genomics
175 rdf:type schema:DefinedTerm
176 Ne5a7dc1c8c794b1eaf16f9a9c43fcef7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
177 schema:name Sequence Analysis
178 rdf:type schema:DefinedTerm
179 Ne82d86dd92d44331b4ec4b6d0ceac2c1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
180 schema:name Genome
181 rdf:type schema:DefinedTerm
182 Necdc9a9b1180489e82a018b73580224d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
183 schema:name Databases, Protein
184 rdf:type schema:DefinedTerm
185 Nfc273c239ca94088866910c69c9870ab schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
186 schema:name Databases, Genetic
187 rdf:type schema:DefinedTerm
188 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
189 schema:name Biological Sciences
190 rdf:type schema:DefinedTerm
191 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
192 schema:name Genetics
193 rdf:type schema:DefinedTerm
194 sg:journal.1023786 schema:issn 1471-2105
195 schema:name BMC Bioinformatics
196 rdf:type schema:Periodical
197 sg:person.012502544064.15 schema:affiliation https://www.grid.ac/institutes/grid.184769.5
198 schema:familyName Eisen
199 schema:givenName Michael B
200 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012502544064.15
201 rdf:type schema:Person
202 sg:person.0701707747.50 schema:affiliation https://www.grid.ac/institutes/grid.184769.5
203 schema:familyName Nix
204 schema:givenName David A
205 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0701707747.50
206 rdf:type schema:Person
207 sg:pub.10.1038/35000615 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020681013
208 https://doi.org/10.1038/35000615
209 rdf:type schema:CreativeWork
210 sg:pub.10.1186/gb-2002-3-12-research0083 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010099513
211 https://doi.org/10.1186/gb-2002-3-12-research0083
212 rdf:type schema:CreativeWork
213 https://app.dimensions.ai/details/publication/pub.1083205397 schema:CreativeWork
214 https://doi.org/10.1006/dbio.2002.0619 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005467732
215 rdf:type schema:CreativeWork
216 https://doi.org/10.1016/0022-2836(70)90057-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021169618
217 rdf:type schema:CreativeWork
218 https://doi.org/10.1016/0022-2836(81)90087-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024589839
219 rdf:type schema:CreativeWork
220 https://doi.org/10.1016/0378-1119(95)00714-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032266910
221 rdf:type schema:CreativeWork
222 https://doi.org/10.1016/s0022-2836(05)80360-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013618994
223 rdf:type schema:CreativeWork
224 https://doi.org/10.1016/s0959-437x(02)00345-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029658361
225 rdf:type schema:CreativeWork
226 https://doi.org/10.1016/s0959-437x(02)00355-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053600825
227 rdf:type schema:CreativeWork
228 https://doi.org/10.1046/j.1525-142x.2001.01043.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1023309430
229 rdf:type schema:CreativeWork
230 https://doi.org/10.1093/bioinformatics/12.6.507 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039841649
231 rdf:type schema:CreativeWork
232 https://doi.org/10.1093/bioinformatics/16.10.944 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014452632
233 rdf:type schema:CreativeWork
234 https://doi.org/10.1093/bioinformatics/btg1005 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008039161
235 rdf:type schema:CreativeWork
236 https://doi.org/10.1093/bioinformatics/btg406 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031076710
237 rdf:type schema:CreativeWork
238 https://doi.org/10.1093/bioinformatics/btg459 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011955235
239 rdf:type schema:CreativeWork
240 https://doi.org/10.1093/molbev/msg140 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001597742
241 rdf:type schema:CreativeWork
242 https://doi.org/10.1101/gr.1957004 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050355176
243 rdf:type schema:CreativeWork
244 https://doi.org/10.1101/gr.1960404 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017077573
245 rdf:type schema:CreativeWork
246 https://doi.org/10.1101/gr.2067704 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038796702
247 rdf:type schema:CreativeWork
248 https://doi.org/10.1101/gr.2289704 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041222434
249 rdf:type schema:CreativeWork
250 https://doi.org/10.1111/j.1574-6968.1999.tb13575.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1020936444
251 rdf:type schema:CreativeWork
252 https://www.grid.ac/institutes/grid.184769.5 schema:alternateName Lawrence Berkeley National Laboratory
253 schema:name Department of Genome Science, Life Science Division, Lawrence Berkeley National Laboratory, 94720, Berkeley, CA, USA
254 Department of Molecular and Cell Biology, University of California, 94720, Berkeley, CA, USA
255 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...