Improving pan-genome annotation using whole genome multiple alignment View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2011-06-30

AUTHORS

Samuel V Angiuoli, Julie C Dunning Hotopp, Steven L Salzberg, Hervé Tettelin

ABSTRACT

BackgroundRapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes.ResultsWe introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review.ConclusionsWhole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency. More... »

PAGES

272

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-12-272

DOI

http://dx.doi.org/10.1186/1471-2105-12-272

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1033419151

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/21718539


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Chromosome Mapping", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Bacterial", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Annotation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA", 
          "id": "http://www.grid.ac/institutes/grid.411024.2", 
          "name": [
            "Center for Bioinformatics and Computational Biology, University of Maryland, 20742, College Park, MD, USA", 
            "Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Angiuoli", 
        "givenName": "Samuel V", 
        "id": "sg:person.01343322453.53", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01343322453.53"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA", 
          "id": "http://www.grid.ac/institutes/grid.411024.2", 
          "name": [
            "Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Dunning Hotopp", 
        "givenName": "Julie C", 
        "id": "sg:person.01322263734.83", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01322263734.83"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Center for Bioinformatics and Computational Biology, University of Maryland, 20742, College Park, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.164295.d", 
          "name": [
            "Center for Bioinformatics and Computational Biology, University of Maryland, 20742, College Park, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salzberg", 
        "givenName": "Steven L", 
        "id": "sg:person.01223441713.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA", 
          "id": "http://www.grid.ac/institutes/grid.411024.2", 
          "name": [
            "Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tettelin", 
        "givenName": "Herv\u00e9", 
        "id": "sg:person.01065620174.76", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01065620174.76"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nrmicro2462", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052475852", 
          "https://doi.org/10.1038/nrmicro2462"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2009-10-10-r110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020721403", 
          "https://doi.org/10.1186/gb-2009-10-10-r110"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.1457", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014321989", 
          "https://doi.org/10.1038/nmeth.1457"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-119", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026423599", 
          "https://doi.org/10.1186/1471-2105-11-119"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-131", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026489420", 
          "https://doi.org/10.1186/1471-2105-11-131"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-9-335", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005672418", 
          "https://doi.org/10.1186/1471-2164-9-335"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2011-06-30", 
    "datePublishedReg": "2011-06-30", 
    "description": "BackgroundRapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes.ResultsWe introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review.ConclusionsWhole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-12-272", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2519905", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2529425", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2529453", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "12"
      }
    ], 
    "keywords": [
      "whole-genome multiple alignments", 
      "translation initiation site", 
      "gene structure", 
      "genome annotation", 
      "multiple alignment", 
      "initiation site", 
      "draft genome sequencing", 
      "comparison of genomes", 
      "prokaryotic genomes", 
      "single species", 
      "genome", 
      "genome sequencing", 
      "annotation quality", 
      "multiple isolates", 
      "annotation", 
      "more variation", 
      "species", 
      "orthologs", 
      "pseudogenes", 
      "genes", 
      "new tool", 
      "sequencing", 
      "sites", 
      "comparative analysis", 
      "annotation consistency", 
      "isolates", 
      "alignment", 
      "structure", 
      "variation", 
      "identifies", 
      "tool", 
      "advances", 
      "identifies anomalies", 
      "edits", 
      "candidates", 
      "analysis", 
      "comparison", 
      "ResultsWe", 
      "review", 
      "efforts", 
      "area", 
      "set", 
      "quality", 
      "technology", 
      "anomalies", 
      "inconsistencies", 
      "such anomalies", 
      "consistency", 
      "evaluation", 
      "further review", 
      "error", 
      "problem areas"
    ], 
    "name": "Improving pan-genome annotation using whole genome multiple alignment", 
    "pagination": "272", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1033419151"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-12-272"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "21718539"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-12-272", 
      "https://app.dimensions.ai/details/publication/pub.1033419151"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-09-02T15:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/article/article_536.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-12-272"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-272'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-272'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-272'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-272'


 

This table displays all metadata directly associated to this object as RDF triples.

187 TRIPLES      21 PREDICATES      88 URIs      74 LITERALS      12 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-12-272 schema:about N64801545cb2b4b9a9e968510e7277432
2 N6d6b756ba0854f68aa81b6398ae9375e
3 N8f87faafede3431daa25f2019af60069
4 Ndbe128483821402eb8f0c05037a59196
5 Nf0cc45bb0b324c24a998f748c39edcd3
6 anzsrc-for:06
7 anzsrc-for:0604
8 schema:author Ne7c252a795e949e0840eacdbaa51318d
9 schema:citation sg:pub.10.1038/nmeth.1457
10 sg:pub.10.1038/nrmicro2462
11 sg:pub.10.1186/1471-2105-11-119
12 sg:pub.10.1186/1471-2105-11-131
13 sg:pub.10.1186/1471-2164-9-335
14 sg:pub.10.1186/gb-2009-10-10-r110
15 schema:datePublished 2011-06-30
16 schema:datePublishedReg 2011-06-30
17 schema:description BackgroundRapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes.ResultsWe introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review.ConclusionsWhole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.
18 schema:genre article
19 schema:isAccessibleForFree true
20 schema:isPartOf N46ee74af2987412a8bf9ff35eb90cff5
21 Nadc7ea5f9df14baeb75610b4026ef49d
22 sg:journal.1023786
23 schema:keywords ResultsWe
24 advances
25 alignment
26 analysis
27 annotation
28 annotation consistency
29 annotation quality
30 anomalies
31 area
32 candidates
33 comparative analysis
34 comparison
35 comparison of genomes
36 consistency
37 draft genome sequencing
38 edits
39 efforts
40 error
41 evaluation
42 further review
43 gene structure
44 genes
45 genome
46 genome annotation
47 genome sequencing
48 identifies
49 identifies anomalies
50 inconsistencies
51 initiation site
52 isolates
53 more variation
54 multiple alignment
55 multiple isolates
56 new tool
57 orthologs
58 problem areas
59 prokaryotic genomes
60 pseudogenes
61 quality
62 review
63 sequencing
64 set
65 single species
66 sites
67 species
68 structure
69 such anomalies
70 technology
71 tool
72 translation initiation site
73 variation
74 whole-genome multiple alignments
75 schema:name Improving pan-genome annotation using whole genome multiple alignment
76 schema:pagination 272
77 schema:productId N97dccc3b932e40648f11bd886e195fbb
78 Nc6b46743409a4491bceb5f6f14aef626
79 Nd1c37bdade3243b9bc41739c6678380f
80 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033419151
81 https://doi.org/10.1186/1471-2105-12-272
82 schema:sdDatePublished 2022-09-02T15:54
83 schema:sdLicense https://scigraph.springernature.com/explorer/license/
84 schema:sdPublisher N58dab88af517429cb872564eafb7a724
85 schema:url https://doi.org/10.1186/1471-2105-12-272
86 sgo:license sg:explorer/license/
87 sgo:sdDataset articles
88 rdf:type schema:ScholarlyArticle
89 N46ee74af2987412a8bf9ff35eb90cff5 schema:volumeNumber 12
90 rdf:type schema:PublicationVolume
91 N58dab88af517429cb872564eafb7a724 schema:name Springer Nature - SN SciGraph project
92 rdf:type schema:Organization
93 N64801545cb2b4b9a9e968510e7277432 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
94 schema:name Chromosome Mapping
95 rdf:type schema:DefinedTerm
96 N6d6b756ba0854f68aa81b6398ae9375e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
97 schema:name Genome, Bacterial
98 rdf:type schema:DefinedTerm
99 N7615dbfd1d754ccbbae46dfb5be7b519 rdf:first sg:person.01065620174.76
100 rdf:rest rdf:nil
101 N8f87faafede3431daa25f2019af60069 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
102 schema:name Molecular Sequence Annotation
103 rdf:type schema:DefinedTerm
104 N97dccc3b932e40648f11bd886e195fbb schema:name doi
105 schema:value 10.1186/1471-2105-12-272
106 rdf:type schema:PropertyValue
107 N9a8e8052ccf041ab9e3146dc708e45d2 rdf:first sg:person.01223441713.02
108 rdf:rest N7615dbfd1d754ccbbae46dfb5be7b519
109 Nadc7ea5f9df14baeb75610b4026ef49d schema:issueNumber 1
110 rdf:type schema:PublicationIssue
111 Nc6b46743409a4491bceb5f6f14aef626 schema:name pubmed_id
112 schema:value 21718539
113 rdf:type schema:PropertyValue
114 Nd1c37bdade3243b9bc41739c6678380f schema:name dimensions_id
115 schema:value pub.1033419151
116 rdf:type schema:PropertyValue
117 Ndbe128483821402eb8f0c05037a59196 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
118 schema:name Sequence Alignment
119 rdf:type schema:DefinedTerm
120 Ne7c252a795e949e0840eacdbaa51318d rdf:first sg:person.01343322453.53
121 rdf:rest Nfbba48e0931d4bf4acd0fac955c5535b
122 Nf0cc45bb0b324c24a998f748c39edcd3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
123 schema:name Bacteria
124 rdf:type schema:DefinedTerm
125 Nfbba48e0931d4bf4acd0fac955c5535b rdf:first sg:person.01322263734.83
126 rdf:rest N9a8e8052ccf041ab9e3146dc708e45d2
127 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
128 schema:name Biological Sciences
129 rdf:type schema:DefinedTerm
130 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
131 schema:name Genetics
132 rdf:type schema:DefinedTerm
133 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-272
134 rdf:type schema:MonetaryGrant
135 sg:grant.2529425 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-272
136 rdf:type schema:MonetaryGrant
137 sg:grant.2529453 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-272
138 rdf:type schema:MonetaryGrant
139 sg:journal.1023786 schema:issn 1471-2105
140 schema:name BMC Bioinformatics
141 schema:publisher Springer Nature
142 rdf:type schema:Periodical
143 sg:person.01065620174.76 schema:affiliation grid-institutes:grid.411024.2
144 schema:familyName Tettelin
145 schema:givenName Hervé
146 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01065620174.76
147 rdf:type schema:Person
148 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.164295.d
149 schema:familyName Salzberg
150 schema:givenName Steven L
151 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
152 rdf:type schema:Person
153 sg:person.01322263734.83 schema:affiliation grid-institutes:grid.411024.2
154 schema:familyName Dunning Hotopp
155 schema:givenName Julie C
156 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01322263734.83
157 rdf:type schema:Person
158 sg:person.01343322453.53 schema:affiliation grid-institutes:grid.411024.2
159 schema:familyName Angiuoli
160 schema:givenName Samuel V
161 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01343322453.53
162 rdf:type schema:Person
163 sg:pub.10.1038/nmeth.1457 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014321989
164 https://doi.org/10.1038/nmeth.1457
165 rdf:type schema:CreativeWork
166 sg:pub.10.1038/nrmicro2462 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052475852
167 https://doi.org/10.1038/nrmicro2462
168 rdf:type schema:CreativeWork
169 sg:pub.10.1186/1471-2105-11-119 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026423599
170 https://doi.org/10.1186/1471-2105-11-119
171 rdf:type schema:CreativeWork
172 sg:pub.10.1186/1471-2105-11-131 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026489420
173 https://doi.org/10.1186/1471-2105-11-131
174 rdf:type schema:CreativeWork
175 sg:pub.10.1186/1471-2164-9-335 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005672418
176 https://doi.org/10.1186/1471-2164-9-335
177 rdf:type schema:CreativeWork
178 sg:pub.10.1186/gb-2009-10-10-r110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020721403
179 https://doi.org/10.1186/gb-2009-10-10-r110
180 rdf:type schema:CreativeWork
181 grid-institutes:grid.164295.d schema:alternateName Center for Bioinformatics and Computational Biology, University of Maryland, 20742, College Park, MD, USA
182 schema:name Center for Bioinformatics and Computational Biology, University of Maryland, 20742, College Park, MD, USA
183 rdf:type schema:Organization
184 grid-institutes:grid.411024.2 schema:alternateName Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA
185 schema:name Center for Bioinformatics and Computational Biology, University of Maryland, 20742, College Park, MD, USA
186 Institute for Genome Sciences (IGS), University of Maryland Baltimore, 21201, Baltimore, Maryland, USA
187 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...