GENCODE: producing a reference annotation for ENCODE View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2006-08-07

AUTHORS

Jennifer Harrow, France Denoeud, Adam Frankish, Alexandre Reymond, Chao-Kung Chen, Jacqueline Chrast, Julien Lagarde, James GR Gilbert, Roy Storey, David Swarbreck, Colette Rossier, Catherine Ucla, Tim Hubbard, Stylianos E Antonarakis, Roderic Guigo

ABSTRACT

BackgroundThe GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.ResultsThe GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions.ConclusionIn total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation. More... »

PAGES

s4

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/gb-2006-7-s1-s4

DOI

http://dx.doi.org/10.1186/gb-2006-7-s1-s4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1003630706

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/16925838


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Chromosome Mapping", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Expressed Sequence Tags", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Human", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Proteins", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Pseudogenes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "RNA, Messenger", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Reference Standards", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, RNA", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.10306.34", 
          "name": [
            "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Harrow", 
        "givenName": "Jennifer", 
        "id": "sg:person.013015267207.78", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013015267207.78"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Grup de Recerca en Informatica Biomedica, Institut Municipal d'Informatica Medica-Universitat Pompeu Fabra, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain", 
          "id": "http://www.grid.ac/institutes/grid.5612.0", 
          "name": [
            "Grup de Recerca en Informatica Biomedica, Institut Municipal d'Informatica Medica-Universitat Pompeu Fabra, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Denoeud", 
        "givenName": "France", 
        "id": "sg:person.0772061470.31", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0772061470.31"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.10306.34", 
          "name": [
            "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Frankish", 
        "givenName": "Adam", 
        "id": "sg:person.0736671011.81", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0736671011.81"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland", 
          "id": "http://www.grid.ac/institutes/grid.9851.5", 
          "name": [
            "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland", 
            "Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Reymond", 
        "givenName": "Alexandre", 
        "id": "sg:person.01251740635.19", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01251740635.19"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.10306.34", 
          "name": [
            "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chen", 
        "givenName": "Chao-Kung", 
        "id": "sg:person.01150516171.95", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01150516171.95"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland", 
          "id": "http://www.grid.ac/institutes/grid.9851.5", 
          "name": [
            "Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chrast", 
        "givenName": "Jacqueline", 
        "id": "sg:person.0601046127.18", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0601046127.18"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland", 
          "id": "http://www.grid.ac/institutes/grid.150338.c", 
          "name": [
            "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lagarde", 
        "givenName": "Julien", 
        "id": "sg:person.01022622516.05", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01022622516.05"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.10306.34", 
          "name": [
            "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gilbert", 
        "givenName": "James GR", 
        "id": "sg:person.07416746422.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07416746422.35"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.10306.34", 
          "name": [
            "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Storey", 
        "givenName": "Roy", 
        "id": "sg:person.0664613663.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0664613663.50"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.10306.34", 
          "name": [
            "Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Swarbreck", 
        "givenName": "David", 
        "id": "sg:person.01333722575.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01333722575.09"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland", 
          "id": "http://www.grid.ac/institutes/grid.150338.c", 
          "name": [
            "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rossier", 
        "givenName": "Colette", 
        "id": "sg:person.0754631237.56", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0754631237.56"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland", 
          "id": "http://www.grid.ac/institutes/grid.150338.c", 
          "name": [
            "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ucla", 
        "givenName": "Catherine", 
        "id": "sg:person.01124164272.05", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01124164272.05"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Grup de Recerca en Informatica Biomedica, Institut Municipal d'Informatica Medica-Universitat Pompeu Fabra, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain", 
          "id": "http://www.grid.ac/institutes/grid.5612.0", 
          "name": [
            "Grup de Recerca en Informatica Biomedica, Institut Municipal d'Informatica Medica-Universitat Pompeu Fabra, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hubbard", 
        "givenName": "Tim", 
        "id": "sg:person.01036742770.11", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01036742770.11"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland", 
          "id": "http://www.grid.ac/institutes/grid.150338.c", 
          "name": [
            "Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Antonarakis", 
        "givenName": "Stylianos E", 
        "id": "sg:person.014566311317.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014566311317.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Centre de Regulacio Genomica, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain", 
          "id": "http://www.grid.ac/institutes/grid.11478.3b", 
          "name": [
            "Centre de Regulacio Genomica, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Guigo", 
        "givenName": "Roderic", 
        "id": "sg:person.01347214467.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01347214467.09"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/414865a", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030604127", 
          "https://doi.org/10.1038/414865a"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth733", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038506370", 
          "https://doi.org/10.1038/nmeth733"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature03001", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013534924", 
          "https://doi.org/10.1038/nature03001"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2006-7-s1-s2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048969371", 
          "https://doi.org/10.1186/gb-2006-7-s1-s2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00335-002-4002-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013569495", 
          "https://doi.org/10.1007/s00335-002-4002-5"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2006-08-07", 
    "datePublishedReg": "2006-08-07", 
    "description": "BackgroundThe GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.ResultsThe GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions.ConclusionIn total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/gb-2006-7-s1-s4", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "Suppl 1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "7"
      }
    ], 
    "keywords": [
      "protein-coding genes", 
      "GENCODE annotation", 
      "GENCODE consortium", 
      "ENCODE regions", 
      "alternative splice forms", 
      "new splice variant", 
      "putative transcripts", 
      "intergenic region", 
      "exon pairs", 
      "UCSC browser", 
      "human genome", 
      "rapid amplification", 
      "cDNA ends", 
      "exon boundaries", 
      "splice forms", 
      "unique exons", 
      "gene features", 
      "loci", 
      "splice variants", 
      "initial annotation", 
      "reference annotations", 
      "annotation", 
      "exons", 
      "genes", 
      "RT-PCR", 
      "Ensembl", 
      "genome", 
      "RefSeq", 
      "Encodes", 
      "transcripts", 
      "higher number", 
      "consortium", 
      "region", 
      "sequence", 
      "experimental validation", 
      "amplification", 
      "variants", 
      "manual annotation", 
      "pairs", 
      "subset", 
      "evidence", 
      "browser", 
      "addition", 
      "products", 
      "number", 
      "form", 
      "combination", 
      "novel", 
      "validation", 
      "part", 
      "end", 
      "features", 
      "total", 
      "results", 
      "comparison", 
      "set", 
      "different categories", 
      "extension", 
      "categories", 
      "boundaries", 
      "reference", 
      "experimental results", 
      "refinement", 
      "aid", 
      "comprehensiveness", 
      "reflection", 
      "collaboration", 
      "team"
    ], 
    "name": "GENCODE: producing a reference annotation for ENCODE", 
    "pagination": "s4", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1003630706"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/gb-2006-7-s1-s4"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "16925838"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/gb-2006-7-s1-s4", 
      "https://app.dimensions.ai/details/publication/pub.1003630706"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-12-01T06:25", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/article/article_411.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/gb-2006-7-s1-s4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/gb-2006-7-s1-s4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/gb-2006-7-s1-s4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/gb-2006-7-s1-s4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/gb-2006-7-s1-s4'


 

This table displays all metadata directly associated to this object as RDF triples.

312 TRIPLES      21 PREDICATES      111 URIs      98 LITERALS      20 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/gb-2006-7-s1-s4 schema:about N3d5f93475a4b4bd0a21e3d4f44fb4a82
2 N43155248ee014ce3911bbec56f28fed7
3 N53addfb90693406cb30bd60c538de892
4 N5527ae1972314842a48ec15956b62df8
5 N724b1e8e23b5476f80cb5b0d22522a49
6 N78895132b53444dc9c1532aaaf666bf6
7 N9374c04a0c5645529ae80b8a02963249
8 N9af9faf4e8354c88bcfb7c70a8ac39f9
9 Nc95e7aa52aa548c2916d35820f5f4574
10 Nce297c9c195f47c581192c2e8a51023a
11 Ne0edf36465014ba19accea0f4cd0318d
12 Nf806b6a7c6074e12b8dfff6f700e1f21
13 Nfea1bc7d88734e83953b6dc0746a9633
14 anzsrc-for:06
15 anzsrc-for:0604
16 schema:author N1f7624c40bfe48fa96e6080dcdde6934
17 schema:citation sg:pub.10.1007/s00335-002-4002-5
18 sg:pub.10.1038/414865a
19 sg:pub.10.1038/nature03001
20 sg:pub.10.1038/nmeth733
21 sg:pub.10.1186/gb-2006-7-s1-s2
22 schema:datePublished 2006-08-07
23 schema:datePublishedReg 2006-08-07
24 schema:description BackgroundThe GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.ResultsThe GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions.ConclusionIn total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.
25 schema:genre article
26 schema:isAccessibleForFree true
27 schema:isPartOf Nc7709dd5de244c5caf279a688879e318
28 Ne8ed0f865fb149f697da36af8cd55eb5
29 sg:journal.1023439
30 schema:keywords ENCODE regions
31 Encodes
32 Ensembl
33 GENCODE annotation
34 GENCODE consortium
35 RT-PCR
36 RefSeq
37 UCSC browser
38 addition
39 aid
40 alternative splice forms
41 amplification
42 annotation
43 boundaries
44 browser
45 cDNA ends
46 categories
47 collaboration
48 combination
49 comparison
50 comprehensiveness
51 consortium
52 different categories
53 end
54 evidence
55 exon boundaries
56 exon pairs
57 exons
58 experimental results
59 experimental validation
60 extension
61 features
62 form
63 gene features
64 genes
65 genome
66 higher number
67 human genome
68 initial annotation
69 intergenic region
70 loci
71 manual annotation
72 new splice variant
73 novel
74 number
75 pairs
76 part
77 products
78 protein-coding genes
79 putative transcripts
80 rapid amplification
81 reference
82 reference annotations
83 refinement
84 reflection
85 region
86 results
87 sequence
88 set
89 splice forms
90 splice variants
91 subset
92 team
93 total
94 transcripts
95 unique exons
96 validation
97 variants
98 schema:name GENCODE: producing a reference annotation for ENCODE
99 schema:pagination s4
100 schema:productId N146c79f311344295869a715e41780969
101 Na1866f7f5d5c40ee85f8659c35504630
102 Ne46817dc06234d308985ef3520f0df6f
103 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003630706
104 https://doi.org/10.1186/gb-2006-7-s1-s4
105 schema:sdDatePublished 2022-12-01T06:25
106 schema:sdLicense https://scigraph.springernature.com/explorer/license/
107 schema:sdPublisher N58a3ddd954074131b936fb1aca55263a
108 schema:url https://doi.org/10.1186/gb-2006-7-s1-s4
109 sgo:license sg:explorer/license/
110 sgo:sdDataset articles
111 rdf:type schema:ScholarlyArticle
112 N146c79f311344295869a715e41780969 schema:name dimensions_id
113 schema:value pub.1003630706
114 rdf:type schema:PropertyValue
115 N1afb1355e60e48ea966bb8a145c32060 rdf:first sg:person.01150516171.95
116 rdf:rest N982d32c9ca6d4f9e803e87bf2d909a58
117 N1f7624c40bfe48fa96e6080dcdde6934 rdf:first sg:person.013015267207.78
118 rdf:rest Ncb75d47bbd5e4ef2bba790b75b15d298
119 N27744b1f44cf4448991f99cdb6dfe51d rdf:first sg:person.0754631237.56
120 rdf:rest Na9b5dfc58b7a4379ac384d7051270d6e
121 N2fc7ac2383eb439ba41b794811369c02 rdf:first sg:person.01333722575.09
122 rdf:rest N27744b1f44cf4448991f99cdb6dfe51d
123 N3d5f93475a4b4bd0a21e3d4f44fb4a82 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
124 schema:name Reference Standards
125 rdf:type schema:DefinedTerm
126 N43155248ee014ce3911bbec56f28fed7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
127 schema:name Humans
128 rdf:type schema:DefinedTerm
129 N53addfb90693406cb30bd60c538de892 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
130 schema:name Expressed Sequence Tags
131 rdf:type schema:DefinedTerm
132 N5527ae1972314842a48ec15956b62df8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
133 schema:name RNA, Messenger
134 rdf:type schema:DefinedTerm
135 N5697f5c51ec04da186e8fb9f4e12a10d rdf:first sg:person.01251740635.19
136 rdf:rest N1afb1355e60e48ea966bb8a145c32060
137 N58a3ddd954074131b936fb1aca55263a schema:name Springer Nature - SN SciGraph project
138 rdf:type schema:Organization
139 N724b1e8e23b5476f80cb5b0d22522a49 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
140 schema:name Sequence Analysis, DNA
141 rdf:type schema:DefinedTerm
142 N78895132b53444dc9c1532aaaf666bf6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
143 schema:name Genome, Human
144 rdf:type schema:DefinedTerm
145 N7cc1251c42e54b35a4cfa68ddd554bca rdf:first sg:person.014566311317.54
146 rdf:rest Nca885c3397cd43ebac5c117452a2363d
147 N89c455a6648546d7b185bf2b32d4c8a2 rdf:first sg:person.0736671011.81
148 rdf:rest N5697f5c51ec04da186e8fb9f4e12a10d
149 N9374c04a0c5645529ae80b8a02963249 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
150 schema:name Pseudogenes
151 rdf:type schema:DefinedTerm
152 N982d32c9ca6d4f9e803e87bf2d909a58 rdf:first sg:person.0601046127.18
153 rdf:rest Nb073934d3f59473ebbbb1e946820d203
154 N9af9faf4e8354c88bcfb7c70a8ac39f9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
155 schema:name Computational Biology
156 rdf:type schema:DefinedTerm
157 Na1866f7f5d5c40ee85f8659c35504630 schema:name doi
158 schema:value 10.1186/gb-2006-7-s1-s4
159 rdf:type schema:PropertyValue
160 Na9b5dfc58b7a4379ac384d7051270d6e rdf:first sg:person.01124164272.05
161 rdf:rest Nd13605e25c6647e192d4c5c132f479b8
162 Nb073934d3f59473ebbbb1e946820d203 rdf:first sg:person.01022622516.05
163 rdf:rest Nb8780d3ee405472fbe9b8f58bc57c9ac
164 Nb8780d3ee405472fbe9b8f58bc57c9ac rdf:first sg:person.07416746422.35
165 rdf:rest Ndd9df20f4bca458ba5a66b4549d65d3e
166 Nc7709dd5de244c5caf279a688879e318 schema:issueNumber Suppl 1
167 rdf:type schema:PublicationIssue
168 Nc95e7aa52aa548c2916d35820f5f4574 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
169 schema:name Chromosome Mapping
170 rdf:type schema:DefinedTerm
171 Nca885c3397cd43ebac5c117452a2363d rdf:first sg:person.01347214467.09
172 rdf:rest rdf:nil
173 Ncb75d47bbd5e4ef2bba790b75b15d298 rdf:first sg:person.0772061470.31
174 rdf:rest N89c455a6648546d7b185bf2b32d4c8a2
175 Nce297c9c195f47c581192c2e8a51023a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
176 schema:name Genomics
177 rdf:type schema:DefinedTerm
178 Nd13605e25c6647e192d4c5c132f479b8 rdf:first sg:person.01036742770.11
179 rdf:rest N7cc1251c42e54b35a4cfa68ddd554bca
180 Ndd9df20f4bca458ba5a66b4549d65d3e rdf:first sg:person.0664613663.50
181 rdf:rest N2fc7ac2383eb439ba41b794811369c02
182 Ne0edf36465014ba19accea0f4cd0318d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
183 schema:name Sequence Analysis, RNA
184 rdf:type schema:DefinedTerm
185 Ne46817dc06234d308985ef3520f0df6f schema:name pubmed_id
186 schema:value 16925838
187 rdf:type schema:PropertyValue
188 Ne8ed0f865fb149f697da36af8cd55eb5 schema:volumeNumber 7
189 rdf:type schema:PublicationVolume
190 Nf806b6a7c6074e12b8dfff6f700e1f21 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
191 schema:name Proteins
192 rdf:type schema:DefinedTerm
193 Nfea1bc7d88734e83953b6dc0746a9633 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
194 schema:name Genes
195 rdf:type schema:DefinedTerm
196 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
197 schema:name Biological Sciences
198 rdf:type schema:DefinedTerm
199 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
200 schema:name Genetics
201 rdf:type schema:DefinedTerm
202 sg:journal.1023439 schema:issn 1465-6906
203 1474-760X
204 schema:name Genome Biology
205 schema:publisher Springer Nature
206 rdf:type schema:Periodical
207 sg:person.01022622516.05 schema:affiliation grid-institutes:grid.150338.c
208 schema:familyName Lagarde
209 schema:givenName Julien
210 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01022622516.05
211 rdf:type schema:Person
212 sg:person.01036742770.11 schema:affiliation grid-institutes:grid.5612.0
213 schema:familyName Hubbard
214 schema:givenName Tim
215 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01036742770.11
216 rdf:type schema:Person
217 sg:person.01124164272.05 schema:affiliation grid-institutes:grid.150338.c
218 schema:familyName Ucla
219 schema:givenName Catherine
220 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01124164272.05
221 rdf:type schema:Person
222 sg:person.01150516171.95 schema:affiliation grid-institutes:grid.10306.34
223 schema:familyName Chen
224 schema:givenName Chao-Kung
225 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01150516171.95
226 rdf:type schema:Person
227 sg:person.01251740635.19 schema:affiliation grid-institutes:grid.9851.5
228 schema:familyName Reymond
229 schema:givenName Alexandre
230 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01251740635.19
231 rdf:type schema:Person
232 sg:person.013015267207.78 schema:affiliation grid-institutes:grid.10306.34
233 schema:familyName Harrow
234 schema:givenName Jennifer
235 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013015267207.78
236 rdf:type schema:Person
237 sg:person.01333722575.09 schema:affiliation grid-institutes:grid.10306.34
238 schema:familyName Swarbreck
239 schema:givenName David
240 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01333722575.09
241 rdf:type schema:Person
242 sg:person.01347214467.09 schema:affiliation grid-institutes:grid.11478.3b
243 schema:familyName Guigo
244 schema:givenName Roderic
245 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01347214467.09
246 rdf:type schema:Person
247 sg:person.014566311317.54 schema:affiliation grid-institutes:grid.150338.c
248 schema:familyName Antonarakis
249 schema:givenName Stylianos E
250 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014566311317.54
251 rdf:type schema:Person
252 sg:person.0601046127.18 schema:affiliation grid-institutes:grid.9851.5
253 schema:familyName Chrast
254 schema:givenName Jacqueline
255 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0601046127.18
256 rdf:type schema:Person
257 sg:person.0664613663.50 schema:affiliation grid-institutes:grid.10306.34
258 schema:familyName Storey
259 schema:givenName Roy
260 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0664613663.50
261 rdf:type schema:Person
262 sg:person.0736671011.81 schema:affiliation grid-institutes:grid.10306.34
263 schema:familyName Frankish
264 schema:givenName Adam
265 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0736671011.81
266 rdf:type schema:Person
267 sg:person.07416746422.35 schema:affiliation grid-institutes:grid.10306.34
268 schema:familyName Gilbert
269 schema:givenName James GR
270 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07416746422.35
271 rdf:type schema:Person
272 sg:person.0754631237.56 schema:affiliation grid-institutes:grid.150338.c
273 schema:familyName Rossier
274 schema:givenName Colette
275 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0754631237.56
276 rdf:type schema:Person
277 sg:person.0772061470.31 schema:affiliation grid-institutes:grid.5612.0
278 schema:familyName Denoeud
279 schema:givenName France
280 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0772061470.31
281 rdf:type schema:Person
282 sg:pub.10.1007/s00335-002-4002-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013569495
283 https://doi.org/10.1007/s00335-002-4002-5
284 rdf:type schema:CreativeWork
285 sg:pub.10.1038/414865a schema:sameAs https://app.dimensions.ai/details/publication/pub.1030604127
286 https://doi.org/10.1038/414865a
287 rdf:type schema:CreativeWork
288 sg:pub.10.1038/nature03001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013534924
289 https://doi.org/10.1038/nature03001
290 rdf:type schema:CreativeWork
291 sg:pub.10.1038/nmeth733 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038506370
292 https://doi.org/10.1038/nmeth733
293 rdf:type schema:CreativeWork
294 sg:pub.10.1186/gb-2006-7-s1-s2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048969371
295 https://doi.org/10.1186/gb-2006-7-s1-s2
296 rdf:type schema:CreativeWork
297 grid-institutes:grid.10306.34 schema:alternateName Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK
298 schema:name Wellcome Trust Sanger Institute, Wellcome Trust Campus, CB10 1SA, Hinxton, Cambridge, UK
299 rdf:type schema:Organization
300 grid-institutes:grid.11478.3b schema:alternateName Centre de Regulacio Genomica, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain
301 schema:name Centre de Regulacio Genomica, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain
302 rdf:type schema:Organization
303 grid-institutes:grid.150338.c schema:alternateName Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland
304 schema:name Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland
305 rdf:type schema:Organization
306 grid-institutes:grid.5612.0 schema:alternateName Grup de Recerca en Informatica Biomedica, Institut Municipal d'Informatica Medica-Universitat Pompeu Fabra, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain
307 schema:name Grup de Recerca en Informatica Biomedica, Institut Municipal d'Informatica Medica-Universitat Pompeu Fabra, Pg. Maritim de la Barceloneta, 08003, Barcelona, Catalonia, Spain
308 rdf:type schema:Organization
309 grid-institutes:grid.9851.5 schema:alternateName Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
310 schema:name Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
311 Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva, Switzerland
312 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...