CpGcluster: a distance-based algorithm for CpG-island detection View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2006-12

AUTHORS

Michael Hackenberg, Christopher Previti, Pedro Luis Luque-Escamilla, Pedro Carpena, José Martínez-Aroza, José L Oliver

ABSTRACT

BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions. More... »

PAGES

446

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-7-446

DOI

http://dx.doi.org/10.1186/1471-2105-7-446

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1023845443

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/17038168


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Animals", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "CpG Islands", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Mice", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Granada", 
          "id": "https://www.grid.ac/institutes/grid.4489.1", 
          "name": [
            "Dpto. de Gen\u00e9tica, Facultad de Ciencias, Universidad de Granada, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hackenberg", 
        "givenName": "Michael", 
        "id": "sg:person.01062552671.42", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01062552671.42"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "German Cancer Research Center", 
          "id": "https://www.grid.ac/institutes/grid.7497.d", 
          "name": [
            "Dpto. de Gen\u00e9tica, Facultad de Ciencias, Universidad de Granada, Spain", 
            "Dept. of Molecular Biophysics, German Cancer Research Center, Heidelberg, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Previti", 
        "givenName": "Christopher", 
        "id": "sg:person.01015743145.65", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01015743145.65"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Ja\u00e9n", 
          "id": "https://www.grid.ac/institutes/grid.21507.31", 
          "name": [
            "Dpto. de Ingenier\u00eda Mec\u00e1nica y Minera, Universidad de Ja\u00e9n, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Luque-Escamilla", 
        "givenName": "Pedro Luis", 
        "id": "sg:person.0745764413.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0745764413.01"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Malaga", 
          "id": "https://www.grid.ac/institutes/grid.10215.37", 
          "name": [
            "Dpto de F\u00edsica Aplicada II, Universidad de M\u00e1laga, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Carpena", 
        "givenName": "Pedro", 
        "id": "sg:person.01364440457.27", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01364440457.27"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Granada", 
          "id": "https://www.grid.ac/institutes/grid.4489.1", 
          "name": [
            "Dpto. de Matem\u00e1tica Aplicada, Facultad de Ciencias, Universidad de Granada, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Mart\u00ednez-Aroza", 
        "givenName": "Jos\u00e9", 
        "id": "sg:person.01014077613.93", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01014077613.93"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Granada", 
          "id": "https://www.grid.ac/institutes/grid.4489.1", 
          "name": [
            "Dpto. de Gen\u00e9tica, Facultad de Ciencias, Universidad de Granada, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Oliver", 
        "givenName": "Jos\u00e9 L", 
        "id": "sg:person.0643703414.30", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0643703414.30"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/ng1789", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001417778", 
          "https://doi.org/10.1038/ng1789"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng1789", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001417778", 
          "https://doi.org/10.1038/ng1789"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/hmg/10.7.687", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005780142"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/11.3.647", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006667160"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.87.12.4692", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008531607"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pgen.0020017", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012581342"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0092-8674(90)90015-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015429561"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkj129", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016972616"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/321209a0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022096390", 
          "https://doi.org/10.1038/321209a0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/321209a0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022096390", 
          "https://doi.org/10.1038/321209a0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/321209a0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022096390", 
          "https://doi.org/10.1038/321209a0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2836(87)90689-9", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022754921"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bth059", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023436066"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.052410099", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023462148"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00018-003-3088-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023588237", 
          "https://doi.org/10.1007/s00018-003-3088-6"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0510310103", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026928995"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/79189", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027828828", 
          "https://doi.org/10.1038/79189"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/79189", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027828828", 
          "https://doi.org/10.1038/79189"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gad.947102", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030570608"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrc1507", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033930047", 
          "https://doi.org/10.1038/nrc1507"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrc1507", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033930047", 
          "https://doi.org/10.1038/nrc1507"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/geno.1996.0298", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1034621866"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.3430605", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038482717"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/18.4.631", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040587807"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1128/mcb.19.11.7327", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040986154"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/10.23.7865", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042383251"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0097-8485(02)00010-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042530571"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.90.24.11995", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045216410"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng886", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046260941", 
          "https://doi.org/10.1038/ng886"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng886", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046260941", 
          "https://doi.org/10.1038/ng886"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0888-7543(92)90024-m", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046950562"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.3715005", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048048079"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0378-1119(01)00672-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048620165"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/dbio.2001.0560", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048685870"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gki582", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049695730"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1103/physreve.71.061925", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1060733059"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1103/physreve.71.061925", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1060733059"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1074801105", 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1075343242", 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/oxfordjournals.molbev.a040370", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1080006093"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/9780471650126.dob1121", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1090239983"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2006-12", 
    "datePublishedReg": "2006-12-01", 
    "description": "BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content.\nRESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome.\nCONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1471-2105-7-446", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "7"
      }
    ], 
    "name": "CpGcluster: a distance-based algorithm for CpG-island detection", 
    "pagination": "446", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "719a5c4077ea0b376c5048f640b44f48effc271278a17287bb5097ba7bfa9073"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "17038168"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-7-446"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1023845443"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-7-446", 
      "https://app.dimensions.ai/details/publication/pub.1023845443"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T09:31", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000346_0000000346/records_99803_00000001.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2F1471-2105-7-446"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-446'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-446'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-446'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-446'


 

This table displays all metadata directly associated to this object as RDF triples.

244 TRIPLES      21 PREDICATES      69 URIs      27 LITERALS      15 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-7-446 schema:about N3291e1776684419ca2023ef896e2cbd8
2 N754c867bde83463d9c94db2e504ca69c
3 N8dbb53104bda4686b5f05b1a2bd72c74
4 Nb8df1565ba0a472a92ed31e2a40e1c7c
5 Nbd997a90b2704e4a9a9af39bb505140f
6 Nd47fe8c3480c4f9fa952e1220bec531f
7 anzsrc-for:06
8 anzsrc-for:0604
9 schema:author N7673762db89b4574822500868e43eb11
10 schema:citation sg:pub.10.1007/s00018-003-3088-6
11 sg:pub.10.1038/321209a0
12 sg:pub.10.1038/79189
13 sg:pub.10.1038/ng1789
14 sg:pub.10.1038/ng886
15 sg:pub.10.1038/nrc1507
16 https://app.dimensions.ai/details/publication/pub.1074801105
17 https://app.dimensions.ai/details/publication/pub.1075343242
18 https://doi.org/10.1002/9780471650126.dob1121
19 https://doi.org/10.1006/dbio.2001.0560
20 https://doi.org/10.1006/geno.1996.0298
21 https://doi.org/10.1016/0022-2836(87)90689-9
22 https://doi.org/10.1016/0092-8674(90)90015-7
23 https://doi.org/10.1016/0888-7543(92)90024-m
24 https://doi.org/10.1016/s0097-8485(02)00010-4
25 https://doi.org/10.1016/s0378-1119(01)00672-2
26 https://doi.org/10.1073/pnas.0510310103
27 https://doi.org/10.1073/pnas.052410099
28 https://doi.org/10.1073/pnas.87.12.4692
29 https://doi.org/10.1073/pnas.90.24.11995
30 https://doi.org/10.1093/bioinformatics/18.4.631
31 https://doi.org/10.1093/bioinformatics/bth059
32 https://doi.org/10.1093/hmg/10.7.687
33 https://doi.org/10.1093/nar/10.23.7865
34 https://doi.org/10.1093/nar/11.3.647
35 https://doi.org/10.1093/nar/gki582
36 https://doi.org/10.1093/nar/gkj129
37 https://doi.org/10.1093/oxfordjournals.molbev.a040370
38 https://doi.org/10.1101/gad.947102
39 https://doi.org/10.1101/gr.3430605
40 https://doi.org/10.1101/gr.3715005
41 https://doi.org/10.1103/physreve.71.061925
42 https://doi.org/10.1128/mcb.19.11.7327
43 https://doi.org/10.1371/journal.pgen.0020017
44 schema:datePublished 2006-12
45 schema:datePublishedReg 2006-12-01
46 schema:description BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.
47 schema:genre research_article
48 schema:inLanguage en
49 schema:isAccessibleForFree true
50 schema:isPartOf N24dc09873af047a69ec768a23d8dfaf5
51 N8f8027fc36ed4b59903d76dab33a61cf
52 sg:journal.1023786
53 schema:name CpGcluster: a distance-based algorithm for CpG-island detection
54 schema:pagination 446
55 schema:productId N2f5bfc9abaec429a9fd8cab5140cb544
56 N513f78d1c2e94ea8872868a7a02b689a
57 N9eb6533a0b6e4b13914f08e1516d9dbc
58 Ndd807cf3a4f846e5bfe69007a4ad5b91
59 Ne8126fd5f297447694506a6fb3fe5a05
60 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023845443
61 https://doi.org/10.1186/1471-2105-7-446
62 schema:sdDatePublished 2019-04-11T09:31
63 schema:sdLicense https://scigraph.springernature.com/explorer/license/
64 schema:sdPublisher N1ee7539d3a534d4283bb24ebbcbb0186
65 schema:url https://link.springer.com/10.1186%2F1471-2105-7-446
66 sgo:license sg:explorer/license/
67 sgo:sdDataset articles
68 rdf:type schema:ScholarlyArticle
69 N09db213071d8411492993571e01b88a6 rdf:first sg:person.01014077613.93
70 rdf:rest Nb21fe09ddb3a4e8c9fff31fde9927fc5
71 N1ee7539d3a534d4283bb24ebbcbb0186 schema:name Springer Nature - SN SciGraph project
72 rdf:type schema:Organization
73 N24dc09873af047a69ec768a23d8dfaf5 schema:issueNumber 1
74 rdf:type schema:PublicationIssue
75 N2f5bfc9abaec429a9fd8cab5140cb544 schema:name nlm_unique_id
76 schema:value 100965194
77 rdf:type schema:PropertyValue
78 N3291e1776684419ca2023ef896e2cbd8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
79 schema:name Humans
80 rdf:type schema:DefinedTerm
81 N513f78d1c2e94ea8872868a7a02b689a schema:name pubmed_id
82 schema:value 17038168
83 rdf:type schema:PropertyValue
84 N754c867bde83463d9c94db2e504ca69c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
85 schema:name Mice
86 rdf:type schema:DefinedTerm
87 N7673762db89b4574822500868e43eb11 rdf:first sg:person.01062552671.42
88 rdf:rest N9aef60054d7d48218dd749bd1bbf11fe
89 N8dbb53104bda4686b5f05b1a2bd72c74 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
90 schema:name Animals
91 rdf:type schema:DefinedTerm
92 N8f8027fc36ed4b59903d76dab33a61cf schema:volumeNumber 7
93 rdf:type schema:PublicationVolume
94 N9aef60054d7d48218dd749bd1bbf11fe rdf:first sg:person.01015743145.65
95 rdf:rest Nb3f615438a2d4ca4bf02031e31008295
96 N9eb6533a0b6e4b13914f08e1516d9dbc schema:name dimensions_id
97 schema:value pub.1023845443
98 rdf:type schema:PropertyValue
99 Nb21fe09ddb3a4e8c9fff31fde9927fc5 rdf:first sg:person.0643703414.30
100 rdf:rest rdf:nil
101 Nb3f615438a2d4ca4bf02031e31008295 rdf:first sg:person.0745764413.01
102 rdf:rest Nbfcc8cf9d98848d8bc133d34b4a0e3e0
103 Nb8df1565ba0a472a92ed31e2a40e1c7c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
104 schema:name Genome
105 rdf:type schema:DefinedTerm
106 Nbd997a90b2704e4a9a9af39bb505140f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
107 schema:name CpG Islands
108 rdf:type schema:DefinedTerm
109 Nbfcc8cf9d98848d8bc133d34b4a0e3e0 rdf:first sg:person.01364440457.27
110 rdf:rest N09db213071d8411492993571e01b88a6
111 Nd47fe8c3480c4f9fa952e1220bec531f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
112 schema:name Algorithms
113 rdf:type schema:DefinedTerm
114 Ndd807cf3a4f846e5bfe69007a4ad5b91 schema:name readcube_id
115 schema:value 719a5c4077ea0b376c5048f640b44f48effc271278a17287bb5097ba7bfa9073
116 rdf:type schema:PropertyValue
117 Ne8126fd5f297447694506a6fb3fe5a05 schema:name doi
118 schema:value 10.1186/1471-2105-7-446
119 rdf:type schema:PropertyValue
120 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
121 schema:name Biological Sciences
122 rdf:type schema:DefinedTerm
123 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
124 schema:name Genetics
125 rdf:type schema:DefinedTerm
126 sg:journal.1023786 schema:issn 1471-2105
127 schema:name BMC Bioinformatics
128 rdf:type schema:Periodical
129 sg:person.01014077613.93 schema:affiliation https://www.grid.ac/institutes/grid.4489.1
130 schema:familyName Martínez-Aroza
131 schema:givenName José
132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01014077613.93
133 rdf:type schema:Person
134 sg:person.01015743145.65 schema:affiliation https://www.grid.ac/institutes/grid.7497.d
135 schema:familyName Previti
136 schema:givenName Christopher
137 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01015743145.65
138 rdf:type schema:Person
139 sg:person.01062552671.42 schema:affiliation https://www.grid.ac/institutes/grid.4489.1
140 schema:familyName Hackenberg
141 schema:givenName Michael
142 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01062552671.42
143 rdf:type schema:Person
144 sg:person.01364440457.27 schema:affiliation https://www.grid.ac/institutes/grid.10215.37
145 schema:familyName Carpena
146 schema:givenName Pedro
147 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01364440457.27
148 rdf:type schema:Person
149 sg:person.0643703414.30 schema:affiliation https://www.grid.ac/institutes/grid.4489.1
150 schema:familyName Oliver
151 schema:givenName José L
152 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0643703414.30
153 rdf:type schema:Person
154 sg:person.0745764413.01 schema:affiliation https://www.grid.ac/institutes/grid.21507.31
155 schema:familyName Luque-Escamilla
156 schema:givenName Pedro Luis
157 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0745764413.01
158 rdf:type schema:Person
159 sg:pub.10.1007/s00018-003-3088-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023588237
160 https://doi.org/10.1007/s00018-003-3088-6
161 rdf:type schema:CreativeWork
162 sg:pub.10.1038/321209a0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022096390
163 https://doi.org/10.1038/321209a0
164 rdf:type schema:CreativeWork
165 sg:pub.10.1038/79189 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027828828
166 https://doi.org/10.1038/79189
167 rdf:type schema:CreativeWork
168 sg:pub.10.1038/ng1789 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001417778
169 https://doi.org/10.1038/ng1789
170 rdf:type schema:CreativeWork
171 sg:pub.10.1038/ng886 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046260941
172 https://doi.org/10.1038/ng886
173 rdf:type schema:CreativeWork
174 sg:pub.10.1038/nrc1507 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033930047
175 https://doi.org/10.1038/nrc1507
176 rdf:type schema:CreativeWork
177 https://app.dimensions.ai/details/publication/pub.1074801105 schema:CreativeWork
178 https://app.dimensions.ai/details/publication/pub.1075343242 schema:CreativeWork
179 https://doi.org/10.1002/9780471650126.dob1121 schema:sameAs https://app.dimensions.ai/details/publication/pub.1090239983
180 rdf:type schema:CreativeWork
181 https://doi.org/10.1006/dbio.2001.0560 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048685870
182 rdf:type schema:CreativeWork
183 https://doi.org/10.1006/geno.1996.0298 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034621866
184 rdf:type schema:CreativeWork
185 https://doi.org/10.1016/0022-2836(87)90689-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022754921
186 rdf:type schema:CreativeWork
187 https://doi.org/10.1016/0092-8674(90)90015-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015429561
188 rdf:type schema:CreativeWork
189 https://doi.org/10.1016/0888-7543(92)90024-m schema:sameAs https://app.dimensions.ai/details/publication/pub.1046950562
190 rdf:type schema:CreativeWork
191 https://doi.org/10.1016/s0097-8485(02)00010-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042530571
192 rdf:type schema:CreativeWork
193 https://doi.org/10.1016/s0378-1119(01)00672-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048620165
194 rdf:type schema:CreativeWork
195 https://doi.org/10.1073/pnas.0510310103 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026928995
196 rdf:type schema:CreativeWork
197 https://doi.org/10.1073/pnas.052410099 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023462148
198 rdf:type schema:CreativeWork
199 https://doi.org/10.1073/pnas.87.12.4692 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008531607
200 rdf:type schema:CreativeWork
201 https://doi.org/10.1073/pnas.90.24.11995 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045216410
202 rdf:type schema:CreativeWork
203 https://doi.org/10.1093/bioinformatics/18.4.631 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040587807
204 rdf:type schema:CreativeWork
205 https://doi.org/10.1093/bioinformatics/bth059 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023436066
206 rdf:type schema:CreativeWork
207 https://doi.org/10.1093/hmg/10.7.687 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005780142
208 rdf:type schema:CreativeWork
209 https://doi.org/10.1093/nar/10.23.7865 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042383251
210 rdf:type schema:CreativeWork
211 https://doi.org/10.1093/nar/11.3.647 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006667160
212 rdf:type schema:CreativeWork
213 https://doi.org/10.1093/nar/gki582 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049695730
214 rdf:type schema:CreativeWork
215 https://doi.org/10.1093/nar/gkj129 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016972616
216 rdf:type schema:CreativeWork
217 https://doi.org/10.1093/oxfordjournals.molbev.a040370 schema:sameAs https://app.dimensions.ai/details/publication/pub.1080006093
218 rdf:type schema:CreativeWork
219 https://doi.org/10.1101/gad.947102 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030570608
220 rdf:type schema:CreativeWork
221 https://doi.org/10.1101/gr.3430605 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038482717
222 rdf:type schema:CreativeWork
223 https://doi.org/10.1101/gr.3715005 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048048079
224 rdf:type schema:CreativeWork
225 https://doi.org/10.1103/physreve.71.061925 schema:sameAs https://app.dimensions.ai/details/publication/pub.1060733059
226 rdf:type schema:CreativeWork
227 https://doi.org/10.1128/mcb.19.11.7327 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040986154
228 rdf:type schema:CreativeWork
229 https://doi.org/10.1371/journal.pgen.0020017 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012581342
230 rdf:type schema:CreativeWork
231 https://www.grid.ac/institutes/grid.10215.37 schema:alternateName University of Malaga
232 schema:name Dpto de Física Aplicada II, Universidad de Málaga, Spain
233 rdf:type schema:Organization
234 https://www.grid.ac/institutes/grid.21507.31 schema:alternateName University of Jaén
235 schema:name Dpto. de Ingeniería Mecánica y Minera, Universidad de Jaén, Spain
236 rdf:type schema:Organization
237 https://www.grid.ac/institutes/grid.4489.1 schema:alternateName University of Granada
238 schema:name Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Spain
239 Dpto. de Matemática Aplicada, Facultad de Ciencias, Universidad de Granada, Spain
240 rdf:type schema:Organization
241 https://www.grid.ac/institutes/grid.7497.d schema:alternateName German Cancer Research Center
242 schema:name Dept. of Molecular Biophysics, German Cancer Research Center, Heidelberg, Germany
243 Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Spain
244 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...