Efficient counting of k-mers in DNA sequences using a bloom filter View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2011-12

AUTHORS

Páll Melsted, Jonathan K Pritchard

ABSTRACT

BACKGROUND: Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction-often more than 50%-of the storage capacity may be spent on storing k-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton k-mers are uninformative for many algorithms without some kind of error correction. RESULTS: We present a new method that identifies all the k-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique k-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting k-mers in sequence data with errors. CONCLUSIONS: A reference implementation for this methodology, BFCounter, is written in C++ and is GPL licensed. It is available for free download at http://pritch.bsd.uchicago.edu/bfcounter.html. More... »

PAGES

333

References to SciGraph publications

  • 2010-11. Quake: quality-aware detection and correction of sequencing errors in GENOME BIOLOGY
  • 2005. Non-blocking Hashtables with Open Addressing in DISTRIBUTED COMPUTING
  • 2010-01. The sequence and de novo assembly of the giant panda genome in NATURE
  • 2010-10-28. A map of human genome variation from population-scale sequencing in NATURE
  • 2007-10. Biosequence Similarity Search on the Mercury System in JOURNAL OF SIGNAL PROCESSING SYSTEMS
  • 2002-08-23. Counting Distinct Elements in a Data Stream in RANDOMIZATION AND APPROXIMATION TECHNIQUES IN COMPUTER SCIENCE
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/1471-2105-12-333

    DOI

    http://dx.doi.org/10.1186/1471-2105-12-333

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1041214578

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/21831268


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Algorithms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computers", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "HapMap Project", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Probability", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, DNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Chicago", 
              "id": "https://www.grid.ac/institutes/grid.170205.1", 
              "name": [
                "Department of Human Genetics, The University of Chicago, 60637, Chicago, IL, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Melsted", 
            "givenName": "P\u00e1ll", 
            "id": "sg:person.01014100275.50", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01014100275.50"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Chicago", 
              "id": "https://www.grid.ac/institutes/grid.170205.1", 
              "name": [
                "Department of Human Genetics, The University of Chicago, 60637, Chicago, IL, USA", 
                "Howard Hughes Medical Institute, The University of Chicago, 60637, Chicago, IL, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Pritchard", 
            "givenName": "Jonathan K", 
            "id": "sg:person.01104516710.33", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01104516710.33"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1101/gr.115402.110", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000102598"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1073/pnas.1017351108", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004253849"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/362686.362692", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007357969"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/15427951.2004.10129096", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008038469"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11561927_10", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009241384", 
              "https://doi.org/10.1007/11561927_10"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11561927_10", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009241384", 
              "https://doi.org/10.1007/11561927_10"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1073/pnas.171285098", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010138766"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09534", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010608717", 
              "https://doi.org/10.1038/nature09534"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09534", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010608717", 
              "https://doi.org/10.1038/nature09534"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/gr.089532.108", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011404279"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-45726-7_1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027531354", 
              "https://doi.org/10.1007/3-540-45726-7_1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-45726-7_1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027531354", 
              "https://doi.org/10.1007/3-540-45726-7_1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1371/journal.pone.0003376", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028160249"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s11265-007-0087-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030884755", 
              "https://doi.org/10.1007/s11265-007-0087-0"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btq697", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031150767"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btr011", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032486702"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/gr.7337908", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035219026"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/gr.097261.109", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1036245120"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-11-r116", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042567408", 
              "https://doi.org/10.1186/gb-2010-11-11-r116"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btq230", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042907877"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08696", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044030989", 
              "https://doi.org/10.1038/nature08696"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08696", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044030989", 
              "https://doi.org/10.1038/nature08696"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/gr.074492.107", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1051720574"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1089/cmb.2009.0062", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1059245812"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/90.851975", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061247530"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2011-12", 
        "datePublishedReg": "2011-12-01", 
        "description": "BACKGROUND: Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction-often more than 50%-of the storage capacity may be spent on storing k-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton k-mers are uninformative for many algorithms without some kind of error correction.\nRESULTS: We present a new method that identifies all the k-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique k-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting k-mers in sequence data with errors.\nCONCLUSIONS: A reference implementation for this methodology, BFCounter, is written in C++ and is GPL licensed. It is available for free download at http://pritch.bsd.uchicago.edu/bfcounter.html.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1186/1471-2105-12-333", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2551835", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023786", 
            "issn": [
              "1471-2105"
            ], 
            "name": "BMC Bioinformatics", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "12"
          }
        ], 
        "name": "Efficient counting of k-mers in DNA sequences using a bloom filter", 
        "pagination": "333", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "c4553f5454c53422b8e9d129255abb7949239b76b4ea84c3f2e89ef59f487bb9"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "21831268"
            ]
          }, 
          {
            "name": "nlm_unique_id", 
            "type": "PropertyValue", 
            "value": [
              "100965194"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/1471-2105-12-333"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1041214578"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/1471-2105-12-333", 
          "https://app.dimensions.ai/details/publication/pub.1041214578"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T10:20", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000348_0000000348/records_54334_00000000.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1186%2F1471-2105-12-333"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-333'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-333'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-333'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-333'


     

    This table displays all metadata directly associated to this object as RDF triples.

    179 TRIPLES      21 PREDICATES      58 URIs      29 LITERALS      17 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/1471-2105-12-333 schema:about N2917c61ff10745ee95ad434cbf5bdc29
    2 N3275111fab0f42a2a26f74590684f38b
    3 N39aa26c4fb9b41e394d44834499977fc
    4 N44b9c4ad115648858064b62f9d27b02a
    5 N45896750104c45b49d887f5e64d73b15
    6 N83c6076498564ff48e5b49de95a63a3d
    7 Na18c5ff1a88c45d68a1b5b44aa2d0a8c
    8 Ncfc31928c3bb43ddbf410fa9739c5292
    9 anzsrc-for:06
    10 anzsrc-for:0604
    11 schema:author N3a5a0cdb0b114605aecb42a25232d181
    12 schema:citation sg:pub.10.1007/11561927_10
    13 sg:pub.10.1007/3-540-45726-7_1
    14 sg:pub.10.1007/s11265-007-0087-0
    15 sg:pub.10.1038/nature08696
    16 sg:pub.10.1038/nature09534
    17 sg:pub.10.1186/gb-2010-11-11-r116
    18 https://doi.org/10.1073/pnas.1017351108
    19 https://doi.org/10.1073/pnas.171285098
    20 https://doi.org/10.1080/15427951.2004.10129096
    21 https://doi.org/10.1089/cmb.2009.0062
    22 https://doi.org/10.1093/bioinformatics/btq230
    23 https://doi.org/10.1093/bioinformatics/btq697
    24 https://doi.org/10.1093/bioinformatics/btr011
    25 https://doi.org/10.1101/gr.074492.107
    26 https://doi.org/10.1101/gr.089532.108
    27 https://doi.org/10.1101/gr.097261.109
    28 https://doi.org/10.1101/gr.115402.110
    29 https://doi.org/10.1101/gr.7337908
    30 https://doi.org/10.1109/90.851975
    31 https://doi.org/10.1145/362686.362692
    32 https://doi.org/10.1371/journal.pone.0003376
    33 schema:datePublished 2011-12
    34 schema:datePublishedReg 2011-12-01
    35 schema:description BACKGROUND: Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction-often more than 50%-of the storage capacity may be spent on storing k-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton k-mers are uninformative for many algorithms without some kind of error correction. RESULTS: We present a new method that identifies all the k-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique k-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting k-mers in sequence data with errors. CONCLUSIONS: A reference implementation for this methodology, BFCounter, is written in C++ and is GPL licensed. It is available for free download at http://pritch.bsd.uchicago.edu/bfcounter.html.
    36 schema:genre research_article
    37 schema:inLanguage en
    38 schema:isAccessibleForFree true
    39 schema:isPartOf N0f11f53a4d644daa8e379402a5d80b2f
    40 Na21701fde68b4ffb968461928c452e8d
    41 sg:journal.1023786
    42 schema:name Efficient counting of k-mers in DNA sequences using a bloom filter
    43 schema:pagination 333
    44 schema:productId N0bc95f8b1854445e9df63666e0eae03e
    45 N0c5482fd90aa4097ba3c6674404633b5
    46 N2167bdc2212c42cf87daeade1c89703d
    47 N6c8419ec70184effb6299bd8fe3bd414
    48 Ne4cc535a23c24d5db4348e3882d2a986
    49 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041214578
    50 https://doi.org/10.1186/1471-2105-12-333
    51 schema:sdDatePublished 2019-04-11T10:20
    52 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    53 schema:sdPublisher N0f940fcd8b7c48f6bbf909aabbf41a53
    54 schema:url https://link.springer.com/10.1186%2F1471-2105-12-333
    55 sgo:license sg:explorer/license/
    56 sgo:sdDataset articles
    57 rdf:type schema:ScholarlyArticle
    58 N0bc95f8b1854445e9df63666e0eae03e schema:name dimensions_id
    59 schema:value pub.1041214578
    60 rdf:type schema:PropertyValue
    61 N0c5482fd90aa4097ba3c6674404633b5 schema:name pubmed_id
    62 schema:value 21831268
    63 rdf:type schema:PropertyValue
    64 N0f11f53a4d644daa8e379402a5d80b2f schema:volumeNumber 12
    65 rdf:type schema:PublicationVolume
    66 N0f940fcd8b7c48f6bbf909aabbf41a53 schema:name Springer Nature - SN SciGraph project
    67 rdf:type schema:Organization
    68 N1e46814cb084475bb552bc67819493a9 rdf:first sg:person.01104516710.33
    69 rdf:rest rdf:nil
    70 N2167bdc2212c42cf87daeade1c89703d schema:name readcube_id
    71 schema:value c4553f5454c53422b8e9d129255abb7949239b76b4ea84c3f2e89ef59f487bb9
    72 rdf:type schema:PropertyValue
    73 N2917c61ff10745ee95ad434cbf5bdc29 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    74 schema:name Probability
    75 rdf:type schema:DefinedTerm
    76 N3275111fab0f42a2a26f74590684f38b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    77 schema:name Sequence Analysis, DNA
    78 rdf:type schema:DefinedTerm
    79 N39aa26c4fb9b41e394d44834499977fc schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    80 schema:name Humans
    81 rdf:type schema:DefinedTerm
    82 N3a5a0cdb0b114605aecb42a25232d181 rdf:first sg:person.01014100275.50
    83 rdf:rest N1e46814cb084475bb552bc67819493a9
    84 N44b9c4ad115648858064b62f9d27b02a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    85 schema:name HapMap Project
    86 rdf:type schema:DefinedTerm
    87 N45896750104c45b49d887f5e64d73b15 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    88 schema:name Computational Biology
    89 rdf:type schema:DefinedTerm
    90 N6c8419ec70184effb6299bd8fe3bd414 schema:name nlm_unique_id
    91 schema:value 100965194
    92 rdf:type schema:PropertyValue
    93 N83c6076498564ff48e5b49de95a63a3d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    94 schema:name Software
    95 rdf:type schema:DefinedTerm
    96 Na18c5ff1a88c45d68a1b5b44aa2d0a8c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    97 schema:name Algorithms
    98 rdf:type schema:DefinedTerm
    99 Na21701fde68b4ffb968461928c452e8d schema:issueNumber 1
    100 rdf:type schema:PublicationIssue
    101 Ncfc31928c3bb43ddbf410fa9739c5292 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    102 schema:name Computers
    103 rdf:type schema:DefinedTerm
    104 Ne4cc535a23c24d5db4348e3882d2a986 schema:name doi
    105 schema:value 10.1186/1471-2105-12-333
    106 rdf:type schema:PropertyValue
    107 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    108 schema:name Biological Sciences
    109 rdf:type schema:DefinedTerm
    110 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    111 schema:name Genetics
    112 rdf:type schema:DefinedTerm
    113 sg:grant.2551835 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-333
    114 rdf:type schema:MonetaryGrant
    115 sg:journal.1023786 schema:issn 1471-2105
    116 schema:name BMC Bioinformatics
    117 rdf:type schema:Periodical
    118 sg:person.01014100275.50 schema:affiliation https://www.grid.ac/institutes/grid.170205.1
    119 schema:familyName Melsted
    120 schema:givenName Páll
    121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01014100275.50
    122 rdf:type schema:Person
    123 sg:person.01104516710.33 schema:affiliation https://www.grid.ac/institutes/grid.170205.1
    124 schema:familyName Pritchard
    125 schema:givenName Jonathan K
    126 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01104516710.33
    127 rdf:type schema:Person
    128 sg:pub.10.1007/11561927_10 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009241384
    129 https://doi.org/10.1007/11561927_10
    130 rdf:type schema:CreativeWork
    131 sg:pub.10.1007/3-540-45726-7_1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027531354
    132 https://doi.org/10.1007/3-540-45726-7_1
    133 rdf:type schema:CreativeWork
    134 sg:pub.10.1007/s11265-007-0087-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030884755
    135 https://doi.org/10.1007/s11265-007-0087-0
    136 rdf:type schema:CreativeWork
    137 sg:pub.10.1038/nature08696 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044030989
    138 https://doi.org/10.1038/nature08696
    139 rdf:type schema:CreativeWork
    140 sg:pub.10.1038/nature09534 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010608717
    141 https://doi.org/10.1038/nature09534
    142 rdf:type schema:CreativeWork
    143 sg:pub.10.1186/gb-2010-11-11-r116 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042567408
    144 https://doi.org/10.1186/gb-2010-11-11-r116
    145 rdf:type schema:CreativeWork
    146 https://doi.org/10.1073/pnas.1017351108 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004253849
    147 rdf:type schema:CreativeWork
    148 https://doi.org/10.1073/pnas.171285098 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010138766
    149 rdf:type schema:CreativeWork
    150 https://doi.org/10.1080/15427951.2004.10129096 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008038469
    151 rdf:type schema:CreativeWork
    152 https://doi.org/10.1089/cmb.2009.0062 schema:sameAs https://app.dimensions.ai/details/publication/pub.1059245812
    153 rdf:type schema:CreativeWork
    154 https://doi.org/10.1093/bioinformatics/btq230 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042907877
    155 rdf:type schema:CreativeWork
    156 https://doi.org/10.1093/bioinformatics/btq697 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031150767
    157 rdf:type schema:CreativeWork
    158 https://doi.org/10.1093/bioinformatics/btr011 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032486702
    159 rdf:type schema:CreativeWork
    160 https://doi.org/10.1101/gr.074492.107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051720574
    161 rdf:type schema:CreativeWork
    162 https://doi.org/10.1101/gr.089532.108 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011404279
    163 rdf:type schema:CreativeWork
    164 https://doi.org/10.1101/gr.097261.109 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036245120
    165 rdf:type schema:CreativeWork
    166 https://doi.org/10.1101/gr.115402.110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000102598
    167 rdf:type schema:CreativeWork
    168 https://doi.org/10.1101/gr.7337908 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035219026
    169 rdf:type schema:CreativeWork
    170 https://doi.org/10.1109/90.851975 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061247530
    171 rdf:type schema:CreativeWork
    172 https://doi.org/10.1145/362686.362692 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007357969
    173 rdf:type schema:CreativeWork
    174 https://doi.org/10.1371/journal.pone.0003376 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028160249
    175 rdf:type schema:CreativeWork
    176 https://www.grid.ac/institutes/grid.170205.1 schema:alternateName University of Chicago
    177 schema:name Department of Human Genetics, The University of Chicago, 60637, Chicago, IL, USA
    178 Howard Hughes Medical Institute, The University of Chicago, 60637, Chicago, IL, USA
    179 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...