Repetitive DNA and next-generation sequencing: computational challenges and solutions View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2011-11-29

AUTHORS

Todd J. Treangen, Steven L. Salzberg

ABSTRACT

Key PointsNew high-throughput sequencing technologies have spurred explosive growth in the use of sequencing to discover mutations and structural variants in the human genome and in the number of projects to sequence and assemble new genomes.Highly efficient algorithms have been developed to align next-generation sequences to genomes, and these algorithms use a variety of strategies to place repetitive reads.Ambiguous mapping of sequences that are derived from repetitive regions makes it difficult to identify true polymorphisms and to reconstruct transcripts.Short read lengths combined with mapping ambiguities lead to false reports of single-nucleotide polymorphisms, inserts, deletions and other sequence variants.When assembling a genome de novo, repetitive sequences can lead to erroneous rearrangements, deletions, collapsed repeats and other assembly errors.Long-range linking information from paired-end reads can overcome some of the difficulties in short-read assembly. More... »

PAGES

36-46

References to SciGraph publications

  • 2008-05-30. Stem cell transcriptome profiling via massive-scale mRNA sequencing in NATURE METHODS
  • 2008-05-30. Mapping and quantifying mammalian transcriptomes by RNA-Seq in NATURE METHODS
  • 2009-03-04. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome in GENOME BIOLOGY
  • 2009-09-08. ChIP–seq: advantages and challenges of a maturing technology in NATURE REVIEWS GENETICS
  • 2009-08-30. Personalized copy number and segmental duplication maps using next-generation sequencing in NATURE GENETICS
  • 2011-06-20. Sniper: improved SNP discovery by multiply mapping deep sequenced reads in GENOME BIOLOGY
  • 2009-05-27. The 1001 Genomes Project for Arabidopsis thaliana in GENOME BIOLOGY
  • 2011-03-01. Genome structural variation discovery and genotyping in NATURE REVIEWS GENETICS
  • 2011-03-31. A vertebrate case study of the quality of assemblies derived from next-generation sequences in GENOME BIOLOGY
  • 2008-03-14. Genome assembly forensics: finding the elusive mis-assembly in GENOME BIOLOGY
  • 2011-07-27. An Archaeopteryx-like theropod from China and the origin of Avialae in NATURE
  • 2011-01-01. Integrative genomics viewer in NATURE BIOTECHNOLOGY
  • 2009-10-15. Computational methods for discovering structural variation with next-generation sequencing in NATURE METHODS
  • 2010-10-10. De novo assembly and analysis of RNA-seq data in NATURE METHODS
  • 2011-08-11. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts in GENOME BIOLOGY
  • 2011-07-10. Genome sequence and analysis of the tuber crop potato in NATURE
  • 2010-04-01. State of the art de novoassembly of human genomes from massively parallel sequencing data in HUMAN GENOMICS
  • 2002-05. Alu repeats and human genomic diversity in NATURE REVIEWS GENETICS
  • 2011-03-17. The struggle for life of the genome's selfish architects in BIOLOGY DIRECT
  • 2010-10-27. A map of human genome variation from population-scale sequencing in NATURE
  • 2009-02-11. High tandem repeat content in the genome of the short-lived annual fish Nothobranchius furzeri: a new vertebrate model for aging research in GENOME BIOLOGY
  • 2010-09-17. Advances in understanding cancer genomes through second-generation sequencing in NATURE REVIEWS GENETICS
  • 2010-10-21. FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data in GENOME BIOLOGY
  • 2011-05-25. rnaSeqMap: a Bioconductor package for RNA sequencing data exploration in BMC BIOINFORMATICS
  • 2010-05-02. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation in NATURE BIOTECHNOLOGY
  • 2011-05-15. Full-length transcriptome assembly from RNA-Seq data without a reference genome in NATURE BIOTECHNOLOGY
  • 2000-12. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana in NATURE
  • 2011-05-27. Computational methods for transcriptome annotation and quantification using RNA-seq in NATURE METHODS
  • 2010-11-21. Limitations of next-generation genome sequence assembly in NATURE METHODS
  • 2011-04-13. Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies in BMC BIOINFORMATICS
  • 2011-04-10. A framework for variation discovery and genotyping using next-generation DNA sequencing data in NATURE GENETICS
  • Journal

    TITLE

    Nature Reviews Genetics

    ISSUE

    1

    VOLUME

    13

    Related Patents

  • Triazole-Based Reader Molecules And Methods For Synthesizing And Use Thereof
  • Accurate And Fast Mapping Of Reads To Genome
  • Devices With A Fluid Transport Nanochannel Intersected By A Fluid Sensing Nanochannel And Related Methods
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Methods For Storing Digital Data As, And For Transforming Digital Data Into, Synthetic Dna
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Transposable Elements, Tdp-43, And Neurodegenerative Disorders
  • Sequence Assembly
  • Modulating The Cellular Stress Response
  • Methods For Maintaining The Integrity And Identification Of A Nucleic Acid Template In A Multiplex Sequencing Reaction
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Devices And Systems With Fluidic Nanofunnels For Processing Single Molecules
  • Biosynthetic Systems Producing Fungal Indole Alkaloids
  • Nanofluidic Devices With Integrated Components For The Controlled Capture, Trapping, And Transport Of Macromolecules And Related Methods Of Analysis
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Method For Generating A Masked Reference Sequence Of The Y Chromosome
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Nanofluidic Devices With Integrated Components For The Controlled Capture, Trapping, And Transport Of Macromolecules And Related Methods Of Analysis
  • Sequence Assembly
  • Accurate And Fast Mapping Of Targeted Sequencing Reads
  • Methods For Maintaining The Integrity And Identification Of A Nucleic Acid Template In A Multiplex Sequencing Reaction
  • Fluidic Devices With Nanoscale Manifolds For Molecular Transport, Related Systems And Methods Of Analysis
  • Transposable Elements, Tdp-43, And Neurodegenerative Disorders
  • Accurate And Fast Mapping Of Reads To Genome
  • Nanofluidic Devices For The Rapid Mapping Of Whole Genomes And Related Systems And Methods Of Analysis
  • Method For Determining Copy Number Variations In Sex Chromosomes
  • Devices With Fluidic Nanofunnels, Associated Methods, Fabrication And Analysis Systems
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1038/nrg3117

    DOI

    http://dx.doi.org/10.1038/nrg3117

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1044335233

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/22124482


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Algorithms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Animals", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "DNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genome", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Molecular Sequence Data", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Plants", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Repetitive Sequences, Nucleic Acid", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Reproducibility of Results", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Alignment", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, DNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "McKusick\u2013Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, 21205, Baltimore, Maryland, USA", 
              "id": "http://www.grid.ac/institutes/grid.21107.35", 
              "name": [
                "McKusick\u2013Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, 21205, Baltimore, Maryland, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Treangen", 
            "givenName": "Todd J.", 
            "id": "sg:person.01261673253.41", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01261673253.41"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 21205, Baltimore, Maryland, USA", 
              "id": "http://www.grid.ac/institutes/grid.21107.35", 
              "name": [
                "McKusick\u2013Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, 21205, Baltimore, Maryland, USA", 
                "Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 21205, Baltimore, Maryland, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Salzberg", 
            "givenName": "Steven L.", 
            "id": "sg:person.01223441713.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1038/ng.437", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035989827", 
              "https://doi.org/10.1038/ng.437"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature10158", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002525491", 
              "https://doi.org/10.1038/nature10158"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2009-10-3-r25", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049583368", 
              "https://doi.org/10.1186/gb-2009-10-3-r25"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1883", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015803168", 
              "https://doi.org/10.1038/nbt.1883"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1754", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019307928", 
              "https://doi.org/10.1038/nbt.1754"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2011-12-8-r72", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006381787", 
              "https://doi.org/10.1186/gb-2011-12-8-r72"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature10288", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025276362", 
              "https://doi.org/10.1038/nature10288"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2011-12-6-r55", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1018738511", 
              "https://doi.org/10.1186/gb-2011-12-6-r55"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2641", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006115199", 
              "https://doi.org/10.1038/nrg2641"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2008-9-3-r55", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003390143", 
              "https://doi.org/10.1186/gb-2008-9-3-r55"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1527", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015617800", 
              "https://doi.org/10.1038/nmeth.1527"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2009-10-2-r16", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032910724", 
              "https://doi.org/10.1186/gb-2009-10-2-r16"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1621", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031035095", 
              "https://doi.org/10.1038/nbt.1621"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg798", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025444771", 
              "https://doi.org/10.1038/nrg798"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2841", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017741162", 
              "https://doi.org/10.1038/nrg2841"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature09534", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010608717", 
              "https://doi.org/10.1038/nature09534"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2011-12-3-r31", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002683835", 
              "https://doi.org/10.1186/gb-2011-12-3-r31"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-12-200", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1040958434", 
              "https://doi.org/10.1186/1471-2105-12-200"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/35048692", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044298669", 
              "https://doi.org/10.1038/35048692"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1226", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045381177", 
              "https://doi.org/10.1038/nmeth.1226"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-12-95", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1020547896", 
              "https://doi.org/10.1186/1471-2105-12-95"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ng.806", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010244476", 
              "https://doi.org/10.1038/ng.806"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1479-7364-4-4-271", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028588465", 
              "https://doi.org/10.1186/1479-7364-4-4-271"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2009-10-5-107", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043335129", 
              "https://doi.org/10.1186/gb-2009-10-5-107"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r104", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002592427", 
              "https://doi.org/10.1186/gb-2010-11-10-r104"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1223", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1048586936", 
              "https://doi.org/10.1038/nmeth.1223"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1374", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019637093", 
              "https://doi.org/10.1038/nmeth.1374"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1517", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032102367", 
              "https://doi.org/10.1038/nmeth.1517"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2958", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004346662", 
              "https://doi.org/10.1038/nrg2958"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1745-6150-6-19", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000128333", 
              "https://doi.org/10.1186/1745-6150-6-19"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1613", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009798986", 
              "https://doi.org/10.1038/nmeth.1613"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2011-11-29", 
        "datePublishedReg": "2011-11-29", 
        "description": "Key PointsNew high-throughput sequencing technologies have spurred explosive growth in the use of sequencing to discover mutations and structural variants in the human genome and in the number of projects to sequence and assemble new genomes.Highly efficient algorithms have been developed to align next-generation sequences to genomes, and these algorithms use a variety of strategies to place repetitive reads.Ambiguous mapping of sequences that are derived from repetitive regions makes it difficult to identify true polymorphisms and to reconstruct transcripts.Short read lengths combined with mapping ambiguities lead to false reports of single-nucleotide polymorphisms, inserts, deletions and other sequence variants.When assembling a genome de novo, repetitive sequences can lead to erroneous rearrangements, deletions, collapsed repeats and other assembly errors.Long-range linking information from paired-end reads can overcome some of the difficulties in short-read assembly.", 
        "genre": "article", 
        "id": "sg:pub.10.1038/nrg3117", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2519905", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2529453", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2529425", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023607", 
            "issn": [
              "1471-0056", 
              "1471-0064"
            ], 
            "name": "Nature Reviews Genetics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "13"
          }
        ], 
        "keywords": [
          "high-throughput sequencing technology", 
          "paired-end reads", 
          "short read assemblies", 
          "short read lengths", 
          "next-generation sequences", 
          "repetitive DNA", 
          "new genomes", 
          "human genome", 
          "single nucleotide polymorphisms", 
          "next-generation sequencing", 
          "repetitive sequences", 
          "sequencing technologies", 
          "true polymorphism", 
          "use of sequencing", 
          "repetitive regions", 
          "genome", 
          "structural variants", 
          "sequence variants", 
          "read length", 
          "repetitive reads", 
          "sequencing", 
          "sequence", 
          "deletion", 
          "reads", 
          "polymorphism", 
          "repeats", 
          "transcripts", 
          "variants", 
          "DNA", 
          "ambiguous mapping", 
          "mutations", 
          "novo", 
          "variety of strategies", 
          "assembly", 
          "assembly errors", 
          "rearrangement", 
          "growth", 
          "inserts", 
          "computational challenges", 
          "region", 
          "mapping", 
          "variety", 
          "length", 
          "number", 
          "strategies", 
          "lead", 
          "information", 
          "report", 
          "explosive growth", 
          "use", 
          "challenges", 
          "technology", 
          "project", 
          "number of projects", 
          "difficulties", 
          "solution", 
          "efficient algorithm", 
          "algorithm", 
          "error", 
          "false reports"
        ], 
        "name": "Repetitive DNA and next-generation sequencing: computational challenges and solutions", 
        "pagination": "36-46", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1044335233"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1038/nrg3117"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "22124482"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1038/nrg3117", 
          "https://app.dimensions.ai/details/publication/pub.1044335233"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-10-01T06:37", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_541.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1038/nrg3117"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/nrg3117'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/nrg3117'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/nrg3117'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/nrg3117'


     

    This table displays all metadata directly associated to this object as RDF triples.

    320 TRIPLES      21 PREDICATES      131 URIs      92 LITERALS      22 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1038/nrg3117 schema:about N034e23f9d8a948eaaf19f0b178f176a0
    2 N151a51f269c447a7927e9dc69947c77c
    3 N1e6bdc679a9946bd8185fc8dc195359d
    4 N35bb5894e5ee41c68a391feb9a0fa720
    5 N55e940eb1ba545158be67889457a3dfe
    6 N560b06266a4c4a4c8e079855483072b5
    7 N5ce9dd5236514929930c43bd653314e6
    8 N81b08f519c4b428ca8bd73decd1c4c2f
    9 N9650b677f6c64c58957f4d15c03a6273
    10 N9a7f62b09052471bb63236912e06b712
    11 Nb50508f8b81f4850848c0729e55354ae
    12 Nb9b312445ca043b6b05ed1aa014920a9
    13 Nd918bef78d804d8ab004ce8ab3314470
    14 Ne5cb374f43f44038a61310f1c0dbce79
    15 Nf75d6b21251c4240a1804801c7137b25
    16 anzsrc-for:06
    17 anzsrc-for:0604
    18 schema:author N26d4c56ac86f45d3a7f34b48404cf34d
    19 schema:citation sg:pub.10.1038/35048692
    20 sg:pub.10.1038/nature09534
    21 sg:pub.10.1038/nature10158
    22 sg:pub.10.1038/nature10288
    23 sg:pub.10.1038/nbt.1621
    24 sg:pub.10.1038/nbt.1754
    25 sg:pub.10.1038/nbt.1883
    26 sg:pub.10.1038/ng.437
    27 sg:pub.10.1038/ng.806
    28 sg:pub.10.1038/nmeth.1223
    29 sg:pub.10.1038/nmeth.1226
    30 sg:pub.10.1038/nmeth.1374
    31 sg:pub.10.1038/nmeth.1517
    32 sg:pub.10.1038/nmeth.1527
    33 sg:pub.10.1038/nmeth.1613
    34 sg:pub.10.1038/nrg2641
    35 sg:pub.10.1038/nrg2841
    36 sg:pub.10.1038/nrg2958
    37 sg:pub.10.1038/nrg798
    38 sg:pub.10.1186/1471-2105-12-200
    39 sg:pub.10.1186/1471-2105-12-95
    40 sg:pub.10.1186/1479-7364-4-4-271
    41 sg:pub.10.1186/1745-6150-6-19
    42 sg:pub.10.1186/gb-2008-9-3-r55
    43 sg:pub.10.1186/gb-2009-10-2-r16
    44 sg:pub.10.1186/gb-2009-10-3-r25
    45 sg:pub.10.1186/gb-2009-10-5-107
    46 sg:pub.10.1186/gb-2010-11-10-r104
    47 sg:pub.10.1186/gb-2011-12-3-r31
    48 sg:pub.10.1186/gb-2011-12-6-r55
    49 sg:pub.10.1186/gb-2011-12-8-r72
    50 schema:datePublished 2011-11-29
    51 schema:datePublishedReg 2011-11-29
    52 schema:description Key PointsNew high-throughput sequencing technologies have spurred explosive growth in the use of sequencing to discover mutations and structural variants in the human genome and in the number of projects to sequence and assemble new genomes.Highly efficient algorithms have been developed to align next-generation sequences to genomes, and these algorithms use a variety of strategies to place repetitive reads.Ambiguous mapping of sequences that are derived from repetitive regions makes it difficult to identify true polymorphisms and to reconstruct transcripts.Short read lengths combined with mapping ambiguities lead to false reports of single-nucleotide polymorphisms, inserts, deletions and other sequence variants.When assembling a genome de novo, repetitive sequences can lead to erroneous rearrangements, deletions, collapsed repeats and other assembly errors.Long-range linking information from paired-end reads can overcome some of the difficulties in short-read assembly.
    53 schema:genre article
    54 schema:isAccessibleForFree true
    55 schema:isPartOf N7f28efa4e48a4ab387aafc1eb16e5553
    56 Nad0b07618ef947b3966a64bd51a41454
    57 sg:journal.1023607
    58 schema:keywords DNA
    59 algorithm
    60 ambiguous mapping
    61 assembly
    62 assembly errors
    63 challenges
    64 computational challenges
    65 deletion
    66 difficulties
    67 efficient algorithm
    68 error
    69 explosive growth
    70 false reports
    71 genome
    72 growth
    73 high-throughput sequencing technology
    74 human genome
    75 information
    76 inserts
    77 lead
    78 length
    79 mapping
    80 mutations
    81 new genomes
    82 next-generation sequences
    83 next-generation sequencing
    84 novo
    85 number
    86 number of projects
    87 paired-end reads
    88 polymorphism
    89 project
    90 read length
    91 reads
    92 rearrangement
    93 region
    94 repeats
    95 repetitive DNA
    96 repetitive reads
    97 repetitive regions
    98 repetitive sequences
    99 report
    100 sequence
    101 sequence variants
    102 sequencing
    103 sequencing technologies
    104 short read assemblies
    105 short read lengths
    106 single nucleotide polymorphisms
    107 solution
    108 strategies
    109 structural variants
    110 technology
    111 transcripts
    112 true polymorphism
    113 use
    114 use of sequencing
    115 variants
    116 variety
    117 variety of strategies
    118 schema:name Repetitive DNA and next-generation sequencing: computational challenges and solutions
    119 schema:pagination 36-46
    120 schema:productId N48c7a4e1ea9b4c2fb9268e7bee2f633d
    121 N4f3a2955ab4547f28cb604ce01a6ddea
    122 N5c339e71d7894a48b11890ed807b8fed
    123 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044335233
    124 https://doi.org/10.1038/nrg3117
    125 schema:sdDatePublished 2022-10-01T06:37
    126 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    127 schema:sdPublisher N87bd604a13ec4fb78d6a880ff4de868e
    128 schema:url https://doi.org/10.1038/nrg3117
    129 sgo:license sg:explorer/license/
    130 sgo:sdDataset articles
    131 rdf:type schema:ScholarlyArticle
    132 N034e23f9d8a948eaaf19f0b178f176a0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    133 schema:name Animals
    134 rdf:type schema:DefinedTerm
    135 N151a51f269c447a7927e9dc69947c77c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    136 schema:name Genome
    137 rdf:type schema:DefinedTerm
    138 N1e6bdc679a9946bd8185fc8dc195359d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    139 schema:name Sequence Analysis, DNA
    140 rdf:type schema:DefinedTerm
    141 N26d4c56ac86f45d3a7f34b48404cf34d rdf:first sg:person.01261673253.41
    142 rdf:rest Nd5a1ddfd41c84a9a8768821ff3735103
    143 N35bb5894e5ee41c68a391feb9a0fa720 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    144 schema:name Plants
    145 rdf:type schema:DefinedTerm
    146 N48c7a4e1ea9b4c2fb9268e7bee2f633d schema:name doi
    147 schema:value 10.1038/nrg3117
    148 rdf:type schema:PropertyValue
    149 N4f3a2955ab4547f28cb604ce01a6ddea schema:name dimensions_id
    150 schema:value pub.1044335233
    151 rdf:type schema:PropertyValue
    152 N55e940eb1ba545158be67889457a3dfe schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    153 schema:name Molecular Sequence Data
    154 rdf:type schema:DefinedTerm
    155 N560b06266a4c4a4c8e079855483072b5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    156 schema:name Repetitive Sequences, Nucleic Acid
    157 rdf:type schema:DefinedTerm
    158 N5c339e71d7894a48b11890ed807b8fed schema:name pubmed_id
    159 schema:value 22124482
    160 rdf:type schema:PropertyValue
    161 N5ce9dd5236514929930c43bd653314e6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    162 schema:name Sequence Alignment
    163 rdf:type schema:DefinedTerm
    164 N7f28efa4e48a4ab387aafc1eb16e5553 schema:issueNumber 1
    165 rdf:type schema:PublicationIssue
    166 N81b08f519c4b428ca8bd73decd1c4c2f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    167 schema:name Sequence Analysis, RNA
    168 rdf:type schema:DefinedTerm
    169 N87bd604a13ec4fb78d6a880ff4de868e schema:name Springer Nature - SN SciGraph project
    170 rdf:type schema:Organization
    171 N9650b677f6c64c58957f4d15c03a6273 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    172 schema:name Algorithms
    173 rdf:type schema:DefinedTerm
    174 N9a7f62b09052471bb63236912e06b712 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    175 schema:name RNA
    176 rdf:type schema:DefinedTerm
    177 Nad0b07618ef947b3966a64bd51a41454 schema:volumeNumber 13
    178 rdf:type schema:PublicationVolume
    179 Nb50508f8b81f4850848c0729e55354ae schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    180 schema:name Computational Biology
    181 rdf:type schema:DefinedTerm
    182 Nb9b312445ca043b6b05ed1aa014920a9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    183 schema:name Humans
    184 rdf:type schema:DefinedTerm
    185 Nd5a1ddfd41c84a9a8768821ff3735103 rdf:first sg:person.01223441713.02
    186 rdf:rest rdf:nil
    187 Nd918bef78d804d8ab004ce8ab3314470 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    188 schema:name DNA
    189 rdf:type schema:DefinedTerm
    190 Ne5cb374f43f44038a61310f1c0dbce79 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    191 schema:name Software
    192 rdf:type schema:DefinedTerm
    193 Nf75d6b21251c4240a1804801c7137b25 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    194 schema:name Reproducibility of Results
    195 rdf:type schema:DefinedTerm
    196 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    197 schema:name Biological Sciences
    198 rdf:type schema:DefinedTerm
    199 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    200 schema:name Genetics
    201 rdf:type schema:DefinedTerm
    202 sg:grant.2519905 http://pending.schema.org/fundedItem sg:pub.10.1038/nrg3117
    203 rdf:type schema:MonetaryGrant
    204 sg:grant.2529425 http://pending.schema.org/fundedItem sg:pub.10.1038/nrg3117
    205 rdf:type schema:MonetaryGrant
    206 sg:grant.2529453 http://pending.schema.org/fundedItem sg:pub.10.1038/nrg3117
    207 rdf:type schema:MonetaryGrant
    208 sg:journal.1023607 schema:issn 1471-0056
    209 1471-0064
    210 schema:name Nature Reviews Genetics
    211 schema:publisher Springer Nature
    212 rdf:type schema:Periodical
    213 sg:person.01223441713.02 schema:affiliation grid-institutes:grid.21107.35
    214 schema:familyName Salzberg
    215 schema:givenName Steven L.
    216 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223441713.02
    217 rdf:type schema:Person
    218 sg:person.01261673253.41 schema:affiliation grid-institutes:grid.21107.35
    219 schema:familyName Treangen
    220 schema:givenName Todd J.
    221 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01261673253.41
    222 rdf:type schema:Person
    223 sg:pub.10.1038/35048692 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044298669
    224 https://doi.org/10.1038/35048692
    225 rdf:type schema:CreativeWork
    226 sg:pub.10.1038/nature09534 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010608717
    227 https://doi.org/10.1038/nature09534
    228 rdf:type schema:CreativeWork
    229 sg:pub.10.1038/nature10158 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002525491
    230 https://doi.org/10.1038/nature10158
    231 rdf:type schema:CreativeWork
    232 sg:pub.10.1038/nature10288 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025276362
    233 https://doi.org/10.1038/nature10288
    234 rdf:type schema:CreativeWork
    235 sg:pub.10.1038/nbt.1621 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031035095
    236 https://doi.org/10.1038/nbt.1621
    237 rdf:type schema:CreativeWork
    238 sg:pub.10.1038/nbt.1754 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019307928
    239 https://doi.org/10.1038/nbt.1754
    240 rdf:type schema:CreativeWork
    241 sg:pub.10.1038/nbt.1883 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015803168
    242 https://doi.org/10.1038/nbt.1883
    243 rdf:type schema:CreativeWork
    244 sg:pub.10.1038/ng.437 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035989827
    245 https://doi.org/10.1038/ng.437
    246 rdf:type schema:CreativeWork
    247 sg:pub.10.1038/ng.806 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010244476
    248 https://doi.org/10.1038/ng.806
    249 rdf:type schema:CreativeWork
    250 sg:pub.10.1038/nmeth.1223 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048586936
    251 https://doi.org/10.1038/nmeth.1223
    252 rdf:type schema:CreativeWork
    253 sg:pub.10.1038/nmeth.1226 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045381177
    254 https://doi.org/10.1038/nmeth.1226
    255 rdf:type schema:CreativeWork
    256 sg:pub.10.1038/nmeth.1374 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019637093
    257 https://doi.org/10.1038/nmeth.1374
    258 rdf:type schema:CreativeWork
    259 sg:pub.10.1038/nmeth.1517 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032102367
    260 https://doi.org/10.1038/nmeth.1517
    261 rdf:type schema:CreativeWork
    262 sg:pub.10.1038/nmeth.1527 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015617800
    263 https://doi.org/10.1038/nmeth.1527
    264 rdf:type schema:CreativeWork
    265 sg:pub.10.1038/nmeth.1613 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009798986
    266 https://doi.org/10.1038/nmeth.1613
    267 rdf:type schema:CreativeWork
    268 sg:pub.10.1038/nrg2641 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006115199
    269 https://doi.org/10.1038/nrg2641
    270 rdf:type schema:CreativeWork
    271 sg:pub.10.1038/nrg2841 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017741162
    272 https://doi.org/10.1038/nrg2841
    273 rdf:type schema:CreativeWork
    274 sg:pub.10.1038/nrg2958 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004346662
    275 https://doi.org/10.1038/nrg2958
    276 rdf:type schema:CreativeWork
    277 sg:pub.10.1038/nrg798 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025444771
    278 https://doi.org/10.1038/nrg798
    279 rdf:type schema:CreativeWork
    280 sg:pub.10.1186/1471-2105-12-200 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040958434
    281 https://doi.org/10.1186/1471-2105-12-200
    282 rdf:type schema:CreativeWork
    283 sg:pub.10.1186/1471-2105-12-95 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020547896
    284 https://doi.org/10.1186/1471-2105-12-95
    285 rdf:type schema:CreativeWork
    286 sg:pub.10.1186/1479-7364-4-4-271 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028588465
    287 https://doi.org/10.1186/1479-7364-4-4-271
    288 rdf:type schema:CreativeWork
    289 sg:pub.10.1186/1745-6150-6-19 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000128333
    290 https://doi.org/10.1186/1745-6150-6-19
    291 rdf:type schema:CreativeWork
    292 sg:pub.10.1186/gb-2008-9-3-r55 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003390143
    293 https://doi.org/10.1186/gb-2008-9-3-r55
    294 rdf:type schema:CreativeWork
    295 sg:pub.10.1186/gb-2009-10-2-r16 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032910724
    296 https://doi.org/10.1186/gb-2009-10-2-r16
    297 rdf:type schema:CreativeWork
    298 sg:pub.10.1186/gb-2009-10-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049583368
    299 https://doi.org/10.1186/gb-2009-10-3-r25
    300 rdf:type schema:CreativeWork
    301 sg:pub.10.1186/gb-2009-10-5-107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043335129
    302 https://doi.org/10.1186/gb-2009-10-5-107
    303 rdf:type schema:CreativeWork
    304 sg:pub.10.1186/gb-2010-11-10-r104 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002592427
    305 https://doi.org/10.1186/gb-2010-11-10-r104
    306 rdf:type schema:CreativeWork
    307 sg:pub.10.1186/gb-2011-12-3-r31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002683835
    308 https://doi.org/10.1186/gb-2011-12-3-r31
    309 rdf:type schema:CreativeWork
    310 sg:pub.10.1186/gb-2011-12-6-r55 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018738511
    311 https://doi.org/10.1186/gb-2011-12-6-r55
    312 rdf:type schema:CreativeWork
    313 sg:pub.10.1186/gb-2011-12-8-r72 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006381787
    314 https://doi.org/10.1186/gb-2011-12-8-r72
    315 rdf:type schema:CreativeWork
    316 grid-institutes:grid.21107.35 schema:alternateName Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 21205, Baltimore, Maryland, USA
    317 McKusick–Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, 21205, Baltimore, Maryland, USA
    318 schema:name Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 21205, Baltimore, Maryland, USA
    319 McKusick–Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, 21205, Baltimore, Maryland, USA
    320 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...