Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2003-04

AUTHORS

Joseph Cheung, Xavier Estivill, Razi Khaja, Jeffrey R MacDonald, Ken Lau, Lap-Chee Tsui, Stephen W Scherer

ABSTRACT

BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve. More... »

PAGES

r25

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/gb-2003-4-4-r25

DOI

http://dx.doi.org/10.1186/gb-2003-4-4-r25

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1025892956

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/12702206


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Artifacts", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Chromosomes, Human", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Duplication", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genetic Diseases, Inborn", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genetic Variation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Human", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Polymorphism, Single Nucleotide", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Hospital for Sick Children", 
          "id": "https://www.grid.ac/institutes/grid.42327.30", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cheung", 
        "givenName": "Joseph", 
        "id": "sg:person.01346013070.79", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01346013070.79"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Pompeu Fabra University", 
          "id": "https://www.grid.ac/institutes/grid.5612.0", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada", 
            "Genes and Disease Program, Genomic Regulation Center, and Facultat Ciencies de la Salut i de la Vida, Universitat Pompeu Fabra, E-08003, Barcelona, Catalonia, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Estivill", 
        "givenName": "Xavier", 
        "id": "sg:person.0675214555.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0675214555.67"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Hospital for Sick Children", 
          "id": "https://www.grid.ac/institutes/grid.42327.30", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Khaja", 
        "givenName": "Razi", 
        "id": "sg:person.0702001650.57", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0702001650.57"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Hospital for Sick Children", 
          "id": "https://www.grid.ac/institutes/grid.42327.30", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "MacDonald", 
        "givenName": "Jeffrey R", 
        "id": "sg:person.01005010277.13", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01005010277.13"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Hospital for Sick Children", 
          "id": "https://www.grid.ac/institutes/grid.42327.30", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lau", 
        "givenName": "Ken", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Hong Kong", 
          "id": "https://www.grid.ac/institutes/grid.194645.b", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada", 
            "Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, M5G 1X8, Toronto, ON, Canada", 
            "The University of Hong Kong, Pokfulam Road, Hong Kong"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tsui", 
        "givenName": "Lap-Chee", 
        "id": "sg:person.01243057047.05", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01243057047.05"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Toronto", 
          "id": "https://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada", 
            "Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, M5G 1X8, Toronto, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Scherer", 
        "givenName": "Stephen W", 
        "id": "sg:person.011066475117.55", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011066475117.55"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/s0168-9525(02)02592-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000179704"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/18.2.335", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001279450"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/gcc.10111", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001434096"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.1058040", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001517867"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0092-8674(01)00447-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003113849"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/hmg/11.17.1987", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003966518"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.1072047", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006710139"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrg705", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008302356", 
          "https://doi.org/10.1038/nrg705"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrg705", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008302356", 
          "https://doi.org/10.1038/nrg705"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.403602", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012195556"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0022-2836(05)80360-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013618994"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0960-9822(01)00490-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015447031"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng0396-288", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017805829", 
          "https://doi.org/10.1038/ng0396-288"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35097067", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018632260", 
          "https://doi.org/10.1038/35097067"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35097067", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018632260", 
          "https://doi.org/10.1038/35097067"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35093500", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029953644", 
          "https://doi.org/10.1038/35093500"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35093500", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029953644", 
          "https://doi.org/10.1038/35093500"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng753", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031273704", 
          "https://doi.org/10.1038/ng753"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng753", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031273704", 
          "https://doi.org/10.1038/ng753"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/geno.2000.6312", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1036764135"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng757", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040986528", 
          "https://doi.org/10.1038/ng757"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng757", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040986528", 
          "https://doi.org/10.1038/ng757"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.187101", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042408677"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35057062", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042854081", 
          "https://doi.org/10.1038/35057062"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/35057062", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042854081", 
          "https://doi.org/10.1038/35057062"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0168-9525(98)01555-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044342085"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.152171299", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046195397"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/hmg/9.4.489", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049132213"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.290.5494.1151", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1050543187"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1086/319506", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058622732"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1086/341610", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058639071"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/9780471650126.dob0273.pub2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1089871629"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/0471684228.egp03898", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1089901767"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2003-04", 
    "datePublishedReg": "2003-04-01", 
    "description": "BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies.\nRESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants.\nCONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/gb-2003-4-4-r25", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "4", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "4"
      }
    ], 
    "name": "Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence", 
    "pagination": "r25", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "7b5b10a15174f02750b22560d30b4eb07ae075baf38e7f6b661a34f5e5467970"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "12702206"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100960660"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/gb-2003-4-4-r25"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1025892956"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/gb-2003-4-4-r25", 
      "https://app.dimensions.ai/details/publication/pub.1025892956"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T00:16", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8695_00000513.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2Fgb-2003-4-4-r25"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/gb-2003-4-4-r25'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/gb-2003-4-4-r25'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/gb-2003-4-4-r25'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/gb-2003-4-4-r25'


 

This table displays all metadata directly associated to this object as RDF triples.

255 TRIPLES      21 PREDICATES      67 URIs      32 LITERALS      20 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/gb-2003-4-4-r25 schema:about N484f94410270436fbc89c17547405843
2 N8520c8124aca405fb1c11daede181613
3 N952257a9878e43a5a864be27d9a566ae
4 Nb5c40e8a983a4b9b95d356cc250f4dce
5 Nbdc44885d03843acb3d62d879f1c610a
6 Nc430b0b371d44872ad019c5d8ae5ba89
7 Nc71982834cc14861b0d35395457e5ed9
8 Nca7a90fe0d8c41d9bcae50b72680ffe4
9 Nee65fa5205b0451a938176f113d9fac8
10 Nf0453497b3484884bfb5cf238461f9c7
11 Nfdd318e1e03c49b2b5633ad9a120779d
12 anzsrc-for:06
13 anzsrc-for:0604
14 schema:author Nf0ed25f73571486a8c4ef480242ce37b
15 schema:citation sg:pub.10.1038/35057062
16 sg:pub.10.1038/35093500
17 sg:pub.10.1038/35097067
18 sg:pub.10.1038/ng0396-288
19 sg:pub.10.1038/ng753
20 sg:pub.10.1038/ng757
21 sg:pub.10.1038/nrg705
22 https://doi.org/10.1002/0471684228.egp03898
23 https://doi.org/10.1002/9780471650126.dob0273.pub2
24 https://doi.org/10.1002/gcc.10111
25 https://doi.org/10.1006/geno.2000.6312
26 https://doi.org/10.1016/s0022-2836(05)80360-2
27 https://doi.org/10.1016/s0092-8674(01)00447-0
28 https://doi.org/10.1016/s0168-9525(02)02592-1
29 https://doi.org/10.1016/s0168-9525(98)01555-8
30 https://doi.org/10.1016/s0960-9822(01)00490-0
31 https://doi.org/10.1073/pnas.152171299
32 https://doi.org/10.1086/319506
33 https://doi.org/10.1086/341610
34 https://doi.org/10.1093/bioinformatics/18.2.335
35 https://doi.org/10.1093/hmg/11.17.1987
36 https://doi.org/10.1093/hmg/9.4.489
37 https://doi.org/10.1101/gr.187101
38 https://doi.org/10.1101/gr.403602
39 https://doi.org/10.1126/science.1058040
40 https://doi.org/10.1126/science.1072047
41 https://doi.org/10.1126/science.290.5494.1151
42 schema:datePublished 2003-04
43 schema:datePublishedReg 2003-04-01
44 schema:description BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.
45 schema:genre research_article
46 schema:inLanguage en
47 schema:isAccessibleForFree true
48 schema:isPartOf N6b524f58f9a54eab8ddc380b2300a658
49 Nf2b3c0904e53429b89100d4a3c9b9b12
50 sg:journal.1023439
51 schema:name Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
52 schema:pagination r25
53 schema:productId N1e308f6956c84f5aa1abbfc802a61461
54 N98b61e20ae0947fc88b8fca01153a8fd
55 Nac69e00a38374390a9c5c8976971c46c
56 Nd68067e27a3e488b850705b8bfc5de60
57 Ne4a8697bc91642438654b75d13d4dbcc
58 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025892956
59 https://doi.org/10.1186/gb-2003-4-4-r25
60 schema:sdDatePublished 2019-04-11T00:16
61 schema:sdLicense https://scigraph.springernature.com/explorer/license/
62 schema:sdPublisher Ned8390d67ba740768cce4949dcb31fec
63 schema:url http://link.springer.com/10.1186%2Fgb-2003-4-4-r25
64 sgo:license sg:explorer/license/
65 sgo:sdDataset articles
66 rdf:type schema:ScholarlyArticle
67 N1e308f6956c84f5aa1abbfc802a61461 schema:name pubmed_id
68 schema:value 12702206
69 rdf:type schema:PropertyValue
70 N235e9d5825734c3c893c42356595665e rdf:first N9c4308e96427401d82170e16afd4378e
71 rdf:rest N52ee2de9682643c38037710ebb345e56
72 N41d564b85296466a9b2b720d44ecff2c rdf:first sg:person.0675214555.67
73 rdf:rest Ned257fd74c5b48eb8613fe73e4ad0381
74 N484f94410270436fbc89c17547405843 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
75 schema:name Gene Duplication
76 rdf:type schema:DefinedTerm
77 N52ee2de9682643c38037710ebb345e56 rdf:first sg:person.01243057047.05
78 rdf:rest N5d401a7d8feb49369ada4e67f075b2f2
79 N5d401a7d8feb49369ada4e67f075b2f2 rdf:first sg:person.011066475117.55
80 rdf:rest rdf:nil
81 N6b524f58f9a54eab8ddc380b2300a658 schema:volumeNumber 4
82 rdf:type schema:PublicationVolume
83 N8520c8124aca405fb1c11daede181613 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
84 schema:name Artifacts
85 rdf:type schema:DefinedTerm
86 N952257a9878e43a5a864be27d9a566ae schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
87 schema:name Genome, Human
88 rdf:type schema:DefinedTerm
89 N98b61e20ae0947fc88b8fca01153a8fd schema:name readcube_id
90 schema:value 7b5b10a15174f02750b22560d30b4eb07ae075baf38e7f6b661a34f5e5467970
91 rdf:type schema:PropertyValue
92 N9c4308e96427401d82170e16afd4378e schema:affiliation https://www.grid.ac/institutes/grid.42327.30
93 schema:familyName Lau
94 schema:givenName Ken
95 rdf:type schema:Person
96 Nac69e00a38374390a9c5c8976971c46c schema:name doi
97 schema:value 10.1186/gb-2003-4-4-r25
98 rdf:type schema:PropertyValue
99 Nb5c40e8a983a4b9b95d356cc250f4dce schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
100 schema:name Base Sequence
101 rdf:type schema:DefinedTerm
102 Nbdc44885d03843acb3d62d879f1c610a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
103 schema:name Genetic Variation
104 rdf:type schema:DefinedTerm
105 Nc430b0b371d44872ad019c5d8ae5ba89 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
106 schema:name Sequence Analysis, DNA
107 rdf:type schema:DefinedTerm
108 Nc71982834cc14861b0d35395457e5ed9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
109 schema:name Computational Biology
110 rdf:type schema:DefinedTerm
111 Nca7a90fe0d8c41d9bcae50b72680ffe4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
112 schema:name Genetic Diseases, Inborn
113 rdf:type schema:DefinedTerm
114 Nd08604c4a2ae463d87ef0610b54735d1 rdf:first sg:person.01005010277.13
115 rdf:rest N235e9d5825734c3c893c42356595665e
116 Nd68067e27a3e488b850705b8bfc5de60 schema:name dimensions_id
117 schema:value pub.1025892956
118 rdf:type schema:PropertyValue
119 Ne4a8697bc91642438654b75d13d4dbcc schema:name nlm_unique_id
120 schema:value 100960660
121 rdf:type schema:PropertyValue
122 Ned257fd74c5b48eb8613fe73e4ad0381 rdf:first sg:person.0702001650.57
123 rdf:rest Nd08604c4a2ae463d87ef0610b54735d1
124 Ned8390d67ba740768cce4949dcb31fec schema:name Springer Nature - SN SciGraph project
125 rdf:type schema:Organization
126 Nee65fa5205b0451a938176f113d9fac8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
127 schema:name Humans
128 rdf:type schema:DefinedTerm
129 Nf0453497b3484884bfb5cf238461f9c7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
130 schema:name Polymorphism, Single Nucleotide
131 rdf:type schema:DefinedTerm
132 Nf0ed25f73571486a8c4ef480242ce37b rdf:first sg:person.01346013070.79
133 rdf:rest N41d564b85296466a9b2b720d44ecff2c
134 Nf2b3c0904e53429b89100d4a3c9b9b12 schema:issueNumber 4
135 rdf:type schema:PublicationIssue
136 Nfdd318e1e03c49b2b5633ad9a120779d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Chromosomes, Human
138 rdf:type schema:DefinedTerm
139 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
140 schema:name Biological Sciences
141 rdf:type schema:DefinedTerm
142 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
143 schema:name Genetics
144 rdf:type schema:DefinedTerm
145 sg:journal.1023439 schema:issn 1465-6906
146 1474-760X
147 schema:name Genome Biology
148 rdf:type schema:Periodical
149 sg:person.01005010277.13 schema:affiliation https://www.grid.ac/institutes/grid.42327.30
150 schema:familyName MacDonald
151 schema:givenName Jeffrey R
152 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01005010277.13
153 rdf:type schema:Person
154 sg:person.011066475117.55 schema:affiliation https://www.grid.ac/institutes/grid.17063.33
155 schema:familyName Scherer
156 schema:givenName Stephen W
157 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011066475117.55
158 rdf:type schema:Person
159 sg:person.01243057047.05 schema:affiliation https://www.grid.ac/institutes/grid.194645.b
160 schema:familyName Tsui
161 schema:givenName Lap-Chee
162 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01243057047.05
163 rdf:type schema:Person
164 sg:person.01346013070.79 schema:affiliation https://www.grid.ac/institutes/grid.42327.30
165 schema:familyName Cheung
166 schema:givenName Joseph
167 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01346013070.79
168 rdf:type schema:Person
169 sg:person.0675214555.67 schema:affiliation https://www.grid.ac/institutes/grid.5612.0
170 schema:familyName Estivill
171 schema:givenName Xavier
172 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0675214555.67
173 rdf:type schema:Person
174 sg:person.0702001650.57 schema:affiliation https://www.grid.ac/institutes/grid.42327.30
175 schema:familyName Khaja
176 schema:givenName Razi
177 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0702001650.57
178 rdf:type schema:Person
179 sg:pub.10.1038/35057062 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042854081
180 https://doi.org/10.1038/35057062
181 rdf:type schema:CreativeWork
182 sg:pub.10.1038/35093500 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029953644
183 https://doi.org/10.1038/35093500
184 rdf:type schema:CreativeWork
185 sg:pub.10.1038/35097067 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018632260
186 https://doi.org/10.1038/35097067
187 rdf:type schema:CreativeWork
188 sg:pub.10.1038/ng0396-288 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017805829
189 https://doi.org/10.1038/ng0396-288
190 rdf:type schema:CreativeWork
191 sg:pub.10.1038/ng753 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031273704
192 https://doi.org/10.1038/ng753
193 rdf:type schema:CreativeWork
194 sg:pub.10.1038/ng757 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040986528
195 https://doi.org/10.1038/ng757
196 rdf:type schema:CreativeWork
197 sg:pub.10.1038/nrg705 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008302356
198 https://doi.org/10.1038/nrg705
199 rdf:type schema:CreativeWork
200 https://doi.org/10.1002/0471684228.egp03898 schema:sameAs https://app.dimensions.ai/details/publication/pub.1089901767
201 rdf:type schema:CreativeWork
202 https://doi.org/10.1002/9780471650126.dob0273.pub2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1089871629
203 rdf:type schema:CreativeWork
204 https://doi.org/10.1002/gcc.10111 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001434096
205 rdf:type schema:CreativeWork
206 https://doi.org/10.1006/geno.2000.6312 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036764135
207 rdf:type schema:CreativeWork
208 https://doi.org/10.1016/s0022-2836(05)80360-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013618994
209 rdf:type schema:CreativeWork
210 https://doi.org/10.1016/s0092-8674(01)00447-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003113849
211 rdf:type schema:CreativeWork
212 https://doi.org/10.1016/s0168-9525(02)02592-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000179704
213 rdf:type schema:CreativeWork
214 https://doi.org/10.1016/s0168-9525(98)01555-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044342085
215 rdf:type schema:CreativeWork
216 https://doi.org/10.1016/s0960-9822(01)00490-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015447031
217 rdf:type schema:CreativeWork
218 https://doi.org/10.1073/pnas.152171299 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046195397
219 rdf:type schema:CreativeWork
220 https://doi.org/10.1086/319506 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058622732
221 rdf:type schema:CreativeWork
222 https://doi.org/10.1086/341610 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058639071
223 rdf:type schema:CreativeWork
224 https://doi.org/10.1093/bioinformatics/18.2.335 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001279450
225 rdf:type schema:CreativeWork
226 https://doi.org/10.1093/hmg/11.17.1987 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003966518
227 rdf:type schema:CreativeWork
228 https://doi.org/10.1093/hmg/9.4.489 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049132213
229 rdf:type schema:CreativeWork
230 https://doi.org/10.1101/gr.187101 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042408677
231 rdf:type schema:CreativeWork
232 https://doi.org/10.1101/gr.403602 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012195556
233 rdf:type schema:CreativeWork
234 https://doi.org/10.1126/science.1058040 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001517867
235 rdf:type schema:CreativeWork
236 https://doi.org/10.1126/science.1072047 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006710139
237 rdf:type schema:CreativeWork
238 https://doi.org/10.1126/science.290.5494.1151 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050543187
239 rdf:type schema:CreativeWork
240 https://www.grid.ac/institutes/grid.17063.33 schema:alternateName University of Toronto
241 schema:name Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, M5G 1X8, Toronto, ON, Canada
242 Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
243 rdf:type schema:Organization
244 https://www.grid.ac/institutes/grid.194645.b schema:alternateName University of Hong Kong
245 schema:name Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, M5G 1X8, Toronto, ON, Canada
246 Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
247 The University of Hong Kong, Pokfulam Road, Hong Kong
248 rdf:type schema:Organization
249 https://www.grid.ac/institutes/grid.42327.30 schema:alternateName Hospital for Sick Children
250 schema:name Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
251 rdf:type schema:Organization
252 https://www.grid.ac/institutes/grid.5612.0 schema:alternateName Pompeu Fabra University
253 schema:name Genes and Disease Program, Genomic Regulation Center, and Facultat Ciencies de la Salut i de la Vida, Universitat Pompeu Fabra, E-08003, Barcelona, Catalonia, Spain
254 Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
255 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...