MoGUL: Detecting Common Insertions and Deletions in a Population View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2010

AUTHORS

Seunghak Lee , Eric Xing , Michael Brudno

ABSTRACT

While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data.In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF > 0.06 and indel size > 20 base pairs. More... »

PAGES

357-368

Book

TITLE

Research in Computational Molecular Biology

ISBN

978-3-642-12682-6
978-3-642-12683-3

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23

DOI

http://dx.doi.org/10.1007/978-3-642-12683-3_23

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1009500714


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "School of Computer Science, Carnegie Mellon University, USA", 
          "id": "http://www.grid.ac/institutes/grid.147455.6", 
          "name": [
            "Department of Computer Science, University of Toronto, Canada", 
            "School of Computer Science, Carnegie Mellon University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lee", 
        "givenName": "Seunghak", 
        "id": "sg:person.0652142515.58", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0652142515.58"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Computer Science, Carnegie Mellon University, USA", 
          "id": "http://www.grid.ac/institutes/grid.147455.6", 
          "name": [
            "School of Computer Science, Carnegie Mellon University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Xing", 
        "givenName": "Eric", 
        "id": "sg:person.01253676062.48", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253676062.48"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Banting and Best Dept. of Medical Research, University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "Department of Computer Science, University of Toronto, Canada", 
            "Banting and Best Dept. of Medical Research, University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brudno", 
        "givenName": "Michael", 
        "id": "sg:person.01253563237.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2010", 
    "datePublishedReg": "2010-01-01", 
    "description": "While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data.In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF > 0.06 and indel size > 20 base pairs.", 
    "editor": [
      {
        "familyName": "Berger", 
        "givenName": "Bonnie", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-12683-3_23", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-12682-6", 
        "978-3-642-12683-3"
      ], 
      "name": "Research in Computational Molecular Biology", 
      "type": "Book"
    }, 
    "keywords": [
      "minor allele frequency", 
      "structural variants", 
      "sequenced reads", 
      "Genome Project", 
      "insertions/deletions", 
      "genome data", 
      "indels", 
      "mate pairs", 
      "base pairs", 
      "human populations", 
      "human genotypes", 
      "indel size", 
      "common insertion", 
      "deletion", 
      "allele frequencies", 
      "variants", 
      "genotypes", 
      "individuals", 
      "population", 
      "genome", 
      "loci", 
      "combined power", 
      "reads", 
      "discovery", 
      "pairs", 
      "data", 
      "Bayesian networks", 
      "appropriate priors", 
      "insertion", 
      "most methods", 
      "contrast", 
      "hundreds", 
      "current methods", 
      "high coverage", 
      "coverage", 
      "frequency", 
      "method", 
      "location", 
      "good accuracy", 
      "size", 
      "low coverage", 
      "potential locations", 
      "priors", 
      "task", 
      "framework", 
      "network", 
      "accuracy", 
      "project", 
      "order", 
      "power", 
      "moguls", 
      "MOGUL framework", 
      "multiple low-coverage individuals", 
      "low-coverage individuals"
    ], 
    "name": "MoGUL: Detecting Common Insertions and Deletions in a Population", 
    "pagination": "357-368", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1009500714"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-12683-3_23"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-12683-3_23", 
      "https://app.dimensions.ai/details/publication/pub.1009500714"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-11-01T18:45", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/chapter/chapter_107.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-12683-3_23"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'


 

This table displays all metadata directly associated to this object as RDF triples.

133 TRIPLES      23 PREDICATES      80 URIs      73 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-12683-3_23 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N0869f94c98144f96bc11257674546f14
4 schema:datePublished 2010
5 schema:datePublishedReg 2010-01-01
6 schema:description While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data.In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF > 0.06 and indel size > 20 base pairs.
7 schema:editor N87002d038a924e46960b225de2677176
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N6288bbbff6914a7d9a3a67c371ea5c33
12 schema:keywords Bayesian networks
13 Genome Project
14 MOGUL framework
15 accuracy
16 allele frequencies
17 appropriate priors
18 base pairs
19 combined power
20 common insertion
21 contrast
22 coverage
23 current methods
24 data
25 deletion
26 discovery
27 framework
28 frequency
29 genome
30 genome data
31 genotypes
32 good accuracy
33 high coverage
34 human genotypes
35 human populations
36 hundreds
37 indel size
38 indels
39 individuals
40 insertion
41 insertions/deletions
42 location
43 loci
44 low coverage
45 low-coverage individuals
46 mate pairs
47 method
48 minor allele frequency
49 moguls
50 most methods
51 multiple low-coverage individuals
52 network
53 order
54 pairs
55 population
56 potential locations
57 power
58 priors
59 project
60 reads
61 sequenced reads
62 size
63 structural variants
64 task
65 variants
66 schema:name MoGUL: Detecting Common Insertions and Deletions in a Population
67 schema:pagination 357-368
68 schema:productId N3d83975a3bdd45168e01be1becccdb61
69 Nde320125be6c4c14b93e2b25804e1db2
70 schema:publisher N3e79d649012946df8130097d13120b2a
71 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009500714
72 https://doi.org/10.1007/978-3-642-12683-3_23
73 schema:sdDatePublished 2021-11-01T18:45
74 schema:sdLicense https://scigraph.springernature.com/explorer/license/
75 schema:sdPublisher Ndaafbfa06e4642e098b8494ed5f4638c
76 schema:url https://doi.org/10.1007/978-3-642-12683-3_23
77 sgo:license sg:explorer/license/
78 sgo:sdDataset chapters
79 rdf:type schema:Chapter
80 N0869f94c98144f96bc11257674546f14 rdf:first sg:person.0652142515.58
81 rdf:rest N9b3720b35d0c4e8cb1799174c9239fe2
82 N3d83975a3bdd45168e01be1becccdb61 schema:name dimensions_id
83 schema:value pub.1009500714
84 rdf:type schema:PropertyValue
85 N3e79d649012946df8130097d13120b2a schema:name Springer Nature
86 rdf:type schema:Organisation
87 N6288bbbff6914a7d9a3a67c371ea5c33 schema:isbn 978-3-642-12682-6
88 978-3-642-12683-3
89 schema:name Research in Computational Molecular Biology
90 rdf:type schema:Book
91 N87002d038a924e46960b225de2677176 rdf:first Ne8c380c38b37427b849a6c0c0ebd2745
92 rdf:rest rdf:nil
93 N9b3720b35d0c4e8cb1799174c9239fe2 rdf:first sg:person.01253676062.48
94 rdf:rest Ne8327525cd2344f1a2364ce370c3862c
95 Ndaafbfa06e4642e098b8494ed5f4638c schema:name Springer Nature - SN SciGraph project
96 rdf:type schema:Organization
97 Nde320125be6c4c14b93e2b25804e1db2 schema:name doi
98 schema:value 10.1007/978-3-642-12683-3_23
99 rdf:type schema:PropertyValue
100 Ne8327525cd2344f1a2364ce370c3862c rdf:first sg:person.01253563237.25
101 rdf:rest rdf:nil
102 Ne8c380c38b37427b849a6c0c0ebd2745 schema:familyName Berger
103 schema:givenName Bonnie
104 rdf:type schema:Person
105 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
106 schema:name Biological Sciences
107 rdf:type schema:DefinedTerm
108 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
109 schema:name Genetics
110 rdf:type schema:DefinedTerm
111 sg:person.01253563237.25 schema:affiliation grid-institutes:grid.17063.33
112 schema:familyName Brudno
113 schema:givenName Michael
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25
115 rdf:type schema:Person
116 sg:person.01253676062.48 schema:affiliation grid-institutes:grid.147455.6
117 schema:familyName Xing
118 schema:givenName Eric
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253676062.48
120 rdf:type schema:Person
121 sg:person.0652142515.58 schema:affiliation grid-institutes:grid.147455.6
122 schema:familyName Lee
123 schema:givenName Seunghak
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0652142515.58
125 rdf:type schema:Person
126 grid-institutes:grid.147455.6 schema:alternateName School of Computer Science, Carnegie Mellon University, USA
127 schema:name Department of Computer Science, University of Toronto, Canada
128 School of Computer Science, Carnegie Mellon University, USA
129 rdf:type schema:Organization
130 grid-institutes:grid.17063.33 schema:alternateName Banting and Best Dept. of Medical Research, University of Toronto, Canada
131 schema:name Banting and Best Dept. of Medical Research, University of Toronto, Canada
132 Department of Computer Science, University of Toronto, Canada
133 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...