MoGUL: Detecting Common Insertions and Deletions in a Population View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2010

AUTHORS

Seunghak Lee , Eric Xing , Michael Brudno

ABSTRACT

While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data.In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF > 0.06 and indel size > 20 base pairs. More... »

PAGES

357-368

Book

TITLE

Research in Computational Molecular Biology

ISBN

978-3-642-12682-6
978-3-642-12683-3

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23

DOI

http://dx.doi.org/10.1007/978-3-642-12683-3_23

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1009500714


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "School of Computer Science, Carnegie Mellon University, USA", 
          "id": "http://www.grid.ac/institutes/grid.147455.6", 
          "name": [
            "Department of Computer Science, University of Toronto, Canada", 
            "School of Computer Science, Carnegie Mellon University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lee", 
        "givenName": "Seunghak", 
        "id": "sg:person.0652142515.58", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0652142515.58"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Computer Science, Carnegie Mellon University, USA", 
          "id": "http://www.grid.ac/institutes/grid.147455.6", 
          "name": [
            "School of Computer Science, Carnegie Mellon University, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Xing", 
        "givenName": "Eric", 
        "id": "sg:person.01253676062.48", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253676062.48"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Banting and Best Dept. of Medical Research, University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "Department of Computer Science, University of Toronto, Canada", 
            "Banting and Best Dept. of Medical Research, University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brudno", 
        "givenName": "Michael", 
        "id": "sg:person.01253563237.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2010", 
    "datePublishedReg": "2010-01-01", 
    "description": "While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data.In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF > 0.06 and indel size > 20 base pairs.", 
    "editor": [
      {
        "familyName": "Berger", 
        "givenName": "Bonnie", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-12683-3_23", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-12682-6", 
        "978-3-642-12683-3"
      ], 
      "name": "Research in Computational Molecular Biology", 
      "type": "Book"
    }, 
    "keywords": [
      "minor allele frequency", 
      "structural variants", 
      "sequenced reads", 
      "Genome Project", 
      "insertions/deletions", 
      "genome data", 
      "indels", 
      "mate pairs", 
      "base pairs", 
      "human populations", 
      "human genotypes", 
      "indel size", 
      "common insertion", 
      "deletion", 
      "allele frequencies", 
      "variants", 
      "genotypes", 
      "individuals", 
      "population", 
      "genome", 
      "loci", 
      "combined power", 
      "reads", 
      "discovery", 
      "pairs", 
      "data", 
      "Bayesian networks", 
      "appropriate priors", 
      "insertion", 
      "most methods", 
      "contrast", 
      "hundreds", 
      "current methods", 
      "high coverage", 
      "coverage", 
      "frequency", 
      "method", 
      "location", 
      "good accuracy", 
      "size", 
      "low coverage", 
      "potential locations", 
      "priors", 
      "task", 
      "framework", 
      "network", 
      "accuracy", 
      "project", 
      "order", 
      "power", 
      "moguls", 
      "MOGUL framework", 
      "multiple low-coverage individuals", 
      "low-coverage individuals"
    ], 
    "name": "MoGUL: Detecting Common Insertions and Deletions in a Population", 
    "pagination": "357-368", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1009500714"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-12683-3_23"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-12683-3_23", 
      "https://app.dimensions.ai/details/publication/pub.1009500714"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:19", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_331.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-12683-3_23"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12683-3_23'


 

This table displays all metadata directly associated to this object as RDF triples.

133 TRIPLES      23 PREDICATES      80 URIs      73 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-12683-3_23 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N7817ff76183b4359bbd90084e8cd3612
4 schema:datePublished 2010
5 schema:datePublishedReg 2010-01-01
6 schema:description While the discovery of structural variants in the human population is ongoing, most methods for this task assume that the genome is sequenced to high coverage (e.g. 40x), and use the combined power of the many sequenced reads and mate pairs to identify the variants. In contrast, the 1000 Genomes Project hopes to sequence hundreds of human genotypes, but at low coverage (4-6x), and most of the current methods are unable to discover insertion/deletion and structural variants from this data.In order to identify indels from multiple low-coverage individuals we have developed the MoGUL (Mixture of Genotypes Variant Locator) framework, which identifies potential locations with indels by examining mate pairs generated from all sequenced individuals simultaneously, uses a Bayesian network with appropriate priors to explicitly model each individual as homozygous or heterozygous for each locus, and computes the expected Minor Allele Frequency (MAF) for all predicted variants. We have used MoGUL to identify variants in 1000 Genomes data, as well as in simulated genotypes, and show good accuracy at predicting indels, especially for MAF > 0.06 and indel size > 20 base pairs.
7 schema:editor N40d751cdac094590bbdbc0e79a3aa142
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf Na1c82ed519bf424e856567411365c4aa
12 schema:keywords Bayesian networks
13 Genome Project
14 MOGUL framework
15 accuracy
16 allele frequencies
17 appropriate priors
18 base pairs
19 combined power
20 common insertion
21 contrast
22 coverage
23 current methods
24 data
25 deletion
26 discovery
27 framework
28 frequency
29 genome
30 genome data
31 genotypes
32 good accuracy
33 high coverage
34 human genotypes
35 human populations
36 hundreds
37 indel size
38 indels
39 individuals
40 insertion
41 insertions/deletions
42 location
43 loci
44 low coverage
45 low-coverage individuals
46 mate pairs
47 method
48 minor allele frequency
49 moguls
50 most methods
51 multiple low-coverage individuals
52 network
53 order
54 pairs
55 population
56 potential locations
57 power
58 priors
59 project
60 reads
61 sequenced reads
62 size
63 structural variants
64 task
65 variants
66 schema:name MoGUL: Detecting Common Insertions and Deletions in a Population
67 schema:pagination 357-368
68 schema:productId N0e452e7bf89743b29ff5e965a71099f4
69 N3a07579a6a9b4785b8c6e3fefc549572
70 schema:publisher N75c453a97d45492dabf5a3afccb1551d
71 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009500714
72 https://doi.org/10.1007/978-3-642-12683-3_23
73 schema:sdDatePublished 2022-01-01T19:19
74 schema:sdLicense https://scigraph.springernature.com/explorer/license/
75 schema:sdPublisher N6e218a8e9f12412fbde9a5bda0b911ed
76 schema:url https://doi.org/10.1007/978-3-642-12683-3_23
77 sgo:license sg:explorer/license/
78 sgo:sdDataset chapters
79 rdf:type schema:Chapter
80 N0727ba55a5294b0a8a064a45302b33e9 rdf:first sg:person.01253676062.48
81 rdf:rest Nb16e70a9ffb6462dba4622ae2eeee4fb
82 N0e452e7bf89743b29ff5e965a71099f4 schema:name dimensions_id
83 schema:value pub.1009500714
84 rdf:type schema:PropertyValue
85 N3a07579a6a9b4785b8c6e3fefc549572 schema:name doi
86 schema:value 10.1007/978-3-642-12683-3_23
87 rdf:type schema:PropertyValue
88 N40d751cdac094590bbdbc0e79a3aa142 rdf:first N56f2479d2e0545ec8d452c0e68e07244
89 rdf:rest rdf:nil
90 N56f2479d2e0545ec8d452c0e68e07244 schema:familyName Berger
91 schema:givenName Bonnie
92 rdf:type schema:Person
93 N6e218a8e9f12412fbde9a5bda0b911ed schema:name Springer Nature - SN SciGraph project
94 rdf:type schema:Organization
95 N75c453a97d45492dabf5a3afccb1551d schema:name Springer Nature
96 rdf:type schema:Organisation
97 N7817ff76183b4359bbd90084e8cd3612 rdf:first sg:person.0652142515.58
98 rdf:rest N0727ba55a5294b0a8a064a45302b33e9
99 Na1c82ed519bf424e856567411365c4aa schema:isbn 978-3-642-12682-6
100 978-3-642-12683-3
101 schema:name Research in Computational Molecular Biology
102 rdf:type schema:Book
103 Nb16e70a9ffb6462dba4622ae2eeee4fb rdf:first sg:person.01253563237.25
104 rdf:rest rdf:nil
105 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
106 schema:name Biological Sciences
107 rdf:type schema:DefinedTerm
108 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
109 schema:name Genetics
110 rdf:type schema:DefinedTerm
111 sg:person.01253563237.25 schema:affiliation grid-institutes:grid.17063.33
112 schema:familyName Brudno
113 schema:givenName Michael
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25
115 rdf:type schema:Person
116 sg:person.01253676062.48 schema:affiliation grid-institutes:grid.147455.6
117 schema:familyName Xing
118 schema:givenName Eric
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253676062.48
120 rdf:type schema:Person
121 sg:person.0652142515.58 schema:affiliation grid-institutes:grid.147455.6
122 schema:familyName Lee
123 schema:givenName Seunghak
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0652142515.58
125 rdf:type schema:Person
126 grid-institutes:grid.147455.6 schema:alternateName School of Computer Science, Carnegie Mellon University, USA
127 schema:name Department of Computer Science, University of Toronto, Canada
128 School of Computer Science, Carnegie Mellon University, USA
129 rdf:type schema:Organization
130 grid-institutes:grid.17063.33 schema:alternateName Banting and Best Dept. of Medical Research, University of Toronto, Canada
131 schema:name Banting and Best Dept. of Medical Research, University of Toronto, Canada
132 Department of Computer Science, University of Toronto, Canada
133 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...