Joint SNP and CNV calling in 1000 Genomes sequence data View Homepage


Ontology type: schema:MonetaryGrant     


Grant Info

YEARS

2009-2013

FUNDING AMOUNT

7083002 USD

ABSTRACT

DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease. More... »

URL

http://projectreporter.nih.gov/project_info_description.cfm?aid=7932960

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2201", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2206", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "type": "DefinedTerm"
      }
    ], 
    "amount": {
      "currency": "USD", 
      "type": "MonetaryAmount", 
      "value": "7083002"
    }, 
    "description": "DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease.", 
    "endDate": "2013-06-30T00:00:00Z", 
    "funder": {
      "id": "https://www.grid.ac/institutes/grid.280128.1", 
      "type": "Organization"
    }, 
    "id": "sg:grant.2691278", 
    "identifier": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "2691278"
        ]
      }, 
      {
        "name": "nih_id", 
        "type": "PropertyValue", 
        "value": [
          "U01HG005208"
        ]
      }
    ], 
    "inLanguage": [
      "en"
    ], 
    "keywords": [
      "complex human diseases", 
      "sequencing platforms", 
      "public health relevance", 
      "proposal", 
      "raw data", 
      "refining", 
      "Genomes Project", 
      "array", 
      "bias", 
      "use", 
      "addition", 
      "millions", 
      "Genomes Project data", 
      "disease samples", 
      "human genetic research", 
      "imputation", 
      "project data", 
      "genetics", 
      "data resources", 
      "project", 
      "information", 
      "METHODS", 
      "detection", 
      "error processes characteristic", 
      "variants", 
      "contribution", 
      "four-base probabilities", 
      "inclusion", 
      "members", 
      "medical sequencing efforts", 
      "HapMap project", 
      "application", 
      "data", 
      "analytical method", 
      "underpinnings", 
      "population geneticists", 
      "Genomes sequence data", 
      "prior array data", 
      "patients", 
      "Data Processing", 
      "frequencies 0.5%-10%", 
      "users", 
      "software tools", 
      "unprecedented accuracy", 
      "next generation sequencing platforms", 
      "SNP Consortium", 
      "probability", 
      "Joint SNP", 
      "SNPs", 
      "novel polymorphisms", 
      "track record", 
      "integrated approach", 
      "interpretation", 
      "skilled team", 
      "linkage disequilibrium", 
      "accuracy", 
      "population", 
      "multiple samples", 
      "raw next generation sequencing data", 
      "data formats", 
      "technology companies", 
      "platform", 
      "statistical genetic analysis", 
      "description", 
      "standard", 
      "association", 
      "applicants", 
      "prodigious investment", 
      "next stage", 
      "structural variants", 
      "genetics community", 
      "raw intensity data", 
      "date", 
      "genetic research", 
      "underlying base-call probabilities", 
      "genetic analysts", 
      "genome centers", 
      "CNV", 
      "accurate genome-wide resequencing", 
      "disease research", 
      "samples", 
      "complex diseases", 
      "execution", 
      "project goals", 
      "Genomes Project Analysis Group", 
      "scale", 
      "direct genotyping", 
      "sequence variation", 
      "human variation resource generation", 
      "rare variants", 
      "sensitivity", 
      "AIMS", 
      "population genetics", 
      "methodology", 
      "wider genetics community", 
      "CNV detection", 
      "disease association studies", 
      "key task", 
      "user-friendly software", 
      "software engineers", 
      "collaborative framework"
    ], 
    "name": "Joint SNP and CNV calling in 1000 Genomes sequence data", 
    "recipient": [
      {
        "id": "https://www.grid.ac/institutes/grid.66859.34", 
        "type": "Organization"
      }, 
      {
        "affiliation": {
          "id": "https://www.grid.ac/institutes/grid.66859.34", 
          "name": "BROAD INSTITUTE, INC.", 
          "type": "Organization"
        }, 
        "familyName": "DALY", 
        "givenName": "MARK J", 
        "id": "sg:person.011517303117.07", 
        "type": "Person"
      }, 
      {
        "member": "sg:person.011517303117.07", 
        "roleName": "PI", 
        "type": "Role"
      }
    ], 
    "sameAs": [
      "https://app.dimensions.ai/details/grant/grant.2691278"
    ], 
    "sdDataset": "grants", 
    "sdDatePublished": "2019-03-07T12:08", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com.uberresearch.data.processor/core_data/20181219_192338/projects/base/nih_projects_20.xml.gz", 
    "startDate": "2009-09-16T00:00:00Z", 
    "type": "MonetaryGrant", 
    "url": "http://projectreporter.nih.gov/project_info_description.cfm?aid=7932960"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/grant.2691278'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/grant.2691278'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/grant.2691278'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/grant.2691278'


 

This table displays all metadata directly associated to this object as RDF triples.

148 TRIPLES      19 PREDICATES      124 URIs      115 LITERALS      5 BLANK NODES

Subject Predicate Object
1 sg:grant.2691278 schema:about anzsrc-for:2201
2 anzsrc-for:2206
3 schema:amount Ncd73f74d5a4a4b3c933da4893122a709
4 schema:description DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease.
5 schema:endDate 2013-06-30T00:00:00Z
6 schema:funder https://www.grid.ac/institutes/grid.280128.1
7 schema:identifier N8925266662784081a2d9869c521c4b0d
8 Nb42968f31c2b419e83fdb8405d4d63ed
9 schema:inLanguage en
10 schema:keywords AIMS
11 CNV
12 CNV detection
13 Data Processing
14 Genomes Project
15 Genomes Project Analysis Group
16 Genomes Project data
17 Genomes sequence data
18 HapMap project
19 Joint SNP
20 METHODS
21 SNP Consortium
22 SNPs
23 accuracy
24 accurate genome-wide resequencing
25 addition
26 analytical method
27 applicants
28 application
29 array
30 association
31 bias
32 collaborative framework
33 complex diseases
34 complex human diseases
35 contribution
36 data
37 data formats
38 data resources
39 date
40 description
41 detection
42 direct genotyping
43 disease association studies
44 disease research
45 disease samples
46 error processes characteristic
47 execution
48 four-base probabilities
49 frequencies 0.5%-10%
50 genetic analysts
51 genetic research
52 genetics
53 genetics community
54 genome centers
55 human genetic research
56 human variation resource generation
57 imputation
58 inclusion
59 information
60 integrated approach
61 interpretation
62 key task
63 linkage disequilibrium
64 medical sequencing efforts
65 members
66 methodology
67 millions
68 multiple samples
69 next generation sequencing platforms
70 next stage
71 novel polymorphisms
72 patients
73 platform
74 population
75 population geneticists
76 population genetics
77 prior array data
78 probability
79 prodigious investment
80 project
81 project data
82 project goals
83 proposal
84 public health relevance
85 rare variants
86 raw data
87 raw intensity data
88 raw next generation sequencing data
89 refining
90 samples
91 scale
92 sensitivity
93 sequence variation
94 sequencing platforms
95 skilled team
96 software engineers
97 software tools
98 standard
99 statistical genetic analysis
100 structural variants
101 technology companies
102 track record
103 underlying base-call probabilities
104 underpinnings
105 unprecedented accuracy
106 use
107 user-friendly software
108 users
109 variants
110 wider genetics community
111 schema:name Joint SNP and CNV calling in 1000 Genomes sequence data
112 schema:recipient N2f5b8a19d8cd4f0a9f08bb233ad10627
113 sg:person.011517303117.07
114 https://www.grid.ac/institutes/grid.66859.34
115 schema:sameAs https://app.dimensions.ai/details/grant/grant.2691278
116 schema:sdDatePublished 2019-03-07T12:08
117 schema:sdLicense https://scigraph.springernature.com/explorer/license/
118 schema:sdPublisher N26a13dc122e04d83b713664f16013a2c
119 schema:startDate 2009-09-16T00:00:00Z
120 schema:url http://projectreporter.nih.gov/project_info_description.cfm?aid=7932960
121 sgo:license sg:explorer/license/
122 sgo:sdDataset grants
123 rdf:type schema:MonetaryGrant
124 N26a13dc122e04d83b713664f16013a2c schema:name Springer Nature - SN SciGraph project
125 rdf:type schema:Organization
126 N2f5b8a19d8cd4f0a9f08bb233ad10627 schema:member sg:person.011517303117.07
127 schema:roleName PI
128 rdf:type schema:Role
129 N8925266662784081a2d9869c521c4b0d schema:name dimensions_id
130 schema:value 2691278
131 rdf:type schema:PropertyValue
132 Nb42968f31c2b419e83fdb8405d4d63ed schema:name nih_id
133 schema:value U01HG005208
134 rdf:type schema:PropertyValue
135 Ncd73f74d5a4a4b3c933da4893122a709 schema:currency USD
136 schema:value 7083002
137 rdf:type schema:MonetaryAmount
138 anzsrc-for:2201 schema:inDefinedTermSet anzsrc-for:
139 rdf:type schema:DefinedTerm
140 anzsrc-for:2206 schema:inDefinedTermSet anzsrc-for:
141 rdf:type schema:DefinedTerm
142 sg:person.011517303117.07 schema:affiliation https://www.grid.ac/institutes/grid.66859.34
143 schema:familyName DALY
144 schema:givenName MARK J
145 rdf:type schema:Person
146 https://www.grid.ac/institutes/grid.280128.1 schema:Organization
147 https://www.grid.ac/institutes/grid.66859.34 schema:name BROAD INSTITUTE, INC.
148 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...