Joint SNP and CNV calling in 1000 Genomes sequence data View Homepage


Ontology type: schema:MonetaryGrant     


Grant Info

YEARS

2009-2013

FUNDING AMOUNT

7083002.0 USD

ABSTRACT

DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease. More... »

URL

http://projectreporter.nih.gov/project_info_description.cfm?aid=7932960

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/31", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/42", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "type": "DefinedTerm"
      }
    ], 
    "amount": {
      "currency": "USD", 
      "type": "MonetaryAmount", 
      "value": 7083002.0
    }, 
    "description": "DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease.", 
    "endDate": "2013-06-30", 
    "funder": {
      "id": "http://www.grid.ac/institutes/grid.280128.1", 
      "type": "Organization"
    }, 
    "id": "sg:grant.2691278", 
    "identifier": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "grant.2691278"
        ]
      }, 
      {
        "name": "nih_id", 
        "type": "PropertyValue", 
        "value": [
          "U01HG005208"
        ]
      }
    ], 
    "keywords": [
      "Genome Project", 
      "sequencing platforms", 
      "genetics community", 
      "genetic research", 
      "genome sequence data", 
      "statistical genetic analysis", 
      "next-generation sequencing platforms", 
      "human genetics research", 
      "Genomes Project data", 
      "generation sequencing platforms", 
      "disease association studies", 
      "rare variants", 
      "population genetics", 
      "sequencing efforts", 
      "population geneticists", 
      "SNP Consortium", 
      "sequence data", 
      "sequence variation", 
      "genetic analysis", 
      "HapMap project", 
      "association studies", 
      "structural variants", 
      "Genome Center", 
      "complex diseases", 
      "novel polymorphisms", 
      "SNPs", 
      "CNV detection", 
      "genetics", 
      "direct genotyping", 
      "array data", 
      "disease samples", 
      "disease research", 
      "variants", 
      "project data", 
      "user-friendly software", 
      "software engineers", 
      "raw intensity data", 
      "data format", 
      "software tools", 
      "multiple samples", 
      "collaborative framework", 
      "data resources", 
      "geneticists", 
      "application of data", 
      "data processing", 
      "CNVs", 
      "disequilibrium", 
      "raw data", 
      "polymorphism", 
      "technology companies", 
      "community", 
      "key task", 
      "genotyping", 
      "platform", 
      "next generation", 
      "members", 
      "consortium", 
      "error process", 
      "accuracy", 
      "project goals", 
      "next stage", 
      "generation", 
      "variation", 
      "information", 
      "project", 
      "users", 
      "execution", 
      "software", 
      "stage", 
      "skilled team", 
      "underpinnings", 
      "data", 
      "task", 
      "detection", 
      "format", 
      "date", 
      "goal", 
      "millions", 
      "unprecedented accuracy", 
      "addition", 
      "framework", 
      "resource generation", 
      "analysts", 
      "processing", 
      "improved method", 
      "disease", 
      "method", 
      "association", 
      "analysis", 
      "analysis group", 
      "array", 
      "engineers", 
      "samples", 
      "resources", 
      "proposal", 
      "research", 
      "probability", 
      "tool", 
      "imputation", 
      "process", 
      "analytical method", 
      "applications", 
      "companies", 
      "intensity data", 
      "methodology", 
      "study", 
      "efforts", 
      "track record", 
      "sensitivity", 
      "team", 
      "contribution", 
      "records", 
      "standards", 
      "biases", 
      "group", 
      "approach", 
      "frequency", 
      "scale", 
      "use", 
      "inclusion", 
      "interpretation", 
      "aim", 
      "investment", 
      "refining", 
      "center", 
      "patients"
    ], 
    "name": "Joint SNP and CNV calling in 1000 Genomes sequence data", 
    "recipient": [
      {
        "id": "http://www.grid.ac/institutes/grid.66859.34", 
        "type": "Organization"
      }, 
      {
        "affiliation": {
          "id": "http://www.grid.ac/institutes/None", 
          "name": "BROAD INSTITUTE, INC.", 
          "type": "Organization"
        }, 
        "familyName": "DALY", 
        "givenName": "MARK JOSEPH", 
        "id": "sg:person.011517303117.07", 
        "type": "Person"
      }, 
      {
        "member": "sg:person.011517303117.07", 
        "roleName": "PI", 
        "type": "Role"
      }
    ], 
    "sameAs": [
      "https://app.dimensions.ai/details/grant/grant.2691278"
    ], 
    "sdDataset": "grants", 
    "sdDatePublished": "2022-11-24T21:21", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221124/entities/gbq_results/grant/grant_24.jsonl", 
    "startDate": "2009-09-16", 
    "type": "MonetaryGrant", 
    "url": "http://projectreporter.nih.gov/project_info_description.cfm?aid=7932960"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/grant.2691278'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/grant.2691278'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/grant.2691278'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/grant.2691278'


 

This table displays all metadata directly associated to this object as RDF triples.

173 TRIPLES      18 PREDICATES      148 URIs      139 LITERALS      5 BLANK NODES

Subject Predicate Object
1 sg:grant.2691278 schema:about anzsrc-for:31
2 anzsrc-for:42
3 schema:amount N18430140e1434a49ab22eec335ebea29
4 schema:description DESCRIPTION: (provided by applicant): The 1000 Genomes Project is developing data resources and analytical methods required for the next stage of human genetics research: (a) discovering millions of novel polymorphisms with frequencies 0.5%-10%, which can then be tested for association to disease via imputation and direct genotyping in patients, and (b) bringing together genome centers, technology companies and population geneticists in a collaborative framework to develop data formats, analytical methods and standards for sensitive, accurate genome-wide resequencing for rare variants. The Project's goals are aggressive as detection of rare variants requires unprecedented accuracy, multiplied by the inclusion of multiple rapidly-evolving next generation sequencing platforms. Key tasks for Data Processing include (a) defining the biases and error processes characteristic of each sequencing platform, (b) determining how to use properly calibrated data to discover and genotype variants (SNP and structural), including making use of population genetics and prior array data for each sample, and (c) making it easy for users to browse the resulting data, and integrate it in statistical genetic analysis of disease samples. As members of the 1000 Genomes Project Analysis Group, we propose three Aims. First, to develop, implement and apply methodology to convert raw intensity data from each platform into accurate four-base probabilities, refining and calibrating the underlying base-call probabilities, and increasing accuracy. Second, to develop and implement an integrated approach to SNP and CNV detection that utilizes these probabilities, combines information across multiple samples, and exploits existing information from genotyping arrays, increasing sensitivity and accuracy for both SNPs and structural variants. Third, to develop user-friendly software for browsing and applying 1000 Genomes Project data in disease research, making Project data on sequence variation and linkage disequilibrium accessible and easily usable to the wider genetics community. We have assembled an experienced and skilled team of statistical and population genetic analysts and software engineers, with a track record of contributions to the SNP Consortium, HapMap project, and disease association studies. If funded, we will develop improved methods for interpreting raw next generation sequencing data, and software tools that speed the application of data from the Project to the genetics community. PUBLIC HEALTH RELEVANCE: The data for the 1000 Genomes project will provide the underpinnings for the execution and interpretation of all complex human disease genetic research that follows. As this constitutes the most prodigious investment in human variation resource generation to date, it is Imperative that the data from this project is processed and analyzed as accurately as possible as the raw data is of such a scale that it cannot be maintained permanently. In addition the methods developed in this proposal will be directly applied beyond the 1000 Genomes project to medical sequencing efforts to unlock the genetics of complex disease.
5 schema:endDate 2013-06-30
6 schema:funder grid-institutes:grid.280128.1
7 schema:identifier N23d185d207a444b1b74e13b1b3f44be8
8 Neac94d15845f494f942adf1419204c85
9 schema:keywords CNV detection
10 CNVs
11 Genome Center
12 Genome Project
13 Genomes Project data
14 HapMap project
15 SNP Consortium
16 SNPs
17 accuracy
18 addition
19 aim
20 analysis
21 analysis group
22 analysts
23 analytical method
24 application of data
25 applications
26 approach
27 array
28 array data
29 association
30 association studies
31 biases
32 center
33 collaborative framework
34 community
35 companies
36 complex diseases
37 consortium
38 contribution
39 data
40 data format
41 data processing
42 data resources
43 date
44 detection
45 direct genotyping
46 disease
47 disease association studies
48 disease research
49 disease samples
50 disequilibrium
51 efforts
52 engineers
53 error process
54 execution
55 format
56 framework
57 frequency
58 generation
59 generation sequencing platforms
60 genetic analysis
61 genetic research
62 geneticists
63 genetics
64 genetics community
65 genome sequence data
66 genotyping
67 goal
68 group
69 human genetics research
70 improved method
71 imputation
72 inclusion
73 information
74 intensity data
75 interpretation
76 investment
77 key task
78 members
79 method
80 methodology
81 millions
82 multiple samples
83 next generation
84 next stage
85 next-generation sequencing platforms
86 novel polymorphisms
87 patients
88 platform
89 polymorphism
90 population geneticists
91 population genetics
92 probability
93 process
94 processing
95 project
96 project data
97 project goals
98 proposal
99 rare variants
100 raw data
101 raw intensity data
102 records
103 refining
104 research
105 resource generation
106 resources
107 samples
108 scale
109 sensitivity
110 sequence data
111 sequence variation
112 sequencing efforts
113 sequencing platforms
114 skilled team
115 software
116 software engineers
117 software tools
118 stage
119 standards
120 statistical genetic analysis
121 structural variants
122 study
123 task
124 team
125 technology companies
126 tool
127 track record
128 underpinnings
129 unprecedented accuracy
130 use
131 user-friendly software
132 users
133 variants
134 variation
135 schema:name Joint SNP and CNV calling in 1000 Genomes sequence data
136 schema:recipient N22de52a5c8044d8ab39de191a55c9fc0
137 sg:person.011517303117.07
138 grid-institutes:grid.66859.34
139 schema:sameAs https://app.dimensions.ai/details/grant/grant.2691278
140 schema:sdDatePublished 2022-11-24T21:21
141 schema:sdLicense https://scigraph.springernature.com/explorer/license/
142 schema:sdPublisher N28c2e7400ed34582a03f77d58c47bb19
143 schema:startDate 2009-09-16
144 schema:url http://projectreporter.nih.gov/project_info_description.cfm?aid=7932960
145 sgo:license sg:explorer/license/
146 sgo:sdDataset grants
147 rdf:type schema:MonetaryGrant
148 N18430140e1434a49ab22eec335ebea29 schema:currency USD
149 schema:value 7083002.0
150 rdf:type schema:MonetaryAmount
151 N22de52a5c8044d8ab39de191a55c9fc0 schema:member sg:person.011517303117.07
152 schema:roleName PI
153 rdf:type schema:Role
154 N23d185d207a444b1b74e13b1b3f44be8 schema:name nih_id
155 schema:value U01HG005208
156 rdf:type schema:PropertyValue
157 N28c2e7400ed34582a03f77d58c47bb19 schema:name Springer Nature - SN SciGraph project
158 rdf:type schema:Organization
159 Neac94d15845f494f942adf1419204c85 schema:name dimensions_id
160 schema:value grant.2691278
161 rdf:type schema:PropertyValue
162 anzsrc-for:31 schema:inDefinedTermSet anzsrc-for:
163 rdf:type schema:DefinedTerm
164 anzsrc-for:42 schema:inDefinedTermSet anzsrc-for:
165 rdf:type schema:DefinedTerm
166 sg:person.011517303117.07 schema:affiliation grid-institutes:None
167 schema:familyName DALY
168 schema:givenName MARK JOSEPH
169 rdf:type schema:Person
170 grid-institutes:None schema:name BROAD INSTITUTE, INC.
171 rdf:type schema:Organization
172 grid-institutes:grid.280128.1 schema:Organization
173 grid-institutes:grid.66859.34 schema:Organization
 




Preview window. Press ESC to close (or click here)


...