Alignment Software for Second-Generation Sequencing View Homepage


Ontology type: schema:MonetaryGrant     


Grant Info

YEARS

2011-2015

FUNDING AMOUNT

2155143 USD

ABSTRACT

DESCRIPTION (provided by applicant): The latest generation of DNA sequencing technology has spurred a tremendous increase in the use of sequencing to answer fundamental questions in biology and medicine. Whole-genome sequencing is being used to study cancer, to study common disease-causing variants in the human genome, and to create a better picture of human diversity. Sequencing of messenger RNA through the protocol known as RNA-seq has led to an explosion of projects to characterize the transcriptome of many cell types in many species. These sequencing-based studies generate enormous amounts of data, which in turn require sophisticated, efficient computational tools to align the DNA sequence back to a reference genome and to help interpret the results. Our group has developed a suite of software tools for alignment of DNA and RNA to a reference genome. These include Bowtie, a very fast short-read alignment program; TopHat, an alignment program that aligns spliced transcripts (mRNA) across introns; and Cufflinks, a program that assembles complete transcripts, including alternative splice variants, from the alignments that TopHat produces. Our tools have been designed to handle very large next-generation sequence data sets, reducing alignment times that took multiple CPU-days with previous tools to just minutes. They also have relatively modest memory requirements, allowing them to be run on a desktop computer. For these and other reasons, these programs have become the preferred tools for numerous research groups; the Bowtie program alone has already attracted a very large user base, with over 20,000 downloads since its initial release in 2008. In this proposal, we ask for support to maintain these open-source software programs, adapt them to continuously changing DNA sequencing technology, and add new features designed to improve the alignments and to assist investigators with their analyses. More... »

URL

http://projectreporter.nih.gov/project_info_description.cfm?aid=8911220

Related SciGraph Publications

  • 2015-12. Epiviz: a view inside the design of an integrated visual analysis software for genomics in BMC BIOINFORMATICS
  • 2015-04. HISAT: a fast spliced aligner with low memory requirements in NATURE METHODS
  • 2015-03. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads in NATURE BIOTECHNOLOGY
  • 2014-09. Epiviz: interactive visual analytics for functional genomics data in NATURE METHODS
  • 2013-04. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions in GENOME BIOLOGY
  • 2012-12. Gene expression anti-profiles as a basis for accurate universal cancer signatures in BMC BIOINFORMATICS
  • 2012-07. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species in NATURE
  • 2012-04. Fast gapped-read alignment with Bowtie 2 in NATURE METHODS
  • 2012-03-01. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks in NATURE PROTOCOLS
  • 2012-01. Targeted RNA sequencing reveals the deep complexity of the human transcriptome in NATURE BIOTECHNOLOGY
  • 2012-01. Repetitive DNA and next-generation sequencing: computational challenges and solutions in NATURE REVIEWS GENETICS
  • 2011-12. Detection of lineage-specific evolutionary changes among primate species in BMC BIOINFORMATICS
  • 2011-12. Improving pan-genome annotation using whole genome multiple alignment in BMC BIOINFORMATICS
  • 2011-09. Effective detection of rare variants in pooled DNA samples using Cross-pool tailcurve analysis in GENOME BIOLOGY
  • 2011-09. Improving RNA-Seq expression estimates by correcting for fragment bias in GENOME BIOLOGY
  • 2011-08. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts in GENOME BIOLOGY
  • 2011-08. Increased methylation variation in epigenetic domains across cancer types in NATURE GENETICS
  • 2011-05. PhymmBL expanded: confidence scores, custom databases, parallelization and more in NATURE METHODS
  • 2011-05. Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths in GENOME BIOLOGY
  • 2010-12. Clustering metagenomic sequences with interpolated Markov models in BMC BIOINFORMATICS
  • 2010-11. Quake: quality-aware detection and correction of sequencing errors in GENOME BIOLOGY
  • 2010-05. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation in NATURE BIOTECHNOLOGY
  • JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2206", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "type": "DefinedTerm"
          }
        ], 
        "amount": {
          "currency": "USD", 
          "type": "MonetaryAmount", 
          "value": "2155143"
        }, 
        "description": "DESCRIPTION (provided by applicant): The latest generation of DNA sequencing technology has spurred a tremendous increase in the use of sequencing to answer fundamental questions in biology and medicine. Whole-genome sequencing is being used to study cancer, to study common disease-causing variants in the human genome, and to create a better picture of human diversity. Sequencing of messenger RNA through the protocol known as RNA-seq has led to an explosion of projects to characterize the transcriptome of many cell types in many species. These sequencing-based studies generate enormous amounts of data, which in turn require sophisticated, efficient computational tools to align the DNA sequence back to a reference genome and to help interpret the results. Our group has developed a suite of software tools for alignment of DNA and RNA to a reference genome. These include Bowtie, a very fast short-read alignment program; TopHat, an alignment program that aligns spliced transcripts (mRNA) across introns; and Cufflinks, a program that assembles complete transcripts, including alternative splice variants, from the alignments that TopHat produces. Our tools have been designed to handle very large next-generation sequence data sets, reducing alignment times that took multiple CPU-days with previous tools to just minutes. They also have relatively modest memory requirements, allowing them to be run on a desktop computer. For these and other reasons, these programs have become the preferred tools for numerous research groups; the Bowtie program alone has already attracted a very large user base, with over 20,000 downloads since its initial release in 2008. In this proposal, we ask for support to maintain these open-source software programs, adapt them to continuously changing DNA sequencing technology, and add new features designed to improve the alignments and to assist investigators with their analyses.", 
        "endDate": "2015-04-30T00:00:00Z", 
        "funder": {
          "id": "https://www.grid.ac/institutes/grid.280128.1", 
          "type": "Organization"
        }, 
        "id": "sg:grant.2529425", 
        "identifier": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "2529425"
            ]
          }, 
          {
            "name": "nih_id", 
            "type": "PropertyValue", 
            "value": [
              "R01HG006102"
            ]
          }
        ], 
        "inLanguage": [
          "en"
        ], 
        "keywords": [
          "bowtie", 
          "desktop computers", 
          "analysis", 
          "medicine", 
          "reference genome", 
          "software tools", 
          "alignment time", 
          "initial release", 
          "DNA sequences", 
          "results", 
          "variants", 
          "numerous research groups", 
          "investigators", 
          "support", 
          "project", 
          "complete transcript", 
          "modest memory requirements", 
          "Cufflinks", 
          "TopHat", 
          "groups", 
          "suite", 
          "open-source software program", 
          "use", 
          "minutes", 
          "mRNA", 
          "many species", 
          "description", 
          "multiple CPU-days", 
          "DNA", 
          "applicants", 
          "alignment programs", 
          "Bowtie program", 
          "previous tools", 
          "transcriptome", 
          "biology", 
          "large next-generation sequence data sets", 
          "efficient computational tools", 
          "alignment", 
          "alternative splice variants", 
          "new features", 
          "messenger RNA", 
          "program", 
          "latest generation", 
          "second-generation sequencing", 
          "protocol", 
          "whole-genome sequencing", 
          "data", 
          "common disease", 
          "human genome", 
          "DNA sequencing technology", 
          "human diversity", 
          "enormous amount", 
          "alignment software", 
          "tool", 
          "fundamental questions", 
          "sequencing", 
          "RNA-seq", 
          "download", 
          "explosion", 
          "many cell types", 
          "cancer", 
          "transcripts", 
          "introns", 
          "large user base", 
          "other reasons", 
          "RNA", 
          "tremendous increase", 
          "short-read alignment program", 
          "better picture", 
          "proposal", 
          "turn", 
          "study", 
          "preferred tool"
        ], 
        "name": "Alignment Software for Second-Generation Sequencing", 
        "recipient": [
          {
            "id": "https://www.grid.ac/institutes/grid.21107.35", 
            "type": "Organization"
          }, 
          {
            "affiliation": {
              "id": "https://www.grid.ac/institutes/grid.21107.35", 
              "name": "JOHNS HOPKINS UNIVERSITY", 
              "type": "Organization"
            }, 
            "familyName": "SALZBERG", 
            "givenName": "STEVEN L", 
            "id": "sg:person.01223441713.02", 
            "type": "Person"
          }, 
          {
            "member": "sg:person.01223441713.02", 
            "roleName": "PI", 
            "type": "Role"
          }
        ], 
        "sameAs": [
          "https://app.dimensions.ai/details/grant/grant.2529425"
        ], 
        "sdDataset": "grants", 
        "sdDatePublished": "2021-01-19T02:46", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com.uberresearch.data.processor/core_data/20181219_192338/projects/base/nih_projects_10.xml.gz", 
        "startDate": "2011-07-06T00:00:00Z", 
        "type": "MonetaryGrant", 
        "url": "http://projectreporter.nih.gov/project_info_description.cfm?aid=8911220"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/grant.2529425'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/grant.2529425'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/grant.2529425'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/grant.2529425'


     

    This table displays all metadata directly associated to this object as RDF triples.

    117 TRIPLES      19 PREDICATES      95 URIs      87 LITERALS      5 BLANK NODES

    Subject Predicate Object
    1 sg:grant.2529425 schema:about anzsrc-for:2206
    2 schema:amount Nd21dfca2694242c78bb51cd8cf3150c7
    3 schema:description DESCRIPTION (provided by applicant): The latest generation of DNA sequencing technology has spurred a tremendous increase in the use of sequencing to answer fundamental questions in biology and medicine. Whole-genome sequencing is being used to study cancer, to study common disease-causing variants in the human genome, and to create a better picture of human diversity. Sequencing of messenger RNA through the protocol known as RNA-seq has led to an explosion of projects to characterize the transcriptome of many cell types in many species. These sequencing-based studies generate enormous amounts of data, which in turn require sophisticated, efficient computational tools to align the DNA sequence back to a reference genome and to help interpret the results. Our group has developed a suite of software tools for alignment of DNA and RNA to a reference genome. These include Bowtie, a very fast short-read alignment program; TopHat, an alignment program that aligns spliced transcripts (mRNA) across introns; and Cufflinks, a program that assembles complete transcripts, including alternative splice variants, from the alignments that TopHat produces. Our tools have been designed to handle very large next-generation sequence data sets, reducing alignment times that took multiple CPU-days with previous tools to just minutes. They also have relatively modest memory requirements, allowing them to be run on a desktop computer. For these and other reasons, these programs have become the preferred tools for numerous research groups; the Bowtie program alone has already attracted a very large user base, with over 20,000 downloads since its initial release in 2008. In this proposal, we ask for support to maintain these open-source software programs, adapt them to continuously changing DNA sequencing technology, and add new features designed to improve the alignments and to assist investigators with their analyses.
    4 schema:endDate 2015-04-30T00:00:00Z
    5 schema:funder https://www.grid.ac/institutes/grid.280128.1
    6 schema:identifier N7c8ead351ae94121bf714d0ec7d0a8a2
    7 Nce3fca477b9040cbaf44ca0e50d09d36
    8 schema:inLanguage en
    9 schema:keywords Bowtie program
    10 Cufflinks
    11 DNA
    12 DNA sequences
    13 DNA sequencing technology
    14 RNA
    15 RNA-seq
    16 TopHat
    17 alignment
    18 alignment programs
    19 alignment software
    20 alignment time
    21 alternative splice variants
    22 analysis
    23 applicants
    24 better picture
    25 biology
    26 bowtie
    27 cancer
    28 common disease
    29 complete transcript
    30 data
    31 description
    32 desktop computers
    33 download
    34 efficient computational tools
    35 enormous amount
    36 explosion
    37 fundamental questions
    38 groups
    39 human diversity
    40 human genome
    41 initial release
    42 introns
    43 investigators
    44 large next-generation sequence data sets
    45 large user base
    46 latest generation
    47 mRNA
    48 many cell types
    49 many species
    50 medicine
    51 messenger RNA
    52 minutes
    53 modest memory requirements
    54 multiple CPU-days
    55 new features
    56 numerous research groups
    57 open-source software program
    58 other reasons
    59 preferred tool
    60 previous tools
    61 program
    62 project
    63 proposal
    64 protocol
    65 reference genome
    66 results
    67 second-generation sequencing
    68 sequencing
    69 short-read alignment program
    70 software tools
    71 study
    72 suite
    73 support
    74 tool
    75 transcriptome
    76 transcripts
    77 tremendous increase
    78 turn
    79 use
    80 variants
    81 whole-genome sequencing
    82 schema:name Alignment Software for Second-Generation Sequencing
    83 schema:recipient Nc7893ba33c1547c383d1a4d69a98a169
    84 sg:person.01223441713.02
    85 https://www.grid.ac/institutes/grid.21107.35
    86 schema:sameAs https://app.dimensions.ai/details/grant/grant.2529425
    87 schema:sdDatePublished 2021-01-19T02:46
    88 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    89 schema:sdPublisher N97595ebefbd84eadb9baef4e3999889e
    90 schema:startDate 2011-07-06T00:00:00Z
    91 schema:url http://projectreporter.nih.gov/project_info_description.cfm?aid=8911220
    92 sgo:license sg:explorer/license/
    93 sgo:sdDataset grants
    94 rdf:type schema:MonetaryGrant
    95 N7c8ead351ae94121bf714d0ec7d0a8a2 schema:name dimensions_id
    96 schema:value 2529425
    97 rdf:type schema:PropertyValue
    98 N97595ebefbd84eadb9baef4e3999889e schema:name Springer Nature - SN SciGraph project
    99 rdf:type schema:Organization
    100 Nc7893ba33c1547c383d1a4d69a98a169 schema:member sg:person.01223441713.02
    101 schema:roleName PI
    102 rdf:type schema:Role
    103 Nce3fca477b9040cbaf44ca0e50d09d36 schema:name nih_id
    104 schema:value R01HG006102
    105 rdf:type schema:PropertyValue
    106 Nd21dfca2694242c78bb51cd8cf3150c7 schema:currency USD
    107 schema:value 2155143
    108 rdf:type schema:MonetaryAmount
    109 anzsrc-for:2206 schema:inDefinedTermSet anzsrc-for:
    110 rdf:type schema:DefinedTerm
    111 sg:person.01223441713.02 schema:affiliation https://www.grid.ac/institutes/grid.21107.35
    112 schema:familyName SALZBERG
    113 schema:givenName STEVEN L
    114 rdf:type schema:Person
    115 https://www.grid.ac/institutes/grid.21107.35 schema:name JOHNS HOPKINS UNIVERSITY
    116 rdf:type schema:Organization
    117 https://www.grid.ac/institutes/grid.280128.1 schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...