RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2011-08-04

AUTHORS

Bo Li, Colin N Dewey

ABSTRACT

BackgroundRNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.ResultsWe present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.ConclusionsRSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive. More... »

PAGES

323

References to SciGraph publications

  • 2010-10-27. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2006-09-01. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements in NATURE BIOTECHNOLOGY
  • 2010-05-02. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation in NATURE BIOTECHNOLOGY
  • 2011-05-15. Full-length transcriptome assembly from RNA-Seq data without a reference genome in NATURE BIOTECHNOLOGY
  • 2010. Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data in ALGORITHMS IN BIOINFORMATICS
  • 2008-11-27. Alternative isoform regulation in human tissue transcriptomes in NATURE
  • 2010-11-07. Analysis and design of RNA sequencing experiments for identifying isoform regulation in NATURE METHODS
  • 2010-05-11. Modeling non-uniformity in short-read rates in RNA-Seq data in GENOME BIOLOGY
  • 2011-03-16. Improving RNA-Seq expression estimates by correcting for fragment bias in GENOME BIOLOGY
  • 2008-05-30. Mapping and quantifying mammalian transcriptomes by RNA-Seq in NATURE METHODS
  • 2010-05-02. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs in NATURE BIOTECHNOLOGY
  • 2009-01. RNA-Seq: a revolutionary tool for transcriptomics in NATURE REVIEWS GENETICS
  • 2010-02-18. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments in BMC BIOINFORMATICS
  • 2009-03-04. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome in GENOME BIOLOGY
  • 2010-10-10. De novo assembly and analysis of RNA-seq data in NATURE METHODS
  • Journal

    TITLE

    BMC Bioinformatics

    ISSUE

    1

    VOLUME

    12

    Related Patents

  • Personalized Cancer Vaccines And Adoptive Immune Cell Therapies
  • Tandem Duplicator Phenotype (Tdp) As A Distinct Genomic Configuration In Cancer And Use Thereof
  • Systemic Autoimmune Diseases Diagnostic And Prognostic Method
  • Method Of Predicting Survival Rates For Cancer Patients
  • Nucleic Acid Copy Number Determination Based On Fragment Estimates
  • Lilrb3 Antibody Molecules And Uses Thereof
  • Aav Delivery Of Nucleobase Editors
  • Use Of Gdf15 For Treating Cardiometabolic Syndrome And Other Conditions
  • Use Of Alpha-V-Integrin (Cd51) Inhibitors For The Treatment Of Cardiac Fibrosis
  • High Efficiency Base Editors Comprising Gam
  • Treatment And Prevention Of Disease Mediated By Wwp2
  • Methods And Systems For Visualizing Gene Expression Data
  • In Vitro Method For The Diagnosis And/Or Prognosis Of Calcific Aortic Valve Disease Or Subclinical Aortic-Valve Calcification
  • Interleukin-4–Induced Gene 1 (Il4i1) As A Biomarker And Uses Thereof
  • Transcript Determination Method
  • Nucleobase Editors Comprising Nucleic Acid Programmable Dna Binding Proteins
  • Transcription Factors To Improve Resistance To Environmental Stress In Plants
  • Transcript Determination Method
  • Personalized Cancer Vaccines And Adoptive Immune Cell Therapies
  • Compounds For Inducing Anti-Tumor Immunity And Methods Thereof
  • Interleukin-4-Induced Gene 1 (Il4i1) And Respective Metabolites As Biomarkers For Cancer
  • Compounds For Inducing Anti-Tumor Immunity And Methods Thereof
  • Gdf3 As Biomarker And Biotarget In Post-Ischemic Cardiac Remodeling
  • Gene Networks That Mediate Remyelination Of The Human Brain
  • Selection Of Cancer Mutations For Generation Of A Personalized Cancer Vaccine
  • Methods And Compositions For Modulating Immune Responses
  • Compounds And Methods For The Production Of Suckerin And Uses Thereof
  • Ketogenic Diet And Ketone Supplementation For Cancer Therapy
  • Methods Of Treatment Of Cancer With Reduced Ubb Expression
  • Switchable Cas9 Nucleases And Uses Thereof
  • Methods Of Treatment Of Cancer Associated With Centrosome Amplification
  • Interleukin-4-Induced Gene 1 (Il4i1) And Respective Metabolites As Biomarkers For Cancer
  • Nucleic Acid Copy Number Determination Based On Fragment Estimates
  • Immunotherapy Targeting Tumor Neoantigenic Peptides
  • Respiratory And Sweat Gland Ionocytes
  • Method Of Predicting Survival Rates For Cancer Patients
  • Compositions And Methods For Inducing Intestinal Stem Cell Regeneration
  • Method For Determining Cellular Composition Of A Tumor
  • Method For Determining Cellular Composition Of A Tumor
  • Interleukin-4-Induced Gene 1 (Il4i1) As A Biomarker And Uses Thereof
  • Human Liver Chimeric Non-Human Animal With Deficient P450 Oxidoreductase And Methods Of Using Same
  • Surface Markers For The Isolation Of Myogenic Stem/Progenitor Cells
  • Modulation Of T Cell Cytotoxicity And Related Therapy
  • Methods And Compositions For Controlling Cardiac Fibrosis And Remodeling
  • Wheat Male-Sterility Gene Wms And Its Anther-Specific Expression Promoter And Uses Thereof
  • Hla Single Allele Lines
  • Compositions And Methods For Personalized Neoplasia Vaccines
  • Glycopeptides For Inducing An Immune Response And Methods Of Use
  • Induction Of Arterial-Type Of Hemogenic Endothelium (Ahe) And Enhancement Of T Cell Production From Pscs Through Overexpression Of Ets Factors Or Modulating Mapk/Erk Signalling Pathways
  • Methods And Products For Quantifying Rna Transcript Variants
  • Biosynthetic Genes And Polypeptides
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/1471-2105-12-323

    DOI

    http://dx.doi.org/10.1186/1471-2105-12-323

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1021902674

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/21816040


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Animals", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computer Simulation", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Gene Expression Profiling", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Mice", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Protein Isoforms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA", 
              "id": "http://www.grid.ac/institutes/grid.14003.36", 
              "name": [
                "Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Li", 
            "givenName": "Bo", 
            "id": "sg:person.01303526064.50", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01303526064.50"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA", 
              "id": "http://www.grid.ac/institutes/grid.14003.36", 
              "name": [
                "Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA", 
                "Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Dewey", 
            "givenName": "Colin N", 
            "id": "sg:person.01221075436.94", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01221075436.94"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1186/gb-2009-10-3-r25", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049583368", 
              "https://doi.org/10.1186/gb-2009-10-3-r25"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1621", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031035095", 
              "https://doi.org/10.1038/nbt.1621"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1633", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025339324", 
              "https://doi.org/10.1038/nbt.1633"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-5-r50", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043554856", 
              "https://doi.org/10.1186/gb-2010-11-5-r50"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature07509", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029002744", 
              "https://doi.org/10.1038/nature07509"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-15294-8_17", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1040318157", 
              "https://doi.org/10.1007/978-3-642-15294-8_17"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-94", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053091615", 
              "https://doi.org/10.1186/1471-2105-11-94"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1517", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032102367", 
              "https://doi.org/10.1038/nmeth.1517"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1239", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037875102", 
              "https://doi.org/10.1038/nbt1239"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1528", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047715940", 
              "https://doi.org/10.1038/nmeth.1528"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2011-12-3-r22", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009466747", 
              "https://doi.org/10.1186/gb-2011-12-3-r22"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nrg2484", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030687647", 
              "https://doi.org/10.1038/nrg2484"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1226", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045381177", 
              "https://doi.org/10.1038/nmeth.1226"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1883", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015803168", 
              "https://doi.org/10.1038/nbt.1883"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2011-08-04", 
        "datePublishedReg": "2011-08-04", 
        "description": "BackgroundRNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.ResultsWe present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.ConclusionsRSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/1471-2105-12-323", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2529387", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023786", 
            "issn": [
              "1471-2105"
            ], 
            "name": "BMC Bioinformatics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "12"
          }
        ], 
        "keywords": [
          "RNA-seq data", 
          "isoforms", 
          "relative frequency", 
          "multiple genes", 
          "genes", 
          "RNA-seq", 
          "transcript quantification", 
          "same gene", 
          "cDNA fragments", 
          "data", 
          "splice forms", 
          "transcript abundance", 
          "quantification", 
          "number", 
          "interval", 
          "absence", 
          "credibility intervals", 
          "ability", 
          "de novo transcriptome assembly", 
          "novo transcriptome assembly", 
          "contrast", 
          "single gene", 
          "transcripts", 
          "reference genome", 
          "accurate transcript quantification", 
          "quantification method", 
          "large number", 
          "short single-end reads", 
          "guidance", 
          "quantification experiments", 
          "transcriptome assembly", 
          "significant issue", 
          "tool", 
          "combination", 
          "frequency", 
          "use", 
          "genome", 
          "estimates", 
          "addition", 
          "handling", 
          "RNA-seq experiments", 
          "length", 
          "end", 
          "fragments", 
          "paired-end RNA-seq data", 
          "hand", 
          "paired-end reads", 
          "challenges", 
          "issues", 
          "number of reads", 
          "abundance estimates", 
          "files", 
          "novo transcriptome assemblers", 
          "single-end reads", 
          "quantifying transcript abundances", 
          "form", 
          "abundance", 
          "reads", 
          "design", 
          "transcriptome assemblers", 
          "method", 
          "user-friendly software tool", 
          "experiments", 
          "RSEM", 
          "assembly", 
          "terms", 
          "advantages", 
          "valuable guidance", 
          "key challenges", 
          "package", 
          "software", 
          "species", 
          "data sets", 
          "user-friendly software package", 
          "existence", 
          "software package", 
          "comparable performance", 
          "performance", 
          "assemblers", 
          "set", 
          "software tools", 
          "cost-efficient design", 
          "real data sets"
        ], 
        "name": "RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome", 
        "pagination": "323", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1021902674"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/1471-2105-12-323"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "21816040"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/1471-2105-12-323", 
          "https://app.dimensions.ai/details/publication/pub.1021902674"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-08-04T16:59", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_537.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/1471-2105-12-323"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-323'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-323'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-323'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-323'


     

    This table displays all metadata directly associated to this object as RDF triples.

    250 TRIPLES      21 PREDICATES      132 URIs      109 LITERALS      16 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/1471-2105-12-323 schema:about N101519cc656a4273a333e0c46e9be197
    2 N4effd47bdfa141f198d6d4833f39a3a8
    3 N696a5341d61143d8b4a342f559de84d5
    4 N6d3182afb59242829e1db40f2c1c236a
    5 N74a227a0ba6f4c8c97c934ab76c34dd1
    6 N74f214eb687641ee8d1d2f39b013b5b1
    7 Nabd1f61e67794faf843bb33c6efd083b
    8 Naf6e34f1c401419385cde8dad3626314
    9 Ndfc3e6d4883144ceb779f2cac8be1d4a
    10 anzsrc-for:06
    11 anzsrc-for:0604
    12 schema:author Na6e5aea18d31422eb8dc040db0ff29fa
    13 schema:citation sg:pub.10.1007/978-3-642-15294-8_17
    14 sg:pub.10.1038/nature07509
    15 sg:pub.10.1038/nbt.1621
    16 sg:pub.10.1038/nbt.1633
    17 sg:pub.10.1038/nbt.1883
    18 sg:pub.10.1038/nbt1239
    19 sg:pub.10.1038/nmeth.1226
    20 sg:pub.10.1038/nmeth.1517
    21 sg:pub.10.1038/nmeth.1528
    22 sg:pub.10.1038/nrg2484
    23 sg:pub.10.1186/1471-2105-11-94
    24 sg:pub.10.1186/gb-2009-10-3-r25
    25 sg:pub.10.1186/gb-2010-11-10-r106
    26 sg:pub.10.1186/gb-2010-11-5-r50
    27 sg:pub.10.1186/gb-2011-12-3-r22
    28 schema:datePublished 2011-08-04
    29 schema:datePublishedReg 2011-08-04
    30 schema:description BackgroundRNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.ResultsWe present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.ConclusionsRSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
    31 schema:genre article
    32 schema:isAccessibleForFree true
    33 schema:isPartOf N96a594a539554e37b0d3e103a906d38b
    34 N99faebcfc8ab42918bd9241c42021906
    35 sg:journal.1023786
    36 schema:keywords RNA-seq
    37 RNA-seq data
    38 RNA-seq experiments
    39 RSEM
    40 ability
    41 absence
    42 abundance
    43 abundance estimates
    44 accurate transcript quantification
    45 addition
    46 advantages
    47 assemblers
    48 assembly
    49 cDNA fragments
    50 challenges
    51 combination
    52 comparable performance
    53 contrast
    54 cost-efficient design
    55 credibility intervals
    56 data
    57 data sets
    58 de novo transcriptome assembly
    59 design
    60 end
    61 estimates
    62 existence
    63 experiments
    64 files
    65 form
    66 fragments
    67 frequency
    68 genes
    69 genome
    70 guidance
    71 hand
    72 handling
    73 interval
    74 isoforms
    75 issues
    76 key challenges
    77 large number
    78 length
    79 method
    80 multiple genes
    81 novo transcriptome assemblers
    82 novo transcriptome assembly
    83 number
    84 number of reads
    85 package
    86 paired-end RNA-seq data
    87 paired-end reads
    88 performance
    89 quantification
    90 quantification experiments
    91 quantification method
    92 quantifying transcript abundances
    93 reads
    94 real data sets
    95 reference genome
    96 relative frequency
    97 same gene
    98 set
    99 short single-end reads
    100 significant issue
    101 single gene
    102 single-end reads
    103 software
    104 software package
    105 software tools
    106 species
    107 splice forms
    108 terms
    109 tool
    110 transcript abundance
    111 transcript quantification
    112 transcriptome assemblers
    113 transcriptome assembly
    114 transcripts
    115 use
    116 user-friendly software package
    117 user-friendly software tool
    118 valuable guidance
    119 schema:name RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
    120 schema:pagination 323
    121 schema:productId N677ac838b6f14052b8c640ae2b63adb7
    122 N96eed8174660410bb8a2fb7d5b2e166e
    123 Nc6873b71cbec4c9091787815cdd5e528
    124 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021902674
    125 https://doi.org/10.1186/1471-2105-12-323
    126 schema:sdDatePublished 2022-08-04T16:59
    127 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    128 schema:sdPublisher N3fb4959b63f849a198736fe0462a06a1
    129 schema:url https://doi.org/10.1186/1471-2105-12-323
    130 sgo:license sg:explorer/license/
    131 sgo:sdDataset articles
    132 rdf:type schema:ScholarlyArticle
    133 N101519cc656a4273a333e0c46e9be197 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    134 schema:name Mice
    135 rdf:type schema:DefinedTerm
    136 N3fb4959b63f849a198736fe0462a06a1 schema:name Springer Nature - SN SciGraph project
    137 rdf:type schema:Organization
    138 N4effd47bdfa141f198d6d4833f39a3a8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    139 schema:name RNA
    140 rdf:type schema:DefinedTerm
    141 N677ac838b6f14052b8c640ae2b63adb7 schema:name pubmed_id
    142 schema:value 21816040
    143 rdf:type schema:PropertyValue
    144 N696a5341d61143d8b4a342f559de84d5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    145 schema:name Computer Simulation
    146 rdf:type schema:DefinedTerm
    147 N6d3182afb59242829e1db40f2c1c236a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    148 schema:name Animals
    149 rdf:type schema:DefinedTerm
    150 N74a227a0ba6f4c8c97c934ab76c34dd1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    151 schema:name Gene Expression Profiling
    152 rdf:type schema:DefinedTerm
    153 N74f214eb687641ee8d1d2f39b013b5b1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    154 schema:name Humans
    155 rdf:type schema:DefinedTerm
    156 N96a594a539554e37b0d3e103a906d38b schema:volumeNumber 12
    157 rdf:type schema:PublicationVolume
    158 N96eed8174660410bb8a2fb7d5b2e166e schema:name doi
    159 schema:value 10.1186/1471-2105-12-323
    160 rdf:type schema:PropertyValue
    161 N99faebcfc8ab42918bd9241c42021906 schema:issueNumber 1
    162 rdf:type schema:PublicationIssue
    163 Na6e5aea18d31422eb8dc040db0ff29fa rdf:first sg:person.01303526064.50
    164 rdf:rest Nb1ef43d7fed9454fa853a00e8387b418
    165 Nabd1f61e67794faf843bb33c6efd083b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    166 schema:name Software
    167 rdf:type schema:DefinedTerm
    168 Naf6e34f1c401419385cde8dad3626314 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    169 schema:name Sequence Analysis, RNA
    170 rdf:type schema:DefinedTerm
    171 Nb1ef43d7fed9454fa853a00e8387b418 rdf:first sg:person.01221075436.94
    172 rdf:rest rdf:nil
    173 Nc6873b71cbec4c9091787815cdd5e528 schema:name dimensions_id
    174 schema:value pub.1021902674
    175 rdf:type schema:PropertyValue
    176 Ndfc3e6d4883144ceb779f2cac8be1d4a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    177 schema:name Protein Isoforms
    178 rdf:type schema:DefinedTerm
    179 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    180 schema:name Biological Sciences
    181 rdf:type schema:DefinedTerm
    182 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    183 schema:name Genetics
    184 rdf:type schema:DefinedTerm
    185 sg:grant.2529387 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-323
    186 rdf:type schema:MonetaryGrant
    187 sg:journal.1023786 schema:issn 1471-2105
    188 schema:name BMC Bioinformatics
    189 schema:publisher Springer Nature
    190 rdf:type schema:Periodical
    191 sg:person.01221075436.94 schema:affiliation grid-institutes:grid.14003.36
    192 schema:familyName Dewey
    193 schema:givenName Colin N
    194 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01221075436.94
    195 rdf:type schema:Person
    196 sg:person.01303526064.50 schema:affiliation grid-institutes:grid.14003.36
    197 schema:familyName Li
    198 schema:givenName Bo
    199 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01303526064.50
    200 rdf:type schema:Person
    201 sg:pub.10.1007/978-3-642-15294-8_17 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040318157
    202 https://doi.org/10.1007/978-3-642-15294-8_17
    203 rdf:type schema:CreativeWork
    204 sg:pub.10.1038/nature07509 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029002744
    205 https://doi.org/10.1038/nature07509
    206 rdf:type schema:CreativeWork
    207 sg:pub.10.1038/nbt.1621 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031035095
    208 https://doi.org/10.1038/nbt.1621
    209 rdf:type schema:CreativeWork
    210 sg:pub.10.1038/nbt.1633 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025339324
    211 https://doi.org/10.1038/nbt.1633
    212 rdf:type schema:CreativeWork
    213 sg:pub.10.1038/nbt.1883 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015803168
    214 https://doi.org/10.1038/nbt.1883
    215 rdf:type schema:CreativeWork
    216 sg:pub.10.1038/nbt1239 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037875102
    217 https://doi.org/10.1038/nbt1239
    218 rdf:type schema:CreativeWork
    219 sg:pub.10.1038/nmeth.1226 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045381177
    220 https://doi.org/10.1038/nmeth.1226
    221 rdf:type schema:CreativeWork
    222 sg:pub.10.1038/nmeth.1517 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032102367
    223 https://doi.org/10.1038/nmeth.1517
    224 rdf:type schema:CreativeWork
    225 sg:pub.10.1038/nmeth.1528 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047715940
    226 https://doi.org/10.1038/nmeth.1528
    227 rdf:type schema:CreativeWork
    228 sg:pub.10.1038/nrg2484 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030687647
    229 https://doi.org/10.1038/nrg2484
    230 rdf:type schema:CreativeWork
    231 sg:pub.10.1186/1471-2105-11-94 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053091615
    232 https://doi.org/10.1186/1471-2105-11-94
    233 rdf:type schema:CreativeWork
    234 sg:pub.10.1186/gb-2009-10-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049583368
    235 https://doi.org/10.1186/gb-2009-10-3-r25
    236 rdf:type schema:CreativeWork
    237 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    238 https://doi.org/10.1186/gb-2010-11-10-r106
    239 rdf:type schema:CreativeWork
    240 sg:pub.10.1186/gb-2010-11-5-r50 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043554856
    241 https://doi.org/10.1186/gb-2010-11-5-r50
    242 rdf:type schema:CreativeWork
    243 sg:pub.10.1186/gb-2011-12-3-r22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009466747
    244 https://doi.org/10.1186/gb-2011-12-3-r22
    245 rdf:type schema:CreativeWork
    246 grid-institutes:grid.14003.36 schema:alternateName Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
    247 Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
    248 schema:name Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
    249 Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
    250 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...