EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2015-12

AUTHORS

Soohyun Lee, Chae Hwa Seo, Burak Han Alver, Sanghyuk Lee, Peter J. Park

ABSTRACT

BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. RESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. CONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar. More... »

PAGES

278

References to SciGraph publications

  • 2001-02. Initial sequencing and analysis of the human genome in NATURE
  • 2010-10. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2013-01. Streaming fragment assignment for real-time analysis of sequencing experiments in NATURE METHODS
  • 2014-02. voom: precision weights unlock linear model analysis tools for RNA-seq read counts in GENOME BIOLOGY
  • 2011-12. Estimation of alternative splicing isoform frequencies from RNA-Seq data in ALGORITHMS FOR MOLECULAR BIOLOGY
  • 2014-05. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms in NATURE BIOTECHNOLOGY
  • 2011-12. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome in BMC BIOINFORMATICS
  • 2006-12. Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays in BMC GENOMICS
  • 2008-11. Alternative isoform regulation in human tissue transcriptomes in NATURE
  • 2014-12. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 in GENOME BIOLOGY
  • 2008-07. Mapping and quantifying mammalian transcriptomes by RNA-Seq in NATURE METHODS
  • 2010-05. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation in NATURE BIOTECHNOLOGY
  • 2006-09. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements in NATURE BIOTECHNOLOGY
  • 2014-09. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium in NATURE BIOTECHNOLOGY
  • 2010-01. Genome sequence of the palaeopolyploid soybean in NATURE
  • 2010-09. Comprehensive comparative analysis of strand-specific RNA sequencing methods in NATURE METHODS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z

    DOI

    http://dx.doi.org/10.1186/s12859-015-0704-z

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1018974680

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/26335049


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Base Sequence", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Gene Expression Profiling", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genome", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Protein Isoforms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Transcriptome", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Harvard University", 
              "id": "https://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lee", 
            "givenName": "Soohyun", 
            "id": "sg:person.01366221727.01", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01366221727.01"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "DNA Link", 
              "id": "https://www.grid.ac/institutes/grid.410904.8", 
              "name": [
                "Emerging Technology Center, DNA link, Seoul, South Korea"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Seo", 
            "givenName": "Chae Hwa", 
            "id": "sg:person.01106323055.56", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01106323055.56"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Harvard University", 
              "id": "https://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Alver", 
            "givenName": "Burak Han", 
            "id": "sg:person.01213225540.53", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01213225540.53"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Ewha Womans University", 
              "id": "https://www.grid.ac/institutes/grid.255649.9", 
              "name": [
                "Emerging Technology Center, DNA link, Seoul, South Korea", 
                "Ewha Womans University, Seoul, Korea"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lee", 
            "givenName": "Sanghyuk", 
            "id": "sg:person.01317076376.19", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01317076376.19"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Brigham and Women's Hospital", 
              "id": "https://www.grid.ac/institutes/grid.62560.37", 
              "name": [
                "Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA", 
                "Informatics Program, Boston Children\u2019s Hospital and Division of Genetics, Brigham and Women\u2019s Hospital, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Park", 
            "givenName": "Peter J.", 
            "id": "sg:person.01024612701.33", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01024612701.33"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1089/cmb.2010.0259", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001451255"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2164-7-59", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009952286", 
              "https://doi.org/10.1186/1471-2164-7-59"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.2862", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011219673", 
              "https://doi.org/10.1038/nbt.2862"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bib/bbs046", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1013843285"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2251", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016190409", 
              "https://doi.org/10.1038/nmeth.2251"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08670", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017534919", 
              "https://doi.org/10.1038/nature08670"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08670", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017534919", 
              "https://doi.org/10.1038/nature08670"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/cne.902840310", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017641066"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1491", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019899367", 
              "https://doi.org/10.1038/nmeth.1491"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1491", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019899367", 
              "https://doi.org/10.1038/nmeth.1491"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-12-323", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021902674", 
              "https://doi.org/10.1186/1471-2105-12-323"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btp616", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023247882"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.2957", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027683701", 
              "https://doi.org/10.1038/nbt.2957"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gkq1015", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028292927"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature07509", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029002744", 
              "https://doi.org/10.1038/nature07509"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1621", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031035095", 
              "https://doi.org/10.1038/nbt.1621"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/b978-0-12-385118-5.00005-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1034400626"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1748-7188-6-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035358650", 
              "https://doi.org/10.1186/1748-7188-6-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gkp596", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037666378"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1239", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037875102", 
              "https://doi.org/10.1038/nbt1239"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1239", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037875102", 
              "https://doi.org/10.1038/nbt1239"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gks666", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039501927"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1126/science.1160342", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042163407"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/35057062", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042854081", 
              "https://doi.org/10.1038/35057062"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/35057062", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042854081", 
              "https://doi.org/10.1038/35057062"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btp113", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044688303"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-2-r29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045312009", 
              "https://doi.org/10.1186/gb-2014-15-2-r29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1226", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045381177", 
              "https://doi.org/10.1038/nmeth.1226"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/002832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085103532"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/002832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085103532"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/002832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085103532"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2015-12", 
        "datePublishedReg": "2015-12-01", 
        "description": "BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost.\nRESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods.\nCONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1186/s12859-015-0704-z", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.7491432", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023786", 
            "issn": [
              "1471-2105"
            ], 
            "name": "BMC Bioinformatics", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "16"
          }
        ], 
        "name": "EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering", 
        "pagination": "278", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "5a684f79c9b294085aba992c749bfc1e09e464b7fc534522638febefb61f66fe"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "26335049"
            ]
          }, 
          {
            "name": "nlm_unique_id", 
            "type": "PropertyValue", 
            "value": [
              "100965194"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s12859-015-0704-z"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1018974680"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s12859-015-0704-z", 
          "https://app.dimensions.ai/details/publication/pub.1018974680"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T19:57", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000512.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1186%2Fs12859-015-0704-z"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'


     

    This table displays all metadata directly associated to this object as RDF triples.

    234 TRIPLES      21 PREDICATES      63 URIs      28 LITERALS      16 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s12859-015-0704-z schema:about N06032d0b09e74032bcf99d3e7a743575
    2 N106e5e83e4234e0e974dbf1e3d477093
    3 N1a3361a717464dfcb11d64ab1bcae560
    4 N3054621176e14f9494854c89b4a332e6
    5 N4faa8341cc2744c782e096830f4466e6
    6 Nb253644f07ef4909bd6742c5fe16138e
    7 Nc04f973c3ff34007a223dc1c05ea2ec3
    8 anzsrc-for:06
    9 anzsrc-for:0604
    10 schema:author N67eaab9a200d48f7bac3f704767f0b5c
    11 schema:citation sg:pub.10.1038/35057062
    12 sg:pub.10.1038/nature07509
    13 sg:pub.10.1038/nature08670
    14 sg:pub.10.1038/nbt.1621
    15 sg:pub.10.1038/nbt.2862
    16 sg:pub.10.1038/nbt.2957
    17 sg:pub.10.1038/nbt1239
    18 sg:pub.10.1038/nmeth.1226
    19 sg:pub.10.1038/nmeth.1491
    20 sg:pub.10.1038/nmeth.2251
    21 sg:pub.10.1186/1471-2105-12-323
    22 sg:pub.10.1186/1471-2164-7-59
    23 sg:pub.10.1186/1748-7188-6-9
    24 sg:pub.10.1186/gb-2010-11-10-r106
    25 sg:pub.10.1186/gb-2014-15-2-r29
    26 sg:pub.10.1186/s13059-014-0550-8
    27 https://doi.org/10.1002/cne.902840310
    28 https://doi.org/10.1016/b978-0-12-385118-5.00005-0
    29 https://doi.org/10.1089/cmb.2010.0259
    30 https://doi.org/10.1093/bib/bbs046
    31 https://doi.org/10.1093/bioinformatics/btp113
    32 https://doi.org/10.1093/bioinformatics/btp616
    33 https://doi.org/10.1093/nar/gkp596
    34 https://doi.org/10.1093/nar/gkq1015
    35 https://doi.org/10.1093/nar/gks666
    36 https://doi.org/10.1101/002832
    37 https://doi.org/10.1126/science.1160342
    38 schema:datePublished 2015-12
    39 schema:datePublishedReg 2015-12-01
    40 schema:description BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. RESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. CONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
    41 schema:genre research_article
    42 schema:inLanguage en
    43 schema:isAccessibleForFree true
    44 schema:isPartOf N874c67f319764d9e93835ecc90131c0c
    45 Na6ff08c2783d40e089b148973756de34
    46 sg:journal.1023786
    47 schema:name EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering
    48 schema:pagination 278
    49 schema:productId N660d6facc33d42b7be7e7bf4ad55ca4f
    50 N6ab04bc17ad04c85834f6fc375ba5476
    51 Nb261fa523c4e42b4908e14fb2b224281
    52 Nce1872959b8f48e89ba4488dcb9c90cf
    53 Ndba9c29a79884a58a116ca34076541ea
    54 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018974680
    55 https://doi.org/10.1186/s12859-015-0704-z
    56 schema:sdDatePublished 2019-04-10T19:57
    57 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    58 schema:sdPublisher N10e14d5057d94edd88f2b0adbd0c60eb
    59 schema:url http://link.springer.com/10.1186%2Fs12859-015-0704-z
    60 sgo:license sg:explorer/license/
    61 sgo:sdDataset articles
    62 rdf:type schema:ScholarlyArticle
    63 N06032d0b09e74032bcf99d3e7a743575 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    64 schema:name Protein Isoforms
    65 rdf:type schema:DefinedTerm
    66 N106e5e83e4234e0e974dbf1e3d477093 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    67 schema:name Gene Expression Profiling
    68 rdf:type schema:DefinedTerm
    69 N10e14d5057d94edd88f2b0adbd0c60eb schema:name Springer Nature - SN SciGraph project
    70 rdf:type schema:Organization
    71 N1a3361a717464dfcb11d64ab1bcae560 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    72 schema:name Genome
    73 rdf:type schema:DefinedTerm
    74 N3054621176e14f9494854c89b4a332e6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    75 schema:name Base Sequence
    76 rdf:type schema:DefinedTerm
    77 N3b2de012ceb142a3befb087b6022799e rdf:first sg:person.01317076376.19
    78 rdf:rest Nab084bb39aaa4932885642baf2966865
    79 N4faa8341cc2744c782e096830f4466e6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    80 schema:name RNA
    81 rdf:type schema:DefinedTerm
    82 N53a8c6ef75164cabb599a7523ca5d1ff rdf:first sg:person.01106323055.56
    83 rdf:rest Nffe07378d0484725b624d1b2fcfafcb0
    84 N660d6facc33d42b7be7e7bf4ad55ca4f schema:name doi
    85 schema:value 10.1186/s12859-015-0704-z
    86 rdf:type schema:PropertyValue
    87 N67eaab9a200d48f7bac3f704767f0b5c rdf:first sg:person.01366221727.01
    88 rdf:rest N53a8c6ef75164cabb599a7523ca5d1ff
    89 N6ab04bc17ad04c85834f6fc375ba5476 schema:name dimensions_id
    90 schema:value pub.1018974680
    91 rdf:type schema:PropertyValue
    92 N874c67f319764d9e93835ecc90131c0c schema:issueNumber 1
    93 rdf:type schema:PublicationIssue
    94 Na6ff08c2783d40e089b148973756de34 schema:volumeNumber 16
    95 rdf:type schema:PublicationVolume
    96 Nab084bb39aaa4932885642baf2966865 rdf:first sg:person.01024612701.33
    97 rdf:rest rdf:nil
    98 Nb253644f07ef4909bd6742c5fe16138e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    99 schema:name Sequence Analysis, RNA
    100 rdf:type schema:DefinedTerm
    101 Nb261fa523c4e42b4908e14fb2b224281 schema:name nlm_unique_id
    102 schema:value 100965194
    103 rdf:type schema:PropertyValue
    104 Nc04f973c3ff34007a223dc1c05ea2ec3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    105 schema:name Transcriptome
    106 rdf:type schema:DefinedTerm
    107 Nce1872959b8f48e89ba4488dcb9c90cf schema:name readcube_id
    108 schema:value 5a684f79c9b294085aba992c749bfc1e09e464b7fc534522638febefb61f66fe
    109 rdf:type schema:PropertyValue
    110 Ndba9c29a79884a58a116ca34076541ea schema:name pubmed_id
    111 schema:value 26335049
    112 rdf:type schema:PropertyValue
    113 Nffe07378d0484725b624d1b2fcfafcb0 rdf:first sg:person.01213225540.53
    114 rdf:rest N3b2de012ceb142a3befb087b6022799e
    115 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    116 schema:name Biological Sciences
    117 rdf:type schema:DefinedTerm
    118 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    119 schema:name Genetics
    120 rdf:type schema:DefinedTerm
    121 sg:grant.7491432 http://pending.schema.org/fundedItem sg:pub.10.1186/s12859-015-0704-z
    122 rdf:type schema:MonetaryGrant
    123 sg:journal.1023786 schema:issn 1471-2105
    124 schema:name BMC Bioinformatics
    125 rdf:type schema:Periodical
    126 sg:person.01024612701.33 schema:affiliation https://www.grid.ac/institutes/grid.62560.37
    127 schema:familyName Park
    128 schema:givenName Peter J.
    129 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01024612701.33
    130 rdf:type schema:Person
    131 sg:person.01106323055.56 schema:affiliation https://www.grid.ac/institutes/grid.410904.8
    132 schema:familyName Seo
    133 schema:givenName Chae Hwa
    134 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01106323055.56
    135 rdf:type schema:Person
    136 sg:person.01213225540.53 schema:affiliation https://www.grid.ac/institutes/grid.38142.3c
    137 schema:familyName Alver
    138 schema:givenName Burak Han
    139 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01213225540.53
    140 rdf:type schema:Person
    141 sg:person.01317076376.19 schema:affiliation https://www.grid.ac/institutes/grid.255649.9
    142 schema:familyName Lee
    143 schema:givenName Sanghyuk
    144 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01317076376.19
    145 rdf:type schema:Person
    146 sg:person.01366221727.01 schema:affiliation https://www.grid.ac/institutes/grid.38142.3c
    147 schema:familyName Lee
    148 schema:givenName Soohyun
    149 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01366221727.01
    150 rdf:type schema:Person
    151 sg:pub.10.1038/35057062 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042854081
    152 https://doi.org/10.1038/35057062
    153 rdf:type schema:CreativeWork
    154 sg:pub.10.1038/nature07509 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029002744
    155 https://doi.org/10.1038/nature07509
    156 rdf:type schema:CreativeWork
    157 sg:pub.10.1038/nature08670 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017534919
    158 https://doi.org/10.1038/nature08670
    159 rdf:type schema:CreativeWork
    160 sg:pub.10.1038/nbt.1621 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031035095
    161 https://doi.org/10.1038/nbt.1621
    162 rdf:type schema:CreativeWork
    163 sg:pub.10.1038/nbt.2862 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011219673
    164 https://doi.org/10.1038/nbt.2862
    165 rdf:type schema:CreativeWork
    166 sg:pub.10.1038/nbt.2957 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027683701
    167 https://doi.org/10.1038/nbt.2957
    168 rdf:type schema:CreativeWork
    169 sg:pub.10.1038/nbt1239 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037875102
    170 https://doi.org/10.1038/nbt1239
    171 rdf:type schema:CreativeWork
    172 sg:pub.10.1038/nmeth.1226 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045381177
    173 https://doi.org/10.1038/nmeth.1226
    174 rdf:type schema:CreativeWork
    175 sg:pub.10.1038/nmeth.1491 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019899367
    176 https://doi.org/10.1038/nmeth.1491
    177 rdf:type schema:CreativeWork
    178 sg:pub.10.1038/nmeth.2251 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016190409
    179 https://doi.org/10.1038/nmeth.2251
    180 rdf:type schema:CreativeWork
    181 sg:pub.10.1186/1471-2105-12-323 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021902674
    182 https://doi.org/10.1186/1471-2105-12-323
    183 rdf:type schema:CreativeWork
    184 sg:pub.10.1186/1471-2164-7-59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009952286
    185 https://doi.org/10.1186/1471-2164-7-59
    186 rdf:type schema:CreativeWork
    187 sg:pub.10.1186/1748-7188-6-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035358650
    188 https://doi.org/10.1186/1748-7188-6-9
    189 rdf:type schema:CreativeWork
    190 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    191 https://doi.org/10.1186/gb-2010-11-10-r106
    192 rdf:type schema:CreativeWork
    193 sg:pub.10.1186/gb-2014-15-2-r29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045312009
    194 https://doi.org/10.1186/gb-2014-15-2-r29
    195 rdf:type schema:CreativeWork
    196 sg:pub.10.1186/s13059-014-0550-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015222646
    197 https://doi.org/10.1186/s13059-014-0550-8
    198 rdf:type schema:CreativeWork
    199 https://doi.org/10.1002/cne.902840310 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017641066
    200 rdf:type schema:CreativeWork
    201 https://doi.org/10.1016/b978-0-12-385118-5.00005-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034400626
    202 rdf:type schema:CreativeWork
    203 https://doi.org/10.1089/cmb.2010.0259 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001451255
    204 rdf:type schema:CreativeWork
    205 https://doi.org/10.1093/bib/bbs046 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013843285
    206 rdf:type schema:CreativeWork
    207 https://doi.org/10.1093/bioinformatics/btp113 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044688303
    208 rdf:type schema:CreativeWork
    209 https://doi.org/10.1093/bioinformatics/btp616 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023247882
    210 rdf:type schema:CreativeWork
    211 https://doi.org/10.1093/nar/gkp596 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037666378
    212 rdf:type schema:CreativeWork
    213 https://doi.org/10.1093/nar/gkq1015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028292927
    214 rdf:type schema:CreativeWork
    215 https://doi.org/10.1093/nar/gks666 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039501927
    216 rdf:type schema:CreativeWork
    217 https://doi.org/10.1101/002832 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085103532
    218 rdf:type schema:CreativeWork
    219 https://doi.org/10.1126/science.1160342 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042163407
    220 rdf:type schema:CreativeWork
    221 https://www.grid.ac/institutes/grid.255649.9 schema:alternateName Ewha Womans University
    222 schema:name Emerging Technology Center, DNA link, Seoul, South Korea
    223 Ewha Womans University, Seoul, Korea
    224 rdf:type schema:Organization
    225 https://www.grid.ac/institutes/grid.38142.3c schema:alternateName Harvard University
    226 schema:name Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
    227 rdf:type schema:Organization
    228 https://www.grid.ac/institutes/grid.410904.8 schema:alternateName DNA Link
    229 schema:name Emerging Technology Center, DNA link, Seoul, South Korea
    230 rdf:type schema:Organization
    231 https://www.grid.ac/institutes/grid.62560.37 schema:alternateName Brigham and Women's Hospital
    232 schema:name Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
    233 Informatics Program, Boston Children’s Hospital and Division of Genetics, Brigham and Women’s Hospital, Boston, MA, USA
    234 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...