EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2015-12

AUTHORS

Soohyun Lee, Chae Hwa Seo, Burak Han Alver, Sanghyuk Lee, Peter J. Park

ABSTRACT

BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. RESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. CONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar. More... »

PAGES

278

References to SciGraph publications

  • 2001-02. Initial sequencing and analysis of the human genome in NATURE
  • 2010-10. Differential expression analysis for sequence count data in GENOME BIOLOGY
  • 2013-01. Streaming fragment assignment for real-time analysis of sequencing experiments in NATURE METHODS
  • 2014-02. voom: precision weights unlock linear model analysis tools for RNA-seq read counts in GENOME BIOLOGY
  • 2011-12. Estimation of alternative splicing isoform frequencies from RNA-Seq data in ALGORITHMS FOR MOLECULAR BIOLOGY
  • 2014-05. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms in NATURE BIOTECHNOLOGY
  • 2011-12. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome in BMC BIOINFORMATICS
  • 2006-12. Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays in BMC GENOMICS
  • 2008-11. Alternative isoform regulation in human tissue transcriptomes in NATURE
  • 2014-12. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 in GENOME BIOLOGY
  • 2008-07. Mapping and quantifying mammalian transcriptomes by RNA-Seq in NATURE METHODS
  • 2010-05. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation in NATURE BIOTECHNOLOGY
  • 2006-09. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements in NATURE BIOTECHNOLOGY
  • 2014-09. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium in NATURE BIOTECHNOLOGY
  • 2010-01. Genome sequence of the palaeopolyploid soybean in NATURE
  • 2010-09. Comprehensive comparative analysis of strand-specific RNA sequencing methods in NATURE METHODS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z

    DOI

    http://dx.doi.org/10.1186/s12859-015-0704-z

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1018974680

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/26335049


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Base Sequence", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Gene Expression Profiling", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genome", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Protein Isoforms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, RNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Transcriptome", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Harvard University", 
              "id": "https://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lee", 
            "givenName": "Soohyun", 
            "id": "sg:person.01366221727.01", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01366221727.01"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "DNA Link", 
              "id": "https://www.grid.ac/institutes/grid.410904.8", 
              "name": [
                "Emerging Technology Center, DNA link, Seoul, South Korea"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Seo", 
            "givenName": "Chae Hwa", 
            "id": "sg:person.01106323055.56", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01106323055.56"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Harvard University", 
              "id": "https://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Alver", 
            "givenName": "Burak Han", 
            "id": "sg:person.01213225540.53", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01213225540.53"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Ewha Womans University", 
              "id": "https://www.grid.ac/institutes/grid.255649.9", 
              "name": [
                "Emerging Technology Center, DNA link, Seoul, South Korea", 
                "Ewha Womans University, Seoul, Korea"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lee", 
            "givenName": "Sanghyuk", 
            "id": "sg:person.01317076376.19", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01317076376.19"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Brigham and Women's Hospital", 
              "id": "https://www.grid.ac/institutes/grid.62560.37", 
              "name": [
                "Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA", 
                "Informatics Program, Boston Children\u2019s Hospital and Division of Genetics, Brigham and Women\u2019s Hospital, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Park", 
            "givenName": "Peter J.", 
            "id": "sg:person.01024612701.33", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01024612701.33"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1089/cmb.2010.0259", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001451255"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2164-7-59", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009952286", 
              "https://doi.org/10.1186/1471-2164-7-59"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.2862", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011219673", 
              "https://doi.org/10.1038/nbt.2862"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bib/bbs046", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1013843285"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13059-014-0550-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015222646", 
              "https://doi.org/10.1186/s13059-014-0550-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2251", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016190409", 
              "https://doi.org/10.1038/nmeth.2251"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08670", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017534919", 
              "https://doi.org/10.1038/nature08670"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature08670", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017534919", 
              "https://doi.org/10.1038/nature08670"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/cne.902840310", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017641066"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1491", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019899367", 
              "https://doi.org/10.1038/nmeth.1491"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1491", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019899367", 
              "https://doi.org/10.1038/nmeth.1491"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-12-323", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021902674", 
              "https://doi.org/10.1186/1471-2105-12-323"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btp616", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023247882"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.2957", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027683701", 
              "https://doi.org/10.1038/nbt.2957"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gkq1015", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028292927"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nature07509", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029002744", 
              "https://doi.org/10.1038/nature07509"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.1621", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031035095", 
              "https://doi.org/10.1038/nbt.1621"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-10-r106", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031289083", 
              "https://doi.org/10.1186/gb-2010-11-10-r106"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/b978-0-12-385118-5.00005-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1034400626"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1748-7188-6-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035358650", 
              "https://doi.org/10.1186/1748-7188-6-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gkp596", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037666378"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1239", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037875102", 
              "https://doi.org/10.1038/nbt1239"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt1239", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037875102", 
              "https://doi.org/10.1038/nbt1239"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gks666", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039501927"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1126/science.1160342", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042163407"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/35057062", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042854081", 
              "https://doi.org/10.1038/35057062"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/35057062", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042854081", 
              "https://doi.org/10.1038/35057062"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/bioinformatics/btp113", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044688303"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2014-15-2-r29", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045312009", 
              "https://doi.org/10.1186/gb-2014-15-2-r29"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1226", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045381177", 
              "https://doi.org/10.1038/nmeth.1226"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/002832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085103532"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/002832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085103532"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1101/002832", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085103532"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2015-12", 
        "datePublishedReg": "2015-12-01", 
        "description": "BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost.\nRESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods.\nCONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1186/s12859-015-0704-z", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.7491432", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1023786", 
            "issn": [
              "1471-2105"
            ], 
            "name": "BMC Bioinformatics", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "16"
          }
        ], 
        "name": "EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering", 
        "pagination": "278", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "5a684f79c9b294085aba992c749bfc1e09e464b7fc534522638febefb61f66fe"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "26335049"
            ]
          }, 
          {
            "name": "nlm_unique_id", 
            "type": "PropertyValue", 
            "value": [
              "100965194"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s12859-015-0704-z"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1018974680"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s12859-015-0704-z", 
          "https://app.dimensions.ai/details/publication/pub.1018974680"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T19:57", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000512.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1186%2Fs12859-015-0704-z"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0704-z'


     

    This table displays all metadata directly associated to this object as RDF triples.

    234 TRIPLES      21 PREDICATES      63 URIs      28 LITERALS      16 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s12859-015-0704-z schema:about N02d1254e4f48471b9e1b43607e52b16e
    2 N07a445266fda46d0a1882b426115b7c0
    3 N227a1b8295c54c4ea0f4758e61a60cbb
    4 N332bb1218e1a495ea48bc6c2db0bc5f1
    5 N86c04d0dd6e34337ad3a4490b914f22c
    6 Nc1073158b7f14df996a95d3a3aeed0b1
    7 Nd62c25bb5fc14b2f91cb36853f0a24bc
    8 anzsrc-for:06
    9 anzsrc-for:0604
    10 schema:author Nd674358c69cd405a918f65b1a6508bc8
    11 schema:citation sg:pub.10.1038/35057062
    12 sg:pub.10.1038/nature07509
    13 sg:pub.10.1038/nature08670
    14 sg:pub.10.1038/nbt.1621
    15 sg:pub.10.1038/nbt.2862
    16 sg:pub.10.1038/nbt.2957
    17 sg:pub.10.1038/nbt1239
    18 sg:pub.10.1038/nmeth.1226
    19 sg:pub.10.1038/nmeth.1491
    20 sg:pub.10.1038/nmeth.2251
    21 sg:pub.10.1186/1471-2105-12-323
    22 sg:pub.10.1186/1471-2164-7-59
    23 sg:pub.10.1186/1748-7188-6-9
    24 sg:pub.10.1186/gb-2010-11-10-r106
    25 sg:pub.10.1186/gb-2014-15-2-r29
    26 sg:pub.10.1186/s13059-014-0550-8
    27 https://doi.org/10.1002/cne.902840310
    28 https://doi.org/10.1016/b978-0-12-385118-5.00005-0
    29 https://doi.org/10.1089/cmb.2010.0259
    30 https://doi.org/10.1093/bib/bbs046
    31 https://doi.org/10.1093/bioinformatics/btp113
    32 https://doi.org/10.1093/bioinformatics/btp616
    33 https://doi.org/10.1093/nar/gkp596
    34 https://doi.org/10.1093/nar/gkq1015
    35 https://doi.org/10.1093/nar/gks666
    36 https://doi.org/10.1101/002832
    37 https://doi.org/10.1126/science.1160342
    38 schema:datePublished 2015-12
    39 schema:datePublishedReg 2015-12-01
    40 schema:description BACKGROUND: RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. RESULTS: We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. CONCLUSIONS: EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
    41 schema:genre research_article
    42 schema:inLanguage en
    43 schema:isAccessibleForFree true
    44 schema:isPartOf N5227c4b97b7b47b1aafadff6148b7a11
    45 Nf9cff765383549a3a0525e4eae2ea00a
    46 sg:journal.1023786
    47 schema:name EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering
    48 schema:pagination 278
    49 schema:productId N4c64a472e196401ebb4811e7ea935b43
    50 N6dc76cc461be41e2b56a6ca5d7dedcd5
    51 N9c1ce5e8f410485b95bbbdd5c43284e3
    52 Nb8134564cb664615abbdab29e33835c7
    53 Ndcc65bdce36a43469be8d6d965b1d170
    54 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018974680
    55 https://doi.org/10.1186/s12859-015-0704-z
    56 schema:sdDatePublished 2019-04-10T19:57
    57 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    58 schema:sdPublisher Ncbea7cdf11d943588f49dff23417e850
    59 schema:url http://link.springer.com/10.1186%2Fs12859-015-0704-z
    60 sgo:license sg:explorer/license/
    61 sgo:sdDataset articles
    62 rdf:type schema:ScholarlyArticle
    63 N02d1254e4f48471b9e1b43607e52b16e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    64 schema:name Sequence Analysis, RNA
    65 rdf:type schema:DefinedTerm
    66 N07a445266fda46d0a1882b426115b7c0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    67 schema:name Protein Isoforms
    68 rdf:type schema:DefinedTerm
    69 N0b4279ec9c274878ad4f697cabc7a322 rdf:first sg:person.01213225540.53
    70 rdf:rest N9661c8838b9342d18786411d395be656
    71 N227a1b8295c54c4ea0f4758e61a60cbb schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    72 schema:name Genome
    73 rdf:type schema:DefinedTerm
    74 N332bb1218e1a495ea48bc6c2db0bc5f1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    75 schema:name RNA
    76 rdf:type schema:DefinedTerm
    77 N3505f711428c4954828aa3e57af3a35c rdf:first sg:person.01106323055.56
    78 rdf:rest N0b4279ec9c274878ad4f697cabc7a322
    79 N4c64a472e196401ebb4811e7ea935b43 schema:name readcube_id
    80 schema:value 5a684f79c9b294085aba992c749bfc1e09e464b7fc534522638febefb61f66fe
    81 rdf:type schema:PropertyValue
    82 N5227c4b97b7b47b1aafadff6148b7a11 schema:issueNumber 1
    83 rdf:type schema:PublicationIssue
    84 N6dc76cc461be41e2b56a6ca5d7dedcd5 schema:name dimensions_id
    85 schema:value pub.1018974680
    86 rdf:type schema:PropertyValue
    87 N75a2bf55ee064117abb0250111fa03af rdf:first sg:person.01024612701.33
    88 rdf:rest rdf:nil
    89 N86c04d0dd6e34337ad3a4490b914f22c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    90 schema:name Base Sequence
    91 rdf:type schema:DefinedTerm
    92 N9661c8838b9342d18786411d395be656 rdf:first sg:person.01317076376.19
    93 rdf:rest N75a2bf55ee064117abb0250111fa03af
    94 N9c1ce5e8f410485b95bbbdd5c43284e3 schema:name pubmed_id
    95 schema:value 26335049
    96 rdf:type schema:PropertyValue
    97 Nb8134564cb664615abbdab29e33835c7 schema:name nlm_unique_id
    98 schema:value 100965194
    99 rdf:type schema:PropertyValue
    100 Nc1073158b7f14df996a95d3a3aeed0b1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    101 schema:name Gene Expression Profiling
    102 rdf:type schema:DefinedTerm
    103 Ncbea7cdf11d943588f49dff23417e850 schema:name Springer Nature - SN SciGraph project
    104 rdf:type schema:Organization
    105 Nd62c25bb5fc14b2f91cb36853f0a24bc schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    106 schema:name Transcriptome
    107 rdf:type schema:DefinedTerm
    108 Nd674358c69cd405a918f65b1a6508bc8 rdf:first sg:person.01366221727.01
    109 rdf:rest N3505f711428c4954828aa3e57af3a35c
    110 Ndcc65bdce36a43469be8d6d965b1d170 schema:name doi
    111 schema:value 10.1186/s12859-015-0704-z
    112 rdf:type schema:PropertyValue
    113 Nf9cff765383549a3a0525e4eae2ea00a schema:volumeNumber 16
    114 rdf:type schema:PublicationVolume
    115 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    116 schema:name Biological Sciences
    117 rdf:type schema:DefinedTerm
    118 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    119 schema:name Genetics
    120 rdf:type schema:DefinedTerm
    121 sg:grant.7491432 http://pending.schema.org/fundedItem sg:pub.10.1186/s12859-015-0704-z
    122 rdf:type schema:MonetaryGrant
    123 sg:journal.1023786 schema:issn 1471-2105
    124 schema:name BMC Bioinformatics
    125 rdf:type schema:Periodical
    126 sg:person.01024612701.33 schema:affiliation https://www.grid.ac/institutes/grid.62560.37
    127 schema:familyName Park
    128 schema:givenName Peter J.
    129 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01024612701.33
    130 rdf:type schema:Person
    131 sg:person.01106323055.56 schema:affiliation https://www.grid.ac/institutes/grid.410904.8
    132 schema:familyName Seo
    133 schema:givenName Chae Hwa
    134 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01106323055.56
    135 rdf:type schema:Person
    136 sg:person.01213225540.53 schema:affiliation https://www.grid.ac/institutes/grid.38142.3c
    137 schema:familyName Alver
    138 schema:givenName Burak Han
    139 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01213225540.53
    140 rdf:type schema:Person
    141 sg:person.01317076376.19 schema:affiliation https://www.grid.ac/institutes/grid.255649.9
    142 schema:familyName Lee
    143 schema:givenName Sanghyuk
    144 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01317076376.19
    145 rdf:type schema:Person
    146 sg:person.01366221727.01 schema:affiliation https://www.grid.ac/institutes/grid.38142.3c
    147 schema:familyName Lee
    148 schema:givenName Soohyun
    149 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01366221727.01
    150 rdf:type schema:Person
    151 sg:pub.10.1038/35057062 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042854081
    152 https://doi.org/10.1038/35057062
    153 rdf:type schema:CreativeWork
    154 sg:pub.10.1038/nature07509 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029002744
    155 https://doi.org/10.1038/nature07509
    156 rdf:type schema:CreativeWork
    157 sg:pub.10.1038/nature08670 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017534919
    158 https://doi.org/10.1038/nature08670
    159 rdf:type schema:CreativeWork
    160 sg:pub.10.1038/nbt.1621 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031035095
    161 https://doi.org/10.1038/nbt.1621
    162 rdf:type schema:CreativeWork
    163 sg:pub.10.1038/nbt.2862 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011219673
    164 https://doi.org/10.1038/nbt.2862
    165 rdf:type schema:CreativeWork
    166 sg:pub.10.1038/nbt.2957 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027683701
    167 https://doi.org/10.1038/nbt.2957
    168 rdf:type schema:CreativeWork
    169 sg:pub.10.1038/nbt1239 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037875102
    170 https://doi.org/10.1038/nbt1239
    171 rdf:type schema:CreativeWork
    172 sg:pub.10.1038/nmeth.1226 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045381177
    173 https://doi.org/10.1038/nmeth.1226
    174 rdf:type schema:CreativeWork
    175 sg:pub.10.1038/nmeth.1491 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019899367
    176 https://doi.org/10.1038/nmeth.1491
    177 rdf:type schema:CreativeWork
    178 sg:pub.10.1038/nmeth.2251 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016190409
    179 https://doi.org/10.1038/nmeth.2251
    180 rdf:type schema:CreativeWork
    181 sg:pub.10.1186/1471-2105-12-323 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021902674
    182 https://doi.org/10.1186/1471-2105-12-323
    183 rdf:type schema:CreativeWork
    184 sg:pub.10.1186/1471-2164-7-59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009952286
    185 https://doi.org/10.1186/1471-2164-7-59
    186 rdf:type schema:CreativeWork
    187 sg:pub.10.1186/1748-7188-6-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035358650
    188 https://doi.org/10.1186/1748-7188-6-9
    189 rdf:type schema:CreativeWork
    190 sg:pub.10.1186/gb-2010-11-10-r106 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031289083
    191 https://doi.org/10.1186/gb-2010-11-10-r106
    192 rdf:type schema:CreativeWork
    193 sg:pub.10.1186/gb-2014-15-2-r29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045312009
    194 https://doi.org/10.1186/gb-2014-15-2-r29
    195 rdf:type schema:CreativeWork
    196 sg:pub.10.1186/s13059-014-0550-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015222646
    197 https://doi.org/10.1186/s13059-014-0550-8
    198 rdf:type schema:CreativeWork
    199 https://doi.org/10.1002/cne.902840310 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017641066
    200 rdf:type schema:CreativeWork
    201 https://doi.org/10.1016/b978-0-12-385118-5.00005-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034400626
    202 rdf:type schema:CreativeWork
    203 https://doi.org/10.1089/cmb.2010.0259 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001451255
    204 rdf:type schema:CreativeWork
    205 https://doi.org/10.1093/bib/bbs046 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013843285
    206 rdf:type schema:CreativeWork
    207 https://doi.org/10.1093/bioinformatics/btp113 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044688303
    208 rdf:type schema:CreativeWork
    209 https://doi.org/10.1093/bioinformatics/btp616 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023247882
    210 rdf:type schema:CreativeWork
    211 https://doi.org/10.1093/nar/gkp596 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037666378
    212 rdf:type schema:CreativeWork
    213 https://doi.org/10.1093/nar/gkq1015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028292927
    214 rdf:type schema:CreativeWork
    215 https://doi.org/10.1093/nar/gks666 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039501927
    216 rdf:type schema:CreativeWork
    217 https://doi.org/10.1101/002832 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085103532
    218 rdf:type schema:CreativeWork
    219 https://doi.org/10.1126/science.1160342 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042163407
    220 rdf:type schema:CreativeWork
    221 https://www.grid.ac/institutes/grid.255649.9 schema:alternateName Ewha Womans University
    222 schema:name Emerging Technology Center, DNA link, Seoul, South Korea
    223 Ewha Womans University, Seoul, Korea
    224 rdf:type schema:Organization
    225 https://www.grid.ac/institutes/grid.38142.3c schema:alternateName Harvard University
    226 schema:name Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
    227 rdf:type schema:Organization
    228 https://www.grid.ac/institutes/grid.410904.8 schema:alternateName DNA Link
    229 schema:name Emerging Technology Center, DNA link, Seoul, South Korea
    230 rdf:type schema:Organization
    231 https://www.grid.ac/institutes/grid.62560.37 schema:alternateName Brigham and Women's Hospital
    232 schema:name Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
    233 Informatics Program, Boston Children’s Hospital and Division of Genetics, Brigham and Women’s Hospital, Boston, MA, USA
    234 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...