Strategies for aggregating gene expression data: The collapseRows R function View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2011-12

AUTHORS

Jeremy A Miller, Chaochao Cai, Peter Langfelder, Daniel H Geschwind, Sunil M Kurian, Daniel R Salomon, Steve Horvath

ABSTRACT

BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. RESULTS: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. CONCLUSIONS: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools. More... »

PAGES

322

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-12-322

DOI

http://dx.doi.org/10.1186/1471-2105-12-322

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1013163514

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/21816037


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Animals", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Blood", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Brain", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Expression Profiling", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Expression Regulation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Meta-Analysis as Topic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Mice", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Models, Statistical", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Oligonucleotide Array Sequence Analysis", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of California Los Angeles", 
          "id": "https://www.grid.ac/institutes/grid.19006.3e", 
          "name": [
            "Interdepartmental Program for Neuroscience, UCLA, Los Angeles, California, USA", 
            "Human Genetics Department, UCLA, Los Angeles, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Miller", 
        "givenName": "Jeremy A", 
        "id": "sg:person.01016772265.00", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01016772265.00"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California Los Angeles", 
          "id": "https://www.grid.ac/institutes/grid.19006.3e", 
          "name": [
            "Human Genetics Department, UCLA, Los Angeles, California, USA", 
            "Biostatistics Department, UCLA, Los Angeles, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cai", 
        "givenName": "Chaochao", 
        "id": "sg:person.0641354601.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0641354601.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California Los Angeles", 
          "id": "https://www.grid.ac/institutes/grid.19006.3e", 
          "name": [
            "Human Genetics Department, UCLA, Los Angeles, California, USA", 
            "Biostatistics Department, UCLA, Los Angeles, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Langfelder", 
        "givenName": "Peter", 
        "id": "sg:person.01021573403.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01021573403.17"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California Los Angeles", 
          "id": "https://www.grid.ac/institutes/grid.19006.3e", 
          "name": [
            "Human Genetics Department, UCLA, Los Angeles, California, USA", 
            "Neurology Department, UCLA, Los Angeles, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Geschwind", 
        "givenName": "Daniel H", 
        "id": "sg:person.011730155577.61", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011730155577.61"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Scripps Research Institute", 
          "id": "https://www.grid.ac/institutes/grid.214007.0", 
          "name": [
            "Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kurian", 
        "givenName": "Sunil M", 
        "id": "sg:person.0773631510.35", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0773631510.35"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Scripps Research Institute", 
          "id": "https://www.grid.ac/institutes/grid.214007.0", 
          "name": [
            "Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Salomon", 
        "givenName": "Daniel R", 
        "id": "sg:person.01003333710.11", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01003333710.11"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California Los Angeles", 
          "id": "https://www.grid.ac/institutes/grid.19006.3e", 
          "name": [
            "Human Genetics Department, UCLA, Los Angeles, California, USA", 
            "Biostatistics Department, UCLA, Los Angeles, California, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Horvath", 
        "givenName": "Steve", 
        "id": "sg:person.015714446737.06", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015714446737.06"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/bioinformatics/btm563", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000359686"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0006098", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002734685"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1158/0008-5472.can-10-2465", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002868184"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2001-2-11-software0002", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005374082", 
          "https://doi.org/10.1186/gb-2001-2-11-software0002"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0605938103", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005680522"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1755-8794-4-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009194632", 
          "https://doi.org/10.1186/1755-8794-4-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0914257107", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012227460"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btq097", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013323767"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.2536479100", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017393030"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-9-559", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020312314", 
          "https://doi.org/10.1186/1471-2105-9-559"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2202/1544-6115.1128", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020363278"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0013358", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022462667"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng776", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024019002", 
          "https://doi.org/10.1038/ng776"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng776", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024019002", 
          "https://doi.org/10.1038/ng776"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pcbi.1000117", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024039965"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng2119", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026979537", 
          "https://doi.org/10.1038/ng2119"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0706128104", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028608848"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth1107-879", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029609982", 
          "https://doi.org/10.1038/nmeth1107-879"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bti587", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030583574"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pgen.1000873", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030852640"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2009-10-11-r127", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031152360", 
          "https://doi.org/10.1186/gb-2009-10-11-r127"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.1832361100", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031729073"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1752-0509-2-16", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037049388", 
          "https://doi.org/10.1186/1752-0509-2-16"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gni179", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1040526112"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-10-405", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042478264", 
          "https://doi.org/10.1186/1471-2164-10-405"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1158/0008-5472.can-09-2183", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044382312"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nn.2207", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047969922", 
          "https://doi.org/10.1038/nn.2207"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-11-294", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049811676", 
          "https://doi.org/10.1186/1471-2164-11-294"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btl163", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052109782"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1752-0509-1-54", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052159443", 
          "https://doi.org/10.1186/1752-0509-1-54"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2011-12", 
    "datePublishedReg": "2011-12-01", 
    "description": "BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied.\nRESULTS: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data (\"expression deconvolution\"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected \"hub\" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways.\nCONCLUSIONS: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1471-2105-12-322", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2447961", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2440173", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2695773", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2436772", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.2359598", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "12"
      }
    ], 
    "name": "Strategies for aggregating gene expression data: The collapseRows R function", 
    "pagination": "322", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "3eb4eca4dd852fde0022efac621e8a64769c38643840759a0724efc42d043d94"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "21816037"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-12-322"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1013163514"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-12-322", 
      "https://app.dimensions.ai/details/publication/pub.1013163514"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T13:13", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8659_00000504.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186/1471-2105-12-322"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-322'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-322'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-322'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-322'


 

This table displays all metadata directly associated to this object as RDF triples.

265 TRIPLES      21 PREDICATES      68 URIs      31 LITERALS      19 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-12-322 schema:about N12916ab798f84282858a50f7b6f710bb
2 N2a6df54260cf4085875df49acafed057
3 N41b74e27cef342059e22afe79ddd9b7f
4 N4f591c352e574bf096d8b2074b18e865
5 N5d0276eba6b34f6c830b6735059c4f58
6 N611d8364f68a4e9c8080fc1ecc46a980
7 N74e29237e33a482dbaf19e61b81a51f1
8 N96ec67c38c0444b9aa96128e5604beab
9 Nc1f02e9452064463852f568c1bd4b490
10 Nf026cba7d0104c65b3e5f844be093667
11 anzsrc-for:01
12 anzsrc-for:0104
13 schema:author Nb7dc89af2d0241fbaa21aea6ac5e04e0
14 schema:citation sg:pub.10.1038/ng2119
15 sg:pub.10.1038/ng776
16 sg:pub.10.1038/nmeth1107-879
17 sg:pub.10.1038/nn.2207
18 sg:pub.10.1186/1471-2105-9-559
19 sg:pub.10.1186/1471-2164-10-405
20 sg:pub.10.1186/1471-2164-11-294
21 sg:pub.10.1186/1752-0509-1-54
22 sg:pub.10.1186/1752-0509-2-16
23 sg:pub.10.1186/1755-8794-4-5
24 sg:pub.10.1186/gb-2001-2-11-software0002
25 sg:pub.10.1186/gb-2009-10-11-r127
26 https://doi.org/10.1073/pnas.0605938103
27 https://doi.org/10.1073/pnas.0706128104
28 https://doi.org/10.1073/pnas.0914257107
29 https://doi.org/10.1073/pnas.1832361100
30 https://doi.org/10.1073/pnas.2536479100
31 https://doi.org/10.1093/bioinformatics/bti587
32 https://doi.org/10.1093/bioinformatics/btl163
33 https://doi.org/10.1093/bioinformatics/btm563
34 https://doi.org/10.1093/bioinformatics/btq097
35 https://doi.org/10.1093/nar/gni179
36 https://doi.org/10.1158/0008-5472.can-09-2183
37 https://doi.org/10.1158/0008-5472.can-10-2465
38 https://doi.org/10.1371/journal.pcbi.1000117
39 https://doi.org/10.1371/journal.pgen.1000873
40 https://doi.org/10.1371/journal.pone.0006098
41 https://doi.org/10.1371/journal.pone.0013358
42 https://doi.org/10.2202/1544-6115.1128
43 schema:datePublished 2011-12
44 schema:datePublishedReg 2011-12-01
45 schema:description BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. RESULTS: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. CONCLUSIONS: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.
46 schema:genre research_article
47 schema:inLanguage en
48 schema:isAccessibleForFree true
49 schema:isPartOf N02c241214f0d4f3e816388700435fa08
50 N1ecb025be31d4e3790a37ac709d19341
51 sg:journal.1023786
52 schema:name Strategies for aggregating gene expression data: The collapseRows R function
53 schema:pagination 322
54 schema:productId N865eedf07cc848ba90e3e97a9c3844eb
55 Nb885b55ee71a4afaa59d2c412a4714a9
56 Nc8ddab91c18e42ffb4b0f898cc3b637c
57 Nce24d77bb6cc4044b7d50f907face05f
58 Nf39d6d27521e44048e168beea7855ecb
59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013163514
60 https://doi.org/10.1186/1471-2105-12-322
61 schema:sdDatePublished 2019-04-10T13:13
62 schema:sdLicense https://scigraph.springernature.com/explorer/license/
63 schema:sdPublisher N30a936fd6b6549b89d71a86d273efcc4
64 schema:url http://link.springer.com/10.1186/1471-2105-12-322
65 sgo:license sg:explorer/license/
66 sgo:sdDataset articles
67 rdf:type schema:ScholarlyArticle
68 N02c241214f0d4f3e816388700435fa08 schema:issueNumber 1
69 rdf:type schema:PublicationIssue
70 N12916ab798f84282858a50f7b6f710bb schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
71 schema:name Meta-Analysis as Topic
72 rdf:type schema:DefinedTerm
73 N1ecb025be31d4e3790a37ac709d19341 schema:volumeNumber 12
74 rdf:type schema:PublicationVolume
75 N2a6df54260cf4085875df49acafed057 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
76 schema:name Humans
77 rdf:type schema:DefinedTerm
78 N30a936fd6b6549b89d71a86d273efcc4 schema:name Springer Nature - SN SciGraph project
79 rdf:type schema:Organization
80 N41b74e27cef342059e22afe79ddd9b7f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
81 schema:name Gene Expression Profiling
82 rdf:type schema:DefinedTerm
83 N4f591c352e574bf096d8b2074b18e865 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
84 schema:name Gene Expression Regulation
85 rdf:type schema:DefinedTerm
86 N5321e8c4f85e49188f40d9adbfbd38be rdf:first sg:person.0641354601.20
87 rdf:rest Nda5b3b18082d4fe19d20f834ff327fa6
88 N5d0276eba6b34f6c830b6735059c4f58 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
89 schema:name Brain
90 rdf:type schema:DefinedTerm
91 N611d8364f68a4e9c8080fc1ecc46a980 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
92 schema:name Mice
93 rdf:type schema:DefinedTerm
94 N74e29237e33a482dbaf19e61b81a51f1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
95 schema:name Blood
96 rdf:type schema:DefinedTerm
97 N865eedf07cc848ba90e3e97a9c3844eb schema:name nlm_unique_id
98 schema:value 100965194
99 rdf:type schema:PropertyValue
100 N96ec67c38c0444b9aa96128e5604beab schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
101 schema:name Animals
102 rdf:type schema:DefinedTerm
103 N98db0b8a46354be4855cadbbf7950a5b rdf:first sg:person.011730155577.61
104 rdf:rest Ne611d3c9bebd4b0c9a6696f80042804c
105 Nb7dc89af2d0241fbaa21aea6ac5e04e0 rdf:first sg:person.01016772265.00
106 rdf:rest N5321e8c4f85e49188f40d9adbfbd38be
107 Nb885b55ee71a4afaa59d2c412a4714a9 schema:name readcube_id
108 schema:value 3eb4eca4dd852fde0022efac621e8a64769c38643840759a0724efc42d043d94
109 rdf:type schema:PropertyValue
110 Nc1f02e9452064463852f568c1bd4b490 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name Models, Statistical
112 rdf:type schema:DefinedTerm
113 Nc553779ccf2941b381b73fa6d6e1f13e rdf:first sg:person.015714446737.06
114 rdf:rest rdf:nil
115 Nc8ddab91c18e42ffb4b0f898cc3b637c schema:name doi
116 schema:value 10.1186/1471-2105-12-322
117 rdf:type schema:PropertyValue
118 Nce24d77bb6cc4044b7d50f907face05f schema:name pubmed_id
119 schema:value 21816037
120 rdf:type schema:PropertyValue
121 Nda5b3b18082d4fe19d20f834ff327fa6 rdf:first sg:person.01021573403.17
122 rdf:rest N98db0b8a46354be4855cadbbf7950a5b
123 Ne611d3c9bebd4b0c9a6696f80042804c rdf:first sg:person.0773631510.35
124 rdf:rest Nfffcd77be58b45a7a96c86303c964766
125 Nf026cba7d0104c65b3e5f844be093667 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
126 schema:name Oligonucleotide Array Sequence Analysis
127 rdf:type schema:DefinedTerm
128 Nf39d6d27521e44048e168beea7855ecb schema:name dimensions_id
129 schema:value pub.1013163514
130 rdf:type schema:PropertyValue
131 Nfffcd77be58b45a7a96c86303c964766 rdf:first sg:person.01003333710.11
132 rdf:rest Nc553779ccf2941b381b73fa6d6e1f13e
133 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
134 schema:name Mathematical Sciences
135 rdf:type schema:DefinedTerm
136 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
137 schema:name Statistics
138 rdf:type schema:DefinedTerm
139 sg:grant.2359598 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-322
140 rdf:type schema:MonetaryGrant
141 sg:grant.2436772 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-322
142 rdf:type schema:MonetaryGrant
143 sg:grant.2440173 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-322
144 rdf:type schema:MonetaryGrant
145 sg:grant.2447961 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-322
146 rdf:type schema:MonetaryGrant
147 sg:grant.2695773 http://pending.schema.org/fundedItem sg:pub.10.1186/1471-2105-12-322
148 rdf:type schema:MonetaryGrant
149 sg:journal.1023786 schema:issn 1471-2105
150 schema:name BMC Bioinformatics
151 rdf:type schema:Periodical
152 sg:person.01003333710.11 schema:affiliation https://www.grid.ac/institutes/grid.214007.0
153 schema:familyName Salomon
154 schema:givenName Daniel R
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01003333710.11
156 rdf:type schema:Person
157 sg:person.01016772265.00 schema:affiliation https://www.grid.ac/institutes/grid.19006.3e
158 schema:familyName Miller
159 schema:givenName Jeremy A
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01016772265.00
161 rdf:type schema:Person
162 sg:person.01021573403.17 schema:affiliation https://www.grid.ac/institutes/grid.19006.3e
163 schema:familyName Langfelder
164 schema:givenName Peter
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01021573403.17
166 rdf:type schema:Person
167 sg:person.011730155577.61 schema:affiliation https://www.grid.ac/institutes/grid.19006.3e
168 schema:familyName Geschwind
169 schema:givenName Daniel H
170 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011730155577.61
171 rdf:type schema:Person
172 sg:person.015714446737.06 schema:affiliation https://www.grid.ac/institutes/grid.19006.3e
173 schema:familyName Horvath
174 schema:givenName Steve
175 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015714446737.06
176 rdf:type schema:Person
177 sg:person.0641354601.20 schema:affiliation https://www.grid.ac/institutes/grid.19006.3e
178 schema:familyName Cai
179 schema:givenName Chaochao
180 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0641354601.20
181 rdf:type schema:Person
182 sg:person.0773631510.35 schema:affiliation https://www.grid.ac/institutes/grid.214007.0
183 schema:familyName Kurian
184 schema:givenName Sunil M
185 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0773631510.35
186 rdf:type schema:Person
187 sg:pub.10.1038/ng2119 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026979537
188 https://doi.org/10.1038/ng2119
189 rdf:type schema:CreativeWork
190 sg:pub.10.1038/ng776 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024019002
191 https://doi.org/10.1038/ng776
192 rdf:type schema:CreativeWork
193 sg:pub.10.1038/nmeth1107-879 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029609982
194 https://doi.org/10.1038/nmeth1107-879
195 rdf:type schema:CreativeWork
196 sg:pub.10.1038/nn.2207 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047969922
197 https://doi.org/10.1038/nn.2207
198 rdf:type schema:CreativeWork
199 sg:pub.10.1186/1471-2105-9-559 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020312314
200 https://doi.org/10.1186/1471-2105-9-559
201 rdf:type schema:CreativeWork
202 sg:pub.10.1186/1471-2164-10-405 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042478264
203 https://doi.org/10.1186/1471-2164-10-405
204 rdf:type schema:CreativeWork
205 sg:pub.10.1186/1471-2164-11-294 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049811676
206 https://doi.org/10.1186/1471-2164-11-294
207 rdf:type schema:CreativeWork
208 sg:pub.10.1186/1752-0509-1-54 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052159443
209 https://doi.org/10.1186/1752-0509-1-54
210 rdf:type schema:CreativeWork
211 sg:pub.10.1186/1752-0509-2-16 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037049388
212 https://doi.org/10.1186/1752-0509-2-16
213 rdf:type schema:CreativeWork
214 sg:pub.10.1186/1755-8794-4-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009194632
215 https://doi.org/10.1186/1755-8794-4-5
216 rdf:type schema:CreativeWork
217 sg:pub.10.1186/gb-2001-2-11-software0002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005374082
218 https://doi.org/10.1186/gb-2001-2-11-software0002
219 rdf:type schema:CreativeWork
220 sg:pub.10.1186/gb-2009-10-11-r127 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031152360
221 https://doi.org/10.1186/gb-2009-10-11-r127
222 rdf:type schema:CreativeWork
223 https://doi.org/10.1073/pnas.0605938103 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005680522
224 rdf:type schema:CreativeWork
225 https://doi.org/10.1073/pnas.0706128104 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028608848
226 rdf:type schema:CreativeWork
227 https://doi.org/10.1073/pnas.0914257107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012227460
228 rdf:type schema:CreativeWork
229 https://doi.org/10.1073/pnas.1832361100 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031729073
230 rdf:type schema:CreativeWork
231 https://doi.org/10.1073/pnas.2536479100 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017393030
232 rdf:type schema:CreativeWork
233 https://doi.org/10.1093/bioinformatics/bti587 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030583574
234 rdf:type schema:CreativeWork
235 https://doi.org/10.1093/bioinformatics/btl163 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052109782
236 rdf:type schema:CreativeWork
237 https://doi.org/10.1093/bioinformatics/btm563 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000359686
238 rdf:type schema:CreativeWork
239 https://doi.org/10.1093/bioinformatics/btq097 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013323767
240 rdf:type schema:CreativeWork
241 https://doi.org/10.1093/nar/gni179 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040526112
242 rdf:type schema:CreativeWork
243 https://doi.org/10.1158/0008-5472.can-09-2183 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044382312
244 rdf:type schema:CreativeWork
245 https://doi.org/10.1158/0008-5472.can-10-2465 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002868184
246 rdf:type schema:CreativeWork
247 https://doi.org/10.1371/journal.pcbi.1000117 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024039965
248 rdf:type schema:CreativeWork
249 https://doi.org/10.1371/journal.pgen.1000873 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030852640
250 rdf:type schema:CreativeWork
251 https://doi.org/10.1371/journal.pone.0006098 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002734685
252 rdf:type schema:CreativeWork
253 https://doi.org/10.1371/journal.pone.0013358 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022462667
254 rdf:type schema:CreativeWork
255 https://doi.org/10.2202/1544-6115.1128 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020363278
256 rdf:type schema:CreativeWork
257 https://www.grid.ac/institutes/grid.19006.3e schema:alternateName University of California Los Angeles
258 schema:name Biostatistics Department, UCLA, Los Angeles, California, USA
259 Human Genetics Department, UCLA, Los Angeles, California, USA
260 Interdepartmental Program for Neuroscience, UCLA, Los Angeles, California, USA
261 Neurology Department, UCLA, Los Angeles, California, USA
262 rdf:type schema:Organization
263 https://www.grid.ac/institutes/grid.214007.0 schema:alternateName Scripps Research Institute
264 schema:name Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, USA
265 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...