Computational analysis of microarray data View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2001-06

AUTHORS

John Quackenbush

ABSTRACT

Key Points The completion of the sequencing of a large number of prokaryotic and eukaryotic genomes presents several challenges and opportunities, including the functional classification of predicted genes. Microarray analysis promises to contribute to the functional annotation of genomes and has already provided a wealth of genome-wide expression data. Much attention has been focused on experimental protocols for microarray studies, but the strategies for data analysis have a profound (and perhaps underappreciated) effect on the interpretation of the results. Expression data from each experiment must first be normalized to account for systematic experimental variation, including unequal dye incorporation and detection efficiencies. For comparison between experiments, data is often first filtered to select a subset or to exclude genes for which there is much missing data. A distance metric must then be chosen, which determines how we measure similarity between gene-expression patterns. Genes and experiments can then be grouped using various computational methods. Each step can influence how the expression data are grouped. Clustering algorithms, which are the most widely used approaches to analysing gene expression, can be classified as hierarchical or non-hierarchical (self-organizing maps (SOMs), k-means clustering and principal component analysis), agglomerative (hierarchical) or divisive (k-means, SOMs), and supervised (support vector machine) or non-supervised (hierarchical and k-means clustering, SOMs). A synthetic data set with well-defined relationships between genes is used to show the differences between some of these methods. The choice of data analysis strategy should be influenced by the purpose of the microarray experiment, and the user's knowledge of the biology of the system under investigation. More... »

PAGES

418-427

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1038/35076576

DOI

http://dx.doi.org/10.1038/35076576

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1027465054

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/11389458


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA Probes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Data Collection", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Expression Profiling", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Oligonucleotide Array Sequence Analysis", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA", 
          "id": "http://www.grid.ac/institutes/grid.469946.0", 
          "name": [
            "The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Quackenbush", 
        "givenName": "John", 
        "id": "sg:person.01306176727.55", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01306176727.55"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/1670", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024602905", 
          "https://doi.org/10.1038/1670"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-97610-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033174751", 
          "https://doi.org/10.1007/978-3-642-97610-0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng0895-369", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027303331", 
          "https://doi.org/10.1038/ng0895-369"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt1296-1675", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005458398", 
          "https://doi.org/10.1038/nbt1296-1675"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/10343", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009819816", 
          "https://doi.org/10.1038/10343"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2001-06", 
    "datePublishedReg": "2001-06-01", 
    "description": "Key Points The completion of the sequencing of a large number of prokaryotic and eukaryotic genomes presents several challenges and opportunities, including the functional classification of predicted genes.  Microarray analysis promises to contribute to the functional annotation of genomes and has already provided a wealth of genome-wide expression data.  Much attention has been focused on experimental protocols for microarray studies, but the strategies for data analysis have a profound (and perhaps underappreciated) effect on the interpretation of the results.  Expression data from each experiment must first be normalized to account for systematic experimental variation, including unequal dye incorporation and detection efficiencies.  For comparison between experiments, data is often first filtered to select a subset or to exclude genes for which there is much missing data. A distance metric must then be chosen, which determines how we measure similarity between gene-expression patterns. Genes and experiments can then be grouped using various computational methods. Each step can influence how the expression data are grouped.  Clustering algorithms, which are the most widely used approaches to analysing gene expression, can be classified as hierarchical or non-hierarchical (self-organizing maps (SOMs), k-means clustering and principal component analysis), agglomerative (hierarchical) or divisive (k-means, SOMs), and supervised (support vector machine) or non-supervised (hierarchical and k-means clustering, SOMs).  A synthetic data set with well-defined relationships between genes is used to show the differences between some of these methods.  The choice of data analysis strategy should be influenced by the purpose of the microarray experiment, and the user's knowledge of the biology of the system under investigation.", 
    "genre": "article", 
    "id": "sg:pub.10.1038/35076576", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1023607", 
        "issn": [
          "1471-0056", 
          "1471-0064"
        ], 
        "name": "Nature Reviews Genetics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "6", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "2"
      }
    ], 
    "keywords": [
      "expression data", 
      "genome-wide expression data", 
      "gene expression patterns", 
      "eukaryotic genomes", 
      "functional annotation", 
      "gene expression", 
      "microarray analysis", 
      "microarray experiments", 
      "microarray studies", 
      "microarray data", 
      "genes", 
      "functional classification", 
      "genome", 
      "computational analysis", 
      "systematic experimental variation", 
      "profound effect", 
      "computational methods", 
      "biology", 
      "sequencing", 
      "data analysis strategies", 
      "experimental variation", 
      "annotation", 
      "user knowledge", 
      "expression", 
      "large number", 
      "distance metric", 
      "analysis strategy", 
      "synthetic data", 
      "similarity", 
      "experimental protocol", 
      "analysis", 
      "experiments", 
      "variation", 
      "patterns", 
      "data analysis", 
      "key", 
      "strategies", 
      "knowledge", 
      "subset", 
      "data", 
      "incorporation", 
      "algorithm", 
      "dye incorporation", 
      "metrics", 
      "step", 
      "classification", 
      "number", 
      "detection efficiency", 
      "wealth", 
      "relationship", 
      "protocol", 
      "effect", 
      "study", 
      "differences", 
      "method", 
      "challenges", 
      "system", 
      "comparison", 
      "results", 
      "opportunities", 
      "efficiency", 
      "investigation", 
      "approach", 
      "completion", 
      "attention", 
      "purpose", 
      "interpretation", 
      "choice"
    ], 
    "name": "Computational analysis of microarray data", 
    "pagination": "418-427", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1027465054"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1038/35076576"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "11389458"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1038/35076576", 
      "https://app.dimensions.ai/details/publication/pub.1027465054"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-11-24T20:49", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221124/entities/gbq_results/article/article_314.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1038/35076576"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/35076576'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/35076576'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/35076576'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/35076576'


 

This table displays all metadata directly associated to this object as RDF triples.

173 TRIPLES      21 PREDICATES      105 URIs      92 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1038/35076576 schema:about N29500ba389694f97a9e09acf64bf89bf
2 N62e0fcdb13f74e17aed484ea5129cbc0
3 N78bd0af884ff4f2ca230a68908553736
4 Nb2e8b6ac213c4a22b0655a5d870cfb77
5 Nd269207e1b60488c8124508290e32d3b
6 Ne711332b3b4f4e1faea5ddd2f2b22b65
7 anzsrc-for:06
8 anzsrc-for:0604
9 schema:author N3dea1944732a417d8d4baa47e7c66f72
10 schema:citation sg:pub.10.1007/978-3-642-97610-0
11 sg:pub.10.1038/10343
12 sg:pub.10.1038/1670
13 sg:pub.10.1038/nbt1296-1675
14 sg:pub.10.1038/ng0895-369
15 schema:datePublished 2001-06
16 schema:datePublishedReg 2001-06-01
17 schema:description Key Points The completion of the sequencing of a large number of prokaryotic and eukaryotic genomes presents several challenges and opportunities, including the functional classification of predicted genes. Microarray analysis promises to contribute to the functional annotation of genomes and has already provided a wealth of genome-wide expression data. Much attention has been focused on experimental protocols for microarray studies, but the strategies for data analysis have a profound (and perhaps underappreciated) effect on the interpretation of the results. Expression data from each experiment must first be normalized to account for systematic experimental variation, including unequal dye incorporation and detection efficiencies. For comparison between experiments, data is often first filtered to select a subset or to exclude genes for which there is much missing data. A distance metric must then be chosen, which determines how we measure similarity between gene-expression patterns. Genes and experiments can then be grouped using various computational methods. Each step can influence how the expression data are grouped. Clustering algorithms, which are the most widely used approaches to analysing gene expression, can be classified as hierarchical or non-hierarchical (self-organizing maps (SOMs), k-means clustering and principal component analysis), agglomerative (hierarchical) or divisive (k-means, SOMs), and supervised (support vector machine) or non-supervised (hierarchical and k-means clustering, SOMs). A synthetic data set with well-defined relationships between genes is used to show the differences between some of these methods. The choice of data analysis strategy should be influenced by the purpose of the microarray experiment, and the user's knowledge of the biology of the system under investigation.
18 schema:genre article
19 schema:isAccessibleForFree false
20 schema:isPartOf N955aa5fdb44c413cad272b992a7f018d
21 Nf83568779c964452b95653aa2f7d1383
22 sg:journal.1023607
23 schema:keywords algorithm
24 analysis
25 analysis strategy
26 annotation
27 approach
28 attention
29 biology
30 challenges
31 choice
32 classification
33 comparison
34 completion
35 computational analysis
36 computational methods
37 data
38 data analysis
39 data analysis strategies
40 detection efficiency
41 differences
42 distance metric
43 dye incorporation
44 effect
45 efficiency
46 eukaryotic genomes
47 experimental protocol
48 experimental variation
49 experiments
50 expression
51 expression data
52 functional annotation
53 functional classification
54 gene expression
55 gene expression patterns
56 genes
57 genome
58 genome-wide expression data
59 incorporation
60 interpretation
61 investigation
62 key
63 knowledge
64 large number
65 method
66 metrics
67 microarray analysis
68 microarray data
69 microarray experiments
70 microarray studies
71 number
72 opportunities
73 patterns
74 profound effect
75 protocol
76 purpose
77 relationship
78 results
79 sequencing
80 similarity
81 step
82 strategies
83 study
84 subset
85 synthetic data
86 system
87 systematic experimental variation
88 user knowledge
89 variation
90 wealth
91 schema:name Computational analysis of microarray data
92 schema:pagination 418-427
93 schema:productId N0c14e03e9a1a44759d69ed0d59b3b664
94 Nbca85d48f77343d99bb97c1b5a0870df
95 Nd8c9aacce0f44641903416de07df8cce
96 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027465054
97 https://doi.org/10.1038/35076576
98 schema:sdDatePublished 2022-11-24T20:49
99 schema:sdLicense https://scigraph.springernature.com/explorer/license/
100 schema:sdPublisher N5583cf0fea554906be0e5e4c84aed0e5
101 schema:url https://doi.org/10.1038/35076576
102 sgo:license sg:explorer/license/
103 sgo:sdDataset articles
104 rdf:type schema:ScholarlyArticle
105 N0c14e03e9a1a44759d69ed0d59b3b664 schema:name dimensions_id
106 schema:value pub.1027465054
107 rdf:type schema:PropertyValue
108 N29500ba389694f97a9e09acf64bf89bf schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
109 schema:name Data Collection
110 rdf:type schema:DefinedTerm
111 N3dea1944732a417d8d4baa47e7c66f72 rdf:first sg:person.01306176727.55
112 rdf:rest rdf:nil
113 N5583cf0fea554906be0e5e4c84aed0e5 schema:name Springer Nature - SN SciGraph project
114 rdf:type schema:Organization
115 N62e0fcdb13f74e17aed484ea5129cbc0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
116 schema:name Gene Expression Profiling
117 rdf:type schema:DefinedTerm
118 N78bd0af884ff4f2ca230a68908553736 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
119 schema:name DNA Probes
120 rdf:type schema:DefinedTerm
121 N955aa5fdb44c413cad272b992a7f018d schema:issueNumber 6
122 rdf:type schema:PublicationIssue
123 Nb2e8b6ac213c4a22b0655a5d870cfb77 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
124 schema:name Oligonucleotide Array Sequence Analysis
125 rdf:type schema:DefinedTerm
126 Nbca85d48f77343d99bb97c1b5a0870df schema:name doi
127 schema:value 10.1038/35076576
128 rdf:type schema:PropertyValue
129 Nd269207e1b60488c8124508290e32d3b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
130 schema:name Algorithms
131 rdf:type schema:DefinedTerm
132 Nd8c9aacce0f44641903416de07df8cce schema:name pubmed_id
133 schema:value 11389458
134 rdf:type schema:PropertyValue
135 Ne711332b3b4f4e1faea5ddd2f2b22b65 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
136 schema:name Computational Biology
137 rdf:type schema:DefinedTerm
138 Nf83568779c964452b95653aa2f7d1383 schema:volumeNumber 2
139 rdf:type schema:PublicationVolume
140 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
141 schema:name Biological Sciences
142 rdf:type schema:DefinedTerm
143 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
144 schema:name Genetics
145 rdf:type schema:DefinedTerm
146 sg:journal.1023607 schema:issn 1471-0056
147 1471-0064
148 schema:name Nature Reviews Genetics
149 schema:publisher Springer Nature
150 rdf:type schema:Periodical
151 sg:person.01306176727.55 schema:affiliation grid-institutes:grid.469946.0
152 schema:familyName Quackenbush
153 schema:givenName John
154 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01306176727.55
155 rdf:type schema:Person
156 sg:pub.10.1007/978-3-642-97610-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033174751
157 https://doi.org/10.1007/978-3-642-97610-0
158 rdf:type schema:CreativeWork
159 sg:pub.10.1038/10343 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009819816
160 https://doi.org/10.1038/10343
161 rdf:type schema:CreativeWork
162 sg:pub.10.1038/1670 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024602905
163 https://doi.org/10.1038/1670
164 rdf:type schema:CreativeWork
165 sg:pub.10.1038/nbt1296-1675 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005458398
166 https://doi.org/10.1038/nbt1296-1675
167 rdf:type schema:CreativeWork
168 sg:pub.10.1038/ng0895-369 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027303331
169 https://doi.org/10.1038/ng0895-369
170 rdf:type schema:CreativeWork
171 grid-institutes:grid.469946.0 schema:alternateName The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA
172 schema:name The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA
173 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...