Computational analysis of microarray data View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2001-06

AUTHORS

John Quackenbush

ABSTRACT

Key Points The completion of the sequencing of a large number of prokaryotic and eukaryotic genomes presents several challenges and opportunities, including the functional classification of predicted genes. Microarray analysis promises to contribute to the functional annotation of genomes and has already provided a wealth of genome-wide expression data. Much attention has been focused on experimental protocols for microarray studies, but the strategies for data analysis have a profound (and perhaps underappreciated) effect on the interpretation of the results. Expression data from each experiment must first be normalized to account for systematic experimental variation, including unequal dye incorporation and detection efficiencies. For comparison between experiments, data is often first filtered to select a subset or to exclude genes for which there is much missing data. A distance metric must then be chosen, which determines how we measure similarity between gene-expression patterns. Genes and experiments can then be grouped using various computational methods. Each step can influence how the expression data are grouped. Clustering algorithms, which are the most widely used approaches to analysing gene expression, can be classified as hierarchical or non-hierarchical (self-organizing maps (SOMs), k-means clustering and principal component analysis), agglomerative (hierarchical) or divisive (k-means, SOMs), and supervised (support vector machine) or non-supervised (hierarchical and k-means clustering, SOMs). A synthetic data set with well-defined relationships between genes is used to show the differences between some of these methods. The choice of data analysis strategy should be influenced by the purpose of the microarray experiment, and the user's knowledge of the biology of the system under investigation. More... »

PAGES

418-427

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1038/35076576

DOI

http://dx.doi.org/10.1038/35076576

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1027465054

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/11389458


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA Probes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Data Collection", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Expression Profiling", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Oligonucleotide Array Sequence Analysis", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA", 
          "id": "http://www.grid.ac/institutes/grid.469946.0", 
          "name": [
            "The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Quackenbush", 
        "givenName": "John", 
        "id": "sg:person.01306176727.55", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01306176727.55"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/1670", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024602905", 
          "https://doi.org/10.1038/1670"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/10343", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009819816", 
          "https://doi.org/10.1038/10343"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt1296-1675", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005458398", 
          "https://doi.org/10.1038/nbt1296-1675"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-97610-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033174751", 
          "https://doi.org/10.1007/978-3-642-97610-0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng0895-369", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027303331", 
          "https://doi.org/10.1038/ng0895-369"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2001-06", 
    "datePublishedReg": "2001-06-01", 
    "description": "Key Points The completion of the sequencing of a large number of prokaryotic and eukaryotic genomes presents several challenges and opportunities, including the functional classification of predicted genes.  Microarray analysis promises to contribute to the functional annotation of genomes and has already provided a wealth of genome-wide expression data.  Much attention has been focused on experimental protocols for microarray studies, but the strategies for data analysis have a profound (and perhaps underappreciated) effect on the interpretation of the results.  Expression data from each experiment must first be normalized to account for systematic experimental variation, including unequal dye incorporation and detection efficiencies.  For comparison between experiments, data is often first filtered to select a subset or to exclude genes for which there is much missing data. A distance metric must then be chosen, which determines how we measure similarity between gene-expression patterns. Genes and experiments can then be grouped using various computational methods. Each step can influence how the expression data are grouped.  Clustering algorithms, which are the most widely used approaches to analysing gene expression, can be classified as hierarchical or non-hierarchical (self-organizing maps (SOMs), k-means clustering and principal component analysis), agglomerative (hierarchical) or divisive (k-means, SOMs), and supervised (support vector machine) or non-supervised (hierarchical and k-means clustering, SOMs).  A synthetic data set with well-defined relationships between genes is used to show the differences between some of these methods.  The choice of data analysis strategy should be influenced by the purpose of the microarray experiment, and the user's knowledge of the biology of the system under investigation.", 
    "genre": "article", 
    "id": "sg:pub.10.1038/35076576", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1023607", 
        "issn": [
          "1471-0056", 
          "1471-0064"
        ], 
        "name": "Nature Reviews Genetics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "6", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "2"
      }
    ], 
    "keywords": [
      "systematic experimental variation", 
      "expression data", 
      "genome-wide expression data", 
      "computational methods", 
      "gene-expression patterns", 
      "synthetic data", 
      "microarray experiments", 
      "data analysis strategies", 
      "microarray data", 
      "eukaryotic genomes", 
      "functional annotation", 
      "distance metric", 
      "gene expression", 
      "microarray analysis", 
      "genes", 
      "microarray studies", 
      "data analysis", 
      "functional classification", 
      "genome", 
      "analysis strategy", 
      "large number", 
      "computational analysis", 
      "algorithm", 
      "profound effect", 
      "metrics", 
      "sequencing", 
      "biology", 
      "approach", 
      "annotation", 
      "experimental variation", 
      "expression", 
      "system", 
      "analysis", 
      "data", 
      "similarity", 
      "experiments", 
      "subset", 
      "choice", 
      "efficiency", 
      "strategies", 
      "number", 
      "step", 
      "experimental protocol", 
      "interpretation", 
      "variation", 
      "patterns", 
      "results", 
      "classification", 
      "knowledge", 
      "comparison", 
      "incorporation", 
      "key", 
      "purpose", 
      "wealth", 
      "attention", 
      "dye incorporation", 
      "relationship", 
      "challenges", 
      "study", 
      "differences", 
      "effect", 
      "user knowledge", 
      "opportunities", 
      "completion", 
      "protocol", 
      "investigation", 
      "detection efficiency", 
      "method", 
      "unequal dye incorporation"
    ], 
    "name": "Computational analysis of microarray data", 
    "pagination": "418-427", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1027465054"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1038/35076576"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "11389458"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1038/35076576", 
      "https://app.dimensions.ai/details/publication/pub.1027465054"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2021-12-01T19:12", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211201/entities/gbq_results/article/article_326.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1038/35076576"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/35076576'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/35076576'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/35076576'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/35076576'


 

This table displays all metadata directly associated to this object as RDF triples.

175 TRIPLES      22 PREDICATES      107 URIs      94 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1038/35076576 schema:about N0842b87bc43744718fce05d362965354
2 N0cc4f2d08b194542a6c3292831b3314f
3 N52b69a4be0a84cf78a4a8250d025d5a0
4 N83c2aadae48447c8b4436b776d784306
5 Nf39eda4f8c6243c09e73097987cd100d
6 Nf6106208da5a4c6ca933e10c40a301e5
7 anzsrc-for:06
8 anzsrc-for:0604
9 schema:author N364689e4e7094dac8ec76ac666f9dd3d
10 schema:citation sg:pub.10.1007/978-3-642-97610-0
11 sg:pub.10.1038/10343
12 sg:pub.10.1038/1670
13 sg:pub.10.1038/nbt1296-1675
14 sg:pub.10.1038/ng0895-369
15 schema:datePublished 2001-06
16 schema:datePublishedReg 2001-06-01
17 schema:description Key Points The completion of the sequencing of a large number of prokaryotic and eukaryotic genomes presents several challenges and opportunities, including the functional classification of predicted genes. Microarray analysis promises to contribute to the functional annotation of genomes and has already provided a wealth of genome-wide expression data. Much attention has been focused on experimental protocols for microarray studies, but the strategies for data analysis have a profound (and perhaps underappreciated) effect on the interpretation of the results. Expression data from each experiment must first be normalized to account for systematic experimental variation, including unequal dye incorporation and detection efficiencies. For comparison between experiments, data is often first filtered to select a subset or to exclude genes for which there is much missing data. A distance metric must then be chosen, which determines how we measure similarity between gene-expression patterns. Genes and experiments can then be grouped using various computational methods. Each step can influence how the expression data are grouped. Clustering algorithms, which are the most widely used approaches to analysing gene expression, can be classified as hierarchical or non-hierarchical (self-organizing maps (SOMs), k-means clustering and principal component analysis), agglomerative (hierarchical) or divisive (k-means, SOMs), and supervised (support vector machine) or non-supervised (hierarchical and k-means clustering, SOMs). A synthetic data set with well-defined relationships between genes is used to show the differences between some of these methods. The choice of data analysis strategy should be influenced by the purpose of the microarray experiment, and the user's knowledge of the biology of the system under investigation.
18 schema:genre article
19 schema:inLanguage en
20 schema:isAccessibleForFree false
21 schema:isPartOf N2c8c099ea5e146f2975377301d7ed5d6
22 N4b045fd7a7fa4bf18bdc91c876a1d1d6
23 sg:journal.1023607
24 schema:keywords algorithm
25 analysis
26 analysis strategy
27 annotation
28 approach
29 attention
30 biology
31 challenges
32 choice
33 classification
34 comparison
35 completion
36 computational analysis
37 computational methods
38 data
39 data analysis
40 data analysis strategies
41 detection efficiency
42 differences
43 distance metric
44 dye incorporation
45 effect
46 efficiency
47 eukaryotic genomes
48 experimental protocol
49 experimental variation
50 experiments
51 expression
52 expression data
53 functional annotation
54 functional classification
55 gene expression
56 gene-expression patterns
57 genes
58 genome
59 genome-wide expression data
60 incorporation
61 interpretation
62 investigation
63 key
64 knowledge
65 large number
66 method
67 metrics
68 microarray analysis
69 microarray data
70 microarray experiments
71 microarray studies
72 number
73 opportunities
74 patterns
75 profound effect
76 protocol
77 purpose
78 relationship
79 results
80 sequencing
81 similarity
82 step
83 strategies
84 study
85 subset
86 synthetic data
87 system
88 systematic experimental variation
89 unequal dye incorporation
90 user knowledge
91 variation
92 wealth
93 schema:name Computational analysis of microarray data
94 schema:pagination 418-427
95 schema:productId N148d8e05fc9141b18c723a1e56d13680
96 N23f6a1f575e84d91a0e8a8875214d993
97 N5519707375654fca92b9f038442b86b9
98 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027465054
99 https://doi.org/10.1038/35076576
100 schema:sdDatePublished 2021-12-01T19:12
101 schema:sdLicense https://scigraph.springernature.com/explorer/license/
102 schema:sdPublisher N365e79f008514c7ea037d8fd29790eef
103 schema:url https://doi.org/10.1038/35076576
104 sgo:license sg:explorer/license/
105 sgo:sdDataset articles
106 rdf:type schema:ScholarlyArticle
107 N0842b87bc43744718fce05d362965354 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
108 schema:name Computational Biology
109 rdf:type schema:DefinedTerm
110 N0cc4f2d08b194542a6c3292831b3314f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name Data Collection
112 rdf:type schema:DefinedTerm
113 N148d8e05fc9141b18c723a1e56d13680 schema:name dimensions_id
114 schema:value pub.1027465054
115 rdf:type schema:PropertyValue
116 N23f6a1f575e84d91a0e8a8875214d993 schema:name pubmed_id
117 schema:value 11389458
118 rdf:type schema:PropertyValue
119 N2c8c099ea5e146f2975377301d7ed5d6 schema:issueNumber 6
120 rdf:type schema:PublicationIssue
121 N364689e4e7094dac8ec76ac666f9dd3d rdf:first sg:person.01306176727.55
122 rdf:rest rdf:nil
123 N365e79f008514c7ea037d8fd29790eef schema:name Springer Nature - SN SciGraph project
124 rdf:type schema:Organization
125 N4b045fd7a7fa4bf18bdc91c876a1d1d6 schema:volumeNumber 2
126 rdf:type schema:PublicationVolume
127 N52b69a4be0a84cf78a4a8250d025d5a0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
128 schema:name DNA Probes
129 rdf:type schema:DefinedTerm
130 N5519707375654fca92b9f038442b86b9 schema:name doi
131 schema:value 10.1038/35076576
132 rdf:type schema:PropertyValue
133 N83c2aadae48447c8b4436b776d784306 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
134 schema:name Algorithms
135 rdf:type schema:DefinedTerm
136 Nf39eda4f8c6243c09e73097987cd100d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Gene Expression Profiling
138 rdf:type schema:DefinedTerm
139 Nf6106208da5a4c6ca933e10c40a301e5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
140 schema:name Oligonucleotide Array Sequence Analysis
141 rdf:type schema:DefinedTerm
142 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
143 schema:name Biological Sciences
144 rdf:type schema:DefinedTerm
145 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
146 schema:name Genetics
147 rdf:type schema:DefinedTerm
148 sg:journal.1023607 schema:issn 1471-0056
149 1471-0064
150 schema:name Nature Reviews Genetics
151 schema:publisher Springer Nature
152 rdf:type schema:Periodical
153 sg:person.01306176727.55 schema:affiliation grid-institutes:grid.469946.0
154 schema:familyName Quackenbush
155 schema:givenName John
156 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01306176727.55
157 rdf:type schema:Person
158 sg:pub.10.1007/978-3-642-97610-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033174751
159 https://doi.org/10.1007/978-3-642-97610-0
160 rdf:type schema:CreativeWork
161 sg:pub.10.1038/10343 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009819816
162 https://doi.org/10.1038/10343
163 rdf:type schema:CreativeWork
164 sg:pub.10.1038/1670 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024602905
165 https://doi.org/10.1038/1670
166 rdf:type schema:CreativeWork
167 sg:pub.10.1038/nbt1296-1675 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005458398
168 https://doi.org/10.1038/nbt1296-1675
169 rdf:type schema:CreativeWork
170 sg:pub.10.1038/ng0895-369 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027303331
171 https://doi.org/10.1038/ng0895-369
172 rdf:type schema:CreativeWork
173 grid-institutes:grid.469946.0 schema:alternateName The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA
174 schema:name The Institute for Genomic Research, 9,712 Medical Center Drive, 20850, Rockville, Maryland, USA
175 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...