Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2013

AUTHORS

Sandra Ebert , Mario Fritz , Bernt Schiele

ABSTRACT

Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs. More... »

PAGES

232-245

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-37331-2_18

DOI

http://dx.doi.org/10.1007/978-3-642-37331-2_18

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1004721797


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computation Theory and Mathematics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Max Planck Institute for Informatics, Saarbrucken, Germany", 
          "id": "http://www.grid.ac/institutes/grid.419528.3", 
          "name": [
            "Max Planck Institute for Informatics, Saarbrucken, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ebert", 
        "givenName": "Sandra", 
        "id": "sg:person.011333635343.49", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011333635343.49"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Max Planck Institute for Informatics, Saarbrucken, Germany", 
          "id": "http://www.grid.ac/institutes/grid.419528.3", 
          "name": [
            "Max Planck Institute for Informatics, Saarbrucken, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fritz", 
        "givenName": "Mario", 
        "id": "sg:person.013361072755.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013361072755.17"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Max Planck Institute for Informatics, Saarbrucken, Germany", 
          "id": "http://www.grid.ac/institutes/grid.419528.3", 
          "name": [
            "Max Planck Institute for Informatics, Saarbrucken, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Schiele", 
        "givenName": "Bernt", 
        "id": "sg:person.01174260421.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01174260421.90"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2013", 
    "datePublishedReg": "2013-01-01", 
    "description": "Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.", 
    "editor": [
      {
        "familyName": "Lee", 
        "givenName": "Kyoung Mu", 
        "type": "Person"
      }, 
      {
        "familyName": "Matsushita", 
        "givenName": "Yasuyuki", 
        "type": "Person"
      }, 
      {
        "familyName": "Rehg", 
        "givenName": "James M.", 
        "type": "Person"
      }, 
      {
        "familyName": "Hu", 
        "givenName": "Zhanyi", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-37331-2_18", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-37330-5", 
        "978-3-642-37331-2"
      ], 
      "name": "Computer Vision \u2013 ACCV 2012", 
      "type": "Book"
    }, 
    "keywords": [
      "unlabeled data points", 
      "large datasets", 
      "graph-based semi-supervised learning algorithms", 
      "semi-supervised learning algorithm", 
      "large image datasets", 
      "times larger datasets", 
      "large-scale datasets", 
      "active learning scheme", 
      "Internet data sources", 
      "semi-supervised learning", 
      "image datasets", 
      "scale datasets", 
      "learning algorithm", 
      "learning scheme", 
      "computational complexity", 
      "data points", 
      "classification performance", 
      "computational cost", 
      "previous approaches", 
      "new algorithm", 
      "dataset", 
      "data sources", 
      "algorithm", 
      "compact graphs", 
      "explicit labeling", 
      "learning", 
      "improved performance", 
      "key insights", 
      "large pool", 
      "graph", 
      "performance", 
      "complexity", 
      "classification", 
      "scheme", 
      "method", 
      "cost", 
      "such progress", 
      "point", 
      "data", 
      "progress", 
      "budget", 
      "considerable progress", 
      "success", 
      "setting", 
      "labeling", 
      "source", 
      "comparison", 
      "insights", 
      "theory", 
      "pool", 
      "findings", 
      "approach"
    ], 
    "name": "Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets", 
    "pagination": "232-245", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1004721797"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-37331-2_18"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-37331-2_18", 
      "https://app.dimensions.ai/details/publication/pub.1004721797"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-12-01T06:51", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/chapter/chapter_322.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-37331-2_18"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-37331-2_18'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-37331-2_18'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-37331-2_18'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-37331-2_18'


 

This table displays all metadata directly associated to this object as RDF triples.

144 TRIPLES      22 PREDICATES      78 URIs      70 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-37331-2_18 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 anzsrc-for:0802
4 schema:author Ndc94d29a053342d29a2c7bf3c456599d
5 schema:datePublished 2013
6 schema:datePublishedReg 2013-01-01
7 schema:description Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.
8 schema:editor Nfb62f351652b4e41aaea8849a4868bfe
9 schema:genre chapter
10 schema:isAccessibleForFree false
11 schema:isPartOf N10d1a13374494baf9307de2d965c09cf
12 schema:keywords Internet data sources
13 active learning scheme
14 algorithm
15 approach
16 budget
17 classification
18 classification performance
19 compact graphs
20 comparison
21 complexity
22 computational complexity
23 computational cost
24 considerable progress
25 cost
26 data
27 data points
28 data sources
29 dataset
30 explicit labeling
31 findings
32 graph
33 graph-based semi-supervised learning algorithms
34 image datasets
35 improved performance
36 insights
37 key insights
38 labeling
39 large datasets
40 large image datasets
41 large pool
42 large-scale datasets
43 learning
44 learning algorithm
45 learning scheme
46 method
47 new algorithm
48 performance
49 point
50 pool
51 previous approaches
52 progress
53 scale datasets
54 scheme
55 semi-supervised learning
56 semi-supervised learning algorithm
57 setting
58 source
59 success
60 such progress
61 theory
62 times larger datasets
63 unlabeled data points
64 schema:name Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets
65 schema:pagination 232-245
66 schema:productId N9f1e51a1d3da4dae87aeba7b4c593f5d
67 Nf1d5596152ba49b6af74d3a0448f655f
68 schema:publisher N8e5ace4972f14ac0b6a0859be05116e7
69 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004721797
70 https://doi.org/10.1007/978-3-642-37331-2_18
71 schema:sdDatePublished 2022-12-01T06:51
72 schema:sdLicense https://scigraph.springernature.com/explorer/license/
73 schema:sdPublisher Nafd30f38c77d4daa8c099b13fa559351
74 schema:url https://doi.org/10.1007/978-3-642-37331-2_18
75 sgo:license sg:explorer/license/
76 sgo:sdDataset chapters
77 rdf:type schema:Chapter
78 N10d1a13374494baf9307de2d965c09cf schema:isbn 978-3-642-37330-5
79 978-3-642-37331-2
80 schema:name Computer Vision – ACCV 2012
81 rdf:type schema:Book
82 N21e42cc745df443d975ad6e6decd847b schema:familyName Rehg
83 schema:givenName James M.
84 rdf:type schema:Person
85 N34f82e5ab3c1444da20d54a72c773de4 schema:familyName Lee
86 schema:givenName Kyoung Mu
87 rdf:type schema:Person
88 N6b25042b997f42c8bd21f2846d77ef10 schema:familyName Matsushita
89 schema:givenName Yasuyuki
90 rdf:type schema:Person
91 N8d1b190dddcf4bd7bb72c61e37faac50 rdf:first N21e42cc745df443d975ad6e6decd847b
92 rdf:rest Ne2f024974de74e2a8356654af5829010
93 N8e5ace4972f14ac0b6a0859be05116e7 schema:name Springer Nature
94 rdf:type schema:Organisation
95 N9f1e51a1d3da4dae87aeba7b4c593f5d schema:name doi
96 schema:value 10.1007/978-3-642-37331-2_18
97 rdf:type schema:PropertyValue
98 Nafd30f38c77d4daa8c099b13fa559351 schema:name Springer Nature - SN SciGraph project
99 rdf:type schema:Organization
100 Nc5b592fd32e640448f4b2bba53bdf29a rdf:first sg:person.013361072755.17
101 rdf:rest Ndab4816958094c0bb35ea659d93fa26d
102 Ndab4816958094c0bb35ea659d93fa26d rdf:first sg:person.01174260421.90
103 rdf:rest rdf:nil
104 Ndc94d29a053342d29a2c7bf3c456599d rdf:first sg:person.011333635343.49
105 rdf:rest Nc5b592fd32e640448f4b2bba53bdf29a
106 Ne1480ef050f14d1e9f5a449fa09a736d rdf:first N6b25042b997f42c8bd21f2846d77ef10
107 rdf:rest N8d1b190dddcf4bd7bb72c61e37faac50
108 Ne2f024974de74e2a8356654af5829010 rdf:first Nef9c20b80d5d4129a0660fdac7991fa7
109 rdf:rest rdf:nil
110 Nef9c20b80d5d4129a0660fdac7991fa7 schema:familyName Hu
111 schema:givenName Zhanyi
112 rdf:type schema:Person
113 Nf1d5596152ba49b6af74d3a0448f655f schema:name dimensions_id
114 schema:value pub.1004721797
115 rdf:type schema:PropertyValue
116 Nfb62f351652b4e41aaea8849a4868bfe rdf:first N34f82e5ab3c1444da20d54a72c773de4
117 rdf:rest Ne1480ef050f14d1e9f5a449fa09a736d
118 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
119 schema:name Information and Computing Sciences
120 rdf:type schema:DefinedTerm
121 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
122 schema:name Artificial Intelligence and Image Processing
123 rdf:type schema:DefinedTerm
124 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
125 schema:name Computation Theory and Mathematics
126 rdf:type schema:DefinedTerm
127 sg:person.011333635343.49 schema:affiliation grid-institutes:grid.419528.3
128 schema:familyName Ebert
129 schema:givenName Sandra
130 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011333635343.49
131 rdf:type schema:Person
132 sg:person.01174260421.90 schema:affiliation grid-institutes:grid.419528.3
133 schema:familyName Schiele
134 schema:givenName Bernt
135 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01174260421.90
136 rdf:type schema:Person
137 sg:person.013361072755.17 schema:affiliation grid-institutes:grid.419528.3
138 schema:familyName Fritz
139 schema:givenName Mario
140 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013361072755.17
141 rdf:type schema:Person
142 grid-institutes:grid.419528.3 schema:alternateName Max Planck Institute for Informatics, Saarbrucken, Germany
143 schema:name Max Planck Institute for Informatics, Saarbrucken, Germany
144 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...