Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2013

AUTHORS

Christoph Lofi , Kinda El Maarry , Wolf-Tilo Balke

ABSTRACT

Skyline queries are a well-known technique for explorative retrieval, multi-objective optimization problems, and personalization tasks in databases. They are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skyline query processing is severely hampered and often has to resort to error-prone heuristics. Unfortunately, incomplete datasets are a frequent phenomenon due to widespread use of automated information extraction and aggregation. In this paper, we evaluate and compare various established heuristics for adapting skylines to incomplete datasets, focusing specifically on the error they impose on the skyline result. Building upon these results, we argue for improving the skyline result quality by employing crowd-enabled databases. This allows dynamic outsourcing of some database operators to human workers, therefore enabling the elicitation of missing values during runtime. Unfortunately, each crowd-sourcing operation will result in monetary and query runtime costs. Therefore, our main contribution is introducing a sophisticated error model, allowing us to specifically concentrate on those tuples that are highly likely to be error-prone, while relying on established heuristics for safer tuples. This technique of focused crowd-sourcing allows us to strike a perfect balance between costs and result’s quality. More... »

PAGES

298-312

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-41924-9_25

DOI

http://dx.doi.org/10.1007/978-3-642-41924-9_25

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1038373132


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "National Institute of Informatics, 101-8430, Tokyo, Japan", 
          "id": "http://www.grid.ac/institutes/grid.250343.3", 
          "name": [
            "National Institute of Informatics, 101-8430, Tokyo, Japan"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lofi", 
        "givenName": "Christoph", 
        "id": "sg:person.011355173745.44", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011355173745.44"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, 38106, Braunschweig, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6738.a", 
          "name": [
            "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, 38106, Braunschweig, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "El Maarry", 
        "givenName": "Kinda", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, 38106, Braunschweig, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6738.a", 
          "name": [
            "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, 38106, Braunschweig, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Balke", 
        "givenName": "Wolf-Tilo", 
        "id": "sg:person.014313642615.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014313642615.12"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2013", 
    "datePublishedReg": "2013-01-01", 
    "description": "Skyline queries are a well-known technique for explorative retrieval, multi-objective optimization problems, and personalization tasks in databases. They are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skyline query processing is severely hampered and often has to resort to error-prone heuristics. Unfortunately, incomplete datasets are a frequent phenomenon due to widespread use of automated information extraction and aggregation. In this paper, we evaluate and compare various established heuristics for adapting skylines to incomplete datasets, focusing specifically on the error they impose on the skyline result. Building upon these results, we argue for improving the skyline result quality by employing crowd-enabled databases. This allows dynamic outsourcing of some database operators to human workers, therefore enabling the elicitation of missing values during runtime. Unfortunately, each crowd-sourcing operation will result in monetary and query runtime costs. Therefore, our main contribution is introducing a sophisticated error model, allowing us to specifically concentrate on those tuples that are highly likely to be error-prone, while relying on established heuristics for safer tuples. This technique of focused crowd-sourcing allows us to strike a perfect balance between costs and result\u2019s quality.", 
    "editor": [
      {
        "familyName": "Ng", 
        "givenName": "Wilfred", 
        "type": "Person"
      }, 
      {
        "familyName": "Storey", 
        "givenName": "Veda C.", 
        "type": "Person"
      }, 
      {
        "familyName": "Trujillo", 
        "givenName": "Juan C.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-41924-9_25", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-642-41923-2", 
        "978-3-642-41924-9"
      ], 
      "name": "Conceptual Modeling", 
      "type": "Book"
    }, 
    "keywords": [
      "skyline queries", 
      "incomplete datasets", 
      "skyline query processing", 
      "error-prone heuristics", 
      "multi-objective optimization problem", 
      "query processing", 
      "skyline results", 
      "personalization tasks", 
      "Crowd Sourcing", 
      "database operators", 
      "information extraction", 
      "human workers", 
      "runtime cost", 
      "result quality", 
      "dynamic outsourcing", 
      "main contribution", 
      "optimization problem", 
      "heuristics", 
      "queries", 
      "dataset", 
      "formulation mechanism", 
      "tuples", 
      "error model", 
      "runtime", 
      "database", 
      "skyline", 
      "retrieval", 
      "perfect balance", 
      "outsourcing", 
      "cost", 
      "task", 
      "widespread use", 
      "technique", 
      "processing", 
      "elicitation", 
      "quality", 
      "operators", 
      "model", 
      "extraction", 
      "operation", 
      "error", 
      "results", 
      "use", 
      "aggregation", 
      "contribution", 
      "mechanism", 
      "balance", 
      "frequent phenomenon", 
      "values", 
      "workers", 
      "phenomenon", 
      "paper", 
      "problem"
    ], 
    "name": "Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing", 
    "pagination": "298-312", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1038373132"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-41924-9_25"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-41924-9_25", 
      "https://app.dimensions.ai/details/publication/pub.1038373132"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-06-01T22:33", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/chapter/chapter_376.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-41924-9_25"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-41924-9_25'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-41924-9_25'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-41924-9_25'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-41924-9_25'


 

This table displays all metadata directly associated to this object as RDF triples.

139 TRIPLES      23 PREDICATES      79 URIs      72 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-41924-9_25 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N74e2936376cf4d36a6af32481468c445
4 schema:datePublished 2013
5 schema:datePublishedReg 2013-01-01
6 schema:description Skyline queries are a well-known technique for explorative retrieval, multi-objective optimization problems, and personalization tasks in databases. They are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skyline query processing is severely hampered and often has to resort to error-prone heuristics. Unfortunately, incomplete datasets are a frequent phenomenon due to widespread use of automated information extraction and aggregation. In this paper, we evaluate and compare various established heuristics for adapting skylines to incomplete datasets, focusing specifically on the error they impose on the skyline result. Building upon these results, we argue for improving the skyline result quality by employing crowd-enabled databases. This allows dynamic outsourcing of some database operators to human workers, therefore enabling the elicitation of missing values during runtime. Unfortunately, each crowd-sourcing operation will result in monetary and query runtime costs. Therefore, our main contribution is introducing a sophisticated error model, allowing us to specifically concentrate on those tuples that are highly likely to be error-prone, while relying on established heuristics for safer tuples. This technique of focused crowd-sourcing allows us to strike a perfect balance between costs and result’s quality.
7 schema:editor N0d1624dd4a5648be8e934007290e2f1a
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf Nefaae73c0c404a50ba54c9b910bed3ab
12 schema:keywords Crowd Sourcing
13 aggregation
14 balance
15 contribution
16 cost
17 database
18 database operators
19 dataset
20 dynamic outsourcing
21 elicitation
22 error
23 error model
24 error-prone heuristics
25 extraction
26 formulation mechanism
27 frequent phenomenon
28 heuristics
29 human workers
30 incomplete datasets
31 information extraction
32 main contribution
33 mechanism
34 model
35 multi-objective optimization problem
36 operation
37 operators
38 optimization problem
39 outsourcing
40 paper
41 perfect balance
42 personalization tasks
43 phenomenon
44 problem
45 processing
46 quality
47 queries
48 query processing
49 result quality
50 results
51 retrieval
52 runtime
53 runtime cost
54 skyline
55 skyline queries
56 skyline query processing
57 skyline results
58 task
59 technique
60 tuples
61 use
62 values
63 widespread use
64 workers
65 schema:name Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing
66 schema:pagination 298-312
67 schema:productId N7bb228ade7d64f138f55eabaff6a0489
68 Ne0eb0b4787a84a2c9c58b04c62e5da7c
69 schema:publisher Nd44b1ccf724749cf90588a011b472c01
70 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038373132
71 https://doi.org/10.1007/978-3-642-41924-9_25
72 schema:sdDatePublished 2022-06-01T22:33
73 schema:sdLicense https://scigraph.springernature.com/explorer/license/
74 schema:sdPublisher Nc71d66bd86a64d2c893c7dc92b2351f0
75 schema:url https://doi.org/10.1007/978-3-642-41924-9_25
76 sgo:license sg:explorer/license/
77 sgo:sdDataset chapters
78 rdf:type schema:Chapter
79 N0d1624dd4a5648be8e934007290e2f1a rdf:first N753cf24450aa4c1bba642ffdd6db0702
80 rdf:rest Nb7a0a69929974395be11e5503d834f3b
81 N231b7ee963274a6688c8766ac53b8dde schema:familyName Storey
82 schema:givenName Veda C.
83 rdf:type schema:Person
84 N28771d89a4f64bb0ad8daf12b14b5b95 rdf:first N73d6681fa60840c99cdb2a0442db4e73
85 rdf:rest Na9604a427e274bfeaa9c9a8bda79846e
86 N34807b469e9640bcbb4d371e08712ade schema:familyName Trujillo
87 schema:givenName Juan C.
88 rdf:type schema:Person
89 N73d6681fa60840c99cdb2a0442db4e73 schema:affiliation grid-institutes:grid.6738.a
90 schema:familyName El Maarry
91 schema:givenName Kinda
92 rdf:type schema:Person
93 N74e2936376cf4d36a6af32481468c445 rdf:first sg:person.011355173745.44
94 rdf:rest N28771d89a4f64bb0ad8daf12b14b5b95
95 N753cf24450aa4c1bba642ffdd6db0702 schema:familyName Ng
96 schema:givenName Wilfred
97 rdf:type schema:Person
98 N7bb228ade7d64f138f55eabaff6a0489 schema:name dimensions_id
99 schema:value pub.1038373132
100 rdf:type schema:PropertyValue
101 Na9604a427e274bfeaa9c9a8bda79846e rdf:first sg:person.014313642615.12
102 rdf:rest rdf:nil
103 Nab23c1162be54049821481932f8d9ea1 rdf:first N34807b469e9640bcbb4d371e08712ade
104 rdf:rest rdf:nil
105 Nb7a0a69929974395be11e5503d834f3b rdf:first N231b7ee963274a6688c8766ac53b8dde
106 rdf:rest Nab23c1162be54049821481932f8d9ea1
107 Nc71d66bd86a64d2c893c7dc92b2351f0 schema:name Springer Nature - SN SciGraph project
108 rdf:type schema:Organization
109 Nd44b1ccf724749cf90588a011b472c01 schema:name Springer Nature
110 rdf:type schema:Organisation
111 Ne0eb0b4787a84a2c9c58b04c62e5da7c schema:name doi
112 schema:value 10.1007/978-3-642-41924-9_25
113 rdf:type schema:PropertyValue
114 Nefaae73c0c404a50ba54c9b910bed3ab schema:isbn 978-3-642-41923-2
115 978-3-642-41924-9
116 schema:name Conceptual Modeling
117 rdf:type schema:Book
118 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
119 schema:name Information and Computing Sciences
120 rdf:type schema:DefinedTerm
121 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
122 schema:name Information Systems
123 rdf:type schema:DefinedTerm
124 sg:person.011355173745.44 schema:affiliation grid-institutes:grid.250343.3
125 schema:familyName Lofi
126 schema:givenName Christoph
127 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011355173745.44
128 rdf:type schema:Person
129 sg:person.014313642615.12 schema:affiliation grid-institutes:grid.6738.a
130 schema:familyName Balke
131 schema:givenName Wolf-Tilo
132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014313642615.12
133 rdf:type schema:Person
134 grid-institutes:grid.250343.3 schema:alternateName National Institute of Informatics, 101-8430, Tokyo, Japan
135 schema:name National Institute of Informatics, 101-8430, Tokyo, Japan
136 rdf:type schema:Organization
137 grid-institutes:grid.6738.a schema:alternateName Institut für Informationssysteme, Technische Universität Braunschweig, 38106, Braunschweig, Germany
138 schema:name Institut für Informationssysteme, Technische Universität Braunschweig, 38106, Braunschweig, Germany
139 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...