A Majority of Wrongs Doesn’t Make It Right - On Crowdsourcing Quality for Skewed Domain Tasks View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2015-12-25

AUTHORS

Kinda El Maarry , Ulrich Güntzer , Wolf-Tilo Balke

ABSTRACT

Today, crowdsourcing has emerged as a promising paradigm for annotating, structuring, and managing Web data. Still, as long as the problem of the crowd workers’ trustworthiness in terms of result quality is not essentially solved, all these efforts remain doubtful. Therefore, in this paper we look at today’s dominant quality assurance techniques and investigate how they cope with Web data, i.e. typical long-tail distributions, making it easy for strategic spammers to guess the prevalent answers and thus to go undetected. We provide a thorough theoretical analysis, quantifying the success of different methods on such skewed domains by means of test theory and show their individual weaknesses. Exploiting our case study analysis, we propose a simple privacy-preserving, task-agnostic model to improve test reliability, while actually decreasing overhead costs for quality assurance. Finally, we show the stability of our method for even higher numbers of spammers in controlled crowdsourcing experiments. More... »

PAGES

293-308

Book

TITLE

Web Information Systems Engineering – WISE 2015

ISBN

978-3-319-26189-8
978-3-319-26190-4

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-26190-4_20

DOI

http://dx.doi.org/10.1007/978-3-319-26190-4_20

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1004063282


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IFIS, TU Braunschweig, Brunswick, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6738.a", 
          "name": [
            "IFIS, TU Braunschweig, Brunswick, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "El Maarry", 
        "givenName": "Kinda", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Inst. F. Informatik, Universit\u00e4t T\u00fcbingen, T\u00fcbingen, Germany", 
          "id": "http://www.grid.ac/institutes/grid.10392.39", 
          "name": [
            "Inst. F. Informatik, Universit\u00e4t T\u00fcbingen, T\u00fcbingen, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "G\u00fcntzer", 
        "givenName": "Ulrich", 
        "id": "sg:person.013324511711.75", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013324511711.75"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IFIS, TU Braunschweig, Brunswick, Germany", 
          "id": "http://www.grid.ac/institutes/grid.6738.a", 
          "name": [
            "IFIS, TU Braunschweig, Brunswick, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Balke", 
        "givenName": "Wolf-Tilo", 
        "id": "sg:person.014313642615.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014313642615.12"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2015-12-25", 
    "datePublishedReg": "2015-12-25", 
    "description": "Today, crowdsourcing has emerged as a promising paradigm for annotating, structuring, and managing Web data. Still, as long as the problem of the crowd workers\u2019 trustworthiness in terms of result quality is not essentially solved, all these efforts remain doubtful. Therefore, in this paper we look at today\u2019s dominant quality assurance techniques and investigate how they cope with Web data, i.e. typical long-tail distributions, making it easy for strategic spammers to guess the prevalent answers and thus to go undetected. We provide a thorough theoretical analysis, quantifying the success of different methods on such skewed domains by means of test theory and show their individual weaknesses. Exploiting our case study analysis, we propose a simple privacy-preserving, task-agnostic model to improve test reliability, while actually decreasing overhead costs for quality assurance. Finally, we show the stability of our method for even higher numbers of spammers in controlled crowdsourcing experiments.", 
    "editor": [
      {
        "familyName": "Wang", 
        "givenName": "Jianyong", 
        "type": "Person"
      }, 
      {
        "familyName": "Cellary", 
        "givenName": "Wojciech", 
        "type": "Person"
      }, 
      {
        "familyName": "Wang", 
        "givenName": "Dingding", 
        "type": "Person"
      }, 
      {
        "familyName": "Wang", 
        "givenName": "Hua", 
        "type": "Person"
      }, 
      {
        "familyName": "Chen", 
        "givenName": "Shu-Ching", 
        "type": "Person"
      }, 
      {
        "familyName": "Li", 
        "givenName": "Tao", 
        "type": "Person"
      }, 
      {
        "familyName": "Zhang", 
        "givenName": "Yanchun", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-26190-4_20", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-319-26189-8", 
        "978-3-319-26190-4"
      ], 
      "name": "Web Information Systems Engineering \u2013 WISE 2015", 
      "type": "Book"
    }, 
    "keywords": [
      "web data", 
      "quality assurance techniques", 
      "domain tasks", 
      "promising paradigm", 
      "result quality", 
      "long-tail distribution", 
      "thorough theoretical analysis", 
      "assurance techniques", 
      "crowd workers", 
      "overhead costs", 
      "spammers", 
      "individual weaknesses", 
      "theoretical analysis", 
      "quality assurance", 
      "trustworthiness", 
      "task", 
      "paradigm", 
      "quality", 
      "assurance", 
      "method", 
      "cost", 
      "reliability", 
      "different methods", 
      "data", 
      "domain", 
      "technique", 
      "today", 
      "structuring", 
      "case study analysis", 
      "answers", 
      "model", 
      "efforts", 
      "experiments", 
      "weakness", 
      "terms", 
      "number", 
      "success", 
      "analysis", 
      "study analysis", 
      "means", 
      "test theory", 
      "higher number", 
      "theory", 
      "distribution", 
      "workers", 
      "wrong", 
      "test reliability", 
      "stability", 
      "majority", 
      "paper", 
      "problem", 
      "today\u2019s dominant quality assurance techniques", 
      "\u2019s dominant quality assurance techniques", 
      "typical long-tail distributions", 
      "strategic spammers", 
      "prevalent answers", 
      "such skewed domains", 
      "skewed domains", 
      "task-agnostic model", 
      "Majority of Wrongs", 
      "Crowdsourcing Quality", 
      "Skewed Domain Tasks"
    ], 
    "name": "A Majority of Wrongs Doesn\u2019t Make It Right - On Crowdsourcing Quality for Skewed Domain Tasks", 
    "pagination": "293-308", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1004063282"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-26190-4_20"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-26190-4_20", 
      "https://app.dimensions.ai/details/publication/pub.1004063282"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:11", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_190.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-319-26190-4_20"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-26190-4_20'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-26190-4_20'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-26190-4_20'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-26190-4_20'


 

This table displays all metadata directly associated to this object as RDF triples.

168 TRIPLES      23 PREDICATES      87 URIs      80 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-26190-4_20 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N51088287287147a0bf28a714c263085e
4 schema:datePublished 2015-12-25
5 schema:datePublishedReg 2015-12-25
6 schema:description Today, crowdsourcing has emerged as a promising paradigm for annotating, structuring, and managing Web data. Still, as long as the problem of the crowd workers’ trustworthiness in terms of result quality is not essentially solved, all these efforts remain doubtful. Therefore, in this paper we look at today’s dominant quality assurance techniques and investigate how they cope with Web data, i.e. typical long-tail distributions, making it easy for strategic spammers to guess the prevalent answers and thus to go undetected. We provide a thorough theoretical analysis, quantifying the success of different methods on such skewed domains by means of test theory and show their individual weaknesses. Exploiting our case study analysis, we propose a simple privacy-preserving, task-agnostic model to improve test reliability, while actually decreasing overhead costs for quality assurance. Finally, we show the stability of our method for even higher numbers of spammers in controlled crowdsourcing experiments.
7 schema:editor N2ea8378212e842118122d4cf3b7f18e2
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf N8e595871c21e48fa9d808fedfe51da21
12 schema:keywords Crowdsourcing Quality
13 Majority of Wrongs
14 Skewed Domain Tasks
15 analysis
16 answers
17 assurance
18 assurance techniques
19 case study analysis
20 cost
21 crowd workers
22 data
23 different methods
24 distribution
25 domain
26 domain tasks
27 efforts
28 experiments
29 higher number
30 individual weaknesses
31 long-tail distribution
32 majority
33 means
34 method
35 model
36 number
37 overhead costs
38 paper
39 paradigm
40 prevalent answers
41 problem
42 promising paradigm
43 quality
44 quality assurance
45 quality assurance techniques
46 reliability
47 result quality
48 skewed domains
49 spammers
50 stability
51 strategic spammers
52 structuring
53 study analysis
54 success
55 such skewed domains
56 task
57 task-agnostic model
58 technique
59 terms
60 test reliability
61 test theory
62 theoretical analysis
63 theory
64 thorough theoretical analysis
65 today
66 today’s dominant quality assurance techniques
67 trustworthiness
68 typical long-tail distributions
69 weakness
70 web data
71 workers
72 wrong
73 ’s dominant quality assurance techniques
74 schema:name A Majority of Wrongs Doesn’t Make It Right - On Crowdsourcing Quality for Skewed Domain Tasks
75 schema:pagination 293-308
76 schema:productId N433baf2c35be498d9a3af0aa27ccbe5e
77 Nba9c6ef7ad8b434f9f342bff8864e640
78 schema:publisher N072db2b0690c441390ef031064d158b9
79 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004063282
80 https://doi.org/10.1007/978-3-319-26190-4_20
81 schema:sdDatePublished 2022-01-01T19:11
82 schema:sdLicense https://scigraph.springernature.com/explorer/license/
83 schema:sdPublisher N0f8c526a6f6342ad8dde4bb6925b40b3
84 schema:url https://doi.org/10.1007/978-3-319-26190-4_20
85 sgo:license sg:explorer/license/
86 sgo:sdDataset chapters
87 rdf:type schema:Chapter
88 N072db2b0690c441390ef031064d158b9 schema:name Springer Nature
89 rdf:type schema:Organisation
90 N0b00f7047b80471895fddfd42a162deb schema:familyName Chen
91 schema:givenName Shu-Ching
92 rdf:type schema:Person
93 N0f73a07db1ed4f9e90e5be186a90e93c schema:familyName Wang
94 schema:givenName Hua
95 rdf:type schema:Person
96 N0f8c526a6f6342ad8dde4bb6925b40b3 schema:name Springer Nature - SN SciGraph project
97 rdf:type schema:Organization
98 N15bcb689442a4025be1d22957f767593 rdf:first N88af6763e70c4bfd8bf3ce9719299598
99 rdf:rest N4fbcb962ca5741e1bf2d478e94f3c284
100 N243d5cfc377f4bc9ae797b30f5edea29 rdf:first Ned21584f34e34bf28474fe043ab00507
101 rdf:rest Nfe780b3924014a2797ab8e88a276f3cf
102 N2ea8378212e842118122d4cf3b7f18e2 rdf:first Nb94a85b179ac46b5abc0628a30616f6d
103 rdf:rest N4531d6aa6e6e4f958ca92551c0f08b47
104 N3849083db70b4bd99d6f12249e4ef922 rdf:first N0b00f7047b80471895fddfd42a162deb
105 rdf:rest N243d5cfc377f4bc9ae797b30f5edea29
106 N3bdbca3996824c9cb0baa3b017f21bc9 schema:familyName Zhang
107 schema:givenName Yanchun
108 rdf:type schema:Person
109 N41e0a4d080514f49877124e4318d2ad9 schema:affiliation grid-institutes:grid.6738.a
110 schema:familyName El Maarry
111 schema:givenName Kinda
112 rdf:type schema:Person
113 N433baf2c35be498d9a3af0aa27ccbe5e schema:name doi
114 schema:value 10.1007/978-3-319-26190-4_20
115 rdf:type schema:PropertyValue
116 N4531d6aa6e6e4f958ca92551c0f08b47 rdf:first Ned38630412c14ee4a99a1119b66e348f
117 rdf:rest N15bcb689442a4025be1d22957f767593
118 N4fbcb962ca5741e1bf2d478e94f3c284 rdf:first N0f73a07db1ed4f9e90e5be186a90e93c
119 rdf:rest N3849083db70b4bd99d6f12249e4ef922
120 N51088287287147a0bf28a714c263085e rdf:first N41e0a4d080514f49877124e4318d2ad9
121 rdf:rest N73d35a57c1214341a7aeda9c62caedc7
122 N73d35a57c1214341a7aeda9c62caedc7 rdf:first sg:person.013324511711.75
123 rdf:rest Nea620abee6134bb682a9598b51fe6e02
124 N88af6763e70c4bfd8bf3ce9719299598 schema:familyName Wang
125 schema:givenName Dingding
126 rdf:type schema:Person
127 N8e595871c21e48fa9d808fedfe51da21 schema:isbn 978-3-319-26189-8
128 978-3-319-26190-4
129 schema:name Web Information Systems Engineering – WISE 2015
130 rdf:type schema:Book
131 Nb94a85b179ac46b5abc0628a30616f6d schema:familyName Wang
132 schema:givenName Jianyong
133 rdf:type schema:Person
134 Nba9c6ef7ad8b434f9f342bff8864e640 schema:name dimensions_id
135 schema:value pub.1004063282
136 rdf:type schema:PropertyValue
137 Nea620abee6134bb682a9598b51fe6e02 rdf:first sg:person.014313642615.12
138 rdf:rest rdf:nil
139 Ned21584f34e34bf28474fe043ab00507 schema:familyName Li
140 schema:givenName Tao
141 rdf:type schema:Person
142 Ned38630412c14ee4a99a1119b66e348f schema:familyName Cellary
143 schema:givenName Wojciech
144 rdf:type schema:Person
145 Nfe780b3924014a2797ab8e88a276f3cf rdf:first N3bdbca3996824c9cb0baa3b017f21bc9
146 rdf:rest rdf:nil
147 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
148 schema:name Information and Computing Sciences
149 rdf:type schema:DefinedTerm
150 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
151 schema:name Information Systems
152 rdf:type schema:DefinedTerm
153 sg:person.013324511711.75 schema:affiliation grid-institutes:grid.10392.39
154 schema:familyName Güntzer
155 schema:givenName Ulrich
156 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013324511711.75
157 rdf:type schema:Person
158 sg:person.014313642615.12 schema:affiliation grid-institutes:grid.6738.a
159 schema:familyName Balke
160 schema:givenName Wolf-Tilo
161 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014313642615.12
162 rdf:type schema:Person
163 grid-institutes:grid.10392.39 schema:alternateName Inst. F. Informatik, Universität Tübingen, Tübingen, Germany
164 schema:name Inst. F. Informatik, Universität Tübingen, Tübingen, Germany
165 rdf:type schema:Organization
166 grid-institutes:grid.6738.a schema:alternateName IFIS, TU Braunschweig, Brunswick, Germany
167 schema:name IFIS, TU Braunschweig, Brunswick, Germany
168 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...