Information Extraction Meets Crowdsourcing: A Promising Couple View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2012-05-23

AUTHORS

Christoph Lofi, Joachim Selke, Wolf-Tilo Balke

ABSTRACT

Recent years brought tremendous advancements in the area of automated information extraction. But still, problem scenarios remain where even state-of-the-art algorithms do not provide a satisfying solution. In these cases, another aspiring recent trend can be exploited to achieve the required extraction quality: explicit crowdsourcing of human intelligence tasks. In this paper, we discuss the synergies between information extraction and crowdsourcing. In particular, we methodically identify and classify the challenges and fallacies that arise when combining both approaches. Furthermore, we argue that for harnessing the full potential of either approach, true hybrid techniques must be considered. To demonstrate this point, we showcase such a hybrid technique, which tightly interweaves information extraction with crowdsourcing and machine learning to vastly surpass the abilities of either technique. More... »

PAGES

109-120

References to SciGraph publications

  • 1982-01. The Psychology of Preferences in SCIENTIFIC AMERICAN
  • 2010-10-05. Advances in Collaborative Filtering in RECOMMENDER SYSTEMS HANDBOOK
  • 2004-08. A tutorial on support vector regression in STATISTICS AND COMPUTING
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s13222-012-0092-8

    DOI

    http://dx.doi.org/10.1007/s13222-012-0092-8

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1028950695


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, Braunschweig, Germany", 
              "id": "http://www.grid.ac/institutes/grid.6738.a", 
              "name": [
                "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, Braunschweig, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lofi", 
            "givenName": "Christoph", 
            "id": "sg:person.011355173745.44", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011355173745.44"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, Braunschweig, Germany", 
              "id": "http://www.grid.ac/institutes/grid.6738.a", 
              "name": [
                "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, Braunschweig, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Selke", 
            "givenName": "Joachim", 
            "id": "sg:person.012152554345.21", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012152554345.21"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, Braunschweig, Germany", 
              "id": "http://www.grid.ac/institutes/grid.6738.a", 
              "name": [
                "Institut f\u00fcr Informationssysteme, Technische Universit\u00e4t Braunschweig, Braunschweig, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Balke", 
            "givenName": "Wolf-Tilo", 
            "id": "sg:person.014313642615.12", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014313642615.12"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1023/b:stco.0000035301.49549.88", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000991887", 
              "https://doi.org/10.1023/b:stco.0000035301.49549.88"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/scientificamerican0182-160", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1056482656", 
              "https://doi.org/10.1038/scientificamerican0182-160"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-85820-3_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002058343", 
              "https://doi.org/10.1007/978-0-387-85820-3_5"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2012-05-23", 
        "datePublishedReg": "2012-05-23", 
        "description": "Recent years brought tremendous advancements in the area of automated information extraction. But still, problem scenarios remain where even state-of-the-art algorithms do not provide a satisfying solution. In these cases, another aspiring recent trend can be exploited to achieve the required extraction quality: explicit crowdsourcing of human intelligence tasks. In this paper, we discuss the synergies between information extraction and crowdsourcing. In particular, we methodically identify and classify the challenges and fallacies that arise when combining both approaches. Furthermore, we argue that for harnessing the full potential of either approach, true hybrid techniques must be considered. To demonstrate this point, we showcase such a hybrid technique, which tightly interweaves information extraction with crowdsourcing and machine learning to vastly surpass the abilities of either technique.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s13222-012-0092-8", 
        "inLanguage": "en", 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1136415", 
            "issn": [
              "1618-2162", 
              "1610-1995"
            ], 
            "name": "Datenbank-Spektrum", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "2", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "12"
          }
        ], 
        "keywords": [
          "information extraction", 
          "human intelligence tasks", 
          "hybrid technique", 
          "art algorithms", 
          "intelligence tasks", 
          "problem scenarios", 
          "extraction quality", 
          "satisfying solution", 
          "promising couple", 
          "tremendous advancement", 
          "full potential", 
          "Crowdsourcing", 
          "extraction", 
          "recent years", 
          "algorithm", 
          "machine", 
          "recent trends", 
          "task", 
          "technique", 
          "scenarios", 
          "advancement", 
          "challenges", 
          "solution", 
          "quality", 
          "point", 
          "synergy", 
          "state", 
          "ability", 
          "area", 
          "trends", 
          "cases", 
          "potential", 
          "couples", 
          "years", 
          "fallacy", 
          "approach", 
          "paper"
        ], 
        "name": "Information Extraction Meets Crowdsourcing: A Promising Couple", 
        "pagination": "109-120", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1028950695"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s13222-012-0092-8"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s13222-012-0092-8", 
          "https://app.dimensions.ai/details/publication/pub.1028950695"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-06-01T22:12", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/article/article_585.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s13222-012-0092-8"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s13222-012-0092-8'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s13222-012-0092-8'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s13222-012-0092-8'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s13222-012-0092-8'


     

    This table displays all metadata directly associated to this object as RDF triples.

    121 TRIPLES      22 PREDICATES      65 URIs      54 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s13222-012-0092-8 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author Nfb35eb4af42240b889bd8ab7443f9252
    4 schema:citation sg:pub.10.1007/978-0-387-85820-3_5
    5 sg:pub.10.1023/b:stco.0000035301.49549.88
    6 sg:pub.10.1038/scientificamerican0182-160
    7 schema:datePublished 2012-05-23
    8 schema:datePublishedReg 2012-05-23
    9 schema:description Recent years brought tremendous advancements in the area of automated information extraction. But still, problem scenarios remain where even state-of-the-art algorithms do not provide a satisfying solution. In these cases, another aspiring recent trend can be exploited to achieve the required extraction quality: explicit crowdsourcing of human intelligence tasks. In this paper, we discuss the synergies between information extraction and crowdsourcing. In particular, we methodically identify and classify the challenges and fallacies that arise when combining both approaches. Furthermore, we argue that for harnessing the full potential of either approach, true hybrid techniques must be considered. To demonstrate this point, we showcase such a hybrid technique, which tightly interweaves information extraction with crowdsourcing and machine learning to vastly surpass the abilities of either technique.
    10 schema:genre article
    11 schema:inLanguage en
    12 schema:isAccessibleForFree false
    13 schema:isPartOf N3b02a430515646e3be9581337cbabd17
    14 N6ed6a416ba2c47038d154226544142cc
    15 sg:journal.1136415
    16 schema:keywords Crowdsourcing
    17 ability
    18 advancement
    19 algorithm
    20 approach
    21 area
    22 art algorithms
    23 cases
    24 challenges
    25 couples
    26 extraction
    27 extraction quality
    28 fallacy
    29 full potential
    30 human intelligence tasks
    31 hybrid technique
    32 information extraction
    33 intelligence tasks
    34 machine
    35 paper
    36 point
    37 potential
    38 problem scenarios
    39 promising couple
    40 quality
    41 recent trends
    42 recent years
    43 satisfying solution
    44 scenarios
    45 solution
    46 state
    47 synergy
    48 task
    49 technique
    50 tremendous advancement
    51 trends
    52 years
    53 schema:name Information Extraction Meets Crowdsourcing: A Promising Couple
    54 schema:pagination 109-120
    55 schema:productId N426d810331754b1b9f580b3e7e723191
    56 N6c56494452054ea49b2c83cf73dcdac4
    57 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028950695
    58 https://doi.org/10.1007/s13222-012-0092-8
    59 schema:sdDatePublished 2022-06-01T22:12
    60 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    61 schema:sdPublisher N7e7bb9f17d6c4476a8e720947fad5688
    62 schema:url https://doi.org/10.1007/s13222-012-0092-8
    63 sgo:license sg:explorer/license/
    64 sgo:sdDataset articles
    65 rdf:type schema:ScholarlyArticle
    66 N3b02a430515646e3be9581337cbabd17 schema:volumeNumber 12
    67 rdf:type schema:PublicationVolume
    68 N426d810331754b1b9f580b3e7e723191 schema:name dimensions_id
    69 schema:value pub.1028950695
    70 rdf:type schema:PropertyValue
    71 N6c56494452054ea49b2c83cf73dcdac4 schema:name doi
    72 schema:value 10.1007/s13222-012-0092-8
    73 rdf:type schema:PropertyValue
    74 N6ed6a416ba2c47038d154226544142cc schema:issueNumber 2
    75 rdf:type schema:PublicationIssue
    76 N715ebebb280a47fe9e777b44783ffd28 rdf:first sg:person.014313642615.12
    77 rdf:rest rdf:nil
    78 N7e7bb9f17d6c4476a8e720947fad5688 schema:name Springer Nature - SN SciGraph project
    79 rdf:type schema:Organization
    80 Nb8999eb73d9d49df9ca241ae064b899e rdf:first sg:person.012152554345.21
    81 rdf:rest N715ebebb280a47fe9e777b44783ffd28
    82 Nfb35eb4af42240b889bd8ab7443f9252 rdf:first sg:person.011355173745.44
    83 rdf:rest Nb8999eb73d9d49df9ca241ae064b899e
    84 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    85 schema:name Information and Computing Sciences
    86 rdf:type schema:DefinedTerm
    87 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    88 schema:name Artificial Intelligence and Image Processing
    89 rdf:type schema:DefinedTerm
    90 sg:journal.1136415 schema:issn 1610-1995
    91 1618-2162
    92 schema:name Datenbank-Spektrum
    93 schema:publisher Springer Nature
    94 rdf:type schema:Periodical
    95 sg:person.011355173745.44 schema:affiliation grid-institutes:grid.6738.a
    96 schema:familyName Lofi
    97 schema:givenName Christoph
    98 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011355173745.44
    99 rdf:type schema:Person
    100 sg:person.012152554345.21 schema:affiliation grid-institutes:grid.6738.a
    101 schema:familyName Selke
    102 schema:givenName Joachim
    103 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012152554345.21
    104 rdf:type schema:Person
    105 sg:person.014313642615.12 schema:affiliation grid-institutes:grid.6738.a
    106 schema:familyName Balke
    107 schema:givenName Wolf-Tilo
    108 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014313642615.12
    109 rdf:type schema:Person
    110 sg:pub.10.1007/978-0-387-85820-3_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002058343
    111 https://doi.org/10.1007/978-0-387-85820-3_5
    112 rdf:type schema:CreativeWork
    113 sg:pub.10.1023/b:stco.0000035301.49549.88 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000991887
    114 https://doi.org/10.1023/b:stco.0000035301.49549.88
    115 rdf:type schema:CreativeWork
    116 sg:pub.10.1038/scientificamerican0182-160 schema:sameAs https://app.dimensions.ai/details/publication/pub.1056482656
    117 https://doi.org/10.1038/scientificamerican0182-160
    118 rdf:type schema:CreativeWork
    119 grid-institutes:grid.6738.a schema:alternateName Institut für Informationssysteme, Technische Universität Braunschweig, Braunschweig, Germany
    120 schema:name Institut für Informationssysteme, Technische Universität Braunschweig, Braunschweig, Germany
    121 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...