Developing an innovative entity extraction method for unstructured data View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2017-12

AUTHORS

Waleed Zaghloul, Silvana Trimi

ABSTRACT

The main goal of this study is to build high-precision extractors for entities such as Person and Organization as a good initial seed that can be used for training and learning in machine-learning systems, for the same categories, other categories, and across domains, languages, and applications. The improvement of entities extraction precision also increases the relationships extraction precision, which is particularly important in certain domains (such as intelligence systems, social networking, genetic studies, healthcare, etc.). These increases in precision improve the end users’ experience quality in using the extraction system because it lowers the time that users spend for training the system and correcting outputs, focusing more on analyzing the information extracted to make better data-driven decisions. More... »

PAGES

3

References to SciGraph publications

  • 2013. An Approach for Extracting and Disambiguating Arabic Persons’ Names Using Clustered Dictionaries and Scored Patterns in NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS
  • 2014-06. Fine-grained Dutch named entity recognition in LANGUAGE RESOURCES AND EVALUATION
  • 2015-12. The age of quality innovation in INTERNATIONAL JOURNAL OF QUALITY INNOVATION
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s40887-017-0012-y

    DOI

    http://dx.doi.org/10.1186/s40887-017-0012-y

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1085560223


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "name": [
                "Valera Intelligent Systems, Fairfax, VA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zaghloul", 
            "givenName": "Waleed", 
            "id": "sg:person.012756621434.09", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012756621434.09"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Nebraska\u2013Lincoln", 
              "id": "https://www.grid.ac/institutes/grid.24434.35", 
              "name": [
                "Department of Supply Chain Management and Analytics, College of Business Administration, University of Nebraska, Lincoln, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Trimi", 
            "givenName": "Silvana", 
            "id": "sg:person.010664037513.94", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010664037513.94"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-642-38824-8_17", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000269880", 
              "https://doi.org/10.1007/978-3-642-38824-8_17"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/2090176.2090178", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021470883"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/2339530.2339742", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024019118"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10579-013-9255-y", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1026617485", 
              "https://doi.org/10.1007/s10579-013-9255-y"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/2500873", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030277687"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1072228.1072282", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033000915"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1108/02635570910957669", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043379749"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s40887-015-0002-x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044165590", 
              "https://doi.org/10.1186/s40887-015-0002-x"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s40887-015-0002-x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044165590", 
              "https://doi.org/10.1186/s40887-015-0002-x"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1098/rsta.2000.0587", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049160798"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/2345396.2345427", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050240211"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1075/li.30.1.03nad", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058235992"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tcbb.2010.51", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061540806"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1504/ijbra.2010.032121", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1067439393"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1572306.1572317", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099140221"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1596374.1596399", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099140430"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1073012.1073067", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099239575"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1073012.1073067", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099239575"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1073083.1073163", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099239649"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/1073083.1073163", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099239649"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.2307/41703503", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1107655029"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2017-12", 
        "datePublishedReg": "2017-12-01", 
        "description": "The main goal of this study is to build high-precision extractors for entities such as Person and Organization as a good initial seed that can be used for training and learning in machine-learning systems, for the same categories, other categories, and across domains, languages, and applications. The improvement of entities extraction precision also increases the relationships extraction precision, which is particularly important in certain domains (such as intelligence systems, social networking, genetic studies, healthcare, etc.). These increases in precision improve the end users\u2019 experience quality in using the extraction system because it lowers the time that users spend for training the system and correcting outputs, focusing more on analyzing the information extracted to make better data-driven decisions.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1186/s40887-017-0012-y", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1136001", 
            "issn": [
              "2363-7021"
            ], 
            "name": "International Journal of Quality Innovation", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "3"
          }
        ], 
        "name": "Developing an innovative entity extraction method for unstructured data", 
        "pagination": "3", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "24fc498d0a4be4b2b8114c30296b381f43a8162cca2110e4cd455205ef66bacb"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s40887-017-0012-y"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1085560223"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s40887-017-0012-y", 
          "https://app.dimensions.ai/details/publication/pub.1085560223"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T10:00", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89816_00000003.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1186%2Fs40887-017-0012-y"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s40887-017-0012-y'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s40887-017-0012-y'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s40887-017-0012-y'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s40887-017-0012-y'


     

    This table displays all metadata directly associated to this object as RDF triples.

    126 TRIPLES      21 PREDICATES      45 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s40887-017-0012-y schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author Nb9572b767e0247a7a94d61b2fd24ce0e
    4 schema:citation sg:pub.10.1007/978-3-642-38824-8_17
    5 sg:pub.10.1007/s10579-013-9255-y
    6 sg:pub.10.1186/s40887-015-0002-x
    7 https://doi.org/10.1075/li.30.1.03nad
    8 https://doi.org/10.1098/rsta.2000.0587
    9 https://doi.org/10.1108/02635570910957669
    10 https://doi.org/10.1109/tcbb.2010.51
    11 https://doi.org/10.1145/2090176.2090178
    12 https://doi.org/10.1145/2339530.2339742
    13 https://doi.org/10.1145/2345396.2345427
    14 https://doi.org/10.1145/2500873
    15 https://doi.org/10.1504/ijbra.2010.032121
    16 https://doi.org/10.2307/41703503
    17 https://doi.org/10.3115/1072228.1072282
    18 https://doi.org/10.3115/1073012.1073067
    19 https://doi.org/10.3115/1073083.1073163
    20 https://doi.org/10.3115/1572306.1572317
    21 https://doi.org/10.3115/1596374.1596399
    22 schema:datePublished 2017-12
    23 schema:datePublishedReg 2017-12-01
    24 schema:description The main goal of this study is to build high-precision extractors for entities such as Person and Organization as a good initial seed that can be used for training and learning in machine-learning systems, for the same categories, other categories, and across domains, languages, and applications. The improvement of entities extraction precision also increases the relationships extraction precision, which is particularly important in certain domains (such as intelligence systems, social networking, genetic studies, healthcare, etc.). These increases in precision improve the end users’ experience quality in using the extraction system because it lowers the time that users spend for training the system and correcting outputs, focusing more on analyzing the information extracted to make better data-driven decisions.
    25 schema:genre research_article
    26 schema:inLanguage en
    27 schema:isAccessibleForFree true
    28 schema:isPartOf Ncafc9a6d30464e969d4a2aeb914f23f4
    29 Nf2a79f76f74d44eaa07b93a03542d28d
    30 sg:journal.1136001
    31 schema:name Developing an innovative entity extraction method for unstructured data
    32 schema:pagination 3
    33 schema:productId N0c6987d275654a0fa9ab139fc0b27993
    34 N35e5281a7b254c9db7df5ad2330b2689
    35 Nfc106df3bc2a41f6ac7a50c7c8f44beb
    36 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085560223
    37 https://doi.org/10.1186/s40887-017-0012-y
    38 schema:sdDatePublished 2019-04-11T10:00
    39 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    40 schema:sdPublisher N2c02e4cc4ed64f53a8f80523e7256bc2
    41 schema:url https://link.springer.com/10.1186%2Fs40887-017-0012-y
    42 sgo:license sg:explorer/license/
    43 sgo:sdDataset articles
    44 rdf:type schema:ScholarlyArticle
    45 N0c6987d275654a0fa9ab139fc0b27993 schema:name dimensions_id
    46 schema:value pub.1085560223
    47 rdf:type schema:PropertyValue
    48 N2c02e4cc4ed64f53a8f80523e7256bc2 schema:name Springer Nature - SN SciGraph project
    49 rdf:type schema:Organization
    50 N35e5281a7b254c9db7df5ad2330b2689 schema:name doi
    51 schema:value 10.1186/s40887-017-0012-y
    52 rdf:type schema:PropertyValue
    53 N50776f9933a94a8b8320a5aa16b00233 schema:name Valera Intelligent Systems, Fairfax, VA, USA
    54 rdf:type schema:Organization
    55 N5369d3e296aa416c919e9a614b3afe46 rdf:first sg:person.010664037513.94
    56 rdf:rest rdf:nil
    57 Nb9572b767e0247a7a94d61b2fd24ce0e rdf:first sg:person.012756621434.09
    58 rdf:rest N5369d3e296aa416c919e9a614b3afe46
    59 Ncafc9a6d30464e969d4a2aeb914f23f4 schema:issueNumber 1
    60 rdf:type schema:PublicationIssue
    61 Nf2a79f76f74d44eaa07b93a03542d28d schema:volumeNumber 3
    62 rdf:type schema:PublicationVolume
    63 Nfc106df3bc2a41f6ac7a50c7c8f44beb schema:name readcube_id
    64 schema:value 24fc498d0a4be4b2b8114c30296b381f43a8162cca2110e4cd455205ef66bacb
    65 rdf:type schema:PropertyValue
    66 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    67 schema:name Information and Computing Sciences
    68 rdf:type schema:DefinedTerm
    69 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    70 schema:name Artificial Intelligence and Image Processing
    71 rdf:type schema:DefinedTerm
    72 sg:journal.1136001 schema:issn 2363-7021
    73 schema:name International Journal of Quality Innovation
    74 rdf:type schema:Periodical
    75 sg:person.010664037513.94 schema:affiliation https://www.grid.ac/institutes/grid.24434.35
    76 schema:familyName Trimi
    77 schema:givenName Silvana
    78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010664037513.94
    79 rdf:type schema:Person
    80 sg:person.012756621434.09 schema:affiliation N50776f9933a94a8b8320a5aa16b00233
    81 schema:familyName Zaghloul
    82 schema:givenName Waleed
    83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012756621434.09
    84 rdf:type schema:Person
    85 sg:pub.10.1007/978-3-642-38824-8_17 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000269880
    86 https://doi.org/10.1007/978-3-642-38824-8_17
    87 rdf:type schema:CreativeWork
    88 sg:pub.10.1007/s10579-013-9255-y schema:sameAs https://app.dimensions.ai/details/publication/pub.1026617485
    89 https://doi.org/10.1007/s10579-013-9255-y
    90 rdf:type schema:CreativeWork
    91 sg:pub.10.1186/s40887-015-0002-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1044165590
    92 https://doi.org/10.1186/s40887-015-0002-x
    93 rdf:type schema:CreativeWork
    94 https://doi.org/10.1075/li.30.1.03nad schema:sameAs https://app.dimensions.ai/details/publication/pub.1058235992
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.1098/rsta.2000.0587 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049160798
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1108/02635570910957669 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043379749
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1109/tcbb.2010.51 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061540806
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1145/2090176.2090178 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021470883
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.1145/2339530.2339742 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024019118
    105 rdf:type schema:CreativeWork
    106 https://doi.org/10.1145/2345396.2345427 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050240211
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.1145/2500873 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030277687
    109 rdf:type schema:CreativeWork
    110 https://doi.org/10.1504/ijbra.2010.032121 schema:sameAs https://app.dimensions.ai/details/publication/pub.1067439393
    111 rdf:type schema:CreativeWork
    112 https://doi.org/10.2307/41703503 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107655029
    113 rdf:type schema:CreativeWork
    114 https://doi.org/10.3115/1072228.1072282 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033000915
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.3115/1073012.1073067 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099239575
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.3115/1073083.1073163 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099239649
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.3115/1572306.1572317 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099140221
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.3115/1596374.1596399 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099140430
    123 rdf:type schema:CreativeWork
    124 https://www.grid.ac/institutes/grid.24434.35 schema:alternateName University of Nebraska–Lincoln
    125 schema:name Department of Supply Chain Management and Analytics, College of Business Administration, University of Nebraska, Lincoln, USA
    126 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...