Layout-Aware Semi-automatic Information Extraction for Pharmaceutical Documents View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2017-10-24

AUTHORS

Simon Harmata , Katharina Hofer-Schmitz , Phuong-Ha Nguyen , Christoph Quix , Bujar Bakiu

ABSTRACT

Pharmaceutical companies and regulatory authorities are also affected by the current digitalization process and transform their paper-based, document-oriented communication to a structured, digital information exchange. The documents exchanged so far contain a huge amount of information that needs to be transformed into a structured format to enable a more efficient communication in the future. In such a setting, it is important that the information extracted from documents is very accurate as the information is used in a legal, regulatory process and also for the identification of unknown adverse effects of medicinal products that might be a threat to patients’ health. In this paper, we present our layout-aware semi-automatic information extraction system LASIE that combines techniques from rule-based information extraction, flexible data management, and semantic information management in a user-centered design. We applied the system in a case study with an industrial partner and achieved very satisfying results. More... »

PAGES

71-85

References to SciGraph publications

  • 2007. GeRoMe: A Generic Role Based Metamodel for Model Management in JOURNAL ON DATA SEMANTICS VIII
  • 2015. Identification of Adverse Drug Events in Chinese Clinical Narrative Text in UBIQUITOUS COMPUTING APPLICATION AND WIRELESS SENSOR
  • Book

    TITLE

    Data Integration in the Life Sciences

    ISBN

    978-3-319-69750-5
    978-3-319-69751-2

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-319-69751-2_8

    DOI

    http://dx.doi.org/10.1007/978-3-319-69751-2_8

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1092370073


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information Systems", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Fraunhofer Institute for Applied Information Technology", 
              "id": "https://www.grid.ac/institutes/grid.469870.4", 
              "name": [
                "Fraunhofer Institute for Applied Information Technology FIT Schloss Birlinghoven, 53754, Sankt Augustin, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Harmata", 
            "givenName": "Simon", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Fraunhofer Institute for Applied Information Technology", 
              "id": "https://www.grid.ac/institutes/grid.469870.4", 
              "name": [
                "Fraunhofer Institute for Applied Information Technology FIT Schloss Birlinghoven, 53754, Sankt Augustin, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hofer-Schmitz", 
            "givenName": "Katharina", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Fraunhofer Institute for Applied Information Technology", 
              "id": "https://www.grid.ac/institutes/grid.469870.4", 
              "name": [
                "Fraunhofer Institute for Applied Information Technology FIT Schloss Birlinghoven, 53754, Sankt Augustin, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Nguyen", 
            "givenName": "Phuong-Ha", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Fraunhofer Institute for Applied Information Technology", 
              "id": "https://www.grid.ac/institutes/grid.469870.4", 
              "name": [
                "Fraunhofer Institute for Applied Information Technology FIT Schloss Birlinghoven, 53754, Sankt Augustin, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Quix", 
            "givenName": "Christoph", 
            "id": "sg:person.014024640471.57", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014024640471.57"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Fraunhofer Institute for Applied Information Technology", 
              "id": "https://www.grid.ac/institutes/grid.469870.4", 
              "name": [
                "Fraunhofer Institute for Applied Information Technology FIT Schloss Birlinghoven, 53754, Sankt Augustin, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Bakiu", 
            "givenName": "Bujar", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://app.dimensions.ai/details/publication/pub.1002999644", 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://app.dimensions.ai/details/publication/pub.1002999644", 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-70664-9_4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016410614", 
              "https://doi.org/10.1007/978-3-540-70664-9_4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1998076.1998079", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025944524"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1371/journal.pone.0134208", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1036130009"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.jbi.2010.03.011", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1038791843"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.jbi.2005.11.004", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047241910"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-94-017-9618-7_62", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050713431", 
              "https://doi.org/10.1007/978-94-017-9618-7_62"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1561/1900000003", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1068001353"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/miv.1989.40513", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1086362784"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icdar.2013.292", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094984511"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2017-10-24", 
        "datePublishedReg": "2017-10-24", 
        "description": "Pharmaceutical companies and regulatory authorities are also affected by the current digitalization process and transform their paper-based, document-oriented communication to a structured, digital information exchange. The documents exchanged so far contain a huge amount of information that needs to be transformed into a structured format to enable a more efficient communication in the future. In such a setting, it is important that the information extracted from documents is very accurate as the information is used in a legal, regulatory process and also for the identification of unknown adverse effects of medicinal products that might be a threat to patients\u2019 health. In this paper, we present our layout-aware semi-automatic information extraction system LASIE that combines techniques from rule-based information extraction, flexible data management, and semantic information management in a user-centered design. We applied the system in a case study with an industrial partner and achieved very satisfying results.", 
        "editor": [
          {
            "familyName": "Da Silveira", 
            "givenName": "Marcos", 
            "type": "Person"
          }, 
          {
            "familyName": "Pruski", 
            "givenName": "C\u00e9dric", 
            "type": "Person"
          }, 
          {
            "familyName": "Schneider", 
            "givenName": "Reinhard", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-319-69751-2_8", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-319-69750-5", 
            "978-3-319-69751-2"
          ], 
          "name": "Data Integration in the Life Sciences", 
          "type": "Book"
        }, 
        "name": "Layout-Aware Semi-automatic Information Extraction for Pharmaceutical Documents", 
        "pagination": "71-85", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-319-69751-2_8"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "3083632dc05f5293794cfff35b62a6082d2308ac31e1beb10e2d7c7677277fdb"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1092370073"
            ]
          }
        ], 
        "publisher": {
          "location": "Cham", 
          "name": "Springer International Publishing", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-319-69751-2_8", 
          "https://app.dimensions.ai/details/publication/pub.1092370073"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T04:59", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000325_0000000325/records_100778_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-319-69751-2_8"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-69751-2_8'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-69751-2_8'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-69751-2_8'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-69751-2_8'


     

    This table displays all metadata directly associated to this object as RDF triples.

    130 TRIPLES      23 PREDICATES      36 URIs      19 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-319-69751-2_8 schema:about anzsrc-for:08
    2 anzsrc-for:0806
    3 schema:author N0b3810dee6db41abadb30a6125e8c957
    4 schema:citation sg:pub.10.1007/978-3-540-70664-9_4
    5 sg:pub.10.1007/978-94-017-9618-7_62
    6 https://app.dimensions.ai/details/publication/pub.1002999644
    7 https://doi.org/10.1016/j.jbi.2005.11.004
    8 https://doi.org/10.1016/j.jbi.2010.03.011
    9 https://doi.org/10.1109/icdar.2013.292
    10 https://doi.org/10.1109/miv.1989.40513
    11 https://doi.org/10.1145/1998076.1998079
    12 https://doi.org/10.1371/journal.pone.0134208
    13 https://doi.org/10.1561/1900000003
    14 schema:datePublished 2017-10-24
    15 schema:datePublishedReg 2017-10-24
    16 schema:description Pharmaceutical companies and regulatory authorities are also affected by the current digitalization process and transform their paper-based, document-oriented communication to a structured, digital information exchange. The documents exchanged so far contain a huge amount of information that needs to be transformed into a structured format to enable a more efficient communication in the future. In such a setting, it is important that the information extracted from documents is very accurate as the information is used in a legal, regulatory process and also for the identification of unknown adverse effects of medicinal products that might be a threat to patients’ health. In this paper, we present our layout-aware semi-automatic information extraction system LASIE that combines techniques from rule-based information extraction, flexible data management, and semantic information management in a user-centered design. We applied the system in a case study with an industrial partner and achieved very satisfying results.
    17 schema:editor N90db951a0646418ca7b967c725b3363d
    18 schema:genre chapter
    19 schema:inLanguage en
    20 schema:isAccessibleForFree false
    21 schema:isPartOf N98b2438029af461090898156ff0cfd82
    22 schema:name Layout-Aware Semi-automatic Information Extraction for Pharmaceutical Documents
    23 schema:pagination 71-85
    24 schema:productId N57bb4239d6aa40538e2e69b77622da34
    25 Nb569f28e039e4c33889efe6043686b4f
    26 Nc2f7a594a17e4369b59906820b733e35
    27 schema:publisher N036889371e424894b43b17d8ffcc4265
    28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092370073
    29 https://doi.org/10.1007/978-3-319-69751-2_8
    30 schema:sdDatePublished 2019-04-16T04:59
    31 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    32 schema:sdPublisher Nbd9a275c3cea4f9e8458e68d36024cff
    33 schema:url https://link.springer.com/10.1007%2F978-3-319-69751-2_8
    34 sgo:license sg:explorer/license/
    35 sgo:sdDataset chapters
    36 rdf:type schema:Chapter
    37 N036889371e424894b43b17d8ffcc4265 schema:location Cham
    38 schema:name Springer International Publishing
    39 rdf:type schema:Organisation
    40 N050f914c77ff4f94a4ea5093547ec1a4 rdf:first Nb79a07c18fee4f669f9ef6f356eda021
    41 rdf:rest rdf:nil
    42 N0b3810dee6db41abadb30a6125e8c957 rdf:first Nbcd7428aecb9426090e5529b4d48863a
    43 rdf:rest Ne010939482c64619ad51cae82749b030
    44 N0bba8b3e783b4a438257793f9474d375 schema:familyName Pruski
    45 schema:givenName Cédric
    46 rdf:type schema:Person
    47 N1886401266a7499090db73df5eba5276 rdf:first N0bba8b3e783b4a438257793f9474d375
    48 rdf:rest N050f914c77ff4f94a4ea5093547ec1a4
    49 N24156573214842b5a2aca48a88eb1333 schema:familyName Da Silveira
    50 schema:givenName Marcos
    51 rdf:type schema:Person
    52 N24e37369958749dcb617f56ca53e3f0d schema:affiliation https://www.grid.ac/institutes/grid.469870.4
    53 schema:familyName Bakiu
    54 schema:givenName Bujar
    55 rdf:type schema:Person
    56 N3835ead6ab684498a7c36b723f7298ce schema:affiliation https://www.grid.ac/institutes/grid.469870.4
    57 schema:familyName Hofer-Schmitz
    58 schema:givenName Katharina
    59 rdf:type schema:Person
    60 N57bb4239d6aa40538e2e69b77622da34 schema:name doi
    61 schema:value 10.1007/978-3-319-69751-2_8
    62 rdf:type schema:PropertyValue
    63 N7e427285aff84bddaa0ac46c67620bf1 schema:affiliation https://www.grid.ac/institutes/grid.469870.4
    64 schema:familyName Nguyen
    65 schema:givenName Phuong-Ha
    66 rdf:type schema:Person
    67 N90db951a0646418ca7b967c725b3363d rdf:first N24156573214842b5a2aca48a88eb1333
    68 rdf:rest N1886401266a7499090db73df5eba5276
    69 N98b2438029af461090898156ff0cfd82 schema:isbn 978-3-319-69750-5
    70 978-3-319-69751-2
    71 schema:name Data Integration in the Life Sciences
    72 rdf:type schema:Book
    73 Nb569f28e039e4c33889efe6043686b4f schema:name readcube_id
    74 schema:value 3083632dc05f5293794cfff35b62a6082d2308ac31e1beb10e2d7c7677277fdb
    75 rdf:type schema:PropertyValue
    76 Nb7073a13fb3245c4a8a6339bdb56a057 rdf:first sg:person.014024640471.57
    77 rdf:rest Nc5af4bb08e5c4efd8ce7f9231ef4ffb5
    78 Nb79a07c18fee4f669f9ef6f356eda021 schema:familyName Schneider
    79 schema:givenName Reinhard
    80 rdf:type schema:Person
    81 Nbcd7428aecb9426090e5529b4d48863a schema:affiliation https://www.grid.ac/institutes/grid.469870.4
    82 schema:familyName Harmata
    83 schema:givenName Simon
    84 rdf:type schema:Person
    85 Nbd9a275c3cea4f9e8458e68d36024cff schema:name Springer Nature - SN SciGraph project
    86 rdf:type schema:Organization
    87 Nc2f7a594a17e4369b59906820b733e35 schema:name dimensions_id
    88 schema:value pub.1092370073
    89 rdf:type schema:PropertyValue
    90 Nc5af4bb08e5c4efd8ce7f9231ef4ffb5 rdf:first N24e37369958749dcb617f56ca53e3f0d
    91 rdf:rest rdf:nil
    92 Ne010939482c64619ad51cae82749b030 rdf:first N3835ead6ab684498a7c36b723f7298ce
    93 rdf:rest Nebad40c45efd4140b7a85519ad5f7a0e
    94 Nebad40c45efd4140b7a85519ad5f7a0e rdf:first N7e427285aff84bddaa0ac46c67620bf1
    95 rdf:rest Nb7073a13fb3245c4a8a6339bdb56a057
    96 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    97 schema:name Information and Computing Sciences
    98 rdf:type schema:DefinedTerm
    99 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
    100 schema:name Information Systems
    101 rdf:type schema:DefinedTerm
    102 sg:person.014024640471.57 schema:affiliation https://www.grid.ac/institutes/grid.469870.4
    103 schema:familyName Quix
    104 schema:givenName Christoph
    105 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014024640471.57
    106 rdf:type schema:Person
    107 sg:pub.10.1007/978-3-540-70664-9_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016410614
    108 https://doi.org/10.1007/978-3-540-70664-9_4
    109 rdf:type schema:CreativeWork
    110 sg:pub.10.1007/978-94-017-9618-7_62 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050713431
    111 https://doi.org/10.1007/978-94-017-9618-7_62
    112 rdf:type schema:CreativeWork
    113 https://app.dimensions.ai/details/publication/pub.1002999644 schema:CreativeWork
    114 https://doi.org/10.1016/j.jbi.2005.11.004 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047241910
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.1016/j.jbi.2010.03.011 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038791843
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.1109/icdar.2013.292 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094984511
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.1109/miv.1989.40513 schema:sameAs https://app.dimensions.ai/details/publication/pub.1086362784
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.1145/1998076.1998079 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025944524
    123 rdf:type schema:CreativeWork
    124 https://doi.org/10.1371/journal.pone.0134208 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036130009
    125 rdf:type schema:CreativeWork
    126 https://doi.org/10.1561/1900000003 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068001353
    127 rdf:type schema:CreativeWork
    128 https://www.grid.ac/institutes/grid.469870.4 schema:alternateName Fraunhofer Institute for Applied Information Technology
    129 schema:name Fraunhofer Institute for Applied Information Technology FIT Schloss Birlinghoven, 53754, Sankt Augustin, Germany
    130 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...