Generating training documents


Ontology type: sgo:Patent     


Patent Info

DATE

2015-10-20T00:00

AUTHORS

Vinay Deolalikar , Hernan Laffitte

ABSTRACT

A method of generating training documents for training a classifying device comprises, with a processor, sampling from a distribution of words in a number of original documents, and creating a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents. A device for classifying textual documents comprises a processor; and a memory communicatively coupled to the processor, the memory comprising a sampling module to, when executed by the processor, determine the distribution of words in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents, and a training module to, when executed by the processor, train the device to classify textual documents based on the pseudo-documents. More... »

Related SciGraph Publications

  • 1996-08. Bagging predictors in MACHINE LEARNING
  • JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/3468", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "name": "Vinay Deolalikar", 
            "type": "Person"
          }, 
          {
            "name": "Hernan Laffitte", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/bf00058655", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002929950", 
              "https://doi.org/10.1007/bf00058655"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf00058655", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002929950", 
              "https://doi.org/10.1007/bf00058655"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/505282.505283", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023316280"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1613/jair.953", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1105579550"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2015-10-20T00:00", 
        "description": "

    A method of generating training documents for training a classifying device comprises, with a processor, sampling from a distribution of words in a number of original documents, and creating a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents. A device for classifying textual documents comprises a processor; and a memory communicatively coupled to the processor, the memory comprising a sampling module to, when executed by the processor, determine the distribution of words in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents, and a training module to, when executed by the processor, train the device to classify textual documents based on the pseudo-documents.

    ", "id": "sg:patent.US-9165258-B2", "keywords": [ "method", "classifying", "processor", "sampling", "distribution", "document", "similar distribution", "Equipment and Supply", "memory", "training module", "train" ], "name": "Generating training documents", "recipient": [ { "id": "https://www.grid.ac/institutes/grid.418547.b", "type": "Organization" } ], "sameAs": [ "https://app.dimensions.ai/details/patent/US-9165258-B2" ], "sdDataset": "patents", "sdDatePublished": "2019-04-18T10:16", "sdLicense": "https://scigraph.springernature.com/explorer/license/", "sdPublisher": { "name": "Springer Nature - SN SciGraph project", "type": "Organization" }, "sdSource": "s3://com-uberresearch-data-patents-target-20190320-rc/data/sn-export/402f166718b70575fb5d4ffe01f064d1/0000100128-0000352499/json_export_01090.jsonl", "type": "Patent" } ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/patent.US-9165258-B2'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/patent.US-9165258-B2'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/patent.US-9165258-B2'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/patent.US-9165258-B2'


     

    This table displays all metadata directly associated to this object as RDF triples.

    47 TRIPLES      15 PREDICATES      28 URIs      19 LITERALS      2 BLANK NODES

    Subject Predicate Object
    1 sg:patent.US-9165258-B2 schema:about anzsrc-for:3468
    2 schema:author N2e7e77264f084f7fa8bcc445c6c54fde
    3 schema:citation sg:pub.10.1007/bf00058655
    4 https://doi.org/10.1145/505282.505283
    5 https://doi.org/10.1613/jair.953
    6 schema:datePublished 2015-10-20T00:00
    7 schema:description <p id="p-0001" num="0000">A method of generating training documents for training a classifying device comprises, with a processor, sampling from a distribution of words in a number of original documents, and creating a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents. A device for classifying textual documents comprises a processor; and a memory communicatively coupled to the processor, the memory comprising a sampling module to, when executed by the processor, determine the distribution of words in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents, and a training module to, when executed by the processor, train the device to classify textual documents based on the pseudo-documents.</p>
    8 schema:keywords Equipment and Supply
    9 classifying
    10 distribution
    11 document
    12 memory
    13 method
    14 processor
    15 sampling
    16 similar distribution
    17 train
    18 training module
    19 schema:name Generating training documents
    20 schema:recipient https://www.grid.ac/institutes/grid.418547.b
    21 schema:sameAs https://app.dimensions.ai/details/patent/US-9165258-B2
    22 schema:sdDatePublished 2019-04-18T10:16
    23 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    24 schema:sdPublisher Neaf778487ea644f291f9ab5345152518
    25 sgo:license sg:explorer/license/
    26 sgo:sdDataset patents
    27 rdf:type sgo:Patent
    28 N243a77ec28d94da4a7c1179e3c2f5496 schema:name Vinay Deolalikar
    29 rdf:type schema:Person
    30 N2e7e77264f084f7fa8bcc445c6c54fde rdf:first N243a77ec28d94da4a7c1179e3c2f5496
    31 rdf:rest N971bf65bbac64fc7b1b580b7077c4d47
    32 N971bf65bbac64fc7b1b580b7077c4d47 rdf:first Nb2ca9c72bfa3418ca52b0934631e323b
    33 rdf:rest rdf:nil
    34 Nb2ca9c72bfa3418ca52b0934631e323b schema:name Hernan Laffitte
    35 rdf:type schema:Person
    36 Neaf778487ea644f291f9ab5345152518 schema:name Springer Nature - SN SciGraph project
    37 rdf:type schema:Organization
    38 anzsrc-for:3468 schema:inDefinedTermSet anzsrc-for:
    39 rdf:type schema:DefinedTerm
    40 sg:pub.10.1007/bf00058655 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002929950
    41 https://doi.org/10.1007/bf00058655
    42 rdf:type schema:CreativeWork
    43 https://doi.org/10.1145/505282.505283 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023316280
    44 rdf:type schema:CreativeWork
    45 https://doi.org/10.1613/jair.953 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105579550
    46 rdf:type schema:CreativeWork
    47 https://www.grid.ac/institutes/grid.418547.b schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...