Nonparametric Distribution Analysis for Text Mining View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2009-07-31

AUTHORS

Alexandros Karatzoglou , Ingo Feinerer , Kurt Hornik

ABSTRACT

A number of new algorithms for nonparametric distribution analysis based on Maximum Mean Discrepancy measures have been recently introduced. These novel algorithms operate in Hilbert space and can be used for nonparametric two-sample tests. Coupled with recent advances in string kernels, these methods extend the scope of kernel-based methods in the area of text mining. We review these kernel-based two-sample tests focusing on text mining where we will propose novel applications and present an efficient implementation in the kernlab package. We also present an efficient and integrated environment for applying modern machine learning methods to complex text mining problems through the combined use of the tm (for text mining) and the kernlab (for kernel-based learning) R packages. More... »

PAGES

295-305

References to SciGraph publications

  • 2006. Authorship Attribution of Texts: A Review in GENERAL THEORY OF INFORMATION TRANSFER AND COMBINATORICS
  • 2002. Learning to Classify Text Using Support Vector Machines in NONE
  • 1994-04. Authorship attribution in LANGUAGE RESOURCES AND EVALUATION
  • Book

    TITLE

    Advances in Data Analysis, Data Handling and Business Intelligence

    ISBN

    978-3-642-01043-9
    978-3-642-01044-6

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-642-01044-6_27

    DOI

    http://dx.doi.org/10.1007/978-3-642-01044-6_27

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1007202881


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "name": [
                "INSA de Rouen, LITIS, Rouen, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Karatzoglou", 
            "givenName": "Alexandros", 
            "id": "sg:person.07537723735.78", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07537723735.78"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "name": [
                "INSA de Rouen, LITIS, Rouen, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Feinerer", 
            "givenName": "Ingo", 
            "id": "sg:person.01117376470.61", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01117376470.61"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "name": [
                "INSA de Rouen, LITIS, Rouen, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hornik", 
            "givenName": "Kurt", 
            "id": "sg:person.01355621653.94", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01355621653.94"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/11889342_20", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007252641", 
              "https://doi.org/10.1007/11889342_20"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/11889342_20", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007252641", 
              "https://doi.org/10.1007/11889342_20"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/s1570-8667(03)00065-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1013212920"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1143844.1143961", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1028564053"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-1-4615-0907-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037183810", 
              "https://doi.org/10.1007/978-1-4615-0907-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-1-4615-0907-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037183810", 
              "https://doi.org/10.1007/978-1-4615-0907-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/09332480.2003.10554843", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039953475"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf01830689", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050011531", 
              "https://doi.org/10.1007/bf01830689"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf01830689", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050011531", 
              "https://doi.org/10.1007/bf01830689"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18637/jss.v011.i09", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1068672171"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18637/jss.v025.i05", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1068672367"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2009-07-31", 
        "datePublishedReg": "2009-07-31", 
        "description": "A number of new algorithms for nonparametric distribution analysis based on Maximum Mean Discrepancy measures have been recently introduced. These novel algorithms operate in Hilbert space and can be used for nonparametric two-sample tests. Coupled with recent advances in string kernels, these methods extend the scope of kernel-based methods in the area of text mining. We review these kernel-based two-sample tests focusing on text mining where we will propose novel applications and present an efficient implementation in the kernlab package. We also present an efficient and integrated environment for applying modern machine learning methods to complex text mining problems through the combined use of the tm (for text mining) and the kernlab (for kernel-based learning) R packages.", 
        "editor": [
          {
            "familyName": "Fink", 
            "givenName": "Andreas", 
            "type": "Person"
          }, 
          {
            "familyName": "Lausen", 
            "givenName": "Berthold", 
            "type": "Person"
          }, 
          {
            "familyName": "Seidel", 
            "givenName": "Wilfried", 
            "type": "Person"
          }, 
          {
            "familyName": "Ultsch", 
            "givenName": "Alfred", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-642-01044-6_27", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-642-01043-9", 
            "978-3-642-01044-6"
          ], 
          "name": "Advances in Data Analysis, Data Handling and Business Intelligence", 
          "type": "Book"
        }, 
        "name": "Nonparametric Distribution Analysis for Text Mining", 
        "pagination": "295-305", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1007202881"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-642-01044-6_27"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "224ed2ff2a0ed07510bcaca1efab3b9645b1059a7d1a5b78bd9190416a4dddb8"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-642-01044-6_27", 
          "https://app.dimensions.ai/details/publication/pub.1007202881"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:27", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000355_0000000355/records_53016_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-642-01044-6_27"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01044-6_27'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01044-6_27'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01044-6_27'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-01044-6_27'


     

    This table displays all metadata directly associated to this object as RDF triples.

    124 TRIPLES      23 PREDICATES      34 URIs      19 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-642-01044-6_27 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author N66af6a2c27904e8ea958d9f087ca5859
    4 schema:citation sg:pub.10.1007/11889342_20
    5 sg:pub.10.1007/978-1-4615-0907-3
    6 sg:pub.10.1007/bf01830689
    7 https://doi.org/10.1016/s1570-8667(03)00065-0
    8 https://doi.org/10.1080/09332480.2003.10554843
    9 https://doi.org/10.1145/1143844.1143961
    10 https://doi.org/10.18637/jss.v011.i09
    11 https://doi.org/10.18637/jss.v025.i05
    12 schema:datePublished 2009-07-31
    13 schema:datePublishedReg 2009-07-31
    14 schema:description A number of new algorithms for nonparametric distribution analysis based on Maximum Mean Discrepancy measures have been recently introduced. These novel algorithms operate in Hilbert space and can be used for nonparametric two-sample tests. Coupled with recent advances in string kernels, these methods extend the scope of kernel-based methods in the area of text mining. We review these kernel-based two-sample tests focusing on text mining where we will propose novel applications and present an efficient implementation in the kernlab package. We also present an efficient and integrated environment for applying modern machine learning methods to complex text mining problems through the combined use of the tm (for text mining) and the kernlab (for kernel-based learning) R packages.
    15 schema:editor Nd93609a668984f81ae541c862116c05f
    16 schema:genre chapter
    17 schema:inLanguage en
    18 schema:isAccessibleForFree false
    19 schema:isPartOf N4fd9572e0ec74b9294a236fb867dcb59
    20 schema:name Nonparametric Distribution Analysis for Text Mining
    21 schema:pagination 295-305
    22 schema:productId N1cc5b090e26d475a94f8669b3d04757a
    23 N1e2cca7e34c940748fb4237ee3ccef9a
    24 N74207f8a08224db89af976657a8875ff
    25 schema:publisher N60bd368bd57944c1b9f6d7d9c9702270
    26 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007202881
    27 https://doi.org/10.1007/978-3-642-01044-6_27
    28 schema:sdDatePublished 2019-04-16T07:27
    29 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    30 schema:sdPublisher Nfe9b3ea6d5b04dd7a06e739026282764
    31 schema:url https://link.springer.com/10.1007%2F978-3-642-01044-6_27
    32 sgo:license sg:explorer/license/
    33 sgo:sdDataset chapters
    34 rdf:type schema:Chapter
    35 N1cc5b090e26d475a94f8669b3d04757a schema:name readcube_id
    36 schema:value 224ed2ff2a0ed07510bcaca1efab3b9645b1059a7d1a5b78bd9190416a4dddb8
    37 rdf:type schema:PropertyValue
    38 N1e2cca7e34c940748fb4237ee3ccef9a schema:name dimensions_id
    39 schema:value pub.1007202881
    40 rdf:type schema:PropertyValue
    41 N29bdd3c5f34c427887cec965aa74e0bf schema:familyName Seidel
    42 schema:givenName Wilfried
    43 rdf:type schema:Person
    44 N3289c45d78174bc3a8b5fb18f0bc288f schema:name INSA de Rouen, LITIS, Rouen, France
    45 rdf:type schema:Organization
    46 N4dff6ceccf664ddcb2797d873415be12 rdf:first Nd7759420d5fd45da9906b466edb919c4
    47 rdf:rest rdf:nil
    48 N4fd9572e0ec74b9294a236fb867dcb59 schema:isbn 978-3-642-01043-9
    49 978-3-642-01044-6
    50 schema:name Advances in Data Analysis, Data Handling and Business Intelligence
    51 rdf:type schema:Book
    52 N5debfb2186dd490e87bd1d6bd3346aff schema:familyName Fink
    53 schema:givenName Andreas
    54 rdf:type schema:Person
    55 N60bd368bd57944c1b9f6d7d9c9702270 schema:location Berlin, Heidelberg
    56 schema:name Springer Berlin Heidelberg
    57 rdf:type schema:Organisation
    58 N66af6a2c27904e8ea958d9f087ca5859 rdf:first sg:person.07537723735.78
    59 rdf:rest Nba9462927f484207a887be33cfe5fb78
    60 N679d2b1c8fa54666aee7cbb4652e55da rdf:first N8a4bce4698b94bd1bb26778bd62f4501
    61 rdf:rest N88eb8f08c4a94abd95cf9f2a159f68cd
    62 N74207f8a08224db89af976657a8875ff schema:name doi
    63 schema:value 10.1007/978-3-642-01044-6_27
    64 rdf:type schema:PropertyValue
    65 N7f61b133e5db4071afaa5bcea2b29592 schema:name INSA de Rouen, LITIS, Rouen, France
    66 rdf:type schema:Organization
    67 N88eb8f08c4a94abd95cf9f2a159f68cd rdf:first N29bdd3c5f34c427887cec965aa74e0bf
    68 rdf:rest N4dff6ceccf664ddcb2797d873415be12
    69 N8a4bce4698b94bd1bb26778bd62f4501 schema:familyName Lausen
    70 schema:givenName Berthold
    71 rdf:type schema:Person
    72 Nba9462927f484207a887be33cfe5fb78 rdf:first sg:person.01117376470.61
    73 rdf:rest Nd5d14aa074e849539994b350b1a4c86f
    74 Ncf818a9383fd4fda8e8d6e261e0b57d3 schema:name INSA de Rouen, LITIS, Rouen, France
    75 rdf:type schema:Organization
    76 Nd5d14aa074e849539994b350b1a4c86f rdf:first sg:person.01355621653.94
    77 rdf:rest rdf:nil
    78 Nd7759420d5fd45da9906b466edb919c4 schema:familyName Ultsch
    79 schema:givenName Alfred
    80 rdf:type schema:Person
    81 Nd93609a668984f81ae541c862116c05f rdf:first N5debfb2186dd490e87bd1d6bd3346aff
    82 rdf:rest N679d2b1c8fa54666aee7cbb4652e55da
    83 Nfe9b3ea6d5b04dd7a06e739026282764 schema:name Springer Nature - SN SciGraph project
    84 rdf:type schema:Organization
    85 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    86 schema:name Information and Computing Sciences
    87 rdf:type schema:DefinedTerm
    88 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    89 schema:name Artificial Intelligence and Image Processing
    90 rdf:type schema:DefinedTerm
    91 sg:person.01117376470.61 schema:affiliation N3289c45d78174bc3a8b5fb18f0bc288f
    92 schema:familyName Feinerer
    93 schema:givenName Ingo
    94 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01117376470.61
    95 rdf:type schema:Person
    96 sg:person.01355621653.94 schema:affiliation N7f61b133e5db4071afaa5bcea2b29592
    97 schema:familyName Hornik
    98 schema:givenName Kurt
    99 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01355621653.94
    100 rdf:type schema:Person
    101 sg:person.07537723735.78 schema:affiliation Ncf818a9383fd4fda8e8d6e261e0b57d3
    102 schema:familyName Karatzoglou
    103 schema:givenName Alexandros
    104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07537723735.78
    105 rdf:type schema:Person
    106 sg:pub.10.1007/11889342_20 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007252641
    107 https://doi.org/10.1007/11889342_20
    108 rdf:type schema:CreativeWork
    109 sg:pub.10.1007/978-1-4615-0907-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037183810
    110 https://doi.org/10.1007/978-1-4615-0907-3
    111 rdf:type schema:CreativeWork
    112 sg:pub.10.1007/bf01830689 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050011531
    113 https://doi.org/10.1007/bf01830689
    114 rdf:type schema:CreativeWork
    115 https://doi.org/10.1016/s1570-8667(03)00065-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013212920
    116 rdf:type schema:CreativeWork
    117 https://doi.org/10.1080/09332480.2003.10554843 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039953475
    118 rdf:type schema:CreativeWork
    119 https://doi.org/10.1145/1143844.1143961 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028564053
    120 rdf:type schema:CreativeWork
    121 https://doi.org/10.18637/jss.v011.i09 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068672171
    122 rdf:type schema:CreativeWork
    123 https://doi.org/10.18637/jss.v025.i05 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068672367
    124 rdf:type schema:CreativeWork
     




    Preview window. Press ESC to close (or click here)


    ...