Finding Scientific Topics Revisited View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2014

AUTHORS

Martin Ponweiser , Bettina Grün , Kurt Hornik

ABSTRACT

The publication of statistical results based on the use of computational tools requires that the data as well as the code are provided in order to allow to reproduce and verify the results with reasonable effort. However, this only allows to rerun the exact same analysis. While this is helpful to understand and retrace the steps of the analysis which led to the published results, it constitutes only a limited proof of reproducibility. In fact for “true” reproducibility one might require that the essentially same results are obtained in an independent analysis. To check for this “true” reproducibility of results of a text mining application we replicate a study where a latent Dirichlet allocation model was fitted to the document-term matrix derived for the abstracts of the papers published in the Proceedings of the National Academy of Sciences from 1991 to 2001. Comparing the results we assess (1) how well the corpus and the document-term matrix can be reconstructed, (2) if the same model would be selected and (3) if the analysis of the fitted model leads to the same main conclusions and insights. Our study indicates that the results from this study are robust with respect to slightly different preprocessing steps and the use of a different software to fit the model. More... »

PAGES

93-100

Book

TITLE

Advances in Latent Variables

ISBN

978-3-319-02966-5
978-3-319-02967-2

From Grant

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/10104_2014_11

DOI

http://dx.doi.org/10.1007/10104_2014_11

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1007422857


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computation Theory and Mathematics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Vienna University of Economics and Business", 
          "id": "https://www.grid.ac/institutes/grid.15788.33", 
          "name": [
            "Department of Finance, Accounting and Statistics, Institute for Statistics and Mathematics, WU (Wirtschaftsuniversit\u00e4t Wien), Welthandelsplatz 1, 1020\u00a0Wien, Austria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ponweiser", 
        "givenName": "Martin", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Johannes Kepler University of Linz", 
          "id": "https://www.grid.ac/institutes/grid.9970.7", 
          "name": [
            "Department of Applied Statistics, Johannes Kepler University Linz, Altenbergerstra\u00dfe 69, 4040\u00a0Linz, Austria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Gr\u00fcn", 
        "givenName": "Bettina", 
        "id": "sg:person.0762354775.48", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0762354775.48"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Vienna University of Economics and Business", 
          "id": "https://www.grid.ac/institutes/grid.15788.33", 
          "name": [
            "Department of Finance, Accounting and Statistics, Institute for Statistics and Mathematics, WU (Wirtschaftsuniversit\u00e4t Wien), Welthandelsplatz 1, 1020\u00a0Wien, Austria"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hornik", 
        "givenName": "Kurt", 
        "id": "sg:person.01355621653.94", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01355621653.94"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1002/jae.1083", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012068005"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/biostatistics/kxq033", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022840044"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0307752101", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026144033"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bib/bbq084", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042666586"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.18637/jss.v025.i05", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1068672367"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.18637/jss.v040.i13", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1068672607"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2014", 
    "datePublishedReg": "2014-01-01", 
    "description": "The publication of statistical results based on the use of computational tools requires that the data as well as the code are provided in order to allow to reproduce and verify the results with reasonable effort. However, this only allows to rerun the exact same analysis. While this is helpful to understand and retrace the steps of the analysis which led to the published results, it constitutes only a limited proof of reproducibility. In fact for \u201ctrue\u201d reproducibility one might require that the essentially same results are obtained in an independent analysis. To check for this \u201ctrue\u201d reproducibility of results of a text mining application we replicate a study where a latent Dirichlet allocation model was fitted to the document-term matrix derived for the abstracts of the papers published in the Proceedings of the National Academy of Sciences from 1991 to 2001. Comparing the results we assess (1) how well the corpus and the document-term matrix can be reconstructed, (2) if the same model would be selected and (3) if the analysis of the fitted model leads to the same main conclusions and insights. Our study indicates that the results from this study are robust with respect to slightly different preprocessing steps and the use of a different software to fit the model.", 
    "editor": [
      {
        "familyName": "Carpita", 
        "givenName": "Maurizio", 
        "type": "Person"
      }, 
      {
        "familyName": "Brentari", 
        "givenName": "Eugenio", 
        "type": "Person"
      }, 
      {
        "familyName": "Qannari", 
        "givenName": "El Mostafa", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/10104_2014_11", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.6195058", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": {
      "isbn": [
        "978-3-319-02966-5", 
        "978-3-319-02967-2"
      ], 
      "name": "Advances in Latent Variables", 
      "type": "Book"
    }, 
    "name": "Finding Scientific Topics Revisited", 
    "pagination": "93-100", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/10104_2014_11"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "235b6251f8054c20d4e088d94aeec0680cbf23add3d6e03d3246473aa2e6a1cc"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1007422857"
        ]
      }
    ], 
    "publisher": {
      "location": "Cham", 
      "name": "Springer International Publishing", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/10104_2014_11", 
      "https://app.dimensions.ai/details/publication/pub.1007422857"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T18:09", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000247.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/10104_2014_11"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/10104_2014_11'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/10104_2014_11'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/10104_2014_11'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/10104_2014_11'


 

This table displays all metadata directly associated to this object as RDF triples.

111 TRIPLES      23 PREDICATES      33 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/10104_2014_11 schema:about anzsrc-for:08
2 anzsrc-for:0802
3 schema:author N1a3f8a5acd204311a7e6e7689488576a
4 schema:citation https://doi.org/10.1002/jae.1083
5 https://doi.org/10.1073/pnas.0307752101
6 https://doi.org/10.1093/bib/bbq084
7 https://doi.org/10.1093/biostatistics/kxq033
8 https://doi.org/10.18637/jss.v025.i05
9 https://doi.org/10.18637/jss.v040.i13
10 schema:datePublished 2014
11 schema:datePublishedReg 2014-01-01
12 schema:description The publication of statistical results based on the use of computational tools requires that the data as well as the code are provided in order to allow to reproduce and verify the results with reasonable effort. However, this only allows to rerun the exact same analysis. While this is helpful to understand and retrace the steps of the analysis which led to the published results, it constitutes only a limited proof of reproducibility. In fact for “true” reproducibility one might require that the essentially same results are obtained in an independent analysis. To check for this “true” reproducibility of results of a text mining application we replicate a study where a latent Dirichlet allocation model was fitted to the document-term matrix derived for the abstracts of the papers published in the Proceedings of the National Academy of Sciences from 1991 to 2001. Comparing the results we assess (1) how well the corpus and the document-term matrix can be reconstructed, (2) if the same model would be selected and (3) if the analysis of the fitted model leads to the same main conclusions and insights. Our study indicates that the results from this study are robust with respect to slightly different preprocessing steps and the use of a different software to fit the model.
13 schema:editor N61bfb053e3d84ddcabd6051bf5ae1df6
14 schema:genre chapter
15 schema:inLanguage en
16 schema:isAccessibleForFree false
17 schema:isPartOf N2837dbf832574e44a3487e2d5f4b3ed7
18 schema:name Finding Scientific Topics Revisited
19 schema:pagination 93-100
20 schema:productId N2567434c1b734cc4a6ba2faa34067959
21 N2e01e833b46c4425acf54a5c7e9fbe85
22 Nf866209bba914955a2550737cfd063d8
23 schema:publisher N23dc4aedec424d0da5364cb0f593745a
24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007422857
25 https://doi.org/10.1007/10104_2014_11
26 schema:sdDatePublished 2019-04-15T18:09
27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
28 schema:sdPublisher Nb388b863538045c68f0e55b5fb6e7f63
29 schema:url http://link.springer.com/10.1007/10104_2014_11
30 sgo:license sg:explorer/license/
31 sgo:sdDataset chapters
32 rdf:type schema:Chapter
33 N0a4c7c2558ad42f9a01e151d91ae815f schema:familyName Carpita
34 schema:givenName Maurizio
35 rdf:type schema:Person
36 N0bf23af63a3243c0bf21b4a40946a41d rdf:first N374887d2824343db9b41f30b871dd412
37 rdf:rest N973415c424dd44e2890aa5567278f77e
38 N1a3f8a5acd204311a7e6e7689488576a rdf:first Nbb2182d433d74e7fb43a53508eb87f29
39 rdf:rest N6d5eaea05ec045e9a2d5b65b00ea13b3
40 N23dc4aedec424d0da5364cb0f593745a schema:location Cham
41 schema:name Springer International Publishing
42 rdf:type schema:Organisation
43 N2567434c1b734cc4a6ba2faa34067959 schema:name doi
44 schema:value 10.1007/10104_2014_11
45 rdf:type schema:PropertyValue
46 N2837dbf832574e44a3487e2d5f4b3ed7 schema:isbn 978-3-319-02966-5
47 978-3-319-02967-2
48 schema:name Advances in Latent Variables
49 rdf:type schema:Book
50 N2e01e833b46c4425acf54a5c7e9fbe85 schema:name readcube_id
51 schema:value 235b6251f8054c20d4e088d94aeec0680cbf23add3d6e03d3246473aa2e6a1cc
52 rdf:type schema:PropertyValue
53 N374887d2824343db9b41f30b871dd412 schema:familyName Brentari
54 schema:givenName Eugenio
55 rdf:type schema:Person
56 N61bfb053e3d84ddcabd6051bf5ae1df6 rdf:first N0a4c7c2558ad42f9a01e151d91ae815f
57 rdf:rest N0bf23af63a3243c0bf21b4a40946a41d
58 N6d5eaea05ec045e9a2d5b65b00ea13b3 rdf:first sg:person.0762354775.48
59 rdf:rest N85f0513efba0420084268fcd16dbefb7
60 N85f0513efba0420084268fcd16dbefb7 rdf:first sg:person.01355621653.94
61 rdf:rest rdf:nil
62 N973415c424dd44e2890aa5567278f77e rdf:first Needeb068b61e42f6ae5b93e196544c49
63 rdf:rest rdf:nil
64 Nb388b863538045c68f0e55b5fb6e7f63 schema:name Springer Nature - SN SciGraph project
65 rdf:type schema:Organization
66 Nbb2182d433d74e7fb43a53508eb87f29 schema:affiliation https://www.grid.ac/institutes/grid.15788.33
67 schema:familyName Ponweiser
68 schema:givenName Martin
69 rdf:type schema:Person
70 Needeb068b61e42f6ae5b93e196544c49 schema:familyName Qannari
71 schema:givenName El Mostafa
72 rdf:type schema:Person
73 Nf866209bba914955a2550737cfd063d8 schema:name dimensions_id
74 schema:value pub.1007422857
75 rdf:type schema:PropertyValue
76 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
77 schema:name Information and Computing Sciences
78 rdf:type schema:DefinedTerm
79 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
80 schema:name Computation Theory and Mathematics
81 rdf:type schema:DefinedTerm
82 sg:grant.6195058 http://pending.schema.org/fundedItem sg:pub.10.1007/10104_2014_11
83 rdf:type schema:MonetaryGrant
84 sg:person.01355621653.94 schema:affiliation https://www.grid.ac/institutes/grid.15788.33
85 schema:familyName Hornik
86 schema:givenName Kurt
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01355621653.94
88 rdf:type schema:Person
89 sg:person.0762354775.48 schema:affiliation https://www.grid.ac/institutes/grid.9970.7
90 schema:familyName Grün
91 schema:givenName Bettina
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0762354775.48
93 rdf:type schema:Person
94 https://doi.org/10.1002/jae.1083 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012068005
95 rdf:type schema:CreativeWork
96 https://doi.org/10.1073/pnas.0307752101 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026144033
97 rdf:type schema:CreativeWork
98 https://doi.org/10.1093/bib/bbq084 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042666586
99 rdf:type schema:CreativeWork
100 https://doi.org/10.1093/biostatistics/kxq033 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022840044
101 rdf:type schema:CreativeWork
102 https://doi.org/10.18637/jss.v025.i05 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068672367
103 rdf:type schema:CreativeWork
104 https://doi.org/10.18637/jss.v040.i13 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068672607
105 rdf:type schema:CreativeWork
106 https://www.grid.ac/institutes/grid.15788.33 schema:alternateName Vienna University of Economics and Business
107 schema:name Department of Finance, Accounting and Statistics, Institute for Statistics and Mathematics, WU (Wirtschaftsuniversität Wien), Welthandelsplatz 1, 1020 Wien, Austria
108 rdf:type schema:Organization
109 https://www.grid.ac/institutes/grid.9970.7 schema:alternateName Johannes Kepler University of Linz
110 schema:name Department of Applied Statistics, Johannes Kepler University Linz, Altenbergerstraße 69, 4040 Linz, Austria
111 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...