2003-10
AUTHORS ABSTRACTA fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise—that the appearance of a topic in a document stream is signaled by a “burst of activity,” with certain features rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such “bursts,” in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an infinite-state automaton, in which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network traffic. The resulting algorithms are highly efficient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them. More... »
PAGES373-397
http://scigraph.springernature.com/pub.10.1023/a:1024940629314
DOIhttp://dx.doi.org/10.1023/a:1024940629314
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1042400043
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information Systems",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Department of Computer Science, Cornell University, 14853, Ithaca, NY, USA",
"id": "http://www.grid.ac/institutes/grid.5386.8",
"name": [
"Department of Computer Science, Cornell University, 14853, Ithaca, NY, USA"
],
"type": "Organization"
},
"familyName": "Kleinberg",
"givenName": "Jon",
"id": "sg:person.011522233557.04",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522233557.04"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1023/a:1007506220214",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1051234706",
"https://doi.org/10.1023/a:1007506220214"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1007469218079",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1030131329",
"https://doi.org/10.1023/a:1007469218079"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/3-540-45465-9_59",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002330524",
"https://doi.org/10.1007/3-540-45465-9_59"
],
"type": "CreativeWork"
}
],
"datePublished": "2003-10",
"datePublishedReg": "2003-10-01",
"description": "A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise\u2014that the appearance of a topic in a document stream is signaled by a \u201cburst of activity,\u201d with certain features rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such \u201cbursts,\u201d in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an infinite-state automaton, in which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network traffic. The resulting algorithms are highly efficient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them.",
"genre": "article",
"id": "sg:pub.10.1023/a:1024940629314",
"inLanguage": "en",
"isAccessibleForFree": false,
"isPartOf": [
{
"id": "sg:journal.1041853",
"issn": [
"1384-5810",
"1573-756X"
],
"name": "Data Mining and Knowledge Discovery",
"publisher": "Springer Nature",
"type": "Periodical"
},
{
"issueNumber": "4",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "7"
}
],
"keywords": [
"document streams",
"text data mining",
"text mining work",
"bursty network traffic",
"hierarchical structure",
"data mining",
"network traffic",
"underlying content",
"set of bursts",
"formal approach",
"infinite-state automata",
"paper archives",
"particular research field",
"meaningful structures",
"such streams",
"fundamental problem",
"news articles",
"research field",
"mining works",
"overall stream",
"streams",
"state transitions",
"mail",
"mining",
"algorithm",
"traffic",
"organizational framework",
"topic",
"automata",
"natural meaning",
"bursty",
"certain features",
"framework",
"representation",
"period of time",
"set",
"work",
"archives",
"features",
"goal",
"time",
"way",
"example",
"model",
"premise",
"experiments",
"structure",
"terms",
"natural examples",
"bursts of activity",
"field",
"content",
"article",
"area",
"meaning",
"literature",
"theory",
"analogy",
"scale",
"appearance",
"bursts",
"present work",
"time scales",
"rise",
"phenomenon",
"long time scales",
"activity",
"frequency",
"transition",
"intensity",
"period",
"similar phenomenon",
"approach",
"problem"
],
"name": "Bursty and Hierarchical Structure in Streams",
"pagination": "373-397",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1042400043"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1023/a:1024940629314"
]
}
],
"sameAs": [
"https://doi.org/10.1023/a:1024940629314",
"https://app.dimensions.ai/details/publication/pub.1042400043"
],
"sdDataset": "articles",
"sdDatePublished": "2022-05-10T09:54",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220509/entities/gbq_results/article/article_363.jsonl",
"type": "ScholarlyArticle",
"url": "https://doi.org/10.1023/a:1024940629314"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1024940629314'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1024940629314'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1024940629314'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1024940629314'
This table displays all metadata directly associated to this object as RDF triples.
148 TRIPLES
22 PREDICATES
103 URIs
91 LITERALS
6 BLANK NODES
Subject | Predicate | Object | |
---|---|---|---|
1 | sg:pub.10.1023/a:1024940629314 | schema:about | anzsrc-for:08 |
2 | ″ | ″ | anzsrc-for:0801 |
3 | ″ | ″ | anzsrc-for:0806 |
4 | ″ | schema:author | N99d4aae70520401598cbf5dadf0357e9 |
5 | ″ | schema:citation | sg:pub.10.1007/3-540-45465-9_59 |
6 | ″ | ″ | sg:pub.10.1023/a:1007469218079 |
7 | ″ | ″ | sg:pub.10.1023/a:1007506220214 |
8 | ″ | schema:datePublished | 2003-10 |
9 | ″ | schema:datePublishedReg | 2003-10-01 |
10 | ″ | schema:description | A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise—that the appearance of a topic in a document stream is signaled by a “burst of activity,” with certain features rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such “bursts,” in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an infinite-state automaton, in which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network traffic. The resulting algorithms are highly efficient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them. |
11 | ″ | schema:genre | article |
12 | ″ | schema:inLanguage | en |
13 | ″ | schema:isAccessibleForFree | false |
14 | ″ | schema:isPartOf | N236b283b34b9499ebb21578950e38f4a |
15 | ″ | ″ | N9c1d5a4e99cf4e4d8b12ab18ccc404d0 |
16 | ″ | ″ | sg:journal.1041853 |
17 | ″ | schema:keywords | activity |
18 | ″ | ″ | algorithm |
19 | ″ | ″ | analogy |
20 | ″ | ″ | appearance |
21 | ″ | ″ | approach |
22 | ″ | ″ | archives |
23 | ″ | ″ | area |
24 | ″ | ″ | article |
25 | ″ | ″ | automata |
26 | ″ | ″ | bursts |
27 | ″ | ″ | bursts of activity |
28 | ″ | ″ | bursty |
29 | ″ | ″ | bursty network traffic |
30 | ″ | ″ | certain features |
31 | ″ | ″ | content |
32 | ″ | ″ | data mining |
33 | ″ | ″ | document streams |
34 | ″ | ″ | example |
35 | ″ | ″ | experiments |
36 | ″ | ″ | features |
37 | ″ | ″ | field |
38 | ″ | ″ | formal approach |
39 | ″ | ″ | framework |
40 | ″ | ″ | frequency |
41 | ″ | ″ | fundamental problem |
42 | ″ | ″ | goal |
43 | ″ | ″ | hierarchical structure |
44 | ″ | ″ | infinite-state automata |
45 | ″ | ″ | intensity |
46 | ″ | ″ | literature |
47 | ″ | ″ | long time scales |
48 | ″ | ″ | |
49 | ″ | ″ | meaning |
50 | ″ | ″ | meaningful structures |
51 | ″ | ″ | mining |
52 | ″ | ″ | mining works |
53 | ″ | ″ | model |
54 | ″ | ″ | natural examples |
55 | ″ | ″ | natural meaning |
56 | ″ | ″ | network traffic |
57 | ″ | ″ | news articles |
58 | ″ | ″ | organizational framework |
59 | ″ | ″ | overall stream |
60 | ″ | ″ | paper archives |
61 | ″ | ″ | particular research field |
62 | ″ | ″ | period |
63 | ″ | ″ | period of time |
64 | ″ | ″ | phenomenon |
65 | ″ | ″ | premise |
66 | ″ | ″ | present work |
67 | ″ | ″ | problem |
68 | ″ | ″ | representation |
69 | ″ | ″ | research field |
70 | ″ | ″ | rise |
71 | ″ | ″ | scale |
72 | ″ | ″ | set |
73 | ″ | ″ | set of bursts |
74 | ″ | ″ | similar phenomenon |
75 | ″ | ″ | state transitions |
76 | ″ | ″ | streams |
77 | ″ | ″ | structure |
78 | ″ | ″ | such streams |
79 | ″ | ″ | terms |
80 | ″ | ″ | text data mining |
81 | ″ | ″ | text mining work |
82 | ″ | ″ | theory |
83 | ″ | ″ | time |
84 | ″ | ″ | time scales |
85 | ″ | ″ | topic |
86 | ″ | ″ | traffic |
87 | ″ | ″ | transition |
88 | ″ | ″ | underlying content |
89 | ″ | ″ | way |
90 | ″ | ″ | work |
91 | ″ | schema:name | Bursty and Hierarchical Structure in Streams |
92 | ″ | schema:pagination | 373-397 |
93 | ″ | schema:productId | N03d06a441409401aa6552a2c832cbdc5 |
94 | ″ | ″ | N7673a7f244ab44648d69fa44c9e57332 |
95 | ″ | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1042400043 |
96 | ″ | ″ | https://doi.org/10.1023/a:1024940629314 |
97 | ″ | schema:sdDatePublished | 2022-05-10T09:54 |
98 | ″ | schema:sdLicense | https://scigraph.springernature.com/explorer/license/ |
99 | ″ | schema:sdPublisher | N59ec8993a2b64d1da5318faa32541846 |
100 | ″ | schema:url | https://doi.org/10.1023/a:1024940629314 |
101 | ″ | sgo:license | sg:explorer/license/ |
102 | ″ | sgo:sdDataset | articles |
103 | ″ | rdf:type | schema:ScholarlyArticle |
104 | N03d06a441409401aa6552a2c832cbdc5 | schema:name | dimensions_id |
105 | ″ | schema:value | pub.1042400043 |
106 | ″ | rdf:type | schema:PropertyValue |
107 | N236b283b34b9499ebb21578950e38f4a | schema:issueNumber | 4 |
108 | ″ | rdf:type | schema:PublicationIssue |
109 | N59ec8993a2b64d1da5318faa32541846 | schema:name | Springer Nature - SN SciGraph project |
110 | ″ | rdf:type | schema:Organization |
111 | N7673a7f244ab44648d69fa44c9e57332 | schema:name | doi |
112 | ″ | schema:value | 10.1023/a:1024940629314 |
113 | ″ | rdf:type | schema:PropertyValue |
114 | N99d4aae70520401598cbf5dadf0357e9 | rdf:first | sg:person.011522233557.04 |
115 | ″ | rdf:rest | rdf:nil |
116 | N9c1d5a4e99cf4e4d8b12ab18ccc404d0 | schema:volumeNumber | 7 |
117 | ″ | rdf:type | schema:PublicationVolume |
118 | anzsrc-for:08 | schema:inDefinedTermSet | anzsrc-for: |
119 | ″ | schema:name | Information and Computing Sciences |
120 | ″ | rdf:type | schema:DefinedTerm |
121 | anzsrc-for:0801 | schema:inDefinedTermSet | anzsrc-for: |
122 | ″ | schema:name | Artificial Intelligence and Image Processing |
123 | ″ | rdf:type | schema:DefinedTerm |
124 | anzsrc-for:0806 | schema:inDefinedTermSet | anzsrc-for: |
125 | ″ | schema:name | Information Systems |
126 | ″ | rdf:type | schema:DefinedTerm |
127 | sg:journal.1041853 | schema:issn | 1384-5810 |
128 | ″ | ″ | 1573-756X |
129 | ″ | schema:name | Data Mining and Knowledge Discovery |
130 | ″ | schema:publisher | Springer Nature |
131 | ″ | rdf:type | schema:Periodical |
132 | sg:person.011522233557.04 | schema:affiliation | grid-institutes:grid.5386.8 |
133 | ″ | schema:familyName | Kleinberg |
134 | ″ | schema:givenName | Jon |
135 | ″ | schema:sameAs | https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011522233557.04 |
136 | ″ | rdf:type | schema:Person |
137 | sg:pub.10.1007/3-540-45465-9_59 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1002330524 |
138 | ″ | ″ | https://doi.org/10.1007/3-540-45465-9_59 |
139 | ″ | rdf:type | schema:CreativeWork |
140 | sg:pub.10.1023/a:1007469218079 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1030131329 |
141 | ″ | ″ | https://doi.org/10.1023/a:1007469218079 |
142 | ″ | rdf:type | schema:CreativeWork |
143 | sg:pub.10.1023/a:1007506220214 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1051234706 |
144 | ″ | ″ | https://doi.org/10.1023/a:1007506220214 |
145 | ″ | rdf:type | schema:CreativeWork |
146 | grid-institutes:grid.5386.8 | schema:alternateName | Department of Computer Science, Cornell University, 14853, Ithaca, NY, USA |
147 | ″ | schema:name | Department of Computer Science, Cornell University, 14853, Ithaca, NY, USA |
148 | ″ | rdf:type | schema:Organization |