Hancock: A Language for Analyzing Transactional Data Streams View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2016

AUTHORS

Corinna Cortes , Kathleen Fisher , Daryl Pregibon , Anne Rogers , Frederick Smith

ABSTRACT

Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover how the transactors, e.g., credit-card numbers or IP addresses, use the associated services. For over six years, we have computed evolving profiles (called signatures) of the transactors in several large data streams. The signature for each transactor captures the salient features of his or her transactions through time. Programs for processing signatures must be highly optimized because of the size of the data stream (several gigabytes per day) and the number of signatures to maintain (hundreds of millions). Originally, we wrote such programs directly in C, but because signature programs often sacrificed readability for performance, they were difficult to verify and maintain. Hancock is a domain-specific language created to express computationally efficient signature programs cleanly. In this chapter, we describe the obstacles to computing signatures from massive streams and explain how Hancock addresses these problems. For expository purposes, we present Hancock using a running example from the telecommunications industry; however, the language itself is general and applies equally well to other data sources. More... »

PAGES

387-408

References to SciGraph publications

Book

TITLE

Data Stream Management

ISBN

978-3-540-28607-3
978-3-540-28608-0

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-28608-0_19

DOI

http://dx.doi.org/10.1007/978-3-540-28608-0_19

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1003243464


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "Google Research"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cortes", 
        "givenName": "Corinna", 
        "id": "sg:person.010042472421.96", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010042472421.96"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Tufts University", 
          "id": "https://www.grid.ac/institutes/grid.429997.8", 
          "name": [
            "Computer Science Department, Tufts University"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fisher", 
        "givenName": "Kathleen", 
        "id": "sg:person.014245330041.32", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014245330041.32"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Google Research"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pregibon", 
        "givenName": "Daryl", 
        "id": "sg:person.0576176455.14", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0576176455.14"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Chicago", 
          "id": "https://www.grid.ac/institutes/grid.170205.1", 
          "name": [
            "University of Chicago"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rogers", 
        "givenName": "Anne", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "The Mathworks"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Smith", 
        "givenName": "Frederick", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1023/a:1009700419189", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001263423", 
          "https://doi.org/10.1023/a:1009700419189"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf01807697", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013740393", 
          "https://doi.org/10.1007/bf01807697"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf01807697", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013740393", 
          "https://doi.org/10.1007/bf01807697"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/973097.973100", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017709267"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/245882.245905", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033377931"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/347090.347094", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045460397"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/90.779199", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061247461"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tse.1987.232894", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061788065"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2016", 
    "datePublishedReg": "2016-01-01", 
    "description": "Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover how the transactors, e.g., credit-card numbers or IP addresses, use the associated services. For over six years, we have computed evolving profiles (called signatures) of the transactors in several large data streams. The signature for each transactor captures the salient features of his or her transactions through time. Programs for processing signatures must be highly optimized because of the size of the data stream (several gigabytes per day) and the number of signatures to maintain (hundreds of millions). Originally, we wrote such programs directly in C, but because signature programs often sacrificed readability for performance, they were difficult to verify and maintain. Hancock is a domain-specific language created to express computationally efficient signature programs cleanly. In this chapter, we describe the obstacles to computing signatures from massive streams and explain how Hancock addresses these problems. For expository purposes, we present Hancock using a running example from the telecommunications industry; however, the language itself is general and applies equally well to other data sources.", 
    "editor": [
      {
        "familyName": "Garofalakis", 
        "givenName": "Minos", 
        "type": "Person"
      }, 
      {
        "familyName": "Gehrke", 
        "givenName": "Johannes", 
        "type": "Person"
      }, 
      {
        "familyName": "Rastogi", 
        "givenName": "Rajeev", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-28608-0_19", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-28607-3", 
        "978-3-540-28608-0"
      ], 
      "name": "Data Stream Management", 
      "type": "Book"
    }, 
    "name": "Hancock: A Language for Analyzing Transactional Data Streams", 
    "pagination": "387-408", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-28608-0_19"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "0f7333377fbf3656df2ef5749bb3326e5835fc8d5fdd1e0ffbdd88f1e463e2b2"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1003243464"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-28608-0_19", 
      "https://app.dimensions.ai/details/publication/pub.1003243464"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T21:55", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8693_00000245.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-540-28608-0_19"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-28608-0_19'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-28608-0_19'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-28608-0_19'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-28608-0_19'


 

This table displays all metadata directly associated to this object as RDF triples.

133 TRIPLES      23 PREDICATES      34 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-28608-0_19 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N2e5afb79635846eb8adf9f017d70646b
4 schema:citation sg:pub.10.1007/bf01807697
5 sg:pub.10.1023/a:1009700419189
6 https://doi.org/10.1109/90.779199
7 https://doi.org/10.1109/tse.1987.232894
8 https://doi.org/10.1145/245882.245905
9 https://doi.org/10.1145/347090.347094
10 https://doi.org/10.1145/973097.973100
11 schema:datePublished 2016
12 schema:datePublishedReg 2016-01-01
13 schema:description Massive transaction streams present a number of opportunities for data mining techniques. Transactions might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover how the transactors, e.g., credit-card numbers or IP addresses, use the associated services. For over six years, we have computed evolving profiles (called signatures) of the transactors in several large data streams. The signature for each transactor captures the salient features of his or her transactions through time. Programs for processing signatures must be highly optimized because of the size of the data stream (several gigabytes per day) and the number of signatures to maintain (hundreds of millions). Originally, we wrote such programs directly in C, but because signature programs often sacrificed readability for performance, they were difficult to verify and maintain. Hancock is a domain-specific language created to express computationally efficient signature programs cleanly. In this chapter, we describe the obstacles to computing signatures from massive streams and explain how Hancock addresses these problems. For expository purposes, we present Hancock using a running example from the telecommunications industry; however, the language itself is general and applies equally well to other data sources.
14 schema:editor N146478715c384d4cbe2a46a8542099a7
15 schema:genre chapter
16 schema:inLanguage en
17 schema:isAccessibleForFree false
18 schema:isPartOf N2b52b34f3e0e428d91bb220f6f08a84f
19 schema:name Hancock: A Language for Analyzing Transactional Data Streams
20 schema:pagination 387-408
21 schema:productId N262f5525fb5548d9994007e290ef51fa
22 N33ee5f254f874836b3048de7f31afd0b
23 Nfb6fb04f87704cd6af4fe519396d9ad9
24 schema:publisher Ncd4e65cc38d04b4082e66778e7d1e7e1
25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003243464
26 https://doi.org/10.1007/978-3-540-28608-0_19
27 schema:sdDatePublished 2019-04-15T21:55
28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
29 schema:sdPublisher N864ca77f41284008aca350f217335679
30 schema:url http://link.springer.com/10.1007/978-3-540-28608-0_19
31 sgo:license sg:explorer/license/
32 sgo:sdDataset chapters
33 rdf:type schema:Chapter
34 N146478715c384d4cbe2a46a8542099a7 rdf:first Neb7540ff80a148e4a1c00a5fa35b212c
35 rdf:rest Nfd3fcbe7afa34f29be64e9a04ded0bf0
36 N1df98a26dc35407bb98e7dced871fc08 schema:affiliation N6e7059b60cff4f7084ec715ac38b092f
37 schema:familyName Smith
38 schema:givenName Frederick
39 rdf:type schema:Person
40 N24e81506e924497c8f8b3638a651e5f9 rdf:first sg:person.014245330041.32
41 rdf:rest N3656136074104bbc92ab1daad324d880
42 N262f5525fb5548d9994007e290ef51fa schema:name readcube_id
43 schema:value 0f7333377fbf3656df2ef5749bb3326e5835fc8d5fdd1e0ffbdd88f1e463e2b2
44 rdf:type schema:PropertyValue
45 N2b52b34f3e0e428d91bb220f6f08a84f schema:isbn 978-3-540-28607-3
46 978-3-540-28608-0
47 schema:name Data Stream Management
48 rdf:type schema:Book
49 N2e5afb79635846eb8adf9f017d70646b rdf:first sg:person.010042472421.96
50 rdf:rest N24e81506e924497c8f8b3638a651e5f9
51 N2fa133aeda0b4459abe0a38f33404b0a rdf:first N1df98a26dc35407bb98e7dced871fc08
52 rdf:rest rdf:nil
53 N33ee5f254f874836b3048de7f31afd0b schema:name dimensions_id
54 schema:value pub.1003243464
55 rdf:type schema:PropertyValue
56 N3656136074104bbc92ab1daad324d880 rdf:first sg:person.0576176455.14
57 rdf:rest Nbfbb2aa2b8a6448db05b9f99c17c04d3
58 N549a9dd6b3b64ab18d9ada690c756857 rdf:first N65b71cde1c88459e82323c8b9fdd703f
59 rdf:rest rdf:nil
60 N5cdd45f95aea465691f903ce6af686bc schema:name Google Research
61 rdf:type schema:Organization
62 N65b71cde1c88459e82323c8b9fdd703f schema:familyName Rastogi
63 schema:givenName Rajeev
64 rdf:type schema:Person
65 N6e7059b60cff4f7084ec715ac38b092f schema:name The Mathworks
66 rdf:type schema:Organization
67 N8374a1741ce9457f92343dffb202783f schema:affiliation https://www.grid.ac/institutes/grid.170205.1
68 schema:familyName Rogers
69 schema:givenName Anne
70 rdf:type schema:Person
71 N864ca77f41284008aca350f217335679 schema:name Springer Nature - SN SciGraph project
72 rdf:type schema:Organization
73 Nbfbb2aa2b8a6448db05b9f99c17c04d3 rdf:first N8374a1741ce9457f92343dffb202783f
74 rdf:rest N2fa133aeda0b4459abe0a38f33404b0a
75 Ncd3ce06148564e49ab35c217719a7e38 schema:familyName Gehrke
76 schema:givenName Johannes
77 rdf:type schema:Person
78 Ncd4e65cc38d04b4082e66778e7d1e7e1 schema:location Berlin, Heidelberg
79 schema:name Springer Berlin Heidelberg
80 rdf:type schema:Organisation
81 Neb7540ff80a148e4a1c00a5fa35b212c schema:familyName Garofalakis
82 schema:givenName Minos
83 rdf:type schema:Person
84 Nebc3884ee05146c68149b762ef184a2e schema:name Google Research
85 rdf:type schema:Organization
86 Nfb6fb04f87704cd6af4fe519396d9ad9 schema:name doi
87 schema:value 10.1007/978-3-540-28608-0_19
88 rdf:type schema:PropertyValue
89 Nfd3fcbe7afa34f29be64e9a04ded0bf0 rdf:first Ncd3ce06148564e49ab35c217719a7e38
90 rdf:rest N549a9dd6b3b64ab18d9ada690c756857
91 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
92 schema:name Information and Computing Sciences
93 rdf:type schema:DefinedTerm
94 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
95 schema:name Artificial Intelligence and Image Processing
96 rdf:type schema:DefinedTerm
97 sg:person.010042472421.96 schema:affiliation N5cdd45f95aea465691f903ce6af686bc
98 schema:familyName Cortes
99 schema:givenName Corinna
100 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010042472421.96
101 rdf:type schema:Person
102 sg:person.014245330041.32 schema:affiliation https://www.grid.ac/institutes/grid.429997.8
103 schema:familyName Fisher
104 schema:givenName Kathleen
105 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014245330041.32
106 rdf:type schema:Person
107 sg:person.0576176455.14 schema:affiliation Nebc3884ee05146c68149b762ef184a2e
108 schema:familyName Pregibon
109 schema:givenName Daryl
110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0576176455.14
111 rdf:type schema:Person
112 sg:pub.10.1007/bf01807697 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013740393
113 https://doi.org/10.1007/bf01807697
114 rdf:type schema:CreativeWork
115 sg:pub.10.1023/a:1009700419189 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001263423
116 https://doi.org/10.1023/a:1009700419189
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1109/90.779199 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061247461
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1109/tse.1987.232894 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061788065
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1145/245882.245905 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033377931
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1145/347090.347094 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045460397
125 rdf:type schema:CreativeWork
126 https://doi.org/10.1145/973097.973100 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017709267
127 rdf:type schema:CreativeWork
128 https://www.grid.ac/institutes/grid.170205.1 schema:alternateName University of Chicago
129 schema:name University of Chicago
130 rdf:type schema:Organization
131 https://www.grid.ac/institutes/grid.429997.8 schema:alternateName Tufts University
132 schema:name Computer Science Department, Tufts University
133 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...