COPYRIGHT YEAR

2017

AUTHORS

Sweta Singh

TITLE

Benchmarking Spark Machine Learning Using BigBench

ABSTRACT

Databases such as dashDB are adding High Speed Connectors for Spark to efficiently extract large volumes of data. This allows them to be combined with other unstructured data sources and perform Machine Learning (ML) on top of it. Machine Learning is a key ingredient for such use cases. In order to assess performance of the data connectors and machine language frameworks, we sought benchmarks that have the ability to scale the size of datasets to very large volumes and apply Machine Learning algorithms. After exploring several options, we found BigBench to be a good fit. In this paper, we talk about our experiences of using BigBench with special focus on its 5 Machine Learning queries and their default implementation in Spark. We discuss on how we could improve effectiveness of BigBench for benchmarking Machine Learning by avoiding bias and inclusion of real time analytics. We also think that there is scope for improving the coverage of Machine Learning by adding more use cases like Collaborative Filtering. Lastly, we share some interesting visualization of 4 ML queries using SPSS Modeler and our experiments on different Clustering and Classification algorithms.

How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

Download the RDF metadata as:   json-ld nt turtle xml License info


24 TRIPLES      24 PREDICATES      21 URIs      13 LITERALS

Subject Predicate Object
1 book-chapters:1d54bd923fc6256d5160b910299b80ac sg:abstract Abstract Databases such as dashDB are adding High Speed Connectors for Spark to efficiently extract large volumes of data. This allows them to be combined with other unstructured data sources and perform Machine Learning (ML) on top of it. Machine Learning is a key ingredient for such use cases. In order to assess performance of the data connectors and machine language frameworks, we sought benchmarks that have the ability to scale the size of datasets to very large volumes and apply Machine Learning algorithms. After exploring several options, we found BigBench to be a good fit. In this paper, we talk about our experiences of using BigBench with special focus on its 5 Machine Learning queries and their default implementation in Spark. We discuss on how we could improve effectiveness of BigBench for benchmarking Machine Learning by avoiding bias and inclusion of real time analytics. We also think that there is scope for improving the coverage of Machine Learning by adding more use cases like Collaborative Filtering. Lastly, we share some interesting visualization of 4 ML queries using SPSS Modeler and our experiments on different Clustering and Classification algorithms.
2 sg:abstractRights OpenAccess
3 sg:bibliographyRights Restricted
4 sg:bodyHtmlRights Restricted
5 sg:bodyPdfRights Restricted
6 sg:copyrightHolder Springer International Publishing AG
7 sg:copyrightYear 2017
8 sg:ddsId Chap4
9 sg:doi 10.1007/978-3-319-54334-5_4
10 sg:esmRights OpenAccess
11 sg:hasBook books:16ddc579b55c2cb180ebddf17557a850
12 sg:hasBookEdition book-editions:c9c4729707dc50a9b210c474bd86aef3
13 sg:hasContribution contributions:ca6a8035904b6296c33e17ddc8fd0d9d
14 sg:language En
15 sg:license http://scigraph.springernature.com/explorer/license/
16 sg:metadataRights OpenAccess
17 sg:pageFirst 45
18 sg:pageLast 60
19 sg:scigraphId 1d54bd923fc6256d5160b910299b80ac
20 sg:title Benchmarking Spark Machine Learning Using BigBench
21 sg:webpage https://link.springer.com/10.1007/978-3-319-54334-5_4
22 rdf:type sg:BookChapter
23 rdfs:label BookChapter: Benchmarking Spark Machine Learning Using BigBench
24 owl:sameAs http://lod.springer.com/data/bookchapter/978-3-319-54334-5_4
HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular JSON format for linked data.

curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/book-chapters/1d54bd923fc6256d5160b910299b80ac'

N-Triples is a line-based linked data format ideal for batch operations .

curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/book-chapters/1d54bd923fc6256d5160b910299b80ac'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/book-chapters/1d54bd923fc6256d5160b910299b80ac'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/book-chapters/1d54bd923fc6256d5160b910299b80ac'






Preview window. Press ESC to close (or click here)


...