Spark has emerged as the big data platform of choice for data scientists. The real power and value proposition of Apache Spark is its platform to execute data science tasks. Spark’s unique use case is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets.
This hands-on, practical resource will allow you to dive in and become comfortable and confident in working with Spark for data science. We will walk you through various techniques to deal with simple and complex data science tasks with Spark. We’ll effectively offer solutions to problematic concepts in data science using Spark’s data science libraries. The book will help you derive intelligent information at every step of the way through simple yet efficient recipes that will not only show you how to implement algorithms, but also optimize your work.
Chapter 1. Big Data Analytics with Spark
Chapter 2. Tricky Statistics with Spark
Chapter 3. Data Analysis with Spark
Chapter 4. Clustering, Classification, and Regression
Chapter 5. Working with Spark MLlib
Chapter 6. NLP with Spark
Chapter 7. Working with Sparkling Water – H2O
Chapter 8. Data Visualization with Spark
Chapter 9. Deep Learning on Spark
Chapter 10. Working with SparkR