Apache Spark Streaming with Scala培训
Introduction
Scala Programming in Depth Review
Syntax and structure
Flow control and functions
Spark Internals
Resilient Distributed Datasets (RDD)
Spark script to graph to cluster
Overview of Spark Streaming
Streaming architecture
Intervals in streaming
Fault tolerance
Preparing the Development Environment
Installing and configuring Apache Spark
Installing and configuring the Scala IDE
Installing and configuring JDK
Spark Streaming Beginner to Advanced
Working with key/value RDD's
Filtering RDD's
Improving Spark scripts with regular expressions
Sharing data on a cluster
Working with network data sets
Implementing BFS algorithms
Creating Spark driver scripts
Tracking in real time with scripts
Writing continuous applications
Streaming linear regression
Using Spark Machine Learning Library
Spark and Clusters
Bundling dependencies and Spark scripts using the SBT tool
Using EMR for illustrating clusters
Optimizing by partitioning RDD's
Using Spark logs
Integration in Spark Streaming
Integrating Apache Kafka and working with Kafka topics
Integrating Apache Fume and working with pull-based/push-based Flume configurations
Writing a custom receiver class
Integrating Cassandra and exposing data as real-time services
In Production
Packaging an application and running it with Spark-Submit
Troubleshooting, tuning, and debugging Spark Jobs and clusters
Summary and Conclusion