Speaker: Hari Shreedharan
Data Day Texas 2015
Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster.
Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in.
In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.