Spark Streaming Info

•Transferir como PPT, PDF•

0 gostou•1,424 visualizações

Doug Chang

Software Tecnologia Diversão e humor

Spark Streaming
Much easier than Storm
Replaces Storm spouts/bolts with Akka Actors
Better API(make time part of API) and integration
Hadoop 2.3/Spark 0.9.1

Sbt setup

Create a separate sbt project; sbt run

Includes the jars and sets the class path
− Batch and Streaming,
http://spark.apache.org/docs/latest/quick-start.html
− Create a project directory
− Add dependencies; scalaized maven

libraryDependencies += "org.apache.hadoop" %
"hadoop-client" % "2.3.0"

scalaVersion:="2.10.3"
Manage the sbt/scala versions locally

Maven setup

Run the demo using maven/eclipse

Easier, maven central to find jars/artifacts

Add the external libs using maven to local repo
and mvn package in spark source distro

Eclipse: add Scala Nature, Maven project

Demo

Connect to twitter stream and process
− Test Twitter4j connection w/Java first. Print out a
twitter stream

Batch Mode: sc.stop(); RealTime Streaming
stream.awaitTermination().

Dstream/scala lazy evaluation
− Create a stream using #:: like the recursive List
operator. (#iphone,1)#:(#andriod,3)#(#apple,10).
Unlike a list head/tail behave differently. Head is a
val.

Spark Streams

StreamingContext start scheduler
− JobScheduler.scala: starts JobGenerator and runs
them in a thread pool
− JobGenerator.scala: Starts event actor, checkpoint
writer, for each thread

Storage:
− DStream appends to blockgenerator
− BlockGenerator.scala: Spark BlockGenerator w/2
threads. On termination wait for blockpush thread to
join.

Kafka Streaming Demo

KafkaUtils/Consumer connection

IOItec connection lib

Need to add more features/testing for faults

Read source how to fill out params

Start zookeeper, start a producer, define a
topic, etc...
Send data from the producer

Demo Output showing console
producer to Spark Consumer

$Producer/Executor Match the broker-id in the server conf file with groupID in the consumer call val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181 ", "1", Map("testtopic" -> 1))$

$Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()$

Mais conteúdo relacionado

Mais procurados

Deployment with capistranosagar junnarkar

Learn ELK in dockerLarry Cai

CapistranoJason Noble

Getting Started with CapistranoLaunchAny

Capistrano - automate all the thingsJohn Cleary

Introducing Chef | An IT automation for speed and awesomenessRamit Surana

How to contribute Apache CloudStackGo Chiba

Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021StreamNative

Performance Tuning Your Puppet Infrastructure - PuppetConf 2014Puppet

Performance testing meets the cloud - Artem ShendrikovAneta Kołosowska (Wiśniewska)

Learn RabbitMQ with Python in 90minsLarry Cai

OpenNebula and SaltStack - OpenNebulaConf 2013databus.pro

Using SaltStack to orchestrate microservices in application containers at Sal...Love Nyberg

DevOps Hackathon - Session 1: VagrantAntons Kranga

Docker {at,with} SignalFxMaxime Petazzoni

BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Security Conference

Running your Jenkins Infrastructure with ClusterHQClusterHQ

Container Monitoring with SysdigSreenivas Makam

Where is my scalable api?Altoros

Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...bacongobbler

Mais procurados (20)

Deployment with capistrano

Learn ELK in docker

Capistrano

Getting Started with Capistrano

Capistrano - automate all the things

Introducing Chef | An IT automation for speed and awesomeness

How to contribute Apache CloudStack

Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021

Performance Tuning Your Puppet Infrastructure - PuppetConf 2014

Performance testing meets the cloud - Artem Shendrikov

Learn RabbitMQ with Python in 90mins

OpenNebula and SaltStack - OpenNebulaConf 2013

Using SaltStack to orchestrate microservices in application containers at Sal...

DevOps Hackathon - Session 1: Vagrant

Docker {at,with} SignalFx

BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense

Running your Jenkins Infrastructure with ClusterHQ

Container Monitoring with Sysdig

Where is my scalable api?

Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...

Destaque

Odersky week1 notesDoug Chang

Apache bigtopwg7142013Doug Chang

Demographics andweblogtargetingDoug Chang

Bigtop june302013Doug Chang

Hadoop applicationarchitecturesDoug Chang

Bigtop elancesmallrev1Doug Chang

TrainingDoug Chang

Destaque (7)

Odersky week1 notes

Apache bigtopwg7142013

Demographics andweblogtargeting

Bigtop june302013

Hadoop applicationarchitectures

Bigtop elancesmallrev1

Training

Semelhante a Spark Streaming Info

Real-time streaming and data pipelines with Apache KafkaJoe Stein

Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein

Apache kafka configuration-guideChetan Khatri

TrainingHemantDunga1

[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera

Stream Processing using Apache Spark and Apache KafkaAbhinav Singh

SparkstreamingMarilyn Waldman

Apache Kafka - Scalable Message Processing and more!Guido Schmutz

Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...DataStax Academy

Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...confluent

Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend

Learning spark ch10 - Spark Streamingphanleson

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Lightbend

Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz

Kubernetes Node Deep DiveLei (Harry) Zhang

Continuous Integration With Jenkins Docker SQL ServerChris Adkin

Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreGuido Schmutz

How to start using ScalaNgoc Dao

Developing Realtime Data Pipelines With Apache KafkaJoe Stein

Semelhante a Spark Streaming Info (20)

Real-time streaming and data pipelines with Apache Kafka

Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Apache kafka configuration-guide

Training

[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story

Stream Processing using Apache Spark and Apache Kafka

Sparkstreaming

Apache Kafka - Scalable Message Processing and more!

Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...

Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...

Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications

Learning spark ch10 - Spark Streaming

Apache Kafka - Scalable Message-Processing and more !

Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

Kubernetes Node Deep Dive

Continuous Integration With Jenkins Docker SQL Server

Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core

How to start using Scala

Developing Realtime Data Pipelines With Apache Kafka

Mais de Doug Chang

BRV CTO Summit Deep Learning TalkDoug Chang

HapiDoug Chang

Capital onehadoopclassDoug Chang

Capital onehadoopintroDoug Chang

L'Oreal Tech TalkDoug Chang

Hadoop/HBase POC frameworkDoug Chang

Mais de Doug Chang (6)

BRV CTO Summit Deep Learning Talk

Hapi

Capital onehadoopclass

Capital onehadoopintro

L'Oreal Tech Talk

Hadoop/HBase POC framework

Último

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Professional Resume Template for Software DevelopersVinodh Ram

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

Active Directory Penetration Testing, cionsystems.com.pdfCionsystems

Right Money Management App For Your Financial GoalsJhone kinadey

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

TECUNIQUE: Success Stories: IT Service providermohitmore19

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

DNT_Corporate presentation know about usDynamic Netsoft

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Spark Streaming Info

1. Spark Streaming Much easier than Storm Replaces Storm spouts/bolts with Akka Actors Better API(make time part of API) and integration Hadoop 2.3/Spark 0.9.1

2. Sbt setup  Create a separate sbt project; sbt run  Includes the jars and sets the class path − Batch and Streaming, http://spark.apache.org/docs/latest/quick-start.html − Create a project directory − Add dependencies; scalaized maven  libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"  scalaVersion:="2.10.3" Manage the sbt/scala versions locally

3. Maven setup  Run the demo using maven/eclipse  Easier, maven central to find jars/artifacts  Add the external libs using maven to local repo and mvn package in spark source distro  Eclipse: add Scala Nature, Maven project

4. Demo  Connect to twitter stream and process − Test Twitter4j connection w/Java first. Print out a twitter stream  Batch Mode: sc.stop(); RealTime Streaming stream.awaitTermination().  Dstream/scala lazy evaluation − Create a stream using #:: like the recursive List operator. (#iphone,1)#:(#andriod,3)#(#apple,10). Unlike a list head/tail behave differently. Head is a val.

5. Spark Streams  StreamingContext start scheduler − JobScheduler.scala: starts JobGenerator and runs them in a thread pool − JobGenerator.scala: Starts event actor, checkpoint writer, for each thread  Storage: − DStream appends to blockgenerator − BlockGenerator.scala: Spark BlockGenerator w/2 threads. On termination wait for blockpush thread to join.

6. Kafka Streaming Demo  KafkaUtils/Consumer connection  IOItec connection lib  Need to add more features/testing for faults  Read source how to fill out params  Start zookeeper, start a producer, define a topic, etc... Send data from the producer

7. Demo Output showing console producer to Spark Consumer

8. Producer/Executor Match the broker-id in the server conf file with groupID in the consumer call val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181 ", "1", Map("testtopic" -> 1))

9. Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()

10. Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()

Spark Streaming Info

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Spark Streaming Info

Semelhante a Spark Streaming Info (20)

Mais de Doug Chang

Mais de Doug Chang (6)

Último

Último (20)

Spark Streaming Info