Since 2014, Typesafe has been actively contributing to the Apache Spark project, and has become a certified development support partner of Databricks, the company started by the creators of Spark. Typesafe and Mesosphere have forged a partnership in which Typesafe is the official commercial support provider of Spark on Apache Mesos, along with Mesosphere’s Datacenter Operating Systems (DCOS).
In this webinar with Iulian Dragos, Spark team lead at Typesafe Inc., we reveal how Typesafe supports running Spark in various deployment modes, along with the improvements we made to Spark to help integrate backpressure signals into the underlying technologies, making it a better fit for Reactive Streams. He also show you the functionalities at work, and how to make it simple to deploy to Spark on Mesos with Typesafe.
We will introduce:
Various deployment modes for Spark: Standalone, Spark on Mesos, and Spark with Mesosphere DCOS
Overview of Mesos and how it relates to Mesosphere DCOS
Deeper look at how Spark runs on Mesos
How to manage coarse-grained and fine-grained scheduling modes on Mesos
What to know about a client vs. cluster deployment
A demo running Spark on Mesos
TeamStation AI System Report LATAM IT Salaries 2024
How to deploy Apache Spark to Mesos/DCOS
1. How to deploy Apache Spark
to Mesos/DCOS
with Iulian Dragoș
2. Agenda
• Intro Apache Spark
• Apache Mesos
• Why Spark on Mesos
• A look under the hood
2
3.
4. Spark - lightning-fast cluster computing
• next generation Big Data solution
• analytics and data processing
• up to 100x faster than Hadoop MapReduce
• built with Scala and Akka
• Apache top-level project
4
5. Spark
• It’s a next generation compute-engine
• Does not replace the whole Hadoop ecosystem
• just MapReduce
• Integrates/works with HDFS, Hive, Hbase, etc.
5
6. Spark API
• Scala distributed collections
• also available from Python and Java
• interactive shell and job submission
• streaming and batch modes
• flourishing ecosystem (SparkSQL, MLLib, GraphX)
6
10. Why Apache Mesos?
• General (a “distributed kernel”)
• Efficient resource management
• Proven technology (in production at Apple and Twitter)
• Typesafe & Mesosphere maintain the Spark/Mesos framework
10
“Program against your datacenter as a single pool of resources”
11. Frameworks running on Mesos
• HDFS
• Cassandra
• ElasticSearch
• Yarn (Myriad)
• Marathon, etc.
• and of course, Spark
11
12. Resource scheduling with Mesos
• 2-level scheduling
• Mesos offers resources to frameworks
• Frameworks accept or reject offers
• Offers include
• CPU cores, memory, ports, disk
12
22. Dynamic allocation
• Mesos support was added in Spark 1.5
• adds and removes executors based on load
• when executors are idle, kills them
• when tasks queue up in the scheduler, adds executors
• needs external-shuffle-service to be running on each node
22
23. Client vs Cluster mode
• Where does the driver process run?
• client-mode: on the machine that submits the job
• cluster-mode: on a machine in the cluster
23
26. Closing words on Spark Streaming
• Spark 1.5 improves resiliency by adding back-pressure inside
Spark Streaming
• slow-down receivers dynamically, based on load
• Spark 1.6 will add the ability to connect to Reactive Streams
• propagate back-pressure outside of Spark
26
27. Key points
• Spark is a next-generation compute engine for Big Data
• Mesos is a next-generation cluster manager
• better utilization of cluster resources across organization
• Spark on Mesos is commercially supported by Typesafe
• Typesafe&Mesosphere are the maintainers of Spark/Mesos
27
28. EXPERT SUPPORT
Why Contact Typesafe for Your Apache Spark Project?
Ignite your Spark project with 24/7 production SLA,
unlimited expert support and on-site training:
• Full application lifecycle support for Spark Core,
Spark SQL & Spark Streaming
• Deployment to Standalone, EC2, Mesos clusters
• Expert support from dedicated Spark team
• Optional 10-day “getting started” services
package
Typesafe is a partner with Databricks, Mesosphere
and IBM.
Learn more about on-site trainingCONTACT US