The How and Why of Fast Data Analytics with Apache Spark

•

9 likes•3,907 views

Are you tired of struggling with your existing data analytic applications? When MapReduce first emerged it was a great boon to the big data world, but modern big data processing demands have outgrown this framework. That’s where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark’s general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. This combined with it’s interactive shell make it a powerful tool useful for everybody, from data tinkerers to data scientists to data developers.

Software

The How and Why of Fast Data
Analytics with Apache Spark
with Justin Pihony  
@JustinPihony

Today’s agenda:
▪ Concerns
▪ Why Spark?
▪ Spark basics
▪ Common pitfalls
▪ We can help!
2

Concerns
▪ Am I too small?
4
▪ Will switching be too costly?
▪ Can I utilize my current infrastructure?
▪ Will I be able to find developers?
▪ Are there enough resources available?

$object WordCount{ def main(args: Array[String])){ val conf = new SparkConf() .setAppName("wordcount") val sc = new SparkContext(conf) sc.textFile(args(0)) .flatMap(_.split(" ")) .countByValue .saveAsTextFile(args(1)) } } 7 public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } Tiny CodeBig Code Why Spark?$

Why Spark?
8
Readability
Expressiveness
Fast
Testability
Interactive
Fault Tolerant
Unify Big Data

“Spark will kill MapReduce,
but save Hadoop.”
- http://insidebigdata.com/2015/12/08/big-data-industry-predictions-2016/

Big Data Unified API
13
Spark Core
Spark
SQL
Spark
Streaming
MLlib
(machine
learning)
GraphX
(graph)
DataFrames

Spark Mechanics
15
Worker WorkerWorker
Driver

Spark Mechanics
16
Spark Context
Worker WorkerWorker
Driver

Spark Context
17
Task creator
Scheduler
Data locality
Fault tolerance

RDD
18
▪ Resilient Distributed Dataset
▪ Transformations
- map
- filter
- …
▪ Actions
- collect
- count
- reduce
- …

Common Pitfalls
▪ Functional
▪ Out of memory
▪ Debugging
▪ …
21

Concerns
▪ Am I too small?
22
▪ Will switching from MapReduce be too costly?
▪ Can I utilize my current infrastructure?
▪ Will I be able to find developers?
▪ Are there enough resources available?

EXPERT SUPPORT
Why Contact Typesafe for Your Apache Spark Project?
Ignite your Spark project with 24/7 production SLA,
unlimited expert support and on-site training:
• Full application lifecycle support for Spark Core,
Spark SQL & Spark Streaming
• Deployment to Standalone, EC2, Mesos clusters
• Expert support from dedicated Spark team
• Optional 10-day “getting started” services
package
Typesafe is a partner with Databricks, Mesosphere
and IBM.
Learn more about on-site trainingCONTACT US

What's hot

You have collected a lot of time series data so now what? It's not going to be useful unless you can analyze what you have. Apache Spark has become the heir apparent to Map Reduce but did you know you don't need Hadoop? Apache Cassandra is a great data source for Spark jobs! Let me show you how it works, how to get useful information and the best part, storing analyzed data back into Cassandra. That's right. Kiss your ETL jobs goodbye and let's get to analyzing. This is going to be an action packed hour of theory, code and examples so caffeine up and let's go.

Analyzing Time Series Data with Apache Spark and Cassandra

Patrick McFadin

The first part of the slides contains general overview of SMACK stack and possible architecture layouts that could be implemented on top of it. We discuss Apache Spark internals: the concept of RDD, DAG logical view and dependencies types, execution workflow, shuffle process and core Spark components. The second part is dedicated to Mesos architecture and the concept of framework, different ways of running applications and schedule Spark jobs on top of it. We'll take a look at popular frameworks like Marathon and Chronos and see how Spark Jobs and Docker containers are executed using them.

Data processing platforms with SMACK: Spark and Mesos internals

Anton Kirillov

An Introduction to Distributed Search with Datastax Enterprise Search

Patricia Gorla

Spark Summit EU talk by Miklos Christine paddling up the stream

Spark Summit

Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications. When does one use actors vs futures? Can we use Akka with, or in place of, Storm? How did we set up instrumentation and monitoring in production? How does one use VisualVM to debug Akka apps in production? What happens if the mailbox gets full? What is our Akka stack like? I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.

Akka in Production - ScalaDays 2015

Evan Chan

We present a solution for streaming anomaly detection, named “Coral”, based on Spark, Akka and Cassandra. In the system presented, we run Spark to run the data analytics pipeline for anomaly detection. By running Spark on the latest events and data, we make sure that the model is always up-to-date and that the amount of false positives is kept low, even under changing trends and conditions. Our machine learning pipeline uses Spark decision tree ensembles and k-means clustering. Once the model is trained by Spark, the model’s parameters are pushed to the Streaming Event Processing Layer, implemented in Akka. The Akka layer will then score 1000s of event per seconds according to the last model provided by Spark. Spark and Akka communicate which each other using Cassandra as a low-latency data store. By doing so, we make sure that every element of this solution is resilient and distributed. Spark performs micro-batches to keep the model up-to-date while Akka detects the new anomalies by using the latest Spark-generated data model. The project is currently hosted on Github. Have a look at : http://coral-streaming.github.io

Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra

Natalino Busa

This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Helena Edelson

Since 2014, Typesafe has been actively contributing to the Apache Spark project, and has become a certified development support partner of Databricks, the company started by the creators of Spark. Typesafe and Mesosphere have forged a partnership in which Typesafe is the official commercial support provider of Spark on Apache Mesos, along with Mesosphere’s Datacenter Operating Systems (DCOS). In this webinar with Iulian Dragos, Spark team lead at Typesafe Inc., we reveal how Typesafe supports running Spark in various deployment modes, along with the improvements we made to Spark to help integrate backpressure signals into the underlying technologies, making it a better fit for Reactive Streams. He also show you the functionalities at work, and how to make it simple to deploy to Spark on Mesos with Typesafe. We will introduce: Various deployment modes for Spark: Standalone, Spark on Mesos, and Spark with Mesosphere DCOS Overview of Mesos and how it relates to Mesosphere DCOS Deeper look at how Spark runs on Mesos How to manage coarse-grained and fine-grained scheduling modes on Mesos What to know about a client vs. cluster deployment A demo running Spark on Mesos

How to deploy Apache Spark  to Mesos/DCOS

Legacy Typesafe (now Lightbend)

Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster. Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in. In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.

Real Time Data Processing Using Spark Streaming

Hari Shreedharan

In this session we will examine a sample application that simulates an IoT stream that is handled through Kafka, Spark Streaming, and into Cassandra. The session will discuss the implementation details including the Kafka design considerations, Spark Steaming functionality including working with windowing to achieve analytics and finally Cassandra Time series data model considerations. The example is based on OSS Kafka and Integrated Spark and Cassandra in DSE.

Feeding Cassandra with Spark-Streaming and Kafka

DataStax Academy

Alpine academy apache spark series #1 introduction to cluster computing wit...

Holden Karau

Intro to Apache Spark

Mammoth Data

NOTE: This was converted to Powerpoint from Keynote. Slideshare does not play the embedded videos. You can download the powerpoint from slideshare and import it into keynote. The videos should work in the keynote. Abstract: In this presentation, we will describe the "Spark Kernel" which enables applications, such as end-user facing and interactive applications, to interface with Spark clusters. It provides a gateway to define and run Spark tasks and to collect results from a cluster without the friction associated with shipping jars and reading results from peripheral systems. Using the Spark Kernel as a proxy, applications can be hosted remotely from Spark.

Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)

Robert "Chip" Senkbeil

Developing a Real-time Engine with Akka, Cassandra, and Spray

Jacob Park

Spark Community Update - Spark Summit San Francisco 2015

Databricks

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

Helena Edelson

Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks

Legacy Typesafe (now Lightbend)

Real time data pipeline with spark streaming and cassandra with mesos

Rahul Kumar

This session covers our experience with using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.We will start by surveying the current Cassandra analytics landscape, including Hadoop and HIVE, and touch on the use of custom input formats to extract data from Cassandra. We will then dive into Spark and Shark, two memory-based cluster computing frameworks, and how they enable often dramatic improvements in query speed and productivity, over the standard solutions today.

C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...

DataStax Academy

You definitely have heard about the SMACK architecture, which stands for Spark, Mesos, Akka, Cassandra, and Kafka. It’s especially suitable for building a lambda architecture system. But what is SDACK? Apparently it’s very much similar to SMACK except the “D" stands for Docker. While SMACK is an enterprise scale, multi-tanent supported solution, the SDACK architecture is particularly suitable for building a data product. In this talk, I’ll talk about the advantages of the SDACK architecture, and how TrendMicro uses the SDACK architecture to build an anomaly detection data product. The talk will cover: 1) The architecture we designed based on SDACK to support both batch and streaming workload. 2) The data pipeline built based on Akka Stream which is flexible, scalable, and able to do self-healing. 3) The Cassandra data model designed to support time series data writes and reads.

Using the SDACK Architecture to Build a Big Data Product

Evans Ye

What's hot (20)

Analyzing Time Series Data with Apache Spark and Cassandra

Data processing platforms with SMACK: Spark and Mesos internals

An Introduction to Distributed Search with Datastax Enterprise Search

Spark Summit EU talk by Miklos Christine paddling up the stream

Akka in Production - ScalaDays 2015

Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra

Streaming Analytics with Spark, Kafka, Cassandra and Akka

How to deploy Apache Spark  to Mesos/DCOS

Real Time Data Processing Using Spark Streaming

Feeding Cassandra with Spark-Streaming and Kafka

Alpine academy apache spark series #1 introduction to cluster computing wit...

Intro to Apache Spark

Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)

Developing a Real-time Engine with Akka, Cassandra, and Spray

Spark Community Update - Spark Summit San Francisco 2015

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks

Real time data pipeline with spark streaming and cassandra with mesos

C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...

Using the SDACK Architecture to Build a Big Data Product

Similar to The How and Why of Fast Data Analytics with Apache Spark

Scalable and Flexible Machine Learning With Scala @ LinkedIn

Vitaly Gordon

Big Data Processing with .NET and Spark (SQLBits 2020)

Michael Rys

Introduction to Scalding and Monoids

Hugo Gävert

Cascading Through Hadoop for the Boulder JUG

Matthew McCullough

Spark overview

Lisa Hua

Open XKE - Big Data, Big Mess par Bertrand Dechoux

Publicis Sapient Engineering

Spark devoxx2014

Andy Petrella

JRubyKaigi2010 Hadoop Papyrus

Koichi Fujikawa

Paco Nathan, Director of Community Evangelism at Databricks Apache Spark is intended as a fast and powerful general purpose engine for processing Hadoop data. Spark supports combinations of batch processing, streaming, SQL, ML, Graph, etc., for applications written in Scala, Java, Python, Clojure, and R, among others. In this talk, I'll explore how Spark fits into the Big Data landscape. In addition, I'll describe other systems with which Spark pairs nicely, and will also explain why Spark is needed for the work ahead.

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...

BigDataEverywhere

Apache Spark, the Next Generation Cluster Computing

Gerger

Spark streaming State of the Union - Strata San Jose 2015

Databricks

http://bit.ly/1BTaXZP – Hadoop has been a huge success in the data world. It’s disrupted decades of data management practices and technologies by introducing a massively parallel processing framework. The community and the development of all the Open Source components pushed Hadoop to where it is now. That's why the Hadoop community is excited about Apache Spark. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Sparkstreaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. Keys Botzum - Senior Principal Technologist with MapR Technologies Keys is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book.

Apache Spark & Hadoop

MapR Technologies

Recent developments in Hadoop version 2 are pushing the system from the traditional, batch oriented, computational model based on MapRecuce towards becoming a multi paradigm, general purpose, platform. In the first part of this talk we will review and contrast three popular processing frameworks. In the second part we will look at how the ecosystem (eg. Hive, Mahout, Spark) is making use of these new advancements. Finally, we will illustrate "use cases" of batch, interactive and streaming architectures to power traditional and "advanced" analytics applications.

Full stack analytics with Hadoop 2

Gabriele Modena

This is Apache Spark Question & Answer Tutorial. We provide training on Big Data & Hadoop,Hadoop Admin ,MongoDB,Data Analytics with R, Python..etc Our Big Data & Hadoop course consists of Introduction of Hadoop and Big Data,HDFS architecture ,MapReduce ,YARN ,PIG Latin ,Hive,HBase,Mahout,Zookeeper,Oozie,Flume,Spark,Nosql with quizzes and assignments. To watch the video or know more about the course, please visit http://www.knowbigdata.com/page/big-data-spark

Interview questions on Apache spark [part 2]

knowbigdata

Spark what's new what's coming

Databricks

http://bit.ly/1BTaXZP – Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and as such, there’s been plenty of hype about it in recent months, but how much of the discussion is marketing spin? And what are the facts? MapR and Databricks, the company that created and led the development of the Spark stack, will cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal and reveal the benefits for running Spark on Hadoop This presentation was given at a webinar hosted by Data Science Central and co-presented by MapR + Databricks. To see the webinar, please go to: http://www.datasciencecentral.com/video/let-spark-fly-advantages-and-use-cases-for-spark-on-hadoop

Let Spark Fly: Advantages and Use Cases for Spark on Hadoop

MapR Technologies

Hadoop Integration in Cassandra

Jairam Chandar

These are the slides from the Jump Start into Apache Spark and Databricks webinar on February 10th, 2016. --- Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download.

Jump Start into Apache® Spark™ and Databricks

Databricks

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...

Michael Rys

Stratosphere System Overview Big Data Beers Berlin. 20.11.2013

Robert Metzger

Similar to The How and Why of Fast Data Analytics with Apache Spark (20)

Scalable and Flexible Machine Learning With Scala @ LinkedIn

Big Data Processing with .NET and Spark (SQLBits 2020)

Introduction to Scalding and Monoids

Cascading Through Hadoop for the Boulder JUG

Spark overview

Open XKE - Big Data, Big Mess par Bertrand Dechoux

Spark devoxx2014

JRubyKaigi2010 Hadoop Papyrus

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...

Apache Spark, the Next Generation Cluster Computing

Spark streaming State of the Union - Strata San Jose 2015

Apache Spark & Hadoop

Full stack analytics with Hadoop 2

Interview questions on Apache spark [part 2]

Spark what's new what's coming

Let Spark Fly: Advantages and Use Cases for Spark on Hadoop

Hadoop Integration in Cassandra

Jump Start into Apache® Spark™ and Databricks

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...

Stratosphere System Overview Big Data Beers Berlin. 20.11.2013

More from Legacy Typesafe (now Lightbend)

It is widely understood that our software needs to become reactive; we need to consider responsiveness, maintainability, elasticity and scalability from the outset. Not all systems need to implement all these to the same degree, as specific project requirements will determine where effort is most wisely spent. But, in the vast majority of cases, the need to go reactive will demand that we design our applications differently. In this presentation Dr. Roland Kuhn will explore several architecture elements that are commonly found in reactive systems, like the circuit breaker, various replication techniques, and flow control protocols. These patterns are language agnostic and also independent of the abundant choice of reactive programming frameworks and libraries. They are well-specified starting points for exploring the design space of a concrete problem: thinking is strictly required! This webinar is based off of Dr. Kuhn’s session, Reactive Design Sessions, presented at WJAX and Code Mesh.

Reactive Design Patterns

Legacy Typesafe (now Lightbend)

When you need to react quickly to competitive threats, but your existing architecture is anything but nimble, what do you do? In this presentation, you will hear the story of how Walmart Canada revitalized its aging architecture with a microservices model built for speed and performance - that efficiently leveraged its JVM infrastructure - to achieve major e-commerce success in just 12 months: Conversions up 20% Mobile orders up 98% No downtime during Black Friday or Boxing Day This webinar is based off Kevin Webber’s highly successful Gartner session, Lessons Learned: Revitalizing Walmart's Aging Architecture For Web Scale, and will include added content.

Revitalizing Aging Architectures with Microservices

Legacy Typesafe (now Lightbend)

Application development has come a long way. From client-server, to desktop, to web based applications served by monolithic application servers, the need to serve billions of users and hundreds of devices have become crucial to today's business. Typesafe Reactive Platform helps you to modernize your applications by transforming the most critical parts into microservice-style architectures which support extremely high workloads and allow you to serve millions of end-users.

Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more

Legacy Typesafe (now Lightbend)

Technologies Referenced: Akka, Typesafe Reactive Platform Technical Level: Introductory Audience: Senior Developers, Architects Presenter: Konrad Malawski, Akka Software Engineer, Typesafe, Inc. Akka is a runtime framework for building resilient, distributed applications in Java or Scala. In this webinar, Konrad Malawski discusses the roadmap and features of the upcoming Akka 2.4.0 and reveals three upcoming enhancements that enterprises will receive in the latest certified, tested build of Typesafe Reactive Platform. Akka Split Brain Resolver (SBR) Akka SBR provides advanced recovery scenarios in Akka Clusters, improving on the safety of Akka’s automatic resolution to avoid cascading partitioning. Akka Support for Docker and NAT Run Akka Clusters in Docker containers or NAT with complete hostname and port visibility on Java 6+ and Akka 2.3.11+ Akka Long-Term Support Receive Akka 2.4 support for Java 6, Java 7, and Scala 2.10

Akka 2.4 plus new commercial features in Typesafe Reactive Platform

Legacy Typesafe (now Lightbend)

Part 3: What you should know about Resiliency, Errors vs Failures, Isolation (and Containment), Delegation and Replication in Reactive systems In the final webinar with live Q/A in the Reactive Revealed series, we explore the way that Reactive systems maintain resiliency with an infrastructural approach designed to welcome failure often and recover gracefully. Presented by Reactive Manifesto co-author, Akka creator and CTO at Typesafe, Inc., Jonas Bonér explores what you should know about: What you should know about maintaining resiliency with monolithic systems compared to distributed systems How Reactive systems handle errors and prevents catastrophic failures with isolation and containment, delegation and replication How isolation (and containment) of error state and behavior works to block the ripple effect of cascading failures How delegation of failure management and replication lets Reactive systems continue running in the face of failures using a different error handling context, on a different thread or thread pool, in a different process, or on a different network node or computing center Previous Part 1 - Asynchronous I/O, Back-pressure and the Message-driven vs. Event-driven approach in Reactive systems | presented by Konrad Malawski Part 2 - Elasticity, Scalability and Location Transparency in Reactive Systems | presented by Viktor Klang

Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Del...

Legacy Typesafe (now Lightbend)

Akka is a runtime framework for building resilient, distributed applications in Java or Scala. In this webinar, Konrad Malawski discusses the roadmap and features of the upcoming Akka 2.4.0 and reveals three upcoming enhancements that enterprises will receive in the latest certified, tested build of Typesafe Reactive Platform. Akka Split Brain Resolver (SBR) Akka SBR provides advanced recovery scenarios in Akka Clusters, improving on the safety of Akka’s automatic resolution to avoid cascading partitioning. Akka Support for Docker and NAT Run Akka Clusters in Docker containers or NAT with complete hostname and port visibility on Java 6+ and Akka 2.3.11+ Akka Long-Term Support Receive Akka 2.4 support for Java 6, Java 7, and Scala 2.10

Akka 2.4 plus commercial features in Typesafe Reactive Platform

Legacy Typesafe (now Lightbend)

Part 2: What you should know about Elasticity, Scalability and Location Transparency in Reactive systems In the second of three webinars with live Q/A, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about: How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime. The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more. In the third and final webinar in the series with Jonas Bonér, we go over resiliency, failures vs errors, isolation (and containment), delegation and replication in Reactive systems.

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...

Legacy Typesafe (now Lightbend)

In this webinar slideshow, Typesafe Deputy CTO Viktor Klang looks into the world of microservices to see how these architectures emerge from the constraints of reality. We'll review the problems imposed by reality, and show how they can not only be solved, but how the constraints free us from misconceptions that are otherwise very easy to acquire. We will also explore how distributed systems are at the heart of microservices-based architectures and how communication shapes the structure, behavior and development of the software.

Microservices 101: Exploiting Reality's Constraints with Technology

Legacy Typesafe (now Lightbend)

Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0. Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java. Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more). Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.

A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0

Legacy Typesafe (now Lightbend)

When you need to react quickly to competitive threats or new line of business demands, but your existing architecture is anything but nimble, what do you do? Is it time to completely start over with a new enterprise architecture, or can you can augment your existing systems to become more resilient and responsive? This slideshow features Michael Facemire, Principal Analyst at Forrester Research, and Kevin Webber, Enterprise Advocate at Typesafe, Inc., in a discussion about how to leverage a Reactive architectural model to ensure your back-end infrastructure isn’t the limiting factor for your business success.

Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...

Legacy Typesafe (now Lightbend)

In this presentation, Akka Team Lead and author Roland Kuhn presents the freshly released final specification for Reactive Streams on the JVM. This work was done in collaboration with engineers representing Netflix, Red Hat, Pivotal, Oracle, Typesafe and others to define a standard for passing streams of data between threads in an asynchronous and non-blocking fashion. This is a common need in Reactive systems, where handling streams of "live" data whose volume is not predetermined. The most prominent issue facing the industry today is that resource consumption needs to be controlled such that a fast data source does not overwhelm the stream destination. Asynchrony is needed in order to enable the parallel use of computing resources, on collaborating network hosts or multiple CPU cores within a single machine. Here we'll review the mechanisms employed by Reactive Streams, discuss the applicability of this technology to a variety of problems encountered in day to day work on the JVM, and give an overview of the tooling ecosystem that is emerging around this young standard.

Reactive Streams 1.0.0 and Why You Should Care (webinar)

Legacy Typesafe (now Lightbend)

FOR THE FULL VIDEO, RECORDING & PRESENTATION: https://typesafe.com/blog/going-reactive-in-java-with-typesafe-reactive-platform -- In this presentation by Jamie Allen, we do a deep dive into the Typesafe Reactive Platform from the Java developer’s perspective, to learn how Typesafe supports the entire Reactive application development lifecycle. Reactive application development is becoming mainstream and considered a mission-critical need for future growth. This new wave of business applications are message-driven, elastic, resilient and responsive by nature, designed to scale elastically and maintain responsiveness during even large failures. With the Typesafe Reactive Platform (RP), including Play Framework and Akka, Java developers can start to use tools designed for building distributed systems that deliver highly-responsive user experiences. Regardless of whether you code in Java or Scala, Typesafe RP provides a resilient and message-driven application stack that scales effortlessly on multicore and cloud computing architectures.

Going Reactive in Java with Typesafe Reactive Platform

Legacy Typesafe (now Lightbend)

Why Play Framework is fast

Legacy Typesafe (now Lightbend)

Back in summer of 2014, we launched the results of a survey on Java 8, which shared a lot of information we were looking for, but also contained a small golden nugget of data that we didn’t expect: that out of more than 3000 developers surveyed, a shocking 17% of them reported using Apache Spark in production. So we did another survey with 2100+ respondents drilling down into what developers, data scientists, executives and organizations are looking forward to with Apache Spark. You can download the full version of the report for the whole story, but here is a sneak peak into the findings that we discovered. The full version is at: http://typesafe.com/blog/apache-spark-preparing-for-the-next-wave-of-reactive-big-data

[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

Legacy Typesafe (now Lightbend)

More from Legacy Typesafe (now Lightbend) (14)