SlideShare uma empresa Scribd logo
1 de 37
Spark Technology Center
IBM Apache Spark
The start of something big in data and design.
J. White Bear
Spark Technology Center
IBM Spark
IBM Investment in Computing
Linux, 1999
13,000,000 lines of code.
500+ Server Solutions
Ushered in Computer Science
System 360, 1964
10,000,000 lines of code.
54 Peripheral Solutions
Ushered in Information Science
Apache Spark, 2015
400,000 lines of code.
20 Data & Analytics Solutions
Ushered in Data Science
IBM Spark
About Me
3
Education
• University of Michigan- Computer Science
• Databases, Machine Learning/Computational Biology,
Cryptography
• University of California San Francisco-
• Multi-objective Optimization/Computational
Biology/Bioinformatics
• McGill University
• Machine Learning/ Multi-objective Optimization for Path
Planning/ Cryptography
Industry
• IBM (6 months)
• Amazon
• TeraGrid
• Pfizer
• Research at UC Berkeley, Purdue University, and every
university I ever attended. 
Fun Facts (?)
I love research for its own sake. I like robots,
helping to cure diseases, advocating for
social change and reform, and breaking
encryptions. Also, all activities involving the
Ocean and I usually hate taking pictures. 
IBM Spark
Outline
4
• Brief overview of the state and direction of robotics
Introduction
• Definition of SLAM
• Key Challenges
What is SLaM?
• Benefits
• Current Approaches
Why SLaM on IoT/Spark?
• The Approach
• Framework and Architecture
The Framework
• Challenges / Recommendations
The Results
Next Steps
Demo with Gazebo
Questions and Answers
IBM Spark
Introduction: Robotics Today
5
FIRST Robotics World Championship
NASA Glenn Research Center in Cleveland sponsored
Tri-C's team.
Tartan Racing’s Boss, the robotic SUV that won the 2007
DARPA Urban Challenge,
South Korean Team, KAIST wins the DARPA Robot
Challenge
Amazon Drones
IBM Spark
Introduction: Robotics Tomorrow
6
Navigate stores, museums and other indoor locations, with
directions overlaid onto your surroundings. Google Tango
Nanorobots wade through blood to deliver drugs
Space/underground/underwater rescue and
exploration. Places humans can’t go.
SLaM and ML on automated wheelchair
IBM Spark
What is SLaM?
7
Simultaneous Localization and Mapping (SLAM)
• Formal Definition
• Given a series of sensor observations over discrete time steps the SLAM problem is to compute an estimate
of the agent's and a map of the environment. All quantities are usually probabilistic, so the objective is to
compute (as an example variant):
•Computational problem of constructing or updating a map of an unknown environment while
simultaneously keeping track of an agent's location within it.
•SLAM algorithms use various implementations to attempt to find heuristics to make this problem tractable
using machine learning and probabilistic models
•GPS cannot account for unknown barriers, precision navigation, moving objects, or any areas with satellite
interference including weather phenomena.
IBM Spark
What is SLAM?
8
What are some of the key challenges in SLAM?
•Computer vision correctly and identifying images observed
•Moving objects Non-static environments, such as those containing other vehicles or
pedestrians, continue to present research challenges. (collision detection)
•Data Association-refers to the problem of ascertaining which parts of one image
correspond to which parts of another image, where differences are due to movement
of the camera, the elapse of time, and/or movement of objects in the photos.
•Loop closure is the problem of recognizing a previously visited location and updating
the states accordingly.
IBM Spark
What is SLAM?
9
IBM Spark
Why SLAM on IoT?
10
SLAM in IoT
• "[SLAM] is one of the fundamental challenges of robotics . . . [but it] seems that almost all the
current approaches can not perform consistent maps for large areas, mainly due to the increase of
the computational cost and due to the uncertainties that become prohibitive when the scenario
becomes larger."[12] Generally, complete 3D SLAM solutions are highly computationally intensive as
they use complex real-time particle filters, sub-mapping strategies or hierarchical combination of
metric topological representations, etc. (Wiki)
• Computational costs become prohibitive on embedded systems, especially smaller robotic
modules. The data becomes large and the calculations and corrections over time and space
become much more important. Specifically, SlaM increases exponentially with the number of
landmarks found.
• The state uncertainty increase with time and space, and must be bounded by some form of
machine learning to predict and use accurate corrections in the algorithm
• Additional sensors, rapid movements, processing visual input adds additional computational
burdens…
IBM Spark
Why SLAM on IoT?
11
The Benefits
•Seamless integration and scaling allowing users to easily improve
the heuristics of the algorithm without losing any of the
performance expectations of an embedded system.
•Including smart cities, lawn mowing, dog walking, kitchen
appliances, or even communication inside the human body
creating a truly unique interaction between humans and robotics
•Large scale evaluation of performance metrics for all IoT systems
(Big Data)
•Monitoring and control of sensors based on stored data (eg
reducing sensor usage to conserve power)
IBM Spark
Why SLaM on IoT?
12
Current Approaches
• Robot Operating System (ROS) a collection of software frameworks for robot software development
• Providing operating system-like functionality on a heterogeneous computer cluster.
• Hardware abstraction, low-level device control, implementation of commonly used functionality,
message-passing between processes, and package management.
• No true real-time analytics! Despite the importance of reactivity and low latency in robot control,
ROS is not a Realtime OS
• Difficult to scale in IoT! Adding a heterogenous swarm, or integrating interactions requires significant
planning.
• There is a need! Are there any plans to build Kalman filtering and system identification into this
framework? https://github.com/sryza/spark-timeseries/issues/19
• We need a framework that can do this! Enter Apache Kafka and Spark Streaming!
IBM Spark
The Framework
13
The Approach
•Extended Kalman Filter (matrix based update/estimation)
•Nonlinear version of the Kalman filter which linearizes about
an estimate of the current mean and covariance. de facto
standard in the theory of nonlinear state estimation eg
navigation systems and GPS. (wiki)
•TurtleBot (standard robotics research bot)
•Gazebo Simulator (3D simulator with sensors input and
feedback)
IBM Spark
The Framework
14
The Approach
IBM Spark
The Framework
15
IBM Spark
The Framework
16
Our cluster: IBM SoftLayer
cluster with 3 Nodes.
IBM Spark
The Framework
17
IBM SoftLayer cluster with 3 Nodes.
Node 1:
Management
Node
Apache Kafka
(Multithreaded
Producers are each
assigned a sensor)
Simulator/Sensor
Data
Mapping Agent
Node 2:
Hadoop/Spark
Spark Streaming
Consumer/ Apache
Kafka Producer to
Simulator
Spark Streaming
Spark ML
Analytics
Node 3:
Hadoop/Spark
Spark Streaming
Consumer/ Apache
Kafka Producer to
Simulator
Spark Streaming
Spark ML/
Analytics
IBM Spark
The Framework
18
Apache Kafka
Spark
Streaming
Spark ML/
Analytics and
Computation
Apache Kafka
Simulated
Turtlebot
• Odometry, pose and orientation data
for every movement.
• Laser scan data every 30ms with
over 1200 data points per read!
• One robot and not even all the
sensors!
A high performing plug n play cloud for
smart robotics, drones and intelligent
systems that allows easily tuneable
interactions for scientists and industry in
any environment!
IBM Spark
The Framework
19
A high performing plug n play cloud for smart robotics, drones and intelligent systems
that allows easily tuneable interactions for scientists and industry in any environment!
•EKF is calculated primarily using matrix operations!
•Distributed raw sensor data using Apache Kafka. Number of sensors
limited only by Kafka cluster!
•Improved performance using RDDs and Spark ML for computational
intensive tasks!
•Fast/optimized learning and analytics!
•Real-time sensor messaging!
•Easy sensor integration and scaling!
•Retention of data over time for improved optimizations and accuracy!
IBM Spark
The Framework: Apache Kafka
20
Kafka Integration
•Multithreaded Producers for easy scaling and hardware timing
•Apache Kafka Java Api backed by a thread pool to handle concurrency
•Allows shared instances of Producer
•Large scale sensors distributions can be partitioned for easier analysis, and significantly
decreased latency
IBM Spark
The Framework: Apache Kafka
21
IBM Spark
The Framework: Spark Streaming
22
Spark Streaming Integration
Apache Spark Streaming Apache Kafka Consumer
Replaces Kafka Consumer Producer feeds directly to Spark
Streaming
Adheres to fault tolerance policies
incl. WAL (write ahead logs to HDFS)
Not necessarily thread safe (Java Api)
KafkaUtils.createDirectStreamDirect
w/o Receivers in new version, better
access to low level Kafka metadata
Auto-commit feature, partition
replication, integration with
Zookeeper. Finely tuned metadata
access and storage by topic and
partition
Microbatch processing and better
integration into Spark incl online
learning
Buffered batches, developing
streaming analytics capabilities
IBM Spark
The Framework: Spark Streaming
23
IBM Spark
The Framework: Spark ML, RANSAC
24
Spark ML with RANSAC
•RANSAC
• One of many iterative method to estimate parameters of a mathematical model from a set of
observed data which contains outliers.
• Default methodology for determining whether a series of landmark forms a wall or structure
•Ideal for consumption with high-throughput batches in Spark Streaming!
•Integrated as an online learning algorithm (This framework) as back-end iterative process in
Spark Streaming/ Spark!
IBM Spark
The Framework: RANSAC
25
IBM Spark
The Results
26
Key Challenges
•Network Latency
•Embedded vs Framework
•Matrix computations and updates to large matrices
•Jacobian (derivatives), Inversion, Transpositon, Multiplication,
Addition/ Subtraction, Gaussian
•Covariance/Estimation computations
•Coordinating movement with computation
•Spark ML to correctly interpret visual landmark data, minimizing errors
IBM Spark
The Results
27
Challenges
•~4KLOC (Java != verbose )
•Java lambda documentation
•Kafka topics from Spark Streaming consumer
•Real-life latency depends on the type of connection and creates
additional noise
•Matrix computation
•Defining heuristics
•Communicator to sim, need a solid class
IBM Spark
The Results
28
Measuring Network Latency in artificially throttled IO simulators. Timing was kept static to
measure real delays in the messages over the cluster and between the simulator against file IO.
PERF1 (w/ Sim) vs PERF2 (file
IO) Iterations: 10
Iterations: 200
IBM Spark
The Results
29
Measuring landmark acquisition and cpu time Embedded vs Framework at 500 iterations.
IBM Spark
The Results
31
Measuring landmark acquisition and cpu time Embedded vs Framework at 500 iterations.
Framework completed 500 iterations
with expected exponential growth
Embedded failed to complete
at 500 iterations (up to ~300)
IBM Spark
The Results
32
Measuring landmark acquisition and cpu time Embedded vs Framework for complete map.
Both installations were run until the number of landmarks/maps were roughly equivalent and
iterations marked.
Iterations: ~100, Time ~2 min Iterations: ~100, Time ~30-40s
IBM Spark
The Results
33
Forthcoming Benchmarks.
Iterations: ~100 Iterations: ~20
• Apache Kafka latency to brokers
• RANSAC convergence of Spark Streaming batches
• Spark Streaming batch processing throughput in relation to
processing time
IBM Spark
The Results
34
Performance Tuning and Optimization
•Sparse and distributed matrices in Spark ML
•Optimize matrix computations (EKF)
•Separate threads for Apache Kafka producers
•Spark Streaming batches timed to sensor input cycles to avoid heavy loads
misaligned updates (This could also be tuned using device profiles).
•Slower movement/reduced data points to synchronize calculations with
movement and discovery
•Rapid movement are larger RDDs should create new RDDs and matrices for
updates using existing heuristics, updates can sometimes create bottlenecks
•Standard Spark performance tuning: cpu core maximization, and executors
•*Scheduled feature extraction to minimize accumulated error in long runs
•*New parameters/ large skew from ground truth should trigger updates
IBM Spark
Next Steps
35
• Expanded stochastic analysis beyond gradient descent
• Kalman Filter and Extended Kalman Filter
• Improving accuracy and precision with an end to end pipeline that allows
customization/optimization
• Path Planning algorithms to improve search and search times
• Incorporate swarms/particles
• A complete robotics library or even extension to handle robotics, computer vision or any
of the ai/machine learning problems specifics to robotics is publishable and opens the
door to a whole new group of scientists.
• Further scaling and optimization with robotic swarms and rapid/increased volume sensor
data
IBM Spark
Conclusion
36
IBM IoT Cloud Open Platform for Industries
IBM Bluemix IoT Zone
IBM IoT Ecosystem
More to come….!!!
IBM Spark
Demo (Simulation)
38
IBM Spark
Q & A
39
Contact Information:
J. White Bear (jwhiteb@us.ibm.com)
IBM Spark Technology Center
425 Market St San Francisco, CA
Special thanks to IBM, the IBM Spark team at Spark Technology Center for your input,
taking time to discuss, and allowing me time to work on this project.
Sampada Basakar
Vijay Bommireddipalli
Fred Reiss
Luciano Resende

Mais conteĂşdo relacionado

Mais procurados

Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit
 
Interactive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by SparkInteractive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by SparkSpark Summit
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop ClustersMeeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop ClustersDataWorks Summit/Hadoop Summit
 
Hadoop summit 2010, HONU
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONUJerome Boulon
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 

Mais procurados (20)

Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan Zvara
 
Interactive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by SparkInteractive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by Spark
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
ebay
ebayebay
ebay
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop ClustersMeeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
 
Hadoop summit 2010, HONU
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONU
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 

Destaque

Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkDataWorks Summit/Hadoop Summit
 
7 Predictive Analytics, Spark , Streaming use cases
7 Predictive Analytics, Spark , Streaming use cases7 Predictive Analytics, Spark , Streaming use cases
7 Predictive Analytics, Spark , Streaming use casesDataWorks Summit/Hadoop Summit
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Chris Fregly
 
Topfoison product catalog
Topfoison product catalogTopfoison product catalog
Topfoison product catalogLynapple1022
 
Devops Spark Streaming
Devops Spark StreamingDevops Spark Streaming
Devops Spark StreamingMarilyn Waldman
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02Nguyen Tuan
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)Spark Summit
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackTuri, Inc.
 
Remote temperature monitor (DHT11)
Remote temperature monitor (DHT11)Remote temperature monitor (DHT11)
Remote temperature monitor (DHT11)Parshwadeep Lahane
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?Eyal Ben Ivri
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Chris Fregly
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Electronic governance steps in the right direction?
Electronic governance   steps in the right direction?Electronic governance   steps in the right direction?
Electronic governance steps in the right direction?Bozhidar Bozhanov
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
IoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM InformixIoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM InformixPradeep Muthalpuredathe
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaSpark Summit
 

Destaque (20)

Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
 
7 Predictive Analytics, Spark , Streaming use cases
7 Predictive Analytics, Spark , Streaming use cases7 Predictive Analytics, Spark , Streaming use cases
7 Predictive Analytics, Spark , Streaming use cases
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
Topfoison product catalog
Topfoison product catalogTopfoison product catalog
Topfoison product catalog
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Devops Spark Streaming
Devops Spark StreamingDevops Spark Streaming
Devops Spark Streaming
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
 
Remote temperature monitor (DHT11)
Remote temperature monitor (DHT11)Remote temperature monitor (DHT11)
Remote temperature monitor (DHT11)
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Electronic governance steps in the right direction?
Electronic governance   steps in the right direction?Electronic governance   steps in the right direction?
Electronic governance steps in the right direction?
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
IoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM InformixIoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM Informix
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 

Semelhante a Spark Technology Center IBM

IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...Spark Summit
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Kai Wähner
 
System Support for Internet of Things
System Support for Internet of ThingsSystem Support for Internet of Things
System Support for Internet of ThingsHarshitParkar6677
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationDeepak Shankar
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security BrokersRobin Vermeirsch
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?Deepak Shankar
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?Deepak Shankar
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?Deepak Shankar
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterShawn Zandi
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr NovotnĂ˝
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABCodeOps Technologies LLP
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 

Semelhante a Spark Technology Center IBM (20)

IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

 
System Support for Internet of Things
System Support for Internet of ThingsSystem Support for Internet of Things
System Support for Internet of Things
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data Center
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 

Mais de DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionDataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinDataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopDataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Spark Technology Center IBM

  • 1. Spark Technology Center IBM Apache Spark The start of something big in data and design. J. White Bear Spark Technology Center
  • 2. IBM Spark IBM Investment in Computing Linux, 1999 13,000,000 lines of code. 500+ Server Solutions Ushered in Computer Science System 360, 1964 10,000,000 lines of code. 54 Peripheral Solutions Ushered in Information Science Apache Spark, 2015 400,000 lines of code. 20 Data & Analytics Solutions Ushered in Data Science
  • 3. IBM Spark About Me 3 Education • University of Michigan- Computer Science • Databases, Machine Learning/Computational Biology, Cryptography • University of California San Francisco- • Multi-objective Optimization/Computational Biology/Bioinformatics • McGill University • Machine Learning/ Multi-objective Optimization for Path Planning/ Cryptography Industry • IBM (6 months) • Amazon • TeraGrid • Pfizer • Research at UC Berkeley, Purdue University, and every university I ever attended.  Fun Facts (?) I love research for its own sake. I like robots, helping to cure diseases, advocating for social change and reform, and breaking encryptions. Also, all activities involving the Ocean and I usually hate taking pictures. 
  • 4. IBM Spark Outline 4 • Brief overview of the state and direction of robotics Introduction • Definition of SLAM • Key Challenges What is SLaM? • Benefits • Current Approaches Why SLaM on IoT/Spark? • The Approach • Framework and Architecture The Framework • Challenges / Recommendations The Results Next Steps Demo with Gazebo Questions and Answers
  • 5. IBM Spark Introduction: Robotics Today 5 FIRST Robotics World Championship NASA Glenn Research Center in Cleveland sponsored Tri-C's team. Tartan Racing’s Boss, the robotic SUV that won the 2007 DARPA Urban Challenge, South Korean Team, KAIST wins the DARPA Robot Challenge Amazon Drones
  • 6. IBM Spark Introduction: Robotics Tomorrow 6 Navigate stores, museums and other indoor locations, with directions overlaid onto your surroundings. Google Tango Nanorobots wade through blood to deliver drugs Space/underground/underwater rescue and exploration. Places humans can’t go. SLaM and ML on automated wheelchair
  • 7. IBM Spark What is SLaM? 7 Simultaneous Localization and Mapping (SLAM) • Formal Definition • Given a series of sensor observations over discrete time steps the SLAM problem is to compute an estimate of the agent's and a map of the environment. All quantities are usually probabilistic, so the objective is to compute (as an example variant): •Computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. •SLAM algorithms use various implementations to attempt to find heuristics to make this problem tractable using machine learning and probabilistic models •GPS cannot account for unknown barriers, precision navigation, moving objects, or any areas with satellite interference including weather phenomena.
  • 8. IBM Spark What is SLAM? 8 What are some of the key challenges in SLAM? •Computer vision correctly and identifying images observed •Moving objects Non-static environments, such as those containing other vehicles or pedestrians, continue to present research challenges. (collision detection) •Data Association-refers to the problem of ascertaining which parts of one image correspond to which parts of another image, where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photos. •Loop closure is the problem of recognizing a previously visited location and updating the states accordingly.
  • 10. IBM Spark Why SLAM on IoT? 10 SLAM in IoT • "[SLAM] is one of the fundamental challenges of robotics . . . [but it] seems that almost all the current approaches can not perform consistent maps for large areas, mainly due to the increase of the computational cost and due to the uncertainties that become prohibitive when the scenario becomes larger."[12] Generally, complete 3D SLAM solutions are highly computationally intensive as they use complex real-time particle filters, sub-mapping strategies or hierarchical combination of metric topological representations, etc. (Wiki) • Computational costs become prohibitive on embedded systems, especially smaller robotic modules. The data becomes large and the calculations and corrections over time and space become much more important. Specifically, SlaM increases exponentially with the number of landmarks found. • The state uncertainty increase with time and space, and must be bounded by some form of machine learning to predict and use accurate corrections in the algorithm • Additional sensors, rapid movements, processing visual input adds additional computational burdens…
  • 11. IBM Spark Why SLAM on IoT? 11 The Benefits •Seamless integration and scaling allowing users to easily improve the heuristics of the algorithm without losing any of the performance expectations of an embedded system. •Including smart cities, lawn mowing, dog walking, kitchen appliances, or even communication inside the human body creating a truly unique interaction between humans and robotics •Large scale evaluation of performance metrics for all IoT systems (Big Data) •Monitoring and control of sensors based on stored data (eg reducing sensor usage to conserve power)
  • 12. IBM Spark Why SLaM on IoT? 12 Current Approaches • Robot Operating System (ROS) a collection of software frameworks for robot software development • Providing operating system-like functionality on a heterogeneous computer cluster. • Hardware abstraction, low-level device control, implementation of commonly used functionality, message-passing between processes, and package management. • No true real-time analytics! Despite the importance of reactivity and low latency in robot control, ROS is not a Realtime OS • Difficult to scale in IoT! Adding a heterogenous swarm, or integrating interactions requires significant planning. • There is a need! Are there any plans to build Kalman filtering and system identification into this framework? https://github.com/sryza/spark-timeseries/issues/19 • We need a framework that can do this! Enter Apache Kafka and Spark Streaming!
  • 13. IBM Spark The Framework 13 The Approach •Extended Kalman Filter (matrix based update/estimation) •Nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. de facto standard in the theory of nonlinear state estimation eg navigation systems and GPS. (wiki) •TurtleBot (standard robotics research bot) •Gazebo Simulator (3D simulator with sensors input and feedback)
  • 16. IBM Spark The Framework 16 Our cluster: IBM SoftLayer cluster with 3 Nodes.
  • 17. IBM Spark The Framework 17 IBM SoftLayer cluster with 3 Nodes. Node 1: Management Node Apache Kafka (Multithreaded Producers are each assigned a sensor) Simulator/Sensor Data Mapping Agent Node 2: Hadoop/Spark Spark Streaming Consumer/ Apache Kafka Producer to Simulator Spark Streaming Spark ML Analytics Node 3: Hadoop/Spark Spark Streaming Consumer/ Apache Kafka Producer to Simulator Spark Streaming Spark ML/ Analytics
  • 18. IBM Spark The Framework 18 Apache Kafka Spark Streaming Spark ML/ Analytics and Computation Apache Kafka Simulated Turtlebot • Odometry, pose and orientation data for every movement. • Laser scan data every 30ms with over 1200 data points per read! • One robot and not even all the sensors! A high performing plug n play cloud for smart robotics, drones and intelligent systems that allows easily tuneable interactions for scientists and industry in any environment!
  • 19. IBM Spark The Framework 19 A high performing plug n play cloud for smart robotics, drones and intelligent systems that allows easily tuneable interactions for scientists and industry in any environment! •EKF is calculated primarily using matrix operations! •Distributed raw sensor data using Apache Kafka. Number of sensors limited only by Kafka cluster! •Improved performance using RDDs and Spark ML for computational intensive tasks! •Fast/optimized learning and analytics! •Real-time sensor messaging! •Easy sensor integration and scaling! •Retention of data over time for improved optimizations and accuracy!
  • 20. IBM Spark The Framework: Apache Kafka 20 Kafka Integration •Multithreaded Producers for easy scaling and hardware timing •Apache Kafka Java Api backed by a thread pool to handle concurrency •Allows shared instances of Producer •Large scale sensors distributions can be partitioned for easier analysis, and significantly decreased latency
  • 21. IBM Spark The Framework: Apache Kafka 21
  • 22. IBM Spark The Framework: Spark Streaming 22 Spark Streaming Integration Apache Spark Streaming Apache Kafka Consumer Replaces Kafka Consumer Producer feeds directly to Spark Streaming Adheres to fault tolerance policies incl. WAL (write ahead logs to HDFS) Not necessarily thread safe (Java Api) KafkaUtils.createDirectStreamDirect w/o Receivers in new version, better access to low level Kafka metadata Auto-commit feature, partition replication, integration with Zookeeper. Finely tuned metadata access and storage by topic and partition Microbatch processing and better integration into Spark incl online learning Buffered batches, developing streaming analytics capabilities
  • 23. IBM Spark The Framework: Spark Streaming 23
  • 24. IBM Spark The Framework: Spark ML, RANSAC 24 Spark ML with RANSAC •RANSAC • One of many iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. • Default methodology for determining whether a series of landmark forms a wall or structure •Ideal for consumption with high-throughput batches in Spark Streaming! •Integrated as an online learning algorithm (This framework) as back-end iterative process in Spark Streaming/ Spark!
  • 26. IBM Spark The Results 26 Key Challenges •Network Latency •Embedded vs Framework •Matrix computations and updates to large matrices •Jacobian (derivatives), Inversion, Transpositon, Multiplication, Addition/ Subtraction, Gaussian •Covariance/Estimation computations •Coordinating movement with computation •Spark ML to correctly interpret visual landmark data, minimizing errors
  • 27. IBM Spark The Results 27 Challenges •~4KLOC (Java != verbose ) •Java lambda documentation •Kafka topics from Spark Streaming consumer •Real-life latency depends on the type of connection and creates additional noise •Matrix computation •Defining heuristics •Communicator to sim, need a solid class
  • 28. IBM Spark The Results 28 Measuring Network Latency in artificially throttled IO simulators. Timing was kept static to measure real delays in the messages over the cluster and between the simulator against file IO. PERF1 (w/ Sim) vs PERF2 (file IO) Iterations: 10 Iterations: 200
  • 29. IBM Spark The Results 29 Measuring landmark acquisition and cpu time Embedded vs Framework at 500 iterations.
  • 30. IBM Spark The Results 31 Measuring landmark acquisition and cpu time Embedded vs Framework at 500 iterations. Framework completed 500 iterations with expected exponential growth Embedded failed to complete at 500 iterations (up to ~300)
  • 31. IBM Spark The Results 32 Measuring landmark acquisition and cpu time Embedded vs Framework for complete map. Both installations were run until the number of landmarks/maps were roughly equivalent and iterations marked. Iterations: ~100, Time ~2 min Iterations: ~100, Time ~30-40s
  • 32. IBM Spark The Results 33 Forthcoming Benchmarks. Iterations: ~100 Iterations: ~20 • Apache Kafka latency to brokers • RANSAC convergence of Spark Streaming batches • Spark Streaming batch processing throughput in relation to processing time
  • 33. IBM Spark The Results 34 Performance Tuning and Optimization •Sparse and distributed matrices in Spark ML •Optimize matrix computations (EKF) •Separate threads for Apache Kafka producers •Spark Streaming batches timed to sensor input cycles to avoid heavy loads misaligned updates (This could also be tuned using device profiles). •Slower movement/reduced data points to synchronize calculations with movement and discovery •Rapid movement are larger RDDs should create new RDDs and matrices for updates using existing heuristics, updates can sometimes create bottlenecks •Standard Spark performance tuning: cpu core maximization, and executors •*Scheduled feature extraction to minimize accumulated error in long runs •*New parameters/ large skew from ground truth should trigger updates
  • 34. IBM Spark Next Steps 35 • Expanded stochastic analysis beyond gradient descent • Kalman Filter and Extended Kalman Filter • Improving accuracy and precision with an end to end pipeline that allows customization/optimization • Path Planning algorithms to improve search and search times • Incorporate swarms/particles • A complete robotics library or even extension to handle robotics, computer vision or any of the ai/machine learning problems specifics to robotics is publishable and opens the door to a whole new group of scientists. • Further scaling and optimization with robotic swarms and rapid/increased volume sensor data
  • 35. IBM Spark Conclusion 36 IBM IoT Cloud Open Platform for Industries IBM Bluemix IoT Zone IBM IoT Ecosystem More to come….!!!
  • 37. IBM Spark Q & A 39 Contact Information: J. White Bear (jwhiteb@us.ibm.com) IBM Spark Technology Center 425 Market St San Francisco, CA Special thanks to IBM, the IBM Spark team at Spark Technology Center for your input, taking time to discuss, and allowing me time to work on this project. Sampada Basakar Vijay Bommireddipalli Fred Reiss Luciano Resende

Notas do Editor

  1. http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/system360/breakthroughs/ http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/linux/breakthroughs/
  2. http://ici.radio-canada.ca/regions/ontario/2016/06/14/016-robotique-chercheurs-sudbury-robots-drones.shtml
  3. Increase in error over time and readjusting this Identifying landmarks
  4. Not so simple after all. Actually it’s very computationally challenging which is why we decided to move things to the cloud.
  5. ROS is great, but can you really fine tune your parameters and ML algorithms. Is it easily portable and integrated with the next generation of robots that are going to need realtime processing and fast analytics spanning robots and sensors over time?
  6. Unlike its linear counterpart, the extended Kalman filter in general is not an optimal estimator (of course it is optimal if the measurement and the state transition model are both linear, as in that case the extended Kalman filter is identical to the regular one). In addition, if the initial estimate of the state is wrong, or if the process is modeled incorrectly, the filter may quickly diverge, owing to its linearization. Another problem with the extended Kalman filter is that the estimated covariance matrix tends to underestimate the true covariance matrix and therefore risks becoming inconsistent in the statistical sense without the addition of "stabilising noise"[citation needed] . Having stated this, the extended Kalman filter can give reasonable performance, and is arguably the de facto standard in navigation systems and GPS.
  7. Simple graph what this looks like in code is quite different updating the main H matrix alone is the largest computation in both size and cpu usage. It holds all the state and landmark data and must updated based on all the corresponding matrices. The code is large with this one, but it doesn’t have to be building this alone as a library would cut down on over a 1000 lines of code.  Bayesian inference and estimating a joint probability distributionover the variables for each timeframe.
  8. Standard architecture, add ibm ambari etc
  9. This is clearly a problem that announces itself in the big data space
  10. Kafka streaming
  11. The takeaway is that ibm is already in the IoT space and preparing for the next generation of smart cities. Our continued open source innovation is a part of that.
  12. The takeaway is that ibm is already in the IoT space and preparing for the next generation of smart cities. Our continued open source innovation is a part of that.