SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Cloud Event Processing
  Analyze ∙ Sense ∙ Respond

       CloudConnect
           March 8, 2011
Welcome
    •   High Velocity Big Data
    •   What is Complex Event Processing?
    •   Analyzing Time Series with SAX
    •   What is Map/Reduce?
    •   Correlating with Historical Data
    •   Using the Cloud
    •   Questions
CLOUD
EVENT
PROCESSING
Data Growth*
       18
       16
       14
       12
       10
        8
        6
        4
        2
        0
             Category 1   Category 2    Category 3     Category 4



CLOUD           *It would appear that things will actually get worse, not better
EVENT
PROCESSING
High Velocity Big Data
    • What is Big Data?
             – You’ve got Big Data issues when you can’t turn the
               data into information fast enough to act on:
                •   Earthquake
                •   Brownout
                •   Market Crash
                •   Terrorist Event
             – You’ve got Big Data when you have to consider its
               actually Physicality
    • What is High Velocity Big Data
             – Big Data In Flight…
                • You don’t get to store it before you analyze it
CLOUD
EVENT
PROCESSING
What is Complex Event Processing?
    • Complex Event Processing (CEP) delivers high-
      speed processing of many events across all the
      layers of an organization, identifying only the
      most meaningful events within the event
      cloud, analyzing their impact, and taking
      subsequent action in real time.
             – From Wikipedia


CLOUD
EVENT
PROCESSING
What? What is CEP?
    • Domain Specific Language
             – Makes it easier to deal with events
    • Continuous Query
             – Select symbol, side, price from tradeStream
    • Time/Length Windows
             – Select symbol, side, avg(price) from
               tradeStream.win:time(10 minutes) group by symbol, side
    • Pattern Matching
             – select a.* from pattern [every a=FIXNewOrderSingle ->
               (timer:interval(30 seconds) and not
               FIXNewOrderSingle(a.Side!=Side and a.OrderQty =
               OrderQty and a.Symbol = Symbol))]
CLOUD
EVENT
PROCESSING
Wouldn’t It Be Cool
    • Select * from everything where itsInteresting
      = toMe in last 10 minutes;

    • Select * from everything where earthQuake >
      .8;

    • Select * from everything where
      terroristsWillStrike > .9;
CLOUD
EVENT
PROCESSING
CEP – Current Benefits*
    • Really Fast!
    • Low Latency!
    • Provides a ‘ready made’ framework to build
      real-time pattern matching applications
    • Think at a higher level
             – Productivity

                          *your mileage may vary, widely
CLOUD
EVENT
PROCESSING
CEP – Current Limitations
    • Memory Bound
             – If you have a lot of events and windows, you risk
               running out of memory on a single machine
    • Compute Bound
             – To ensure high throughput and low latency, most
               CEP engines are actually doing simplistic things
                • e.g. Filtering events
    • Black Box
             – What’s going on in there?
CLOUD
EVENT
PROCESSING
Checkpoint
    • Ok, so by using Complex Event Processing
             – You can analyze data in flight
             – But
                • You’re constrained by:
                   – Available compute
                   – Memory

    • Because, there’s still too much data to process
      on one machine…
CLOUD
EVENT
PROCESSING
The Problem With Time Series
    • Dimensionality
             – How can I recognize something?
    • Distance Measures
             – How do I find similar occurrences?
    • Time
             – By the time I process the data, the information
               has little value…


CLOUD
EVENT
PROCESSING
Symbolic Aggregate Approximation
                                                               SAX Encoding
 •    SAX reduces numerical data to a
      short string, or SAX word.                                                 c
                                                                                      c         c
 •    Thousands of data points of                                           b              b
      numerical, continuous data                                b
      becomes ‘ABCEDEFGH’
                                                    -                 a a
                                                           0     20    40   60   80       100   120
 •    SAX Approximation of the data fits
      in main memory, yet retains
      features of interest
                                                                baabccbc
 •    Creating SAX words from              SAX Advantages:
      historical and streaming data        • Patterns identified and described using SAX actually
                                             look like the underlying data
      allows us to perform all kinds of
      magic…                               • Other algorithms sometimes don’t actually describe
CLOUD                                        the underlying patterns or take way too much work to
EVENT                                        be useful in real time
PROCESSING
SAX – 5 Use Cases
    • Indexing
             – Given a time series, find similar time series in the database
    • Clustering
             – Find natural grouping in the time series
    • Classification
             – Automagically sort patterns found in time series into
               categories
    • Summarization
             – Condense verbose data into meaningful information
    • Anomaly Detection
             – Find surprising, interesting, or unexpected behavior
CLOUD
EVENT
PROCESSING
Why SAX is Cool
    • Lower Bounding
             – The patterns identified and described using SAX
               actually look like the underlying data
    • Dimensionality Reduction
             – Previously intractable problems become possible in
               real time
    • Other algorithms sometimes don’t describe
      underlying patterns
    • Take way too much work to be useful in real time
CLOUD
EVENT
PROCESSING
A Day’s Worth of IBM




CLOUD
EVENT
PROCESSING
Normalized & PAA Applied




CLOUD
EVENT
PROCESSING
And Finally, SAX
                        G
                        F
         E
                        E
             D     D
                        C
                        C    C          C
                                    B
                        B
                        A
CLOUD
EVENT
PROCESSING        EDDCCBC
Checkpoint
    • We’ve reduced dimensionality
    • We know were we are
             – The current pattern is AABASDGF
    • We’re calculating it in ‘real-time’*
             – Using Complex Event Processing
    • But
             – There’s still too much data to process on one
               machine…
    • How can we process more data in the same
      amount of time?
CLOUD
EVENT
PROCESSING
                        *I much prefer the term event-driven
What is Map/Reduce?
    • Framework for processing ginormous datasets using a large number
      of computers (nodes) in a cluster.

    • "Map"
      Master node takes the input, chops it up into smaller sub-
      problems, and distributes those to worker nodes. The worker node
      processes that smaller problem, and passes the answer back to its
      master node.

    • "Reduce"
      Takes the answers to all the sub-problems and combines them in a
      way to get the output - the answer to the problem it was originally
      trying to solve.
             – From Wikipedia
CLOUD
EVENT
PROCESSING
What? What is Map/Reduce?
    • WordCount Example (classic)
             – Map scans text for words and emits - {word,1}
             – Combine/collapses key values on same node -
               {word,1,1,1} -> {word,3}
             – Shuffle/Sort merges results from different nodes
                • {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50)
                    – becomes
                • {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50}
             – Reduce
                • Outputs {“NoSQL”,100} {“Oracle”,50}

CLOUD
EVENT
PROCESSING
SAX and Map/Reduce
    • SAX is an ‘embarrassingly parallel’ problem
    • Using parallel processing allows SAX words to
      be computed more quickly
    • Using Streaming Map/Reduce provides results
      even faster, increasing the value of data even
      more
             – Partition by symbol and sort by timestamp
             – Calculate SAX words for each symbol, in parallel
    • CEP Time Windows to the Rescue!
CLOUD
EVENT
PROCESSING
Checkpoint
    • CEP is great, but I still have to tell it what I’m
      looking for, right?
    • SAX can help us reduce dimensionality, what
      else can it do for us?
    • How do I relate Streaming Data to Historical
      Data?
    • How do I do this while the Information still has
      value?
CLOUD
EVENT
PROCESSING
High Velocity Big Data Pattern
                                                                        Historical

                                       Map
                     Events            Map   Events   Reduce
                                       Map




                                             Map

       Events   OnRamp        Events         Map      SAX      Reduce       Context

                                             Map




CLOUD
EVENT
PROCESSING
So What Do We Need?
    •   Complex Event Processing
    •   The Algorithm (SAX)
    •   Processing Model – Streaming Map/Reduce
    •   Context – The Historical Aspect
    •   What Do We Call This?



CLOUD
EVENT
PROCESSING
What is DarkStar?
             – Platform as a Service (PaaS)
                • Provides Distributed
                    –   Complex Event Processing
                    –   Streaming Map/Reduce
                    –   Messaging
                    –   Web Services
                    –   Monitoring/Management
             – Applications are built on top, or inside
                • SAX runs inside of DarkStar
                    – SAX is not a component of DarkStar, but an add-in library
             – And deployed in a cluster
                • Virtualized Resources
CLOUD
EVENT
PROCESSING
DarkStar
    • What patterns are occurring in my data, right
      now?
             – CEP based streaming Map/Reduce
               • Use a cluster of machines
    • When did this pattern happen before?
             – Database with embedded Map/Reduce
               • No need to move data outside the database for
                 processing

CLOUD
EVENT
PROCESSING
The Cloud
    • Elastic Resource
             – Grows/Shrinks according to demand
    • Virtualization
             – Efficient utilization of compute
    • The Previously Unthinkable
             – Is now possible, if not already commonplace
    • Peering can provide access to Big Pipes and
      Secure Data
CLOUD
EVENT
PROCESSING
Thank You!
    • Questions?

    • Contact Me
             – Colin Clark
             – @EventCloudPro
             – cpclark@cloudeventprocessing.com


CLOUD
EVENT
PROCESSING

Mais conteúdo relacionado

Mais procurados

AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapBarry Jones
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverstonbcoverston
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes FlightAvere Systems
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Big Data Spain
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Eric Sammer
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionDataStax Academy
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Eric Sammer
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...Amazon Web Services
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 

Mais procurados (16)

AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 Recap
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes Flight
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 

Destaque

#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data DemocratizationNathan C. Wood
 
What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?Dr. Mazlan Abbas
 
McKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoTMcKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoTpolenumerique33
 
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergInternet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergDr. Mazlan Abbas
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applicationsPasquale Puzio
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gMohan Kumar G
 

Destaque (6)

#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization
 
What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?
 
McKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoTMcKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoT
 
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergInternet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An Iceberg
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
 

Semelhante a Cloud connect 03 08-2011

Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computingTao Li
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profitRodrigo Campos
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformDataStax Academy
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesArnon Shimoni
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 

Semelhante a Cloud connect 03 08-2011 (20)

Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Windows Azure introduction
Windows Azure introductionWindows Azure introduction
Windows Azure introduction
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 

Último

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 

Último (20)

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

Cloud connect 03 08-2011

  • 1. Cloud Event Processing Analyze ∙ Sense ∙ Respond CloudConnect March 8, 2011
  • 2. Welcome • High Velocity Big Data • What is Complex Event Processing? • Analyzing Time Series with SAX • What is Map/Reduce? • Correlating with Historical Data • Using the Cloud • Questions CLOUD EVENT PROCESSING
  • 3. Data Growth* 18 16 14 12 10 8 6 4 2 0 Category 1 Category 2 Category 3 Category 4 CLOUD *It would appear that things will actually get worse, not better EVENT PROCESSING
  • 4. High Velocity Big Data • What is Big Data? – You’ve got Big Data issues when you can’t turn the data into information fast enough to act on: • Earthquake • Brownout • Market Crash • Terrorist Event – You’ve got Big Data when you have to consider its actually Physicality • What is High Velocity Big Data – Big Data In Flight… • You don’t get to store it before you analyze it CLOUD EVENT PROCESSING
  • 5. What is Complex Event Processing? • Complex Event Processing (CEP) delivers high- speed processing of many events across all the layers of an organization, identifying only the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. – From Wikipedia CLOUD EVENT PROCESSING
  • 6. What? What is CEP? • Domain Specific Language – Makes it easier to deal with events • Continuous Query – Select symbol, side, price from tradeStream • Time/Length Windows – Select symbol, side, avg(price) from tradeStream.win:time(10 minutes) group by symbol, side • Pattern Matching – select a.* from pattern [every a=FIXNewOrderSingle -> (timer:interval(30 seconds) and not FIXNewOrderSingle(a.Side!=Side and a.OrderQty = OrderQty and a.Symbol = Symbol))] CLOUD EVENT PROCESSING
  • 7. Wouldn’t It Be Cool • Select * from everything where itsInteresting = toMe in last 10 minutes; • Select * from everything where earthQuake > .8; • Select * from everything where terroristsWillStrike > .9; CLOUD EVENT PROCESSING
  • 8. CEP – Current Benefits* • Really Fast! • Low Latency! • Provides a ‘ready made’ framework to build real-time pattern matching applications • Think at a higher level – Productivity *your mileage may vary, widely CLOUD EVENT PROCESSING
  • 9. CEP – Current Limitations • Memory Bound – If you have a lot of events and windows, you risk running out of memory on a single machine • Compute Bound – To ensure high throughput and low latency, most CEP engines are actually doing simplistic things • e.g. Filtering events • Black Box – What’s going on in there? CLOUD EVENT PROCESSING
  • 10. Checkpoint • Ok, so by using Complex Event Processing – You can analyze data in flight – But • You’re constrained by: – Available compute – Memory • Because, there’s still too much data to process on one machine… CLOUD EVENT PROCESSING
  • 11. The Problem With Time Series • Dimensionality – How can I recognize something? • Distance Measures – How do I find similar occurrences? • Time – By the time I process the data, the information has little value… CLOUD EVENT PROCESSING
  • 12. Symbolic Aggregate Approximation SAX Encoding • SAX reduces numerical data to a short string, or SAX word. c c c • Thousands of data points of b b numerical, continuous data b becomes ‘ABCEDEFGH’ - a a 0 20 40 60 80 100 120 • SAX Approximation of the data fits in main memory, yet retains features of interest baabccbc • Creating SAX words from SAX Advantages: historical and streaming data • Patterns identified and described using SAX actually look like the underlying data allows us to perform all kinds of magic… • Other algorithms sometimes don’t actually describe CLOUD the underlying patterns or take way too much work to EVENT be useful in real time PROCESSING
  • 13. SAX – 5 Use Cases • Indexing – Given a time series, find similar time series in the database • Clustering – Find natural grouping in the time series • Classification – Automagically sort patterns found in time series into categories • Summarization – Condense verbose data into meaningful information • Anomaly Detection – Find surprising, interesting, or unexpected behavior CLOUD EVENT PROCESSING
  • 14. Why SAX is Cool • Lower Bounding – The patterns identified and described using SAX actually look like the underlying data • Dimensionality Reduction – Previously intractable problems become possible in real time • Other algorithms sometimes don’t describe underlying patterns • Take way too much work to be useful in real time CLOUD EVENT PROCESSING
  • 15. A Day’s Worth of IBM CLOUD EVENT PROCESSING
  • 16. Normalized & PAA Applied CLOUD EVENT PROCESSING
  • 17. And Finally, SAX G F E E D D C C C C B B A CLOUD EVENT PROCESSING EDDCCBC
  • 18. Checkpoint • We’ve reduced dimensionality • We know were we are – The current pattern is AABASDGF • We’re calculating it in ‘real-time’* – Using Complex Event Processing • But – There’s still too much data to process on one machine… • How can we process more data in the same amount of time? CLOUD EVENT PROCESSING *I much prefer the term event-driven
  • 19. What is Map/Reduce? • Framework for processing ginormous datasets using a large number of computers (nodes) in a cluster. • "Map" Master node takes the input, chops it up into smaller sub- problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node. • "Reduce" Takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve. – From Wikipedia CLOUD EVENT PROCESSING
  • 20. What? What is Map/Reduce? • WordCount Example (classic) – Map scans text for words and emits - {word,1} – Combine/collapses key values on same node - {word,1,1,1} -> {word,3} – Shuffle/Sort merges results from different nodes • {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50) – becomes • {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50} – Reduce • Outputs {“NoSQL”,100} {“Oracle”,50} CLOUD EVENT PROCESSING
  • 21. SAX and Map/Reduce • SAX is an ‘embarrassingly parallel’ problem • Using parallel processing allows SAX words to be computed more quickly • Using Streaming Map/Reduce provides results even faster, increasing the value of data even more – Partition by symbol and sort by timestamp – Calculate SAX words for each symbol, in parallel • CEP Time Windows to the Rescue! CLOUD EVENT PROCESSING
  • 22. Checkpoint • CEP is great, but I still have to tell it what I’m looking for, right? • SAX can help us reduce dimensionality, what else can it do for us? • How do I relate Streaming Data to Historical Data? • How do I do this while the Information still has value? CLOUD EVENT PROCESSING
  • 23. High Velocity Big Data Pattern Historical Map Events Map Events Reduce Map Map Events OnRamp Events Map SAX Reduce Context Map CLOUD EVENT PROCESSING
  • 24. So What Do We Need? • Complex Event Processing • The Algorithm (SAX) • Processing Model – Streaming Map/Reduce • Context – The Historical Aspect • What Do We Call This? CLOUD EVENT PROCESSING
  • 25. What is DarkStar? – Platform as a Service (PaaS) • Provides Distributed – Complex Event Processing – Streaming Map/Reduce – Messaging – Web Services – Monitoring/Management – Applications are built on top, or inside • SAX runs inside of DarkStar – SAX is not a component of DarkStar, but an add-in library – And deployed in a cluster • Virtualized Resources CLOUD EVENT PROCESSING
  • 26. DarkStar • What patterns are occurring in my data, right now? – CEP based streaming Map/Reduce • Use a cluster of machines • When did this pattern happen before? – Database with embedded Map/Reduce • No need to move data outside the database for processing CLOUD EVENT PROCESSING
  • 27. The Cloud • Elastic Resource – Grows/Shrinks according to demand • Virtualization – Efficient utilization of compute • The Previously Unthinkable – Is now possible, if not already commonplace • Peering can provide access to Big Pipes and Secure Data CLOUD EVENT PROCESSING
  • 28. Thank You! • Questions? • Contact Me – Colin Clark – @EventCloudPro – cpclark@cloudeventprocessing.com CLOUD EVENT PROCESSING