SlideShare uma empresa Scribd logo
1 de 35
Brian O’Neill                           Taylor Goetz
Lead Architect, Health Market Science   Development Lead, Health Market Science
boneill@healthmarketscience.com         ptgoetz@healthmarketscience.com
@boneill42                              @ptgoetz
Agenda
     Use   Case
      What is CEP? Why? What for?
     Storm
      Background
      Cluster configuration
     Examples   / Demo
     Future : Trident
Our Products




   Master Data Management
     Good, bad doctors?
   Prescriber eligibility and remediation.
Cassandra to the Rescue



1000’s of Feeds

                  C*                                   Masterfile




                  Big Data for us == Variety of Data

        Δt
But…
 Search unstructured data
 Real-time Analytics / Reporting
 Transactional Processing
     Changes reflected immediately.
     Wide-row Indexes
What might that look like?


                wide-row index
    C*                                          I’m happy




     RDBMS




             Provide for Polyglot Persistence
What we did wrong… (part 1)




 Could not react to transactional changes
 Needed extra logic to track what changed
 Took too long
What we did wrong… (part 2)
                                    Wide Row
                                    ….



      C*
                                    Indexing



            TRIGGERS



 Good Intentions
   Take the onus off the clients
 Bad Result
   Guaranteeing execution
   Write Overhead
What is Complex Event
Processing?
 Event processing is a method of tracking and
  analyzing (processing) streams of information
  (data) about things that happen (events),[1]
  and deriving a conclusion from them.
 Complex event processing, or CEP, is event
  processing that combines data from multiple
  sources[2] to infer events or patterns that
  suggest more complicated circumstances.

                 http://en.wikipedia.org/wiki/Complex_event_processing
Imagine…
   Treating CRUD operations as events in a
    system.

   Then Suddenly,


        CEP = (ETL or Analytics)
What Storm is to us…
  Crud
   Op     ETL   Dimensional
                              Enrichment
                  Counts



                                Fuzzy
          SoR    RDBMS
                                Index



         A High Throughput
         Data Processing Pipeline
Storm Overview
 Open-Sourced by Twitter in 2011
 Distributed Realtime Computation System
 Fault Tolerant
 Highly Scalable
 Guaranteed Processing
 Operates on one or more streams of data (i.e.
  CEP)
Anatomy of a Storm Cluster

   Nimbus
     Master Node
   Zookeeper
     Cluster Coordination
   Supervisors
     Worker Nodes
Storm Components
   Spouts
     Stream Sources
   Bolts
     Unit of Computation
   Topologies
     Combination of n Spouts and n Bolts
     Defines the overall “Computation”
Storm Spouts
   Represents a source (stream) of data
     Queues (JMS, Kafka, Kestrel, etc.)
     Twitter Firehose
     Sensor Data
   Emits “Tuples” (Events) based on source
     Primary Storm data structure
     Set of Key-Value pairs
Storm Bolts
 Receive Tuples from Spouts or other Bolts
 Operate on, or React to Data
     Functions/Filters/Joins/Aggregations
     Database writes/lookups
   Optionally emit additional Tuples
Storm Topologies
 Data flow between spouts and bolts
 Routing of Tuples between spouts/bolts
     Stream “Groupings”
 Parallelism of Components
 Long-Lived
Storm and Cassandra
   Use Cases:
     Write Storm Tuple data to C*
      ○ Computation Results
      ○ Pre-computed indices


     Read data from C* and emit Storm Tuples
      ○ Dynamic Lookups




                      http://github.com/hmsonline/storm-cassandra
Storm Cassandra Bolt Types

                      CassandraBolt



                        Cassandra
                        LookupBolt
                                                     C*
   CassandraBolt
     Writes data to Cassandra
     Available in Batching and Non-Batching
   CassandraLookupBolt
     Reads data from Cassandra
                       http://github.com/hmsonline/storm-cassandra
Storm-Cassandra Project
   Provides generic Bolts for writing/reading
    Storm Tuples to/from C*


                             Tuple
              Tuple         Mapper         Rows




               Tuples
                            Columns
                            Mapper         Columns    C*
                        http://github.com/hmsonline/storm-cassandra
Storm-Cassandra Project
   TupleMapper Interface
     Tells the CassandraBolt how to write a tuple to an
     arbitrary data model


   Given a Storm Tuple:
     Map to Column Family
     Map to Row Key
     Map to Columns


                       http://github.com/hmsonline/storm-cassandra
Storm-Cassandra Project
   ColumnsMapper Interface
     Tells the CassandraLookupBolt how to transform a
     C* row into a Storm Tuple


   Given a C* Row Key and list of Columns:
     Return a list of Storm Tuples




                       http://github.com/hmsonline/storm-cassandra
Storm-Cassandra Project
   Current State:
     Version 0.4.0-WIP
     Uses Astyanax Client
     Several out-of-the-box *Mapper Implementations:
      ○ Basic Key-Value Columns
      ○ Value-less Columns
      ○ Counter Columns
      ○ Lookup by row key
      ○ Lookup by range query
     Initial pass at Trident support
     Initial pass at Composite Column Support

                       http://github.com/hmsonline/storm-cassandra
Storm-Cassandra Project
   Future Plans:
     Switch to CQL (???)
     Full Trident Support




                       http://github.com/hmsonline/storm-cassandra
Word Count Demo




          http://github.com/hmsonline/storm-cassandra
DRPC
Reach Demo
Trident
   Provides a higher-level abstraction for stream
    processing
     Constructs for state management and Batching
 Adds additional primitives that abstract away
  common topological patterns
 Deprecates transactional topologies
 Distributes with Storm
Sample Trident Operations
   Partition Local
     Functions      ( execute(x)  x + y )
     Filters        ( isKeep(x)  0,x )
     PartitionAggregate
      ○ Combiner    ( pairwise combining )
      ○ Reducer     ( iterative accumulation )
      ○ Aggregator ( byoa )
A sample topology
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
    topology.newStream("spout1", spout)
      .each(new Fields("sentence"),
                  new Split(),
                  new Fields("word"))
      .groupBy(new Fields("word"))
      .persistentAggregate(
        MemcachedState.opaque(serverLocations),
                 new Count(),
                 new Fields("count"))
      .parallelismHint(6);

                     https://github.com/nathanmarz/storm/wiki/Trident-state
Trident State
Sequenced writes by batch/transaction id.
   Spouts
     Transactional
      ○ Batch contents never change
     Opaque
      ○ Batch contents can change
   State
     Transactional
      ○ Store tx_id with counts to maintain sequencing of writes.
     Opaque
      ○ Store previous value in order to overwrite the current value
        when contents of a batch change.
Shameless Shoutouts
   HMS (https://github.com/hmsonline/)
     storm-cassandra
     storm-elastic-search
     storm-jdbi (coming soon)


   ptgoetz (https://github.com/ptgoetz)
     storm-jms
     storm-signals
Brian O’Neill                           Taylor Goetz
Lead Architect, Health Market Science   Development Lead, Health Market Science
boneill@healthmarketscience.com         ptgoetz@healthmarketscience.com
@boneill42                              @ptgoetz

Mais conteúdo relacionado

Mais procurados

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 

Mais procurados (20)

Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Fast NoSQL from HDDs?
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs?
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
Omid: A Transactional Framework for HBase
Omid: A Transactional Framework for HBaseOmid: A Transactional Framework for HBase
Omid: A Transactional Framework for HBase
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under Control
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 

Destaque

Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
WSO2
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
Srinath Perera
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
DataStax
 

Destaque (15)

Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
 
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEPAnalyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEP
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Social Media - KPI & ROI
Social Media - KPI & ROI Social Media - KPI & ROI
Social Media - KPI & ROI
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Siddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing ImplementationsSiddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing Implementations
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide Rows
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4jBases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 

Semelhante a C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
P. Taylor Goetz
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
MongoDB
 

Semelhante a C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm (20)

Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizardPhily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
Influx data basic
Influx data basicInflux data basic
Influx data basic
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うこと最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うこと
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Analyzing On-Chip Interconnect with Modern C++
Analyzing On-Chip Interconnect with Modern C++Analyzing On-Chip Interconnect with Modern C++
Analyzing On-Chip Interconnect with Modern C++
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
 
24-TensorFlow-Clipper.pptxnjjjjnjjjjjjmm
24-TensorFlow-Clipper.pptxnjjjjnjjjjjjmm24-TensorFlow-Clipper.pptxnjjjjnjjjjjjmm
24-TensorFlow-Clipper.pptxnjjjjnjjjjjjmm
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 

Mais de DataStax

Mais de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

  • 1. Brian O’Neill Taylor Goetz Lead Architect, Health Market Science Development Lead, Health Market Science boneill@healthmarketscience.com ptgoetz@healthmarketscience.com @boneill42 @ptgoetz
  • 2. Agenda  Use Case  What is CEP? Why? What for?  Storm  Background  Cluster configuration  Examples / Demo  Future : Trident
  • 3. Our Products  Master Data Management  Good, bad doctors?  Prescriber eligibility and remediation.
  • 4. Cassandra to the Rescue 1000’s of Feeds C* Masterfile Big Data for us == Variety of Data Δt
  • 5. But…  Search unstructured data  Real-time Analytics / Reporting  Transactional Processing  Changes reflected immediately.  Wide-row Indexes
  • 6. What might that look like? wide-row index C* I’m happy RDBMS Provide for Polyglot Persistence
  • 7. What we did wrong… (part 1)  Could not react to transactional changes  Needed extra logic to track what changed  Took too long
  • 8. What we did wrong… (part 2) Wide Row …. C* Indexing TRIGGERS  Good Intentions  Take the onus off the clients  Bad Result  Guaranteeing execution  Write Overhead
  • 9. What is Complex Event Processing?  Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events),[1] and deriving a conclusion from them.  Complex event processing, or CEP, is event processing that combines data from multiple sources[2] to infer events or patterns that suggest more complicated circumstances. http://en.wikipedia.org/wiki/Complex_event_processing
  • 10. Imagine…  Treating CRUD operations as events in a system.  Then Suddenly, CEP = (ETL or Analytics)
  • 11. What Storm is to us… Crud Op ETL Dimensional Enrichment Counts Fuzzy SoR RDBMS Index A High Throughput Data Processing Pipeline
  • 12.
  • 13. Storm Overview  Open-Sourced by Twitter in 2011  Distributed Realtime Computation System  Fault Tolerant  Highly Scalable  Guaranteed Processing  Operates on one or more streams of data (i.e. CEP)
  • 14. Anatomy of a Storm Cluster  Nimbus  Master Node  Zookeeper  Cluster Coordination  Supervisors  Worker Nodes
  • 15. Storm Components  Spouts  Stream Sources  Bolts  Unit of Computation  Topologies  Combination of n Spouts and n Bolts  Defines the overall “Computation”
  • 16. Storm Spouts  Represents a source (stream) of data  Queues (JMS, Kafka, Kestrel, etc.)  Twitter Firehose  Sensor Data  Emits “Tuples” (Events) based on source  Primary Storm data structure  Set of Key-Value pairs
  • 17. Storm Bolts  Receive Tuples from Spouts or other Bolts  Operate on, or React to Data  Functions/Filters/Joins/Aggregations  Database writes/lookups  Optionally emit additional Tuples
  • 18. Storm Topologies  Data flow between spouts and bolts  Routing of Tuples between spouts/bolts  Stream “Groupings”  Parallelism of Components  Long-Lived
  • 19. Storm and Cassandra  Use Cases:  Write Storm Tuple data to C* ○ Computation Results ○ Pre-computed indices  Read data from C* and emit Storm Tuples ○ Dynamic Lookups http://github.com/hmsonline/storm-cassandra
  • 20. Storm Cassandra Bolt Types CassandraBolt Cassandra LookupBolt C*  CassandraBolt  Writes data to Cassandra  Available in Batching and Non-Batching  CassandraLookupBolt  Reads data from Cassandra http://github.com/hmsonline/storm-cassandra
  • 21. Storm-Cassandra Project  Provides generic Bolts for writing/reading Storm Tuples to/from C* Tuple Tuple Mapper Rows Tuples Columns Mapper Columns C* http://github.com/hmsonline/storm-cassandra
  • 22. Storm-Cassandra Project  TupleMapper Interface  Tells the CassandraBolt how to write a tuple to an arbitrary data model  Given a Storm Tuple:  Map to Column Family  Map to Row Key  Map to Columns http://github.com/hmsonline/storm-cassandra
  • 23. Storm-Cassandra Project  ColumnsMapper Interface  Tells the CassandraLookupBolt how to transform a C* row into a Storm Tuple  Given a C* Row Key and list of Columns:  Return a list of Storm Tuples http://github.com/hmsonline/storm-cassandra
  • 24. Storm-Cassandra Project  Current State:  Version 0.4.0-WIP  Uses Astyanax Client  Several out-of-the-box *Mapper Implementations: ○ Basic Key-Value Columns ○ Value-less Columns ○ Counter Columns ○ Lookup by row key ○ Lookup by range query  Initial pass at Trident support  Initial pass at Composite Column Support http://github.com/hmsonline/storm-cassandra
  • 25. Storm-Cassandra Project  Future Plans:  Switch to CQL (???)  Full Trident Support http://github.com/hmsonline/storm-cassandra
  • 26. Word Count Demo http://github.com/hmsonline/storm-cassandra
  • 27. DRPC
  • 29.
  • 30. Trident  Provides a higher-level abstraction for stream processing  Constructs for state management and Batching  Adds additional primitives that abstract away common topological patterns  Deprecates transactional topologies  Distributes with Storm
  • 31. Sample Trident Operations  Partition Local  Functions ( execute(x)  x + y )  Filters ( isKeep(x)  0,x )  PartitionAggregate ○ Combiner ( pairwise combining ) ○ Reducer ( iterative accumulation ) ○ Aggregator ( byoa )
  • 32. A sample topology TridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate( MemcachedState.opaque(serverLocations), new Count(), new Fields("count")) .parallelismHint(6); https://github.com/nathanmarz/storm/wiki/Trident-state
  • 33. Trident State Sequenced writes by batch/transaction id.  Spouts  Transactional ○ Batch contents never change  Opaque ○ Batch contents can change  State  Transactional ○ Store tx_id with counts to maintain sequencing of writes.  Opaque ○ Store previous value in order to overwrite the current value when contents of a batch change.
  • 34. Shameless Shoutouts  HMS (https://github.com/hmsonline/)  storm-cassandra  storm-elastic-search  storm-jdbi (coming soon)  ptgoetz (https://github.com/ptgoetz)  storm-jms  storm-signals
  • 35. Brian O’Neill Taylor Goetz Lead Architect, Health Market Science Development Lead, Health Market Science boneill@healthmarketscience.com ptgoetz@healthmarketscience.com @boneill42 @ptgoetz