SlideShare a Scribd company logo
1 of 40
Seattle Monthly Hadoop / Scalability /
           NoSQLMeetup

    DeWayne Flippi, Giga Spaces

B. Todd Burruss, Expedia (Cassandra)
Agenda
•   Lightning talks / community announcements
•   Main session
•   Bier @ Feierabend - 422 Yale Ave North
•   Hashtags #Seattle #Hadoop
GigaSpaces:
• DeWayne's talk will cover the joining of a real-
  time service/data fabric with NOSQL big data
  to create a complete linearly scalable solution
  supporting analytics, complex event
  processing, and reporting in both real-time
  and batch domains.
Expedia (Cassandra):
• Todd's session: Expedia needs the ability
  to search by price in a fast and efficient
  manner. Prices are complex objects
  containing base rate, taxes, fees, etc
  which means a calculation is required to
  determine the customer price. This
  makes searching by price difficult. What
  to do?
Building An Elastic Real Time NoSQL Platform
      Creating a platform for unlimited elastic
         computation power and storage
Motivation
• Complete elastic solution stack
• Applications that need massive “strategic” storage
  (disk-based NoSQL) and a real time (“tactical”)
  component
• Horizontally and vertically scalable
• Highly available
• Self healing
• Fault tolerant: suitable for commodity h/w strategy
• Simplified management and monitoring, vs
  conventional, multi-product solutions

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
What Is Real-Time?
• In this context, means “really fast”.
• Reads as low as 5 μs and typically under 1 ms
  for a fully replicated write.




  Source: http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Two Layer Approach
• Advantage: Minimal




                                                                                               Raw Event Stream




                                                                                                                     Raw Event Stream




                                                                                                                                        Raw Event Stream




                                                                                                                                                                ts
                                                                                      ents
  “impedance mismatch”




                                                                                                                                                                       en
                                                                                                                                                           Real Time Ev
                                                                          Real Time Ev
  between layers.
    – Both NoSQL cluster
      technologies, with similar
      advantages                                                                                                                                                            SCALE

• Grid layer serves as an in



                                                       Reporting Engine
                                                                                             In Memory Compute Cluster
  memory cache for interactive
                                                                                               Raw And Derived Events
  requests.
• Grid layer serves as a real time                                                                                              ...
                                                                                                                                                                            SCALE
  computation fabric for CEP, and
                                                                                                                  NoSQL Cluster
  limited ( to allocated memory)
  real time map/reduce capability.



                  ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Two Layer Approach (continued)
• Grid layer doing CEP can act as a filter, as
  many raw events get converted to
  semantic/business events, reducing
  meaningless data verbosity
• Grid layer provides scalable messaging
• NoSQL layer provides unlimited cheap storage
  on commodity hardware
• NoSQL layer provides virtually unlimited scale
  processing power
® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Basics Of In Memory DataGrid
                        Technology
 • An In Memory Data Grid (IMDG) is a data store
 • Grid just means “cluster”
 • Data can be partitioned across cluster nodes
 • Processing power near data storage
 • Distributed hash table
 • Application optimized data model denormalization
 • Nodes are typically configured with one or more
   replicas (sound familiar yet)?
 • Not a “cache”: a system of record, but can be used as a
   cache, or both

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Advanced Capabilities
•    Business logic (code) co-resident with data shards
•    Scalable messaging
•    Dynamic code execution across cluster
•    Multi-language support
•    Object-oriented
•    Document-oriented/schema free
•    Multi-level indexing
•    SQL Queries
•    Full ACID transaction support
•    Elastic scaling (automatic and manual)
•    Write-behind persistence
® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Features: IMDG vs NoSQL
                                                              Disk Based
                       Data Grid
                                                                NoSQL



         Low Latency
                                                                           Eventual/Tunable
                                      Horizontally Scalable
                                                                             Consistency

                                        Code co-location
 Service remoting
                                       Parallel Execution                        Unlimited scale
                                         Fault Tolerant
                                         Cloud enabled                        Hadoop tools
Transactional
                                        Highly Available

                                             Elastic
    Messaging

                                     Platform Independent

 Complex Event Processing               Flexible Schema




           ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Vive La Difference
• The IMDG compliments a NoSQL store:
      – Can serve as a short term request cache (side cache or
        inline)
      – Can serve as a cache for MR results
      – Enables event driven architectures / CEP
      – In memory map/reduce
      – Very fast writes, regardless of NoSQL store
      – Transactional layer: can essentially turn “eventual”
        consistency into pure transactional persistency
        without a performance hit
      – Highly available and independently scalable

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
A Complete Scalable Application
                    Platform



                                                           Raw Event Stream




                                                                                 Raw Event Stream




                                                                                                    Raw Event Stream




                                                                                                                            ts
                                                 vents




                                                                                                                                    n
                                                                                                                       Real Time Eve
                                     Real Time E



                                                                                                                                        SCALE
                  Reporting Engine




                                                         In Memory Compute Cluster

                                                           Raw And Derived Events


                                                                                            ...
                                                                                                                                        SCALE

                                                                              NoSQL Cluster



® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Key Implementation Issues
• Grid must support reliable asynchronous persistence
      – If not reliable: in-flight data is at risk. Ideally tunable to
        accommodate differing risk tolerance.
      – If not asynchronous: too slow
      – If not persistent: obviously nothing gets send to disk

• To do more than a distributed cache, grid must support
  code and data partitioning
      – Ideally, code is collocated in memory with data partition
      – Needed to support CEP, application, and service remoting
        capabilities

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Key Implementation Issues
• Grid ideally supports FIFO entry ordering
     – Key to using grid as a queue
     – Key to scaling messaging without an additional tier
     – Combined with co-located business logic, operates at memory
       speeds

• Write speed on the NoSQLlayer
     – Grid is, in effect, queuing entries to the NoSQL layer
     – If the NoSQL layer cannot keep up, in memory grid backs up
     – This behavior is an asset, unless an unanticipated, sustained
       flood occurs.
     – The faster the write speed the better


® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Use Case 1 – Event Cloud
• Complex event processing
        Collect events in real time    Transform into decision factors
        •Interactions                  •Good customer
        •Orders                        •Pays 3-6 days early
        •Bills                         •Decreasing usage
        •Payments                      •Missed payment
        •Activations                   •Unusual bill
        •…                             •App usage


   Original events, possibly scrubbed or annotated, are passed
    through
   Business logic derived “synthetic events” constructed from
    raw event stream. Possible rule engine integration(e.g.
    Drools).
   Derived events and analytics passed on to NoSQL layer
   Other events forwarded to external listeners, systems
® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Use Case 2 – Time Bounded
• Time Bounded – suited to operations with daily business cycle (e.g.
  trading)
• Current day (or other time period that will fit in memory) held in
  memory, along with related application state, caching etc…
• Still streaming operations to underlying NoSQL platform, or hold for
  end of day flush if back end can’t write fast enough.
• Supports application hosting, messaging, and complex event
  processing.
• External clients are aware of “current day” store, vs archival.
• Large scale reports/analytics run in background on NoSQL archive.




® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Use Case 3 - LRU
• Grid holds a subset of NoSQL store, and
  supports an LRU caching model.
• In line or side-cache.
• Appropriate only in cases where, like any
  cache, usage pattern does not generate many
  cache misses.
• Still supports CEP, messaging, and
  computation scaling (provided grid product
  supports it).

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Wishlist
• This platform concept is still at an early stage
• For Gigaspaces, integrations already exist for Cassandra
  and MongoDB.
• Customers are currently implementing solutions
• Stuff I’d like to see:
      – Unified management and scaling. Shared infrastructure.
      – Grid/NoSQL aware hive façade that can run MR jobs on
        both. Perhaps other Hadoop tools integration
      – Deeper integration. To further optimize write
        speed/capacity, and perhaps offload some in-memory
        aspects of underlying NoSQL platform to minimize
        duplication and possibly optimize elasticity.

® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
Conclusion
• Two shared nothing “NoSQL” architectures
  complementing each other
• Fully elastic/scalable
• Ultra high performance/low latency combined
  with unlimited scale.
• Full application stack
• Highly reliable and self-healing
• Scalable complex event handling
• Multi-language
• Simple. Two products.
® Copyright 2011 Gigaspaces
Ltd. All Rights Reserved
DataStax is the company behind Apache Cassandra. Besides
contributing the majority of the code for the open source
project, DataStax also provides products and services for
Apache Cassandra
   DataStax Community – the 100% free way to get started with Apache
    Cassandra (free management software and packaging!)
   DataStax Enterprise –Hadoop Analytics and Support!

 Download + Docs at http://www.datastax.com/dev
Expedia Hotel Price Cache

       B. Todd Burruss
Who am I
•   B. Todd Burruss – Sr Architect, Expedia
•   Worked with Cassandra for nearly 2 years
•   Committer on Hector (Java Client)
•   General testing on Cassandra, working with
    community, but not committer
Expedia’s Motivation
We need the ability to search by price in a fast
and efficient manner. Prices are complex
objects containing base rate, taxes, fees, etc
which means a calculation is required to
determine the customer price. This makes
searching by price difficult. What to do?
What to Do?
• Precalculate Total Price! Let’s look at Hotels
• Hotel prices vary based on Date, Length of
  Stay (LOS), Number Adult Travelers (AT)
• Customers book in advance so must have
  prices fairly far into the future (1 year)
• Approximately 140,000 hotels in our inventory
• Support 1-14 LOS and 1-4 AT
• Over 2 billion prices!
Example of Hotel Pricing
• A customer’s family of 4 wants to stay 7 nights
  at the Hilton in Maui, checkin on 12/1/2011,
  checkout on 12/8/2011
• Each night could be a different rate because of
  day of week, conference in the area, holiday,
  etc.
• So must sum the rate, taxes and fees for each
  night to get the total room price
Use Case : Median Price
• Ex: What is the median hotel price in Seattle
  for each day between 11/1 and 11/30?
• 200 * 30 = 6,000 prices returned from
  Cassandra – median calculated on client.
• Idea is customer searches city and date
  range, then narrows search to smaller area
  and dates
• Prices are volatile, so want close to real-time
  updates
Enter Cassandra : Expectations
• Cassandra can handle large amounts of data
  nicely : billions of price objects
• Cassandra is very fast (read and write.) Can
  handle the volatile prices
• Cluster expands easily – our dataset is growing
• Easy to setup, administer and use
• Operational costs are good
• Support is available
Solution : Data Model
• 1 ColumnFamily : Prices
• Row key : date + LOS + AT
• Column name : Hotel ID – 140,000 columns
  (integer comparator)
• Column value : precalculated hotel price for date
  + LOS + AT
• 365 * 14 * 4 = 20,440 row keys
• 20,440 * 140,000 = 2,861,600,000 price objects
Solution : Retrieving Prices
• Generate keys for each checkin day, LOS, AT
  combination wanted
• Query Cassandra using the generated keys,
  using specific column names (hotel IDs)
• For family example, one key, one column =
  12/1/2011 + 7 + 4 = total price for hilton hotel
• For median example, 30 keys, 200 columns
  per key. Client receives 30 result rows, then
  calculates median for each row
Testing Scenario
• Found 19 boxes, 16gb old RAM, 1 old 4 core CPU
• 18th and 19th boxes are clients + Cassandra
  servers (don’t do this in prod)
• Can never find enough hardware :)
• 2 Keyspaces on cluster
• Reduce dataset to 90 days, up to 7 day LOS, up to
  4 AT, 70k hotels – removes disk I/O from test and
  leaves some RAM for caching
• We believe our hot data will be in RAM
• Query 30 days and 200 hotels : 6000 price objects
Results : Page 1
Default Memory and Column Index Settings
• ~50ms : -Xmn400m, 64k index, 8gb, no row
  cache, no key cache : pretty good
• ~800ms : -Xmn400m, 64k index, 8gb, no key
  cache, 600 row cache (Serializing) – copying to
  heap accounts for slowness
Results : Page 2
Change Index to 1k Column Pages:
• ~45ms : -Xmn400m, 1k index, 8gb, no row
  cache, no key cache
• ~29ms : -Xmn400m, 1k index, 8gb, 600 row
  cache (ConcurrentLinkedHash)
1k index saves a little, but data is all in RAM.
Bigger savings when hitting disk
Results : Page 3
Tune Memory
• ~45ms : -Xmn200m, 1k index, 8gb, no row cache,
   no key cache
• ~29ms : -Xmn200m, 1k index, 8gb, 600 row cache
   (ConcurrentLinkedHash)
Increasing Old Gen will not help because all data fits
in RAM, based on reported JVM usage. Reducing
New Gen moves from less frequent long pauses to
more frequent short pauses. No help.
Take Away
• Test is worst case scenario. Completely random
  usage pattern – which is rarely (if ever) the case
  in production. Causes cache churn if cache is too
  small
• Wide rows are not always bad. Access columns
  sequentially or by range is very good (e.g. time
  series data)
• Serializing cache has trade off between serialized
  objects and copying to/from off heap storage
References
• Query plan description by Aaron Morton :
  http://thelastpickle.com/2011/07/04/Cassand
  ra-Query-Plans/
• Disk sizing by B. Todd Burruss (me):
  http://btoddb-cass-storage.blogspot.com/
http://expediajobs.com/
•   Cassandra
•   Hadoop
•   Machine Learning
•   Grid Computing
•   Java

More Related Content

What's hot

Cloud computing bringing the dark side of enterprise apps into the light by...
Cloud computing   bringing the dark side of enterprise apps into the light by...Cloud computing   bringing the dark side of enterprise apps into the light by...
Cloud computing bringing the dark side of enterprise apps into the light by...Khazret Sapenov
 
NYC Meetup November 15, 2012
NYC Meetup November 15, 2012NYC Meetup November 15, 2012
NYC Meetup November 15, 2012NuoDB
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Simplifying network management with Platespin
Simplifying network management with PlatespinSimplifying network management with Platespin
Simplifying network management with PlatespinAdvanced Logic Industries
 
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure PlatformVitor Tomaz
 
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage GridsDB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage GridsLuís Ganhão
 
Fra enkel J2SE til Grid computing med GigaSpaces XAP
Fra enkel J2SE til Grid computing med GigaSpaces XAPFra enkel J2SE til Grid computing med GigaSpaces XAP
Fra enkel J2SE til Grid computing med GigaSpaces XAPmudnaes
 
Cloud product presentation
Cloud product presentationCloud product presentation
Cloud product presentationSKALI Group
 
What Can FPGA Designers Do With Personal Data Centers?
What Can FPGA Designers Do With Personal Data Centers?What Can FPGA Designers Do With Personal Data Centers?
What Can FPGA Designers Do With Personal Data Centers?plunify
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
5 dani künzli citrix networking news 1
5 dani künzli citrix networking news 15 dani künzli citrix networking news 1
5 dani künzli citrix networking news 1Digicomp Academy AG
 
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...cwensel
 
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...SL Corporation
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion Yannick Quentel
 
Cloud Economics: Optimising for Cost
Cloud Economics: Optimising for CostCloud Economics: Optimising for Cost
Cloud Economics: Optimising for CostAmazon Web Services
 
Virtualisatie In Het NGDC - Marc Janssen
Virtualisatie In Het NGDC - Marc JanssenVirtualisatie In Het NGDC - Marc Janssen
Virtualisatie In Het NGDC - Marc JanssenHPDutchWorld
 
Cloud presentation 13 sept 2011-mia
Cloud presentation  13 sept 2011-miaCloud presentation  13 sept 2011-mia
Cloud presentation 13 sept 2011-miaSKALI Group
 
Dell Management And Automation Solutions For IT Infrastructures
Dell Management And Automation Solutions For IT InfrastructuresDell Management And Automation Solutions For IT Infrastructures
Dell Management And Automation Solutions For IT InfrastructuresAgora Group
 
SSNS 2012 Detailed Services Presentation
SSNS 2012 Detailed Services PresentationSSNS 2012 Detailed Services Presentation
SSNS 2012 Detailed Services Presentationcampojo
 

What's hot (20)

Cloud computing bringing the dark side of enterprise apps into the light by...
Cloud computing   bringing the dark side of enterprise apps into the light by...Cloud computing   bringing the dark side of enterprise apps into the light by...
Cloud computing bringing the dark side of enterprise apps into the light by...
 
NYC Meetup November 15, 2012
NYC Meetup November 15, 2012NYC Meetup November 15, 2012
NYC Meetup November 15, 2012
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Simplifying network management with Platespin
Simplifying network management with PlatespinSimplifying network management with Platespin
Simplifying network management with Platespin
 
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
 
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage GridsDB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
 
Fra enkel J2SE til Grid computing med GigaSpaces XAP
Fra enkel J2SE til Grid computing med GigaSpaces XAPFra enkel J2SE til Grid computing med GigaSpaces XAP
Fra enkel J2SE til Grid computing med GigaSpaces XAP
 
Cloud product presentation
Cloud product presentationCloud product presentation
Cloud product presentation
 
What Can FPGA Designers Do With Personal Data Centers?
What Can FPGA Designers Do With Personal Data Centers?What Can FPGA Designers Do With Personal Data Centers?
What Can FPGA Designers Do With Personal Data Centers?
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
5 dani künzli citrix networking news 1
5 dani künzli citrix networking news 15 dani künzli citrix networking news 1
5 dani künzli citrix networking news 1
 
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
 
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion Le Cloud de proximité by Monaco Telecom et Interxion
Le Cloud de proximité by Monaco Telecom et Interxion
 
Cloud Economics: Optimising for Cost
Cloud Economics: Optimising for CostCloud Economics: Optimising for Cost
Cloud Economics: Optimising for Cost
 
Virtualisatie In Het NGDC - Marc Janssen
Virtualisatie In Het NGDC - Marc JanssenVirtualisatie In Het NGDC - Marc Janssen
Virtualisatie In Het NGDC - Marc Janssen
 
Cloud presentation 13 sept 2011-mia
Cloud presentation  13 sept 2011-miaCloud presentation  13 sept 2011-mia
Cloud presentation 13 sept 2011-mia
 
Dell Management And Automation Solutions For IT Infrastructures
Dell Management And Automation Solutions For IT InfrastructuresDell Management And Automation Solutions For IT Infrastructures
Dell Management And Automation Solutions For IT Infrastructures
 
SSNS 2012 Detailed Services Presentation
SSNS 2012 Detailed Services PresentationSSNS 2012 Detailed Services Presentation
SSNS 2012 Detailed Services Presentation
 

Viewers also liked

How Verizon Uses Disruptive Developments for Organized Progress
How Verizon Uses Disruptive Developments for Organized ProgressHow Verizon Uses Disruptive Developments for Organized Progress
How Verizon Uses Disruptive Developments for Organized ProgressMongoDB
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthdaveconnors
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer MongoDB
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 

Viewers also liked (6)

C* Data Modeling
C* Data ModelingC* Data Modeling
C* Data Modeling
 
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
 
How Verizon Uses Disruptive Developments for Organized Progress
How Verizon Uses Disruptive Developments for Organized ProgressHow Verizon Uses Disruptive Developments for Organized Progress
How Verizon Uses Disruptive Developments for Organized Progress
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 

Similar to Seattle Scalability - GigaSpaces / Cassandra

Lap around windows azure
Lap around windows azureLap around windows azure
Lap around windows azureManish Corriea
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
Dancing about architecture
Dancing about architectureDancing about architecture
Dancing about architectureCoraline Ehmke
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsYury Kaliaha
 
An Introduction To Space Based Architecture
An Introduction To Space Based ArchitectureAn Introduction To Space Based Architecture
An Introduction To Space Based ArchitectureAmin Abbaspour
 
vert.x - asynchronous event-driven web applications on the JVM
vert.x - asynchronous event-driven web applications on the JVMvert.x - asynchronous event-driven web applications on the JVM
vert.x - asynchronous event-driven web applications on the JVMjbandi
 
Wed 1130 aasman_jans_color
Wed 1130 aasman_jans_colorWed 1130 aasman_jans_color
Wed 1130 aasman_jans_colorDATAVERSITY
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarHitachi Vantara
 
Clustrix Database Overview
Clustrix Database OverviewClustrix Database Overview
Clustrix Database OverviewClustrix
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBasedarach
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...Real-Time Innovations (RTI)
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
 
Application architecture for cloud
Application architecture for cloudApplication architecture for cloud
Application architecture for cloudMarco Parenzan
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureNuno Godinho
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksSudhir Tonse
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureBob Rhubart
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSybase Türkiye
 

Similar to Seattle Scalability - GigaSpaces / Cassandra (20)

Lap around windows azure
Lap around windows azureLap around windows azure
Lap around windows azure
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
Dancing about architecture
Dancing about architectureDancing about architecture
Dancing about architecture
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based Applications
 
An Introduction To Space Based Architecture
An Introduction To Space Based ArchitectureAn Introduction To Space Based Architecture
An Introduction To Space Based Architecture
 
vert.x - asynchronous event-driven web applications on the JVM
vert.x - asynchronous event-driven web applications on the JVMvert.x - asynchronous event-driven web applications on the JVM
vert.x - asynchronous event-driven web applications on the JVM
 
Wed 1130 aasman_jans_color
Wed 1130 aasman_jans_colorWed 1130 aasman_jans_color
Wed 1130 aasman_jans_color
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinar
 
Clustrix Database Overview
Clustrix Database OverviewClustrix Database Overview
Clustrix Database Overview
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
Application architecture for cloud
Application architecture for cloudApplication architecture for cloud
Application architecture for cloud
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows Azure
 
Introduction to AWS: keynote
Introduction to AWS: keynoteIntroduction to AWS: keynote
Introduction to AWS: keynote
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building Blocks
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the Future
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming Processing
 

More from clive boulton

Camlistore reprise at Google NYC
Camlistore reprise at Google NYCCamlistore reprise at Google NYC
Camlistore reprise at Google NYCclive boulton
 
Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014clive boulton
 
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...clive boulton
 
Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013clive boulton
 
Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013clive boulton
 
Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26clive boulton
 
Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22clive boulton
 
Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?clive boulton
 
Seattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slidesSeattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slidesclive boulton
 
Seattle scalability meetup intro
Seattle scalability meetup introSeattle scalability meetup intro
Seattle scalability meetup introclive boulton
 
Seattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePagesSeattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePagesclive boulton
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetupclive boulton
 
Seattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetupSeattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetupclive boulton
 
Leapfrogging with legacy
Leapfrogging with legacyLeapfrogging with legacy
Leapfrogging with legacyclive boulton
 
Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. clive boulton
 
Whole Chain Traceability Consortium
Whole Chain Traceability ConsortiumWhole Chain Traceability Consortium
Whole Chain Traceability Consortiumclive boulton
 

More from clive boulton (20)

Camlistore reprise at Google NYC
Camlistore reprise at Google NYCCamlistore reprise at Google NYC
Camlistore reprise at Google NYC
 
Riak TS
Riak TSRiak TS
Riak TS
 
Ignitepii2014
Ignitepii2014Ignitepii2014
Ignitepii2014
 
Personal databank
Personal databankPersonal databank
Personal databank
 
Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014Seattle Scalability meetup intro slides, Jan 22, 2014
Seattle Scalability meetup intro slides, Jan 22, 2014
 
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
Seattle Scalability meetup intro slides - Dec 4, 2013 - Scaling SQL + Scaling...
 
Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013
 
Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013Seattle scalability meetup intro slides 24 july 2013
Seattle scalability meetup intro slides 24 july 2013
 
Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26Seattle Scalability Meetup intro pptx - June 26
Seattle Scalability Meetup intro pptx - June 26
 
Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22Seattle scalability meetup intro ppt May 22
Seattle scalability meetup intro ppt May 22
 
Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?Patent Trollls gonna kill VRM?
Patent Trollls gonna kill VRM?
 
Seattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slidesSeattle scalability meetup March 27,2013 intro slides
Seattle scalability meetup March 27,2013 intro slides
 
Seattle scalability meetup intro
Seattle scalability meetup introSeattle scalability meetup intro
Seattle scalability meetup intro
 
Seattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePagesSeattle Scalability Meetup | Accumulo and WhitePages
Seattle Scalability Meetup | Accumulo and WhitePages
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetup
 
Seattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetupSeattle montly hadoop nosql scalability meetup
Seattle montly hadoop nosql scalability meetup
 
Leapfrogging with legacy
Leapfrogging with legacyLeapfrogging with legacy
Leapfrogging with legacy
 
Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru.
 
Whole Chain Traceability Consortium
Whole Chain Traceability ConsortiumWhole Chain Traceability Consortium
Whole Chain Traceability Consortium
 
Wspm
WspmWspm
Wspm
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Seattle Scalability - GigaSpaces / Cassandra

  • 1. Seattle Monthly Hadoop / Scalability / NoSQLMeetup DeWayne Flippi, Giga Spaces B. Todd Burruss, Expedia (Cassandra)
  • 2. Agenda • Lightning talks / community announcements • Main session • Bier @ Feierabend - 422 Yale Ave North • Hashtags #Seattle #Hadoop
  • 3. GigaSpaces: • DeWayne's talk will cover the joining of a real- time service/data fabric with NOSQL big data to create a complete linearly scalable solution supporting analytics, complex event processing, and reporting in both real-time and batch domains.
  • 4. Expedia (Cassandra): • Todd's session: Expedia needs the ability to search by price in a fast and efficient manner. Prices are complex objects containing base rate, taxes, fees, etc which means a calculation is required to determine the customer price. This makes searching by price difficult. What to do?
  • 5.
  • 6. Building An Elastic Real Time NoSQL Platform Creating a platform for unlimited elastic computation power and storage
  • 7. Motivation • Complete elastic solution stack • Applications that need massive “strategic” storage (disk-based NoSQL) and a real time (“tactical”) component • Horizontally and vertically scalable • Highly available • Self healing • Fault tolerant: suitable for commodity h/w strategy • Simplified management and monitoring, vs conventional, multi-product solutions ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 8. What Is Real-Time? • In this context, means “really fast”. • Reads as low as 5 μs and typically under 1 ms for a fully replicated write. Source: http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/ ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 9. Two Layer Approach • Advantage: Minimal Raw Event Stream Raw Event Stream Raw Event Stream ts ents “impedance mismatch” en Real Time Ev Real Time Ev between layers. – Both NoSQL cluster technologies, with similar advantages SCALE • Grid layer serves as an in Reporting Engine In Memory Compute Cluster memory cache for interactive Raw And Derived Events requests. • Grid layer serves as a real time ... SCALE computation fabric for CEP, and NoSQL Cluster limited ( to allocated memory) real time map/reduce capability. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 10. Two Layer Approach (continued) • Grid layer doing CEP can act as a filter, as many raw events get converted to semantic/business events, reducing meaningless data verbosity • Grid layer provides scalable messaging • NoSQL layer provides unlimited cheap storage on commodity hardware • NoSQL layer provides virtually unlimited scale processing power ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 11. Basics Of In Memory DataGrid Technology • An In Memory Data Grid (IMDG) is a data store • Grid just means “cluster” • Data can be partitioned across cluster nodes • Processing power near data storage • Distributed hash table • Application optimized data model denormalization • Nodes are typically configured with one or more replicas (sound familiar yet)? • Not a “cache”: a system of record, but can be used as a cache, or both ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 12. Advanced Capabilities • Business logic (code) co-resident with data shards • Scalable messaging • Dynamic code execution across cluster • Multi-language support • Object-oriented • Document-oriented/schema free • Multi-level indexing • SQL Queries • Full ACID transaction support • Elastic scaling (automatic and manual) • Write-behind persistence ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 13. Features: IMDG vs NoSQL Disk Based Data Grid NoSQL Low Latency Eventual/Tunable Horizontally Scalable Consistency Code co-location Service remoting Parallel Execution Unlimited scale Fault Tolerant Cloud enabled Hadoop tools Transactional Highly Available Elastic Messaging Platform Independent Complex Event Processing Flexible Schema ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 14. Vive La Difference • The IMDG compliments a NoSQL store: – Can serve as a short term request cache (side cache or inline) – Can serve as a cache for MR results – Enables event driven architectures / CEP – In memory map/reduce – Very fast writes, regardless of NoSQL store – Transactional layer: can essentially turn “eventual” consistency into pure transactional persistency without a performance hit – Highly available and independently scalable ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 15. A Complete Scalable Application Platform Raw Event Stream Raw Event Stream Raw Event Stream ts vents n Real Time Eve Real Time E SCALE Reporting Engine In Memory Compute Cluster Raw And Derived Events ... SCALE NoSQL Cluster ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 16. Key Implementation Issues • Grid must support reliable asynchronous persistence – If not reliable: in-flight data is at risk. Ideally tunable to accommodate differing risk tolerance. – If not asynchronous: too slow – If not persistent: obviously nothing gets send to disk • To do more than a distributed cache, grid must support code and data partitioning – Ideally, code is collocated in memory with data partition – Needed to support CEP, application, and service remoting capabilities ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 17. Key Implementation Issues • Grid ideally supports FIFO entry ordering – Key to using grid as a queue – Key to scaling messaging without an additional tier – Combined with co-located business logic, operates at memory speeds • Write speed on the NoSQLlayer – Grid is, in effect, queuing entries to the NoSQL layer – If the NoSQL layer cannot keep up, in memory grid backs up – This behavior is an asset, unless an unanticipated, sustained flood occurs. – The faster the write speed the better ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 18. Use Case 1 – Event Cloud • Complex event processing Collect events in real time Transform into decision factors •Interactions •Good customer •Orders •Pays 3-6 days early •Bills •Decreasing usage •Payments •Missed payment •Activations •Unusual bill •… •App usage  Original events, possibly scrubbed or annotated, are passed through  Business logic derived “synthetic events” constructed from raw event stream. Possible rule engine integration(e.g. Drools).  Derived events and analytics passed on to NoSQL layer  Other events forwarded to external listeners, systems ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 19. Use Case 2 – Time Bounded • Time Bounded – suited to operations with daily business cycle (e.g. trading) • Current day (or other time period that will fit in memory) held in memory, along with related application state, caching etc… • Still streaming operations to underlying NoSQL platform, or hold for end of day flush if back end can’t write fast enough. • Supports application hosting, messaging, and complex event processing. • External clients are aware of “current day” store, vs archival. • Large scale reports/analytics run in background on NoSQL archive. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 20. Use Case 3 - LRU • Grid holds a subset of NoSQL store, and supports an LRU caching model. • In line or side-cache. • Appropriate only in cases where, like any cache, usage pattern does not generate many cache misses. • Still supports CEP, messaging, and computation scaling (provided grid product supports it). ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 21. Wishlist • This platform concept is still at an early stage • For Gigaspaces, integrations already exist for Cassandra and MongoDB. • Customers are currently implementing solutions • Stuff I’d like to see: – Unified management and scaling. Shared infrastructure. – Grid/NoSQL aware hive façade that can run MR jobs on both. Perhaps other Hadoop tools integration – Deeper integration. To further optimize write speed/capacity, and perhaps offload some in-memory aspects of underlying NoSQL platform to minimize duplication and possibly optimize elasticity. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 22. Conclusion • Two shared nothing “NoSQL” architectures complementing each other • Fully elastic/scalable • Ultra high performance/low latency combined with unlimited scale. • Full application stack • Highly reliable and self-healing • Scalable complex event handling • Multi-language • Simple. Two products. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 23.
  • 24. DataStax is the company behind Apache Cassandra. Besides contributing the majority of the code for the open source project, DataStax also provides products and services for Apache Cassandra  DataStax Community – the 100% free way to get started with Apache Cassandra (free management software and packaging!)  DataStax Enterprise –Hadoop Analytics and Support!  Download + Docs at http://www.datastax.com/dev
  • 25. Expedia Hotel Price Cache B. Todd Burruss
  • 26. Who am I • B. Todd Burruss – Sr Architect, Expedia • Worked with Cassandra for nearly 2 years • Committer on Hector (Java Client) • General testing on Cassandra, working with community, but not committer
  • 27. Expedia’s Motivation We need the ability to search by price in a fast and efficient manner. Prices are complex objects containing base rate, taxes, fees, etc which means a calculation is required to determine the customer price. This makes searching by price difficult. What to do?
  • 28. What to Do? • Precalculate Total Price! Let’s look at Hotels • Hotel prices vary based on Date, Length of Stay (LOS), Number Adult Travelers (AT) • Customers book in advance so must have prices fairly far into the future (1 year) • Approximately 140,000 hotels in our inventory • Support 1-14 LOS and 1-4 AT • Over 2 billion prices!
  • 29. Example of Hotel Pricing • A customer’s family of 4 wants to stay 7 nights at the Hilton in Maui, checkin on 12/1/2011, checkout on 12/8/2011 • Each night could be a different rate because of day of week, conference in the area, holiday, etc. • So must sum the rate, taxes and fees for each night to get the total room price
  • 30. Use Case : Median Price • Ex: What is the median hotel price in Seattle for each day between 11/1 and 11/30? • 200 * 30 = 6,000 prices returned from Cassandra – median calculated on client. • Idea is customer searches city and date range, then narrows search to smaller area and dates • Prices are volatile, so want close to real-time updates
  • 31. Enter Cassandra : Expectations • Cassandra can handle large amounts of data nicely : billions of price objects • Cassandra is very fast (read and write.) Can handle the volatile prices • Cluster expands easily – our dataset is growing • Easy to setup, administer and use • Operational costs are good • Support is available
  • 32. Solution : Data Model • 1 ColumnFamily : Prices • Row key : date + LOS + AT • Column name : Hotel ID – 140,000 columns (integer comparator) • Column value : precalculated hotel price for date + LOS + AT • 365 * 14 * 4 = 20,440 row keys • 20,440 * 140,000 = 2,861,600,000 price objects
  • 33. Solution : Retrieving Prices • Generate keys for each checkin day, LOS, AT combination wanted • Query Cassandra using the generated keys, using specific column names (hotel IDs) • For family example, one key, one column = 12/1/2011 + 7 + 4 = total price for hilton hotel • For median example, 30 keys, 200 columns per key. Client receives 30 result rows, then calculates median for each row
  • 34. Testing Scenario • Found 19 boxes, 16gb old RAM, 1 old 4 core CPU • 18th and 19th boxes are clients + Cassandra servers (don’t do this in prod) • Can never find enough hardware :) • 2 Keyspaces on cluster • Reduce dataset to 90 days, up to 7 day LOS, up to 4 AT, 70k hotels – removes disk I/O from test and leaves some RAM for caching • We believe our hot data will be in RAM • Query 30 days and 200 hotels : 6000 price objects
  • 35. Results : Page 1 Default Memory and Column Index Settings • ~50ms : -Xmn400m, 64k index, 8gb, no row cache, no key cache : pretty good • ~800ms : -Xmn400m, 64k index, 8gb, no key cache, 600 row cache (Serializing) – copying to heap accounts for slowness
  • 36. Results : Page 2 Change Index to 1k Column Pages: • ~45ms : -Xmn400m, 1k index, 8gb, no row cache, no key cache • ~29ms : -Xmn400m, 1k index, 8gb, 600 row cache (ConcurrentLinkedHash) 1k index saves a little, but data is all in RAM. Bigger savings when hitting disk
  • 37. Results : Page 3 Tune Memory • ~45ms : -Xmn200m, 1k index, 8gb, no row cache, no key cache • ~29ms : -Xmn200m, 1k index, 8gb, 600 row cache (ConcurrentLinkedHash) Increasing Old Gen will not help because all data fits in RAM, based on reported JVM usage. Reducing New Gen moves from less frequent long pauses to more frequent short pauses. No help.
  • 38. Take Away • Test is worst case scenario. Completely random usage pattern – which is rarely (if ever) the case in production. Causes cache churn if cache is too small • Wide rows are not always bad. Access columns sequentially or by range is very good (e.g. time series data) • Serializing cache has trade off between serialized objects and copying to/from off heap storage
  • 39. References • Query plan description by Aaron Morton : http://thelastpickle.com/2011/07/04/Cassand ra-Query-Plans/ • Disk sizing by B. Todd Burruss (me): http://btoddb-cass-storage.blogspot.com/
  • 40. http://expediajobs.com/ • Cassandra • Hadoop • Machine Learning • Grid Computing • Java