SlideShare uma empresa Scribd logo
1 de 108
Baixar para ler offline
ODC
Beyond The Data Grid: Coherence, Normalisation, Joins
and Linear Scalability

Ben Stopford : RBS
The Story…
                           Industry and academia
                                                         We introduce ODC a
                           have responded with a
The internet era has                                     NoSQL store with a
                           variety of solutions that
moved us away from                                       unique mechanism
                           leverage distribution, use
traditional database                                     for efficiently
                           of a simpler contract and
architecture, now a                                      managing normalised
                           RAM storage.
quarter of a century                                     data.
old.

                                                         We show how we
The result is a highly        Finally we introduce the   adapt the concept of
scalable, in-memory           ‘Connected Replication’    a Snowflake Schema
data store that can           pattern as mechanism       to aid the application
support both millisecond      for making the star        of replication and
queries and high              schema practical for in    partitioning and avoid
bandwidth exports over        memory architectures.      problems with
a normalised object                                      distributed joins.
model.
Database Architecture is Old

Most modern databases still follow a
1970s architecture (for example IBM’s
System R)
“Because RDBMSs can be beaten by more than
     an order of magnitude on the standard OLTP
     benchmark, then there is no market where they
     are competitive. As such, they should be
     considered as legacy technology more than a
     quarter of a century in age, for which a complete
     redesign and re-architecting is the appropriate
     next step.”


       Michael Stonebraker (Creator of Ingres and Postgres)
What steps have we taken
      to improve the
   performance of this
  original architecture?
Improving Database Performance (1)"
Shared Disk Architecture




                             Shared
                              Disk
Improving Database Performance (2)"
Shared Nothing Architecture




                     Shared Nothing
Improving Database Performance (3)"
In Memory Databases
Improving Database Performance (4)"
Distributed In Memory (Shared Nothing)
Improving Database Performance (5)   "
  Distributed Caching

                Distributed Cache
These approaches are converging
                   Shared 
                   Nothing
              Shared 
                                         Nothing
 Regular          Teradata, Vertica,
                      NoSQL…
           (memory)
Database
                               VoltDB, Hstore

Oracle, Sybase,
    MySql
                                          ODC
                  Distributed
                   Caching
                    Coherence,
                     Gemfire,
                    Gigaspaces
So how can we make a data store go
           even faster?

          Distributed Architecture


           Drop ACID: Simplify
              the Contract.


                 Drop disk
(1) Distribution for Scalability: The
Shared Nothing Architecture
•  Originated in 1990 (Gamma
   DB) but popularised by
   Teradata / BigTable /
   NoSQL
•  Massive storage potential
•  Massive scalability of
   processing
•  Commodity hardware
•  Limited by cross partition   Autonomous processing unit
   joins
                                     for a data subset
(2) Simplifying the Contract


•  For many users ACID is overkill.
•  Implementing ACID in a distributed architecture has a
   significant affect on performance.
•  NoSQL Movement: CouchDB, MongoDB,10gen,
   Basho, CouchOne, Cloudant, Cloudera, GoGrid,
   InfiniteGraph, Membase, Riptano, Scality….
Databases have huge operational
          overheads




                              Taken from “OLTP Through
                              the Looking Glass, and What
                              We Found There”
                              Harizopoulos et al
(3) Memory is 100x faster than disk
               ms
   μs
       ns
           ps

1MB Disk/Network
        1MB Main Memory


          0.000,000,000,000
Cross Continental    Main Memory
                  L1 Cache Ref
Round Trip
          Ref
         Cross Network             L2 Cache Ref
         Round Trip
       * L1 ref is about 2 clock cycles or 0.7ns. This is
                           the time it takes light to travel 20cm
Avoid all that overhead

RAM means:
•  No IO
•  Single Threaded
  ⇒ No locking / latching
•  Rapid aggregation etc
•  Query plans become less
   important
We were keen to leverage these three
    factors in building the ODC



                Simplify the   Memory
Distribution
                 contract
      Only
What is the ODC?
                        
Highly distributed, in memory, normalised data
 store designed for scalable data access and
                  processing.
The Concept
Originating from Scott Marcar’s concept of a central
brain within the bank: 

“The copying of data lies at the
route of many of the bank’s
problems. By supplying a single
real-time view that all systems can
interface with we remove the need
for reconciliation and promote the
concept of truly shared services” -
Scott Marcar (Head of Risk and
Finance Technology)
This is quite tricky problem


            High Bandwidth
            Access to Lots
                of Data



                      Low Latency
                        Access to
   Scalability to
                          small
   lots of users
                       amounts of
                          data
ODC Data Grid: Highly Distributed
     Physical Architecture
    In-memory           Lots of parallel
    storage
            processing
         Oracle 
                                           Coherence




Messaging (Topic Based) as a system of record
                (persistence)
The Layers"
  Access Layer      Java
                                    "
                                  Java


                                  
                    client
       client

                     API
          API
 Query Layer




                                            Transactions
  Data Layer




                                                  Mtms

                                               Cashflows
Persistence Layer
But unlike most caches the ODC is
            Normalised
Three Tools of Distributed Data
        Architecture
                  Indexing




  Partitioning
               Replication
For speed, replication is best
         Wherever you go the
         data will be there




          But your storage is limited by
          the memory on a node
For scalability, partitioning is best
Keys Aa-Ap
   Keys Fs-Fz
     Keys Xa-Yd




              Scalable storage, bandwidth
              and processing
Traditional Distributed Caching
             Approach
Keys Aa-Ap
             Keys Fs-Fz
           Keys Xa-Yd




              Trader
                                      Big Denormliased
                         Party
       Objects are spread
                    Trade
            across a distributed
                                      cache
But we believe a data
store needs to be more
than this: it needs to be
      normalised!
So why is that?
Surely denormalisation
 is going to be faster?
Denormalisation means replicating
   parts of your object model
…and that means managing
consistency over lots of copies
… as parts of the object graph will be
copied multiple times
                   Periphery objects that are
Trader
          Party
   denormalised onto core objects
      Trade
       will be duplicated multiple times
                   across the data grid.

                             Party A
…and all the duplication means you
  run out of space really quickly
Spaces issues are exaggerated
        further when data is versioned 
  Trader
            Party
         Version 1
        Trade
                 Trader
                           Party
         Version 2
                       Trade
                                Trader
                                          Party
         Version 3
                                      Trade
                                               Trader
                                                         Party
   Version 4
                                                     Trade
…and you need
versioning to do MVCC
And reconstituting a previous time
  slice becomes very difficult.

   Trade
            Trader
            Party


            Party
   Trader
   Trade


            Party
                     Trader

   Trade
            Party
Why Normalisation?

      Easy to change data (no
  distributed locks / transactions)


      Better use of memory.


       Facilitates Versioning


     And MVCC/Bi-temporal
OK, OK, lets normalise
 our data then. What
  does that mean?
We decompose our
domain model and
 hold each object
    separately
This means the object graph will be
  split across multiple machines.

                       Trader
                                 Party

                             Trade




  Trade
     Trader
                      Party
Binding them back together involves a
“distributed join” => Lots of network hops

                          Trader
                                    Party

                                Trade




   Trade
       Trader
                      Party
It’s going to be slow…
Whereas the denormalised model the
       join is already done
Hence Denormalisation is FAST!"
          (for reads)
So what we want is the advantages of a
normalised store at the speed of a
denormalised one! "
"

                     "
     This is what the ODC is all about!
Looking more closely: Why does
 normalisation mean we have to be
spread data around the cluster. Why
    can’t we hold it all together?
It’s all about the keys
We can collocate data with common keys but if they
 crosscut the only way to collocate is to replicate
     Crosscutting
        Keys




      Common
       Keys
We tackle this problem with a hybrid
               model:

                                      Denormalised
        Trader
                           Party


                  Trade
                                    Normalised
We adapt the concept of a Snowflake
             Schema.
Taking the concept of Facts and
          Dimensions
Everything starts from a Core Fact
          (Trades for us)
Facts are   Big, dimensions are   small
Facts have one key
Dimensions have many"
  (crosscutting) keys
Looking at the data:
         Valuation Legs

             Valuations

art Transaction Mapping

     Cashflow Mapping                                                    Facts:
             Party Alias

            Transaction
                                                                         =>Big, 
             Cashflows                                                   common
                   Legs

                 Parties
                                                                         keys
           Ledger Book

           Source Book
                                                                         Dimensions
            Cost Centre

                Product                                                  =>Small,
  Risk Organisation Unit

          Business Unit
                                                                         crosscutting 
             HCS Entity                                                  Keys
           Set of Books

                           0     37,500,000   75,000,000   112,500,000          150,000,000
We remember we are a grid. We
should avoid the distributed join.
… so we only want to ‘join’ data that
      is in the same process
                           Coherence’s
                          KeyAssociation
                           gives us this
   Trades
      MTMs



             Common
               Key
So we prescribe different physical
storage for Facts and Dimensions

                                           Replicated
       Trader
                          Party


                 Trade
                                     Partitioned
                                   (Key association ensures
                                     joins are in process)
Facts are held distributed,
                               Dimensions are replicated
         Valuation Legs

             Valuations

art Transaction Mapping

     Cashflow Mapping                                                       Facts:
             Party Alias

            Transaction
                                                                            =>Big
             Cashflows                                                      =>Distribute
                   Legs

                 Parties

           Ledger Book

           Source Book
                                                                            Dimensions
            Cost Centre

                Product                                                     =>Small 
  Risk Organisation Unit

          Business Unit
                                                                            => Replicate
             HCS Entity

           Set of Books

                           0        37,500,000   75,000,000   112,500,000         150,000,000
- Facts are partitioned across the data layer"
- Dimensions are replicated across the Query Layer




                                                         Query Layer
     Trader
                Party

           Trade




                                          Transactions




                                                          Data Layer
                                               Mtms

                                            Cashflows



                                       Fact Storage
                                       (Partitioned)
Key Point


   We use a variant on a
   Snowflake Schema to
 partition big stuff, that has
the same key and replicate
     small stuff that has
     crosscutting keys.
So how does they help us to run
    queries without distributed joins?
           Select Transaction, MTM, ReferenceData From
           MTM, Transaction, Ref Where Cost Centre = ‘CC1’



This query involves:
•  Joins between Dimensions: to evaluate where clause
•  Joins between Facts: Transaction joins to MTM
•  Joins between all facts and dimensions needed to
   construct return result
Stage 1: Focus on the where clause:"
        Where Cost Centre = ‘CC1’
Stage 1: Get the right keys to query
             the Facts
      Select Transaction, MTM, ReferenceData From
      MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                            LBs[]=getLedgerBooksFor(CC1)
                                            SBs[]=getSourceBooksFor(LBs[])
                                            So we have all the bottom level
                                            dimensions needed to query facts



                                         Transactions


                                               Mtms


                                           Cashflows



                                      Partitioned
Stage 2: Cluster Join to get Facts
     Select Transaction, MTM, ReferenceData From
     MTM, Transaction, Ref Where Cost Centre = ‘CC1’


                                           LBs[]=getLedgerBooksFor(CC1)
                                           SBs[]=getSourceBooksFor(LBs[])
                                           So we have all the bottom level
                                           dimensions needed to query facts



                                        Transactions

                              Get all Transactions and
                                              Mtms
                              MTMs (cluster side join) for
                              the passed Source Books
                                           Cashflows



                                     Partitioned
Stage 2: Join the facts together efficiently
    as we know they are collocated
Stage 3: Augment raw Facts with
              relevant Dimensions
                      Select Transaction, MTM, ReferenceData From
                      MTM, Transaction, Ref Where Cost Centre = ‘CC1’


Populate raw facts                                          LBs[]=getLedgerBooksFor(CC1)
(Transactions) with                                         SBs[]=getSourceBooksFor(LBs[])
dimension data
                                                            So we have all the bottom level
before returning to
                                                            dimensions needed to query facts
client.


                                                         Transactions

                                               Get all Transactions and
                                                               Mtms
                                               MTMs (cluster side join) for
                                               the passed Source Books
                                                            Cashflows



                                                      Partitioned
Stage 3: Bind relevant dimensions to
             the result
Bringing it together:

Replicated                  Partitioned
                  Java
                  client


Dimensions
                   Facts
                   API




We never have to do a distributed join!
Coherence Voodoo: Joining
Distributed Facts across the Cluster
                             Related Trades and MTMs
                             (Facts) are collocated on the
                             same machine with Key Affinity. 

   Trades
     MTMs



          Aggregator
                   



 Direct backing map access
 must be used due to threading            http://www.benstopford.com/
 issues in Coherence
                  2009/11/20/how-to-perform-efficient-
                                         cross-cache-joins-in-coherence/
So we are
  normalised


And we can join
 without extra
 network hops
We get to do this…


                        Trader
                                  Party

                              Trade




Trade
        Trader
                      Party
…and this…


Trader
          Party
         Version 1
      Trade
               Trader
                         Party
         Version 2
                     Trade
                              Trader
                                        Party
         Version 3
                                    Trade
                                             Trader
                                                       Party
   Version 4
                                                   Trade
..and this..


Trade
            Trader
         Party


         Party
   Trader
Trade


         Party
                  Trader

Trade
         Party
…without the problems of this…
…or this..
..all at the speed of this… well almost!
But there is a fly in the ointment…
I lied earlier. These aren’t all Facts.
        Valuation Legs

            Valuations

rt Transaction Mapping

    Cashflow Mapping

            Party Alias
                                                       Facts
           Transaction

            Cashflows

                  Legs

                Parties       This is a dimension
          Ledger Book
                               •  It has a different
          Source Book

           Cost Centre            key to the Facts.
   Dimensions
               Product
                               •  And it’s BIG
 Risk Organisation Unit

         Business Unit

            HCS Entity

          Set of Books

                          0                                     125,000,000
We can’t replicate really big stuff… we’ll
run out of space"
 => Big Dimensions are a problem.
Fortunately we found a simple
solution!
We noticed that whilst there are lots
of these big dimensions, we didn’t
actually use a lot of them. They are
not all “connected”.
If there are no Trades for Barclays in
the data store then a Trade Query will
never need the Barclays Counterparty
Looking at the All Dimension Data
                   some are quite large
           Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         0   1,250,000   2,500,000   3,750,000   5,000,000
But Connected Dimension Data is tiny
               by comparison
           Party Alias



               Parties



         Ledger Book



         Source Book



          Cost Centre



              Product



Risk Organisation Unit



        Business Unit



           HCS Entity



         Set of Books


                         20   1,250,015   2,500,010   3,750,005   5,000,000
So we only replicate
‘Connected’ or ‘Used’
     dimensions
As data is written to the data store we keep our
       ‘Connected Caches’ up to date




                                                          Processing Layer
                              Dimension Caches
                                 (Replicated)


                                           Transactions




                                                            Data Layer
       As new Facts are added                    Mtms
       relevant Dimensions that
       they reference are moved
                                             Cashflows
       to processing layer caches


                                        Fact Storage
                                        (Partitioned)
Coherence Voodoo: ‘Connected
         Replication’
The Replicated Layer is updated by
recursing through the arcs on the
domain model when facts change
Saving a trade causes all it’s 1st level
              references to be triggered

                                      Query Layer
     Save Trade
                      (With connected
                                      dimension Caches)

                                      Data Layer
Cache
              Trade
                  (All Normalised)
Store

                                             Partitioned 
              Trigger
   Source              Cache
     Party                         Ccy
     Alias
               Book
This updates the connected caches


                              Query Layer
                              (With connected
                              dimension Caches)

                              Data Layer
         Trade
               (All Normalised)



Party             Source   Ccy
Alias
             Book
The process recurses through the object
                graph

                                        Query Layer
                                        (With connected
                                        dimension Caches)

                                        Data Layer
         Trade
                         (All Normalised)



Party              Source            Ccy
Alias
              Book
 

          Party
            Ledger
           
                 Book
‘Connected Replication’
    A simple pattern which
recurses through the foreign
 keys in the domain model,
 ensuring only ‘Connected’
  dimensions are replicated
Limitations of this approach

•  Data set size. Size of connected dimensions limits
   scalability.
•  Joins are only supported between “Facts” that can
   share a partitioning key (But any dimension join can be
   supported)
Performance is very sensitive to
serialisation costs: Avoid with POF

   Deserialise just one field
    from the object stream




                 Binary Value
Integer ID
Other cool stuff 
 (very briefly)
Everything is Java
               Java
               client

Java schema
    API
     Java ‘Stored
                          Procedures’
                         and ‘Triggers’
Messaging as a System of Record"
                                     "  ODC provides a realtime
 Access Layer




                            Java      Java
                            client!

                             API!
                                   
    view over any part of the
                                      client!


                                        dataset as messaging is the
                                       API!


                                                               used as the system of
                                                               record.
Processing Layer




                                                Transactions
                                                               Messaging provides a more
 Data Layer




                                                      Mtms     scalable system of record
                                                   Cashflows   than a database would.
Persistence Layer
Being event based changes the
     programming model.
         The system provides both real
        time and query based views on
                   the data. 


           The two are linked using
                 versioning


          Replication to DR, DB, fact
                 aggregation
API – Queries utilise a fluent interface
Performance


         Query with more
         than twenty joins
            conditions:


2GB per min /
  250Mb/s 
           3ms latency
 (per client)
Conclusion


Data warehousing, OLTP and Distributed
caching fields are all converging on in-
memory architectures to get away from
disk induced latencies.
Conclusion

Shared nothing architectures are always
subject to the distributed join problem if
they are to retain a degree of normalisation.
Conclusion

We present a novel mechanism for
avoiding the distributed join problem by
using a Star Schema to define whether
data should be replicated or partitioned.





                        Partitioned
                         Storage
Conclusion

We make the pattern applicable to ‘real’
data models by only replicating objects
that are actually used: the Connected
Replication pattern.
The End
•  Further details online http://www.benstopford.com
   (linked from my Qcon bio)
•  A big thanks to the team in both India and the UK who
   built this thing.
•  Questions?

Mais conteúdo relacionado

Mais procurados

Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentationadvaitdeo
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbaseDipti Borkar
 
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationDeploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationLeons Petražickis
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentationMat Wall
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDBSandun Perera
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseDipti Borkar
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodesaaronmorton
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101Bernd Ocklin
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Dataexponential-inc
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systemshyun soomyung
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 

Mais procurados (20)

Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud PresentationDeploying MediaWiki On IBM DB2 in The Cloud Presentation
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDB
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement Database
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Nosql intro
Nosql introNosql intro
Nosql intro
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Scalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed SystemsScalable Web Architecture and Distributed Systems
Scalable Web Architecture and Distributed Systems
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 

Destaque

Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Ben Stopford
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache KafkaBen Stopford
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Ben Stopford
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideBen Stopford
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Ben Stopford
 
Network latency - measurement and improvement
Network latency - measurement and improvementNetwork latency - measurement and improvement
Network latency - measurement and improvementMatt Willsher
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the EnterpriseBen Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 

Destaque (9)

Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
 
Network latency - measurement and improvement
Network latency - measurement and improvementNetwork latency - measurement and improvement
Network latency - measurement and improvement
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 

Semelhante a Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopfordBen Stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...Ben Stopford
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastInside Analysis
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutionspmanvi
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosqlJoão Gabriel Lima
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudRightScale
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
 

Semelhante a Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability (20)

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutions
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosql
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 

Mais de Ben Stopford

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache KafkaBen Stopford
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven MicroservicesBen Stopford
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessBen Stopford
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationBen Stopford
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBen Stopford
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with StreamsBen Stopford
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Ben Stopford
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBen Stopford
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsBen Stopford
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache KafkaBen Stopford
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Ben Stopford
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Ben Stopford
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyBen Stopford
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojureBen Stopford
 

Mais de Ben Stopford (14)

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability

  • 1. ODC Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability Ben Stopford : RBS
  • 2. The Story… Industry and academia We introduce ODC a have responded with a The internet era has NoSQL store with a variety of solutions that moved us away from unique mechanism leverage distribution, use traditional database for efficiently of a simpler contract and architecture, now a managing normalised RAM storage. quarter of a century data. old. We show how we The result is a highly Finally we introduce the adapt the concept of scalable, in-memory ‘Connected Replication’ a Snowflake Schema data store that can pattern as mechanism to aid the application support both millisecond for making the star of replication and queries and high schema practical for in partitioning and avoid bandwidth exports over memory architectures. problems with a normalised object distributed joins. model.
  • 3. Database Architecture is Old Most modern databases still follow a 1970s architecture (for example IBM’s System R)
  • 4. “Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.” Michael Stonebraker (Creator of Ingres and Postgres)
  • 5. What steps have we taken to improve the performance of this original architecture?
  • 6. Improving Database Performance (1)" Shared Disk Architecture Shared Disk
  • 7. Improving Database Performance (2)" Shared Nothing Architecture Shared Nothing
  • 8. Improving Database Performance (3)" In Memory Databases
  • 9. Improving Database Performance (4)" Distributed In Memory (Shared Nothing)
  • 10. Improving Database Performance (5) " Distributed Caching Distributed Cache
  • 11. These approaches are converging Shared Nothing Shared Nothing Regular Teradata, Vertica, NoSQL… (memory) Database VoltDB, Hstore Oracle, Sybase, MySql ODC Distributed Caching Coherence, Gemfire, Gigaspaces
  • 12. So how can we make a data store go even faster? Distributed Architecture Drop ACID: Simplify the Contract. Drop disk
  • 13. (1) Distribution for Scalability: The Shared Nothing Architecture •  Originated in 1990 (Gamma DB) but popularised by Teradata / BigTable / NoSQL •  Massive storage potential •  Massive scalability of processing •  Commodity hardware •  Limited by cross partition Autonomous processing unit joins for a data subset
  • 14. (2) Simplifying the Contract •  For many users ACID is overkill. •  Implementing ACID in a distributed architecture has a significant affect on performance. •  NoSQL Movement: CouchDB, MongoDB,10gen, Basho, CouchOne, Cloudant, Cloudera, GoGrid, InfiniteGraph, Membase, Riptano, Scality….
  • 15. Databases have huge operational overheads Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
  • 16. (3) Memory is 100x faster than disk ms μs ns ps 1MB Disk/Network 1MB Main Memory 0.000,000,000,000 Cross Continental Main Memory L1 Cache Ref Round Trip Ref Cross Network L2 Cache Ref Round Trip * L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
  • 17. Avoid all that overhead RAM means: •  No IO •  Single Threaded ⇒ No locking / latching •  Rapid aggregation etc •  Query plans become less important
  • 18. We were keen to leverage these three factors in building the ODC Simplify the Memory Distribution contract Only
  • 19. What is the ODC? Highly distributed, in memory, normalised data store designed for scalable data access and processing.
  • 20. The Concept Originating from Scott Marcar’s concept of a central brain within the bank: “The copying of data lies at the route of many of the bank’s problems. By supplying a single real-time view that all systems can interface with we remove the need for reconciliation and promote the concept of truly shared services” - Scott Marcar (Head of Risk and Finance Technology)
  • 21. This is quite tricky problem High Bandwidth Access to Lots of Data Low Latency Access to Scalability to small lots of users amounts of data
  • 22. ODC Data Grid: Highly Distributed Physical Architecture In-memory Lots of parallel storage processing Oracle Coherence Messaging (Topic Based) as a system of record (persistence)
  • 23. The Layers" Access Layer Java " Java client client API API Query Layer Transactions Data Layer Mtms Cashflows Persistence Layer
  • 24. But unlike most caches the ODC is Normalised
  • 25. Three Tools of Distributed Data Architecture Indexing Partitioning Replication
  • 26. For speed, replication is best Wherever you go the data will be there But your storage is limited by the memory on a node
  • 27. For scalability, partitioning is best Keys Aa-Ap Keys Fs-Fz Keys Xa-Yd Scalable storage, bandwidth and processing
  • 28. Traditional Distributed Caching Approach Keys Aa-Ap Keys Fs-Fz Keys Xa-Yd Trader Big Denormliased Party Objects are spread Trade across a distributed cache
  • 29. But we believe a data store needs to be more than this: it needs to be normalised!
  • 30. So why is that? Surely denormalisation is going to be faster?
  • 31. Denormalisation means replicating parts of your object model
  • 32. …and that means managing consistency over lots of copies
  • 33. … as parts of the object graph will be copied multiple times Periphery objects that are Trader Party denormalised onto core objects Trade will be duplicated multiple times across the data grid. Party A
  • 34. …and all the duplication means you run out of space really quickly
  • 35. Spaces issues are exaggerated further when data is versioned Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade …and you need versioning to do MVCC
  • 36. And reconstituting a previous time slice becomes very difficult. Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 37. Why Normalisation? Easy to change data (no distributed locks / transactions) Better use of memory. Facilitates Versioning And MVCC/Bi-temporal
  • 38. OK, OK, lets normalise our data then. What does that mean?
  • 39. We decompose our domain model and hold each object separately
  • 40. This means the object graph will be split across multiple machines. Trader Party Trade Trade Trader Party
  • 41. Binding them back together involves a “distributed join” => Lots of network hops Trader Party Trade Trade Trader Party
  • 42. It’s going to be slow…
  • 43. Whereas the denormalised model the join is already done
  • 44. Hence Denormalisation is FAST!" (for reads)
  • 45. So what we want is the advantages of a normalised store at the speed of a denormalised one! " " " This is what the ODC is all about!
  • 46. Looking more closely: Why does normalisation mean we have to be spread data around the cluster. Why can’t we hold it all together?
  • 47. It’s all about the keys
  • 48. We can collocate data with common keys but if they crosscut the only way to collocate is to replicate Crosscutting Keys Common Keys
  • 49. We tackle this problem with a hybrid model: Denormalised Trader Party Trade Normalised
  • 50. We adapt the concept of a Snowflake Schema.
  • 51. Taking the concept of Facts and Dimensions
  • 52. Everything starts from a Core Fact (Trades for us)
  • 53. Facts are Big, dimensions are small
  • 55. Dimensions have many" (crosscutting) keys
  • 56. Looking at the data: Valuation Legs Valuations art Transaction Mapping Cashflow Mapping Facts: Party Alias Transaction =>Big, Cashflows common Legs Parties keys Ledger Book Source Book Dimensions Cost Centre Product =>Small, Risk Organisation Unit Business Unit crosscutting HCS Entity Keys Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 57. We remember we are a grid. We should avoid the distributed join.
  • 58. … so we only want to ‘join’ data that is in the same process Coherence’s KeyAssociation gives us this Trades MTMs Common Key
  • 59. So we prescribe different physical storage for Facts and Dimensions Replicated Trader Party Trade Partitioned (Key association ensures joins are in process)
  • 60. Facts are held distributed, Dimensions are replicated Valuation Legs Valuations art Transaction Mapping Cashflow Mapping Facts: Party Alias Transaction =>Big Cashflows =>Distribute Legs Parties Ledger Book Source Book Dimensions Cost Centre Product =>Small Risk Organisation Unit Business Unit => Replicate HCS Entity Set of Books 0 37,500,000 75,000,000 112,500,000 150,000,000
  • 61. - Facts are partitioned across the data layer" - Dimensions are replicated across the Query Layer Query Layer Trader Party Trade Transactions Data Layer Mtms Cashflows Fact Storage (Partitioned)
  • 62. Key Point We use a variant on a Snowflake Schema to partition big stuff, that has the same key and replicate small stuff that has crosscutting keys.
  • 63. So how does they help us to run queries without distributed joins? Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ This query involves: •  Joins between Dimensions: to evaluate where clause •  Joins between Facts: Transaction joins to MTM •  Joins between all facts and dimensions needed to construct return result
  • 64. Stage 1: Focus on the where clause:" Where Cost Centre = ‘CC1’
  • 65. Stage 1: Get the right keys to query the Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Mtms Cashflows Partitioned
  • 66. Stage 2: Cluster Join to get Facts Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ LBs[]=getLedgerBooksFor(CC1) SBs[]=getSourceBooksFor(LBs[]) So we have all the bottom level dimensions needed to query facts Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 67. Stage 2: Join the facts together efficiently as we know they are collocated
  • 68. Stage 3: Augment raw Facts with relevant Dimensions Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’ Populate raw facts LBs[]=getLedgerBooksFor(CC1) (Transactions) with SBs[]=getSourceBooksFor(LBs[]) dimension data So we have all the bottom level before returning to dimensions needed to query facts client. Transactions Get all Transactions and Mtms MTMs (cluster side join) for the passed Source Books Cashflows Partitioned
  • 69. Stage 3: Bind relevant dimensions to the result
  • 70. Bringing it together: Replicated Partitioned Java client Dimensions Facts API We never have to do a distributed join!
  • 71. Coherence Voodoo: Joining Distributed Facts across the Cluster Related Trades and MTMs (Facts) are collocated on the same machine with Key Affinity. Trades MTMs Aggregator Direct backing map access must be used due to threading http://www.benstopford.com/ issues in Coherence 2009/11/20/how-to-perform-efficient- cross-cache-joins-in-coherence/
  • 72. So we are normalised And we can join without extra network hops
  • 73. We get to do this… Trader Party Trade Trade Trader Party
  • 74. …and this… Trader Party Version 1 Trade Trader Party Version 2 Trade Trader Party Version 3 Trade Trader Party Version 4 Trade
  • 75. ..and this.. Trade Trader Party Party Trader Trade Party Trader Trade Party
  • 78. ..all at the speed of this… well almost!
  • 79.
  • 80. But there is a fly in the ointment…
  • 81. I lied earlier. These aren’t all Facts. Valuation Legs Valuations rt Transaction Mapping Cashflow Mapping Party Alias Facts Transaction Cashflows Legs Parties This is a dimension Ledger Book •  It has a different Source Book Cost Centre key to the Facts. Dimensions Product •  And it’s BIG Risk Organisation Unit Business Unit HCS Entity Set of Books 0 125,000,000
  • 82. We can’t replicate really big stuff… we’ll run out of space" => Big Dimensions are a problem.
  • 83. Fortunately we found a simple solution!
  • 84. We noticed that whilst there are lots of these big dimensions, we didn’t actually use a lot of them. They are not all “connected”.
  • 85. If there are no Trades for Barclays in the data store then a Trade Query will never need the Barclays Counterparty
  • 86. Looking at the All Dimension Data some are quite large Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 0 1,250,000 2,500,000 3,750,000 5,000,000
  • 87. But Connected Dimension Data is tiny by comparison Party Alias Parties Ledger Book Source Book Cost Centre Product Risk Organisation Unit Business Unit HCS Entity Set of Books 20 1,250,015 2,500,010 3,750,005 5,000,000
  • 88. So we only replicate ‘Connected’ or ‘Used’ dimensions
  • 89. As data is written to the data store we keep our ‘Connected Caches’ up to date Processing Layer Dimension Caches (Replicated) Transactions Data Layer As new Facts are added Mtms relevant Dimensions that they reference are moved Cashflows to processing layer caches Fact Storage (Partitioned)
  • 91. The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
  • 92. Saving a trade causes all it’s 1st level references to be triggered Query Layer Save Trade (With connected dimension Caches) Data Layer Cache Trade (All Normalised) Store Partitioned Trigger Source Cache Party Ccy Alias Book
  • 93. This updates the connected caches Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book
  • 94. The process recurses through the object graph Query Layer (With connected dimension Caches) Data Layer Trade (All Normalised) Party Source Ccy Alias Book Party Ledger Book
  • 95. ‘Connected Replication’ A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated
  • 96. Limitations of this approach •  Data set size. Size of connected dimensions limits scalability. •  Joins are only supported between “Facts” that can share a partitioning key (But any dimension join can be supported)
  • 97. Performance is very sensitive to serialisation costs: Avoid with POF Deserialise just one field from the object stream Binary Value Integer ID
  • 98. Other cool stuff (very briefly)
  • 99. Everything is Java Java client Java schema API Java ‘Stored Procedures’ and ‘Triggers’
  • 100. Messaging as a System of Record" " ODC provides a realtime Access Layer Java Java client! API! view over any part of the client! dataset as messaging is the API! used as the system of record. Processing Layer Transactions Messaging provides a more Data Layer Mtms scalable system of record Cashflows than a database would. Persistence Layer
  • 101. Being event based changes the programming model. The system provides both real time and query based views on the data. The two are linked using versioning Replication to DR, DB, fact aggregation
  • 102. API – Queries utilise a fluent interface
  • 103. Performance Query with more than twenty joins conditions: 2GB per min / 250Mb/s 3ms latency (per client)
  • 104. Conclusion Data warehousing, OLTP and Distributed caching fields are all converging on in- memory architectures to get away from disk induced latencies.
  • 105. Conclusion Shared nothing architectures are always subject to the distributed join problem if they are to retain a degree of normalisation.
  • 106. Conclusion We present a novel mechanism for avoiding the distributed join problem by using a Star Schema to define whether data should be replicated or partitioned. Partitioned Storage
  • 107. Conclusion We make the pattern applicable to ‘real’ data models by only replicating objects that are actually used: the Connected Replication pattern.
  • 108. The End •  Further details online http://www.benstopford.com (linked from my Qcon bio) •  A big thanks to the team in both India and the UK who built this thing. •  Questions?

Notas do Editor

  1. This avoids distributed transactions that would be needed when updating dimensions in a denormalised model. It also reduces memory consumption which can become prohibitive in MVCC systems. This then facilitates a storage topology that eradicates cross-machine joins (joins between data sets held on different machines which require that entire key sets are shipped to perform the join operation).
  2. Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
  3. Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)