SlideShare a Scribd company logo
1 of 34
Download to read offline
basho
Core Concepts
Introduction to Riak
AKQA
24th July 2013
Friday, 26 July 13
WHO AM I?
Joel Jacobson
Technical Evangelist
BashoTechnologies
@joeljacobson
Friday, 26 July 13
Distributed computing is
HARD.
Friday, 26 July 13
PROBLEMS?
• Concurrency and latency at scale
• Data consistency
• Uptime/failover
• MultiTenancy
• SLA’s
Friday, 26 July 13
WHAT IS RIAK?
• Key-Value store + extras
• Distributed and horizontally scalable
• Fault-tolerant
• Highly available
• Built for the web
Friday, 26 July 13
INSPIRED BY AMAZON DYNAMO
• White paper released to describe a database system to be
used for their shopping cart
• Masterless, peer-coordinated replication
• Dynamo inspired data-stores; Riak, Cassandra, Voldemort
etc.
• Consistent hashing - no sharding :-)
• Eventually consistent
Friday, 26 July 13
RIAK KEY-VALUE STORE
• Simple operations - GET, PUT, DELETE
• Value is opaque, with metadata
• Extras, e.g.
• Secondary Indexes (2i)
• MapReduce
• Full text search
Friday, 26 July 13
HORIZONTALLY SCALABLE
• Near linear scalability
• Query load and data are spread evenly
• Add more nodes and get more:
• ops/second
• storage capacity
• compute power (for Map/Reduce)
Friday, 26 July 13
FAULTTOLERANT
• All nodes participate equally - no single point of failure (SPOF)
• All data is replicated
• Clusters self heal - Handoff, Active Anti-Entropy
• Cluster transparently survives...
• node failure
• network partitions
• Built on Erlang/OTP (designed for FT)
Friday, 26 July 13
HIGHLY AVAILABLE
• Any node can serve client requests
• Fallbacks are used when nodes are down
• Always accepts read and write requests
• Per-request quorums
Friday, 26 July 13
QUORUMS - N/R/W
• Tunable down to bucket level
• n_val = 3 by default
• w / r = 2 by default
• w = 1 - Quicker response time, read could be inconsistent in
short term
• w = all - Slower response, increased data consistency
Friday, 26 July 13
CAPTHEOREM
• C = Consistency
• A = Availability
• P = PartitionTolerance
• Cap theorem states that a
distributed shared data
system can at most support
2 out of these 3 properties
DB DB DB
Client Client
Network/Data Partition
Friday, 26 July 13
THE RING
Friday, 26 July 13
REPLICATION
• Replicated to 3 nodes by default (n_val =3, which is
configurable)
Friday, 26 July 13
DISASTER SCENARIO
• Node fails
• Request goes to fallback
• Node comes back
• Handoff - data retuned to
recovered node
• Normal operations resume
automatically
Friday, 26 July 13
DISASTER SCENARIO
• Node fails
• Request goes to fallback
• Node comes back
• Handoff - data retuned to
recovered node
• Normal operations resume
automatically hash(“user_id”)
Friday, 26 July 13
ACTIVE ANTI-ENTROPY
• Automatically repair inconsistencies in data
• Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to
compare data in partitions and periodically ensure consistency
• Active Anti-Entropy runs as a background process
• Can also be configured as a manual process
Friday, 26 July 13
CONFLICT RESOLUTION
• Network partitions and concurrent actors modifying the
same data cause data divergence
• Riak provides two solutions to manage this that can be set
on bucket level:
• Last Write Wins - an approach used for some use cases
• Vector Clocks - Retain “sibling” copies of data for merging
Friday, 26 July 13
VECTOR CLOCKS
• Every node has an ID
• Send last-seen vector clock in every “put” request
• Can be viewed as ‘commit history’ e.g Git
• Lets you decide conflicts
Friday, 26 July 13
SIBLING CREATION
0
32
1
Object
v1
Object
v1
[{a,3}]
[{a,2},{b,1}]
1) 2)
[{a,3}]
[{a,2},{b,1}]
0
32
1
Object
v1
Object v1
Object v1
• Siblings can be created by:
• Simultaneous writes (based on same object version)
• Network partitions
• Writes to existing key without submitting vector clock
Friday, 26 July 13
STORAGE BACKENDS
• Bitcask
• LevelDB
• Memory
• Multi
Friday, 26 July 13
BITCASK
• A fast, append-only key-value store
• In memory key lookup table (key_dir) data on disk
• Closed files are immutable
• Merging cleans up old data
• Developed by BashoTechnologies
• Suitable for bounded data, e.g. reference data
Friday, 26 July 13
LEVELDB
• Key-Value storage developed by Google
• Append-only for very large data sets
• Multiple levels of SSTable-like data structures
• Allows for more advanced querying (2i)
• It includes compression (Snappy algorithm)
• Suitable for unbounded data or advanced querying
Friday, 26 July 13
MEMORY
• Data is never persisted to disk
• Typically used for “test” databases
(unit tests... etc)
• Definable memory limits per vnode
• Configurable object expiry
• Useful for highly transient data
Friday, 26 July 13
MULTI
• Configure multiple storage engines for different types of data
• Configure the “default” storage engine
• Choose storage engine on per bucket basis
• No reason not to use it
Friday, 26 July 13
CLIENT APIS
• Riak supports two main client types:
• REST based HTTP Interface
• Easy to use from command line and simple scripts
• Useful if using intermediate caching layer, e.g.Varnish
• Protocol Buffers
• Optimized binary encoding standard developed by Google
• More performant than HTTP interface
Friday, 26 July 13
CLIENT LIBRARIES
• Client libraries supported by Basho:
• Community supported languages and frameworks:
• C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy,
Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket,
Ruby, Scala, Smalltalk
Friday, 26 July 13
• Using Riak as datastore for all back-end systems supporting
Angry Birds
• Game-state storage, ID/Login, Payments, Push notifications,
analytics, advertisements
• 9 clusters in use with over 100 nodes
• 263 million active monthly users
Friday, 26 July 13
• Spine2 project - storing patient data (80 million+)
• 500 complex messages per second
• 20,000 integrated end points
• 0 data loss
• 99.9% availability SLA
Friday, 26 July 13
• Push to talk application
• Billions of requests daily
• > 50 dedicated servers
• Everything stored in Riak
• https://github.com/mranney/node_riak
Friday, 26 July 13
MULTI DATACENTER
REPLICATION (MDC)
• Allows data to be replicated between clusters in different data
centers. Can handle larger latencies.
• Two synchronization modes that can be used together: real-
time and full sync
• Set up as uni-directional or bi-directional replication
• Can be used for global load-balancing, business continuity and
back-ups
Friday, 26 July 13
RIAK-CS
• Built on top of Riak and supports MDC
• S3 compatible object storage
• Supports multi-tenancy
• Per-tenant usage data and statistics on network I/O
• Supports Objects of Arbitrary ContentType Up to 5TB
• Often used to build private cloud storage
Friday, 26 July 13
PLAY AROUND WITH RIAK?
• https://github.com/joeljacobson/riak-dev-cluster
• https://github.com/joeljacobson/vagrant-riak-cluster
Friday, 26 July 13
THANKYOU
joel@basho.com
basho
Friday, 26 July 13

More Related Content

What's hot

Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesLeandro Totino Pereira
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatScyllaDB
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleItai Yaffe
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)Julia Angell
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesScyllaDB
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud SpannerSimon Su
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaDataStax Academy
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
 

What's hot (20)

Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter what
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scale
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 

Viewers also liked

Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreAndy Gross
 
Erlang latest version & opensource projects
Erlang latest version & opensource projectsErlang latest version & opensource projects
Erlang latest version & opensource projectsDigikrit
 
Vagrant for developer setup
Vagrant for developer setupVagrant for developer setup
Vagrant for developer setupakqaanoraks
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Rusty Klophaus
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta DataDigikrit
 
James Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 PatternsJames Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 Patternsakqaanoraks
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to LargeRusty Klophaus
 
Riak Training Session — Surge 2011
Riak Training Session — Surge 2011Riak Training Session — Surge 2011
Riak Training Session — Surge 2011DstroyAllModels
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRusty Klophaus
 

Viewers also liked (9)

Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Erlang latest version & opensource projects
Erlang latest version & opensource projectsErlang latest version & opensource projects
Erlang latest version & opensource projects
 
Vagrant for developer setup
Vagrant for developer setupVagrant for developer setup
Vagrant for developer setup
 
Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010Masterless Distributed Computing with Riak Core - EUC 2010
Masterless Distributed Computing with Riak Core - EUC 2010
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
James Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 PatternsJames Turner (Caplin) - Enterprise HTML5 Patterns
James Turner (Caplin) - Enterprise HTML5 Patterns
 
Riak - From Small to Large
Riak - From Small to LargeRiak - From Small to Large
Riak - From Small to Large
 
Riak Training Session — Surge 2011
Riak Training Session — Surge 2011Riak Training Session — Surge 2011
Riak Training Session — Surge 2011
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 

Similar to Introduction to Riak - Joel Jacobson

Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeIke Ellis
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?Ivan Zoratti
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQLIvan Zoratti
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonHentsū
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overviewNicolaas Matthijs
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonIvan Zoratti
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
201311 - Middleware
201311 - Middleware201311 - Middleware
201311 - Middlewarelyonjug
 
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CSMaking Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CSJohn Burwell
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Ryan Tabora
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSJohn Burwell
 

Similar to Introduction to Riak - Joel Jacobson (20)

Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overview
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
201311 - Middleware
201311 - Middleware201311 - Middleware
201311 - Middleware
 
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CSMaking Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
Making Cloudy Peanut Butter Cups: Apache CloudStack + Riak CS
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CS
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Introduction to Riak - Joel Jacobson

  • 1. basho Core Concepts Introduction to Riak AKQA 24th July 2013 Friday, 26 July 13
  • 2. WHO AM I? Joel Jacobson Technical Evangelist BashoTechnologies @joeljacobson Friday, 26 July 13
  • 4. PROBLEMS? • Concurrency and latency at scale • Data consistency • Uptime/failover • MultiTenancy • SLA’s Friday, 26 July 13
  • 5. WHAT IS RIAK? • Key-Value store + extras • Distributed and horizontally scalable • Fault-tolerant • Highly available • Built for the web Friday, 26 July 13
  • 6. INSPIRED BY AMAZON DYNAMO • White paper released to describe a database system to be used for their shopping cart • Masterless, peer-coordinated replication • Dynamo inspired data-stores; Riak, Cassandra, Voldemort etc. • Consistent hashing - no sharding :-) • Eventually consistent Friday, 26 July 13
  • 7. RIAK KEY-VALUE STORE • Simple operations - GET, PUT, DELETE • Value is opaque, with metadata • Extras, e.g. • Secondary Indexes (2i) • MapReduce • Full text search Friday, 26 July 13
  • 8. HORIZONTALLY SCALABLE • Near linear scalability • Query load and data are spread evenly • Add more nodes and get more: • ops/second • storage capacity • compute power (for Map/Reduce) Friday, 26 July 13
  • 9. FAULTTOLERANT • All nodes participate equally - no single point of failure (SPOF) • All data is replicated • Clusters self heal - Handoff, Active Anti-Entropy • Cluster transparently survives... • node failure • network partitions • Built on Erlang/OTP (designed for FT) Friday, 26 July 13
  • 10. HIGHLY AVAILABLE • Any node can serve client requests • Fallbacks are used when nodes are down • Always accepts read and write requests • Per-request quorums Friday, 26 July 13
  • 11. QUORUMS - N/R/W • Tunable down to bucket level • n_val = 3 by default • w / r = 2 by default • w = 1 - Quicker response time, read could be inconsistent in short term • w = all - Slower response, increased data consistency Friday, 26 July 13
  • 12. CAPTHEOREM • C = Consistency • A = Availability • P = PartitionTolerance • Cap theorem states that a distributed shared data system can at most support 2 out of these 3 properties DB DB DB Client Client Network/Data Partition Friday, 26 July 13
  • 14. REPLICATION • Replicated to 3 nodes by default (n_val =3, which is configurable) Friday, 26 July 13
  • 15. DISASTER SCENARIO • Node fails • Request goes to fallback • Node comes back • Handoff - data retuned to recovered node • Normal operations resume automatically Friday, 26 July 13
  • 16. DISASTER SCENARIO • Node fails • Request goes to fallback • Node comes back • Handoff - data retuned to recovered node • Normal operations resume automatically hash(“user_id”) Friday, 26 July 13
  • 17. ACTIVE ANTI-ENTROPY • Automatically repair inconsistencies in data • Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to compare data in partitions and periodically ensure consistency • Active Anti-Entropy runs as a background process • Can also be configured as a manual process Friday, 26 July 13
  • 18. CONFLICT RESOLUTION • Network partitions and concurrent actors modifying the same data cause data divergence • Riak provides two solutions to manage this that can be set on bucket level: • Last Write Wins - an approach used for some use cases • Vector Clocks - Retain “sibling” copies of data for merging Friday, 26 July 13
  • 19. VECTOR CLOCKS • Every node has an ID • Send last-seen vector clock in every “put” request • Can be viewed as ‘commit history’ e.g Git • Lets you decide conflicts Friday, 26 July 13
  • 20. SIBLING CREATION 0 32 1 Object v1 Object v1 [{a,3}] [{a,2},{b,1}] 1) 2) [{a,3}] [{a,2},{b,1}] 0 32 1 Object v1 Object v1 Object v1 • Siblings can be created by: • Simultaneous writes (based on same object version) • Network partitions • Writes to existing key without submitting vector clock Friday, 26 July 13
  • 21. STORAGE BACKENDS • Bitcask • LevelDB • Memory • Multi Friday, 26 July 13
  • 22. BITCASK • A fast, append-only key-value store • In memory key lookup table (key_dir) data on disk • Closed files are immutable • Merging cleans up old data • Developed by BashoTechnologies • Suitable for bounded data, e.g. reference data Friday, 26 July 13
  • 23. LEVELDB • Key-Value storage developed by Google • Append-only for very large data sets • Multiple levels of SSTable-like data structures • Allows for more advanced querying (2i) • It includes compression (Snappy algorithm) • Suitable for unbounded data or advanced querying Friday, 26 July 13
  • 24. MEMORY • Data is never persisted to disk • Typically used for “test” databases (unit tests... etc) • Definable memory limits per vnode • Configurable object expiry • Useful for highly transient data Friday, 26 July 13
  • 25. MULTI • Configure multiple storage engines for different types of data • Configure the “default” storage engine • Choose storage engine on per bucket basis • No reason not to use it Friday, 26 July 13
  • 26. CLIENT APIS • Riak supports two main client types: • REST based HTTP Interface • Easy to use from command line and simple scripts • Useful if using intermediate caching layer, e.g.Varnish • Protocol Buffers • Optimized binary encoding standard developed by Google • More performant than HTTP interface Friday, 26 July 13
  • 27. CLIENT LIBRARIES • Client libraries supported by Basho: • Community supported languages and frameworks: • C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy, Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket, Ruby, Scala, Smalltalk Friday, 26 July 13
  • 28. • Using Riak as datastore for all back-end systems supporting Angry Birds • Game-state storage, ID/Login, Payments, Push notifications, analytics, advertisements • 9 clusters in use with over 100 nodes • 263 million active monthly users Friday, 26 July 13
  • 29. • Spine2 project - storing patient data (80 million+) • 500 complex messages per second • 20,000 integrated end points • 0 data loss • 99.9% availability SLA Friday, 26 July 13
  • 30. • Push to talk application • Billions of requests daily • > 50 dedicated servers • Everything stored in Riak • https://github.com/mranney/node_riak Friday, 26 July 13
  • 31. MULTI DATACENTER REPLICATION (MDC) • Allows data to be replicated between clusters in different data centers. Can handle larger latencies. • Two synchronization modes that can be used together: real- time and full sync • Set up as uni-directional or bi-directional replication • Can be used for global load-balancing, business continuity and back-ups Friday, 26 July 13
  • 32. RIAK-CS • Built on top of Riak and supports MDC • S3 compatible object storage • Supports multi-tenancy • Per-tenant usage data and statistics on network I/O • Supports Objects of Arbitrary ContentType Up to 5TB • Often used to build private cloud storage Friday, 26 July 13
  • 33. PLAY AROUND WITH RIAK? • https://github.com/joeljacobson/riak-dev-cluster • https://github.com/joeljacobson/vagrant-riak-cluster Friday, 26 July 13