9. 9
1 million
Zombilepsy
victims are loaded
into a Riak KV
cluster.
Identified by zip code as an index value and supporting either
Term-Based Inverted Index or Riak's Secondary Index. Zombies
are located via search or interacting with the map.
bit.ly/zombie-riak
13. Masterless
13
Riak has a masterless architecture in which every node in a cluster is
capable of serving read and write requests.
Requests are routed to nodes using standard load balancing.
20. tweet me @mjbrender 20
Matt Brender
@mjbrender
github.com/basho-labs
Spend Time
getting to know us
github.com/basho
Notas do Editor
I have the pleasure of working for Basho, creators and maintainers of Riak
In a nutshell is a distributed dynamo-inspired database falls into NoSQL includes full-text search
I like to play with infrastructure
An important differentiation at this conference :)
What’s great though, is that I know I’m not alone => There’s a whole mission to achieve that!
but I’m not always good at it (idk what I’m doing). I don’t make any two environments the same way and it’s a pain when attempting to explain how something works (or how it failed to work)
That has me appreciative of Vagrant
=> Packages up the config mgmt and virtualization in one easy config file
I’ve implemented exactly zero of what I’m talking about. What I do offer is the good fortune of speaking to people who build these systems, basically non-stop. There is a lot to learn from just listening.
I’ve spoken to hundreds of developers from companies of every shape and size. I’ve argued with ops engineers, I’ve listened to data scientists. I’ve read the 8 years of posts, from Amazon’s Dynamo paper in 2007 that Basho actually designed Riak after.
Wraps up all my concerns with installation, packages up the config mgmt and virtualization in one easy config file
These tools are packaged up in a clean config file thanks to the folks at Hashicorp
Create and configure lightweight, reproducible, and portable development environments.
I like to vagrant up my Riaks. With this single command, I can spin up reproduceable and, easily shared, environments across our community and I love that.
With just a little ruby code (here’s part of a multi-node system) and a call to your favorite provisioners, you have your production deployment system easily connected to right from you laptop without any other calls — of code or to people
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = “ubuntu/trusty64"
config.vm.network :forwarded_port, guest: 8087, host: 37017
config.vm.network :forwarded_port, guest: 8098, host: 37018
config.vm.network :forwarded_port, guest: 8080, host: 8080
config.vm.provision 'shell', path: 'provision.sh'
config.vm.provision 'shell', path: ‘zombie.sh'
config.vm.provider :virtualbox do |vb, override|
override.vm.box_url = "http://files.vagrantup.com/precise64.box"
vb.customize ["modifyvm", :id, "--memory", "2048"]
end
config.vm.provider 'vmware_fusion' do |vm, override|
override.vm.box_url = "http://files.vagrantup.com/precise64_vmware_fusion.box"
vm.vmx['memsize'] = '2048'
end
end
end
Here’s a more complicated example that actually produces 1 or more VMs and configures via Chef.
# -*- mode: ruby -*-
# vi: set ft=ruby :
CENTOS = {
box: "opscode-centos-6.4",
url: "https://opscode-vm.s3.amazonaws.com/vagrant/opscode_centos-6.4_provisionerless.box"
}
UBUNTU = {
box: "opscode-ubuntu-12.04",
url: "https://opscode-vm.s3.amazonaws.com/vagrant/opscode_ubuntu-12.04_provisionerless.box"
}
NODES = ENV["NUM_NODES"].nil? ? 3 : ENV["NUM_NODES"].to_i
OS = UBUNTU
BASE_IP = "33.33.33"
IP_INCREMENT = 10
Vagrant.configure("2") do |cluster|
# Ensure latest version of Chef is installed.
cluster.omnibus.chef_version = :latest
# Utilize the Berkshelf plugin to resolve cookbook dependencies.
cluster.berkshelf.enabled = true
(1..NODES).each do |index|
last_octet = index * IP_INCREMENT
cluster.vm.define "riak#{index}".to_sym do |config|
# Configure the VM and operating system.
config.vm.box = OS[:box]
config.vm.box_url = OS[:url]
config.vm.provider(:virtualbox) { |v| v.customize ["modifyvm", :id, "--memory", 1024] }
# Setup the network and additional file shares.
if index == 1
[ 8098, 8087, 8069 ].each do |port|
config.vm.network :forwarded_port, guest: port, host: port
end
end
config.vm.hostname = "riak#{index}"
config.vm.network :private_network, ip: "#{BASE_IP}.#{last_octet}"
# Provision using Chef.
config.vm.provision :chef_solo do |chef|
chef.roles_path = "roles"
if config.vm.box =~ /ubuntu/
chef.add_recipe "apt"
else
chef.add_recipe "yum"
chef.add_recipe "yum::epel"
end
chef.add_role "base"
chef.add_role "riak"
chef.json = {
"riak" => {
"args" => {
"+S" => 1,
"-name" => "riak@33.33.33.#{last_octet}"
},
"config" => {
"riak_control" => {
"enabled" => (index == 1 ? true : false)
}
}
}
}
end
end
end
end
Like Zombie Riak. This application repo is accompanied by a simple nginx.conf file so you can scale based on demand.
Banana is a fork of Kibana (which visualizes ElasticSearch) that plays nicely with the Solr API. that my colleague is getting up and running with Riak.
What’s great is that Kibana is designed for Apache Solr’s API set, which is not an exact mapping to how Riak KV exposes Solr.
Puppet template
Made a little shim to list out the cores and other questions
Banana queries Solr interfaces
Wants access to one Solr endpoint
Also had access to admin interface
We don’t expose that ^
Nothing I mentioned yet talks about distributed systems, however. For that part to be interest, you have to learn a little more about Riak KV.
Data is spread across N number of nodes thanks to the magic of consistent hashing. This allows us to spread data out incredibly evenly. This does, however, mean some nodes have some data and some don’t. Every node, however, knows about keys help within the whole cluster, and it knows deterministically which node is the primary data owner and secondary owner(s). It passes that request on, waits for response, and passes back the response (ACK or GET value).
But you can retrieve data from any node. So even when I talk to this node and the “owners” are the 3 red systems, the server will act as a coordinator for me.
Bookings took the same algorithmic way by which Riak knows which node has which set of data and implemented that logic right in their proxy servers.
This results in the same deterministic answer of which nodes owns which hash and has resulted in a significant reduction to their bandwidth consumption across the cluster.
I’m exploring whether this can be done with Nginx since it seems possible.
And I have the good fortune to listen in to a ton of conversations. Everyone I’ve met uses proxy servers to augment data flow in a optimized way and Nginx is a loved tool in the toolset we recomment in Professional Services for Basho.
Our database at Basho, Riak, is used by many companies to store everything from session data to log aggregation. In these conversations, I always pivot to asking about their architecture - the how, the why, and the waht could be better.