Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But NoSQL databases are very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. In this presentation, you will learn about our experience implementing a use case from POJOs in Action using popular NoSQL databases: Redis, MongoDB, and Cassandra. We will compare and contrast each database’s data model and Java API. You will learn about the benefits and drawbacks of using NoSQL.
Polygot persistence for Java Developers - August 2011 / @Oakjug
1. Polyglot persistence for Java
developers - moving out of the
relational comfort zone
Chris Richardson
Author of POJOs in Action
Founder of CloudFoundry.com
chris@chrisrichardson.net
@crichardson
2. Overall presentation goal
The joy and pain of
building Java
applications that
use NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 2
3. About Chris
• Grew up in England and live in Oakland,
CA
• Over 25+ years of software development
experience including 14+ years of Java
• Speaker at JavaOne, SpringOne,
PhillyETE, Devoxx, etc.
• Organize the Oakland JUG and the
Groovy Grails meetup
http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 3
4. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action & NoSQL
8/19/11
Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 4
5. Relational databases are great
o SQL = Rich, declarative query language
o Database enforces referential integrity
o ACID semantics
o Well understood by developers
o Well supported by frameworks and tools, e.g. Spring
JDBC, Hibernate, JPA
o Well understood by operations
n Configuration
n Care and feeding
n Backups
n Tuning
n Failure and recovery
n Performance characteristics
o But….
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 5
6. Problem: Complex object graphs
o Object/relational
impedance
mismatch
o Complicated to
map rich domain
model to relational
schema
o Performance issues
n Many rows in many
tables
n Many joins
7. Problem: Semi-structured data
o Relational schema doesn’t easily handle
semi-structured data:
n Varying attributes
n Custom attributes on a customer record
o Common solution = Name/value table
n Poor performance
n E.g. Finding specific attributes for customers
satisfying some criteria = multi-way outer
JOIN
n Lack of constraints
o Another solution = Serialize as blob
n Fewer joins
n BUT can’t be queried
8. Problem: Schema evolution
o For example:
n Add attributes to an object è add
columns to table
o Schema changes =
n Holding locks for a long time è
application downtime
n $$
9. Problem: Scaling
o Scaling reads:
n Master/slave
n But beware of consistency issues
o Scaling writes
n Extremely difficult/impossible/expensive
n Vertical scaling is limited and requires $$
n Horizontal scaling is limited/requires $$
10. Solution: Buy high end technology
http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
11. Solution: Hire more developers
o Application-level sharding
o Build your own middleware
o …
http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
12. Solution: Use NewSQL
o Led by Stonebraker
n Current databases are designed for 1970s
hardware and for both OLTP and data
warehouses
n http://www.slideshare.net/VoltDB/sql-
myths-webinar
o NewSQL
n Next generation SQL databases, e.g. VoltDB
n Leverage multi-core, commodity hardware
n In-memory
n Horizontally scalable
n Transparently shardable
n ACID
13. NoSQL databases are emerging…
Each one offers
some combination
of:
o Higher performance
o Higher scalability
o Richer data-model
o Schema-less
In return for:
o Limited transactions
o Relaxed consistency
o Unconstrained data
o …
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 13
14. … but there are few commonalities
o Everyone and their dog has written
one
o Different data models
n Key-value “Same sorry state as the database
market in the 1970s before SQL was
n Column invented”
http://queue.acm.org/detail.cfm?
n Document id=1961297
n Graph
o Different APIs
o No JDBC, Hibernate, JPA (generally)
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 14
15. Future = multi-paradigm data storage
for enterprise applications
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 15
16. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action & NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 16
17. Redis
o Advanced key-value store
n Values can be binary strings, Lists, Sets,
Sorted Sets, Hashes, …
n Data-type specific operations
o Very fast
n ~100K operations/second on entry-level
hardware
n In-memory operations K1 V1
o Persistent K2 V2
n Periodic snapshots of memory OR K3 V2
append commands to log file
o Transactions within a single server
n Atomic execution of batched commands
n Optimistic locking
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 17
18. Redis CLI Sorted set member = value + score
redis> zadd mysortedset 5.0 a
(integer) 1
redis> zadd mysortedset 10.0 b
(integer) 1
redis> zadd mysortedset 1.0 c
(integer) 1
redis> zrange mysortedset 0 1
1) "c"
2) "a"
redis> zrangebyscore mysortedset 1 6
1) "c"
2) "a"
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 18
19. Scaling Redis
o Master/slave replication
n Tree of Redis servers
n Non-persistent master can replicate to a
persistent slave
n Use slaves for read-only queries
o Sharding
n Client-side only – consistent hashing based
on key
n Server-side sharding – coming one day
o Run multiple servers per physical host
n Server is single threaded => Leverage
multiple CPUs
n 32 bit more efficient than 64 bit
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 19
20. Downsides of Redis
o Low-level API compared to SQL
o Single threaded:
n Multiple cores è multiple Redis servers
o Master/slave failover is manual
o Partitioning is done by the client
o Dataset has to fit in memory
21. Redis use cases
o Drop-in replacement for Memcached
n Session state
n Cache of data retrieved from SOR
o Replica of SOR for queries needing high-
performance
o Miscellaneous yet important
n Counting using INCR command, e.g. hit counts
n Most recent N items - LPUSH and LTRIM
n Randomly selecting an item – SRANDMEMBER
n Queuing – Lists with LPOP, RPUSH, ….
n High score tables – Sorted sets and ZINCRBY
n …
o Notable users: github, guardian.co.uk, ….
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 21
22. Cassandra
o An Apache open-source project
o Developed by Facebook for inbox search
o Column-oriented database/Extensible row store
n The data model will hurt your brain
n Row = map or map of maps
o Fast writes = append to a log
o Extremely scalable
n Transparent and dynamic clustering
n Rack and datacenter aware data replication
o Tunable read/write consistency per operation
n Writes: any, one replica, quorum of replicas, …, all
n Read: one, quorum, …, all
o CQL = “SQL”-like DDL and DML
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 22
23. Cassandra data model
My Column family (within a key space)
Keys Columns
a colA: value1 colB: value2 colC: value3
b colA: value colD: value colE: value
A column has a
timestamp to
o 4-D map: keySpace x key x columnFamily x column è
value
o Arbitrary number of columns
o Column names are dynamic; can contain data
o Columns for a row are stored on disk in order
determined by comparator
o One CF row = one DDD aggregate
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 23
24. Cassandra data model – insert/update
My Column family (within a key space)
Keys Columns
a colA: value1 colB: value2 colC: value3 Transaction =
updates to a
row within a
b colA: value colD: value colE: value ColumnFamily
Insert(key=a, columName=colZ, value=foo) Idempotent
Keys Columns
a colA: value1 colB: value2 colC: value3 colZ: foo
b colA: value colD: value colE: value
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 24
25. Cassandra query example – slice
Key Columns
s
colA: colB: colC: colZ:
a
value1 value2 value3 foo
colA: colD: colE:
b
value value value
slice(key=a, startColumn=colA, endColumnName=colC)
Key Columns You can also do a
s
rangeSlice which
colA: colB:
a
value1 value2 returns a range of keys
– less efficient
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 25
26. Super Column Families – one more
dimension
My Column family (within a key space)
Keys Super columns
ScA ScB
a
colA: value1 colB: value2 colC: value3
b
colA: value colD: value colE: value
Insert(key=a, superColumn=scB, columName=colZ, value=foo)
keySpace x key x columnFamily x superColumn x column -> value
Keys Super columns
ScA ScB
a
colA: value1 colB: value2 colC:colZ: foo
value3
b
colA: value colD: value colE: value
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 26
27. Getting data with super slice
My Column family (within a key space)
Keys Super columns
ScA ScB
a
colA: value1 colB: value2 colC: value3
b
colA: value colD: value colE: value
superSlice(key=a, startColumn=scB, endColumnName=scC)
Keys Super columns
ScB
a
colC: value3
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 27
28. Cassandra CLI
$ bin/cassandra-cli -h localhost
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] list restaurantDetails;
Using default limit of 100
-------------------
RowKey: 1
=> (super_column=attributes,
(column=json, value={"id":
1,"name":"Ajanta","menuItems"....
[default@Keyspace1] get restaurantDetails['1']
['attributes’];
=> (column=json, value={"id":
1,"name":"Ajanta","menuItems"....
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 28
29. Scaling Cassandra
• Client connects to any node
• Dynamically add/remove nodes
Keys = [D, A]
Node 1 • Reads/Writes specify how many nodes
• Configurable # of replicas
Token = A • adjacent nodes
• rack and data center aware
replicates replicates
Node 4 Node 2
Keys = [A, B]
Token = D Token = B
replicates
Keys = [C, D] replicates Replicates to
Node 3
Token = C
Keys = [B, C]
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 29
30. Downsides of Cassandra
o Learning curve
o Still maturing, currently v0.8.4
o Limited queries, i.e. KV lookup
o Transactions limited to a column
family row
o Lacks an easy to use API
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 30
31. Cassandra use cases
o Use cases
• Big data
• Multiple Data Center distributed database
• Persistent cache
• (Write intensive) Logging
• High-availability (writes)
o Who is using it
n Digg, Facebook, Twitter, Reddit, Rackspace
n Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
n The largest production cluster has over 100
TB of data in over 150 machines. –
Casssandra web site
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 31
32. MongoDB
o Document-oriented database
n JSON-style documents: Lists, Maps, primitives
n Documents organized into collections (~table)
n Schema-less
o Rich query language for dynamic queries
o Asynchronous, configurable writes:
n No wait
n Wait for replication
n Wait for write to disk
o Very fast
o Highly scalable and available:
n Replica sets (generalized master/slave)
n Sharding
n Transparent to client
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 32
33. Data Model = Binary JSON documents
{
"name" : "Sahn Maru", One document
"type" : ”Korean",
"serviceArea" : [ =
"94619",
"94618" one DDD aggregate
],
"openingHours" : [
{ DBObject o = new BasicDBObject();
"dayOfWeek" : "Wednesday", o.put("name", ”Sahn Maru");
"open" : 1730,
"close" : 2230 DBObject mi = new BasicDBObject();
} mi.put("name", "Daeji Bulgogi");
], …
"_id" : ObjectId("4bddc2f49d1505567c6220a0") List<DBObject> mis = Collections.singletonList(mi);
}
o.put("menuItems", mis);
o Sequence of bytes on disk = fast I/O
n No joins/seeks
n In-place updates when possible è no index updates
o Transaction = update of single document
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 33
35. MongoDB query by example
{
serviceArea:"94619", Find a
openingHours: {
$elemMatch : { restaurant
"dayOfWeek" : "Monday",
"open": {$lte: 1800}, that serves
}
"close": {$gte: 1800}
the 94619 zip
}
}
code and is
open at 6pm
DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) { on a Monday
DBObject o = cursor.next();
…
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 35
36. Scaling MongoDB
Shard 1 Shard 2
Mongod Mongod
(replica) (replica)
Mongod Mongod
(master) Mongod (master) Mongod
(replica) (replica)
Config
Server
mongod
A shard consists of a
mongos replica set =
generalization of
master slave
mongod
mongod Collections spread
over multiple
client shards
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 36
37. Mongo Downsides
o Server has a global write lock
n Single writer OR multiple readers
è Long running queries blocks writers
o Great that writes are not synchronous
n BUT perhaps an asynchronous response
would be better than a synchronous
getLastError()
Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
38. MongoDB use cases
o Use cases
n High volume writes
n Complex data
n Semi-structured data
o Who is using it?
n Shutterfly, Foursquare
n Bit.ly Intuit
n SourceForge, NY Times
n GILT Groupe, Evite,
n SugarCRM
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 38
39. Other NoSQL databases
Type Examples
Extensible columns/Column- Hbase
oriented SimpleDB
Graph Neo4j
Key-value Membase
Document CouchDb
http://nosql-database.org/ lists 122+ NoSQL databases
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 39
40. Picking a database
Application requirement Solution
Complex transactions/ACID Relational database
Scaling NoSQL
Social data Graph database
Multiple datacenters Cassandra
Highly-available writes Cassandra
Flexible data Document store
High write volumes Mongo, Cassandra
Super fast cache Redis
Adhoc queries Relational or Mongo
…
http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 40
41. Proceed with caution
o Don’t commit to a
NoSQL DB until you
have done a
significant POC
o Encapsulate your data
access code so you
can switch
o Hope that one day
you won’t need ACID
42. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action & NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 42
43. NoSQL Java APIs
Database Libraries
Redis Jedis, JRedis, JDBC-Redis, RJC
Cassandra Raw Thrift if you are a masochist
Hector, …
MongoDB MongoDB provides a Java driver
Some are not so easy to use
Stylistic differences
Boilerplate code
…
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 43
44. Spring Data Project Goals
Bring classic Spring value propositions to a wide
range of NoSQL databases
è
n Productivity
n Programming model consistency: E.g.
<NoSQL>Template classes
n “Portability”
http://www.springsource.org/spring-data
Slide 44
45. Spring Data sub-projects
§ Commons: Polyglot persistence
§ Key-Value: Redis, Riak
§ Document: MongoDB, CouchDB
§ Graph: Neo4j
§ GORM for NoSQL
§ Various milestone releases
§ Redis 1.0.0.M4 (July 20th, 2011)
§ Document 1.0.0.M2 (April 9, 2011)
§ Graph - Neo4j Support 1.0.0 (April 19, 2011)
§ …
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 45
47. Richer mapping
Annotations define mapping:
@Document, @Id, @Indexed,
@PersistanceConstructor,
@Document
@CompoundIndex, @DBRef,
public class Person {
@GeoSpatialIndexed, @Value
@Id
private ObjectId id; Map fields instead of properties
private String firstname; è no getters or setters required
@Indexed Non-default constructor
private String lastname;
Index generation
@PersistenceConstructor
public Person(String firstname, String lastname) {
this.firstname = firstname;
this.lastname = lastname;
}
….
}
Slide 47
48. Generic Mongo Repositories
interface PersonRepository extends MongoRepository<Person, ObjectId> {
List<Person> findByLastname(String lastName);
}
<bean>
<mongo:repositories
base-package="net.chrisrichardson.mongodb.example.mongorepository"
mongo-template-ref="mongoTemplate" />
</beans>
Person p = new Person("John", "Doe");
personRepository.save(p);
Person p2 = personRepository.findOne(p.getId());
List<Person> johnDoes = personRepository.findByLastname("Doe");
assertEquals(1, johnDoes.size());
Slide 48
49. Support for the QueryDSL project
Generated from Type-safe
domain model class composable queries
QPerson person = QPerson.person;
Predicate predicate =
person.homeAddress.street1.eq("1 High Street")
.and(person.firstname.eq("John"))
List<Person> people = personRepository.findAll(predicate);
assertEquals(1, people.size());
assertPersonEquals(p, people.get(0));
Slide 49
50. Cross-store/polyglot persistence
Person person = new Person(…);
@Entity entityManager.persist(person);
public class Person {
// In Database Person p2 = entityManager.find(…)
@Id private Long id;
private String firstname;
private String lastname;
// In MongoDB
@RelatedDocument private Address address;
{ "_id" : ObjectId(”….."),
"_entity_id" : NumberLong(1),
"_entity_class" : "net.. Person",
"_entity_field_name" : "address",
"zip" : "94611", "street1" : "1 High Street", …}
Slide 50
51. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action &
NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 51
52. Food to Go – placing a takeout
order
o Customer enters delivery address and delivery time
o System displays available restaurants = restaurants
that serve the zip code of the delivery address AND
are open at the delivery time
class Restaurant { class TimeRange {
long id; long id;
String name; int dayOfWeek;
Set<String> serviceArea; int openingTime;
Set<TimeRange> openingHours;
int closingTime;
List<MenuItem> menuItems;
}
}
class MenuItem {
String name;
double price;
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 52
54. Finding available restaurants on
monday, 7.30pm for 94619 zip
select r.* Straightforward
from restaurant r three-way join
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
Where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1930
and 1930 <=tr.closingtime
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 54
55. Redis - Persisting restaurants is
“easy”
rest:1:details [ name: “Ajanta”, … ]
Multiple KV value
rest:1:serviceArea [ “94619”, “94611”, …]
pairs
rest:1:openingHours [10, 11]
timerange:10 [“dayOfWeek”: “Monday”, ..]
timerange:11 [“dayOfWeek”: “Tuesday”, ..]
Single KV hash
OR
rest:1 [ name: “Ajanta”,
“serviceArea:0” : “94611”, “serviceArea:1” : “94619”,
“menuItem:0:name”, “Chicken Vindaloo”,
…]
OR
Single KV String
rest:1 { .. A BIG STRING/BYTE ARRAY, E.G. JSON }
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 55
56. BUT…
o … we can only retrieve them via primary key
è We need to implement indexes
è Queries instead of data model drives
NoSQL database design
o But how can a key-value store support a
query that has
?
n A 3-way join
n Multiple =
n > and <
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 56
57. Simplification #1: Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Simpler query:
AND zip_code = 94619 § No joins
§ Two = and two <
AND 1815 < close_time
AND open_time < 1815
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 57
58. Simplification #2: Application filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Even simple query
AND zip_code = 94619 • No joins
AND 1815 < close_time • Two = and one <
AND open_time < 1815
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 58
59. Simplification #3: Eliminate multiple
=’s with concatenation
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
SELECT …
FROM time_range_zip_code
WHERE zip_code_day_of_week = ‘94619:Monday’
AND 1815 < close_time
key
range
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 59
60. Sorted sets support range queries
Key Sorted Set [ Entry:Score, …]
94707:Monday [1130_1:1430, 1730_1:2130]
94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130]
zipCode:dayOfWeek Member: OpeningTime_RestaurantId
Score: ClosingTime
ZRANGEBYSCORE 94619:Monday 1815 2359
è
{1730_1}
1730 is before 1815 è Ajanta is open
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 60
61. What did I just do to query the data?
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 61
62. What did I just do to query the data?
o Wrote code to maintain an index
o Reduced performance due to extra
writes
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 62
63. RedisTemplate-based code
@Repository
public class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository {
@Autowired private final StringRedisTemplate redisTemplate;
private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) {
return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode));
}
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) {
String zipCode = deliveryAddress.getZip();
int timeOfDay = timeOfDay(deliveryTime);
int dayOfWeek = dayOfWeek(deliveryTime);
Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359);
Set<String> restaurantIds = new HashSet<String>();
String paddedTimeOfDay = FormattingUtil.format4(timeOfDay);
for (String trId : closingTrs) {
if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0)
restaurantIds.add(StringUtils.substringAfterLast(trId, "_"));
}
Collection<String> jsonForRestaurants =
redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds ));
List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>();
for (String json : jsonForRestaurants) {
restaurants.add(AvailableRestaurant.fromJson(json));
}
return restaurants;
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 63
64. Redis – Spring configuration
@Configuration
public class RedisConfiguration extends AbstractDatabaseConfig {
@Bean
public RedisConnectionFactory jedisConnectionFactory() {
JedisConnectionFactory factory = new JedisConnectionFactory();
factory.setHostName(databaseHostName);
factory.setPort(6379);
factory.setUsePool(true);
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxActive(1000);
factory.setPoolConfig(poolConfig);
return factory;
}
@Bean
public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) {
StringRedisTemplate template = new StringRedisTemplate();
template.setConnectionFactory(factory);
return template;
}
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 64
65. Cassandra: Easy to store
restaurants
Column Family: RestaurantDetails
Keys Columns
1 name: Ajanta type: Indian …
name: Montclair
2 type: Breakfast …
Egg Shop
OR
Column Family: RestaurantDetails
Keys Columns
1 details: { JSON DOCUMENT }
2 details: { JSON DOCUMENT }
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 65
66. Querying using Cassandra
o Similar challenges to using Redis
o Limited querying options
n Row key – exact or range
n Column name – exact or range
o Use composite/concatenated keys
n Prefix - equality match
n Suffix - can be range scan
o No joins è denormalize
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 66
67. Cassandra: Find restaurants that close after the delivery
time and then filter
Keys Super Columns
1430 1430 2130
94619:Mon
1130_1: JSON FOR 1730_1: JSON FOR
0700_2: JSON FOR EGG
AJANTA AJANTA
SuperSlice
key= 94619:Mon
SliceStart = 1815
SliceEnd = 2359
Keys Super Columns
2130
94619:Mon
1730_1: JSON FOR
AJANTA
18:15 is after 17:30 => {Ajanta}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 67
68. Cassandra/Hector code
import me.prettyprint.hector.api.Cluster;
public class CassandraHelper {
@Autowired private final Cluster cluster;
public <T> List<T> getSuperSlice(String keyspace, String columnFamily,
String key, String sliceStart, String sliceEnd,
SuperSliceResultMapper<T> resultMapper) {
SuperSliceQuery<String, String, String, String> q =
HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster),
StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
q.setColumnFamily(columnFamily);
q.setKey(key);
q.setRange(sliceStart, sliceEnd, false, 10000);
QueryResult<SuperSlice<String, String, String>> qr = q.execute();
SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper);
for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) {
List<HColumn<String, String>> columns = superColumn.getColumns();
rowProcessor.processRow(key, superColumn.getName(), columns);
}
return rowProcessor.getResult();
}
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 68
69. MongoDB = easy to store
{
"_id": "1234"
"name": "Ajanta",
"serviceArea": ["94619", "99999"],
"openingHours": [
{
"dayOfWeek": 1,
"open": 1130,
"close": 1430
},
{
"dayOfWeek": 2,
"open": 1130,
"close": 1430
},
…
]
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 69
70. MongoDB = easy to query
{
"serviceArea": "94619",
"openingHours": {
"$elemMatch": {
"open": { "$lte": 1815},
"dayOfWeek": 4,
"close": { $gte": 1815}
}
}
db.availableRestaurants.ensureIndex({serviceArea: 1})
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 70
71. MongoTemplate-based code
@Repository
public class AvailableRestaurantRepositoryMongoDbImpl
implements AvailableRestaurantRepository {
@Autowired private final MongoTemplate mongoTemplate;
@Autowired @Override
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress,
Date deliveryTime) {
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
Query query = new Query(where("serviceArea").is(deliveryAddress.getZip())
.and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek)
.and("openingTime").lte(timeOfDay)
.and("closingTime").gte(timeOfDay)));
return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query,
AvailableRestaurant.class);
}
mongoTemplate.ensureIndex(“availableRestaurants”,
new Index().on("serviceArea", Order.ASCENDING));
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 71
72. MongoDB – Spring Configuration
@Configuration
public class MongoConfig extends AbstractDatabaseConfig {
private @Value("#{mongoDbProperties.databaseName}")
String mongoDbDatabase;
public @Bean MongoFactoryBean mongo() {
MongoFactoryBean factory = new MongoFactoryBean();
factory.setHost(databaseHostName);
MongoOptions options = new MongoOptions();
options.connectionsPerHost = 500;
factory.setMongoOptions(options);
return factory;
}
public @Bean
MongoTemplate mongoTemplate(Mongo mongo) throws Exception {
MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase);
mongoTemplate.setWriteConcern(WriteConcern.SAFE);
mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION);
return mongoTemplate;
}
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 72
73. Summary
o Relational databases are great but
n Object/relational impedance mismatch
n Relational schema is rigid
n Extremely difficult/impossible to scale writes
n Performance can be suboptimal
o Each NoSQL databases can solve some
combination of those problems BUT
n Limited transactions
n One day needing ACID è major rewrite
n Query-driven, denormalized database design
n …
è
o Carefully pick the NoSQL DB for your application
o Consider a polyglot persistence architecture
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 74
74. Thank you!
My contact info:
chris@chrisrichardson.net
@crichardson
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 75