The document discusses scalable PHP web applications using Apache Cassandra. It provides an overview of Cassandra, how it can be used with PHP via Thrift and PhpCassa, and a case study of how a flash deals website was built using Cassandra to handle filtering and querying of millions of deals. Code examples are shown of how the application retrieves deals data from Cassandra based on attributes and location. The presentation concludes with resources for learning more about Cassandra and a call for questions.
2. @akira28
About me
• Co-founder at Yameveo
• 9+ years developing in PHP
• 2+ years experience with Apache Cassandra
• Zend Framework Certified Engineer
3. @akira28
Yameveo
Founded on 2012 in Barcelona, Yameveo is a young,
dynamic and international company specialised in e-
commerce and web applications development
!
!
www.yameveo.com
@Yameveo
5. @akira28
What we will talk about
• Apache Cassandra
• Data Modeling
• Cassandra & PHP
• Case study
6. @akira28
Apache Cassandra
Apache Cassandra is a massively scalable open source
NoSQL database. Cassandra is perfect for managing
large amounts of structured, semi-structured, and
unstructured data across multiple data centers and the
cloud. Cassandra delivers continuous availability, linear
scalability, and operational simplicity across many
commodity servers with no single point of failure, along
with a powerful dynamic data model designed for
maximum flexibility and fast response times.
Apache Cassandra documentation
7. @akira28
Why Cassandra
• Open Source (enterprise distribution also available)
• Linearly scalable
• Fault-tolerant
• Fully distributed
• Highly performant
• Flexible data model
11. @akira28
CAP Theorem
Only two of:!
!
1. Consistency
all nodes see the same data at the same time
2. Availability
the guarantee that every request receives a
response about whether it was successful or
failed
3. Partition Tolerance
the system continues to operate despite
message loss or failure of part of the system
13. @akira28
Architecture
• Ring
• Each node has a unique token and is identical
• Intra-ring communication via “Gossip” protocol
• Tokens range from 0 to 2^127
19. @akira28
Data Modeling Problems
• Neither join nor subquery support
• Limited support for aggregation
• Ordering is done per-partition
• Ordering is specified at table creation time
20. @akira28
Data Modeling
Best Practices
• Don’t think of a relational table
• Model column families around query patterns
• De-normalize and duplicate for read performance
• Storing values in column names is perfectly OK
• Leverage wide rows for ordering, grouping, and
filtering
25. @akira28
Apache Thrift
Thrift is an interface definition language and binary
communication protocol that is used to define and create
services for numerous languages. It is used as a remote
procedure call (RPC) framework and was developed at
Facebook for "scalable cross-language services
development"
Wikipedia
27. @akira28
PhpCassa
• Open Source
• Uses the Thrift protocol
• Compatible with Cassandra 0.7 through 1.2
• Optional C extension for improved performance
https://github.com/thobbs/phpcassa
!
require: “thobbs/phpcassa”: “v1.1.0”
28. @akira28
Examples
Opening Connections!
!
$pool = new ConnectionPool('Keyspace1');
!
Create a column family object!
!
$users = new ColumnFamily($pool, 'Standard1');
$super = new SuperColumnFamily($pool, 'Super1');
!
Inserting!
!
$users->insert('key', array('column1' => 'value1', 'column2' => 'value2'));
!
Querying!
!
$users->get(‘key'); // returns an array
$users->multiget(array('key1', ‘key2')); // returns an array of arrays
!
Removing!
!
$users->remove('key1'); // removes whole row
$users->remove('key1', 'column1'); // removes 'column1'
32. @akira28
Requirement
• The client wanted a new way to navigate the
website: deal attributes
• Millions of deals (hundreds new and expiring
everyday)
• Dozens of stores and categories
• Performance is key!
33. @akira28
How We Solved It
• Each day we have new deals, so queries based
on date and attributes
• Leverage Cassandra wide-rows to create
indexes
• Use Cassandra multiGet whenever possible