2. Overview
• NoSQL
• Brief History of Cassandra
• Architecture
• Terminology
• Cassandra Query Language
• Basic CRUD Operations using CQL (Possibly in
MULE)
• References, For Further Reading/Implementation
pt2.
3. NoSQL
• originally referring to "non SQL" or "non relational”.
• also sometimes called "Not only SQL" to emphasize that it
may support SQL-like query languages.
• triggered by the growing needs of Web 2.0 companies such
as Facebook, Google and Amazon in which they use
“whole lot of data” (big data or real-time data) and the
need for faster responses to users (Using cache or small
data)
• Data that are not easily modelled into a
Traditional/Relational Database.
4. An Example Use Case of
NoSQL
Let’s create a new social engagement (dating) site
wherein Users can create posts, add pictures, videos
and music to them. Other users can comment on the
posts and give points (likes, thumbs up, thumbs down)
to rate the posts. The landing page (Home) will have a
feed of posts that users can share and interact with.
5. How we will map it using
SQL
How do we display a Post by a certain user using SQL?
9. Brief History of Cassandra
• Cassandra was developed at Facebook for inbox search
(Messaging).
• It was open-sourced by Facebook in July 2008.
• Cassandra was accepted into Apache Incubator in March 2009.
• It was made an Apache top-level project since February 2010.
• The name “Cassandra” was from the Greek Mythology. A gifted
prophet who can see the future, but unfortunately no one
believed in her. It is said that one of the reasons behind the
name(Cassandra) was that NoSQL was not a “believable”
solution to today’s and future data needs.
10. Features of Cassandra
• Highly Scalable - add more nodes to a cluster / add another cluster to accommodate more customers/clients
and data
• Masterless Design - all nodes are the same, which provides operational simplicity and easy scale-out.
• “Always-on” / Continuous Availability - offers redundancy of both data and node function, has no single point
of failure and it is continuously available for business-critical applications that cannot afford a failure.
• Linear-scale performance - increases throughput through the number of nodes in the cluster.
• Flexible Data Storage - Supports Structured (RDBMS) and Semi Structured Data storage (column name-
value or key-value, Table x Row x Column).
• Data Replication - Data is replicated across all nodes, using Gossip Protocol (which is also used to identify
if a Node in a cluster is alive or not).
• Active “everywhere” design – all nodes may be written to and read from.
• Strong data protection – a commit log design ensures no data loss and built in security with backup/restore
keeps data protected and safe.
• Cassandra Query Language - primary language for communicating with the Cassandra database
13. Terminologies
• In Cassandra, a keyspace is a container for your application
data. It is similar to the schema to Oracle or PostgreSQL the
database in RDBMS..
• Column Family / Table − the most basic unit in the Cassandra
data model, and each column consists of a name, a value, and a
timestamp or Time To Live.
• By ignoring the timestamp of the Column, you can represent a
column as a name value pair.
• *You can also configure a Column Family with a TTL.
• Cassandra always stores columns sorted by their Primary Key.
16. Cassandra Query Language
• Basic way to interact with Cassandra is using the
CQL shell
• you can Administer cluster nodes, roles and clients
(users) via CQL shell
• With the release of CQL3, it borrowed many of SQL
features such as orderBy, filtering but still no JOINS
and subqueries
17. Create a Keyspace
CREATE KEYSPACE users
WITH replication = {
'class' : ‘SimpleStrategy’,
//For single server/cluster only
// ‘NetworkTopologyStrategy’ for multiple clusters
'replication_factor' : 1
// number of copies across nodes
};
18. Create a Column Family
(Table)
CREATE TABLE | COLUMNFAMILY users.user_profile (
userId int,
checked_at timestamp,
departmentId int,
firstName text,
lastName text,
address text,
PRIMARY KEY (userId, checked_at))
WITH CLUSTERING ORDER BY ("checked_at"ASC);
<- Compound Primary Key
* Only Primary Keys when used for querying (WHERE) can sort results
19. Inserting Data
INSERT INTO users.user_profile (userId,checked_at,departmentId, lastName, firstName, address)
VALUES (1,'2016-06-21T09:10+1300', 108, 'Dela Cruz', 'Juan','Manila');
INSERT INTO users.user_profile (userId,checked_at,departmentId, lastName, firstName, address)
VALUES (2, '2016-06-21T09:11+1300', 109, 'Tambling', 'Ben','Manila');
INSERT INTO users.user_profile (userId,checked_at,departmentId, lastName, firstName,
address)VALUES (3, '2016-06-21T09:12+1300', 110, 'Badiday', 'Inday','Manila');
INSERT INTO users.user_profile (userId,checked_at,departmentId, lastName, firstName, address)
VALUES (4, '2016-06-21T09:13+1300' ,111, 'Ayala', 'Joey','Manila');
INSERT INTO users.user_profile (userId,checked_at,departmentId, lastName, firstName, address)
VALUES (3, '2016-06-21T09:12+1300', 109, 'Badiday', ‘Inday','Manila') IF NOT EXISTS;
20. Selecting Data
SELECT * FROM users.user_profile WHERE userId =
1;
SELECT * FROM users.user_profile WHERE userId IN
(1,2,3, ...) ORDER BY departmentId ASC;
SELECT * FROM users.user_profile WHERE userId = 1
AND departmentId = 110;
21. Updating Data
UPDATE users.user_profile SET password='luxerey' WHERE
userid=1 AND checked_at='2016-06-21T09:14+1300';
* Per column, you can individually set its time to live
(useful for sessions, auth keys).
UPDATE users.user_profile USING TTL 100 SET
password='luxerey' WHERE userid=1 AND checked_at=‘2016-
06-21T09:14+1300';
22. Deleting Data (Row and
Columns)
* You can delete a specific column:
DELETE password FROM users.user_profile where userid = 1 AND
checked_at='2016-06-21T09:14+1300';
* Or you can delete a whole row:
DELETE FROM users.user_profile WHERE userid=1 AND
checked_at='2016-06-21T09:14+1300';