SlideShare uma empresa Scribd logo
1 de 85
Baixar para ler offline
Apache Cassandra and Datastax Enterprise 
Peter Halliday Sr. Software Engineer 
peter@datastax.com 
@phalliday
A Look at Apache Cassandra…
How Many Have Heard of… 
©2013 DataStax Confidential. Do not distribute without consent. 
3 
• DataStax 
• Apache Cassandra
What is Apache Cassandra? 
Apache Cassandra™ is a massively scalable NoSQL database. 
Cassandra is designed to handle big data workloads across 
multiple data centers with no single point of failure, providing 
enterprises with continuous availability without compromising 
performance.
Customer Industries 
• Financial 
• eCommerce 
• Marketing 
• Recommendation Engines 
• Health Sciences 
• Fraud Detection 
• Sensor Data 
• Online Games 
©2013 DataStax Confidential. Do not distribute without consent. 
5
Why use Cassandra? 
100,000 
txns/sec 
200,000 
txns/sec 
400,000 
txns/sec 
• Masterless architecture. 
• Continuous availability. 
• Multi-data center and cloud availability zone support. 
• Flexible data model. 
• Linear scale performance. 
• Operationally simple. 
• CQL – SQL-like language.
History of Cassandra 
• Amazon Dynamo partitioning and replication 
• Log-structured ColumnFamily data model similar to Bigtable's 
©2013 DataStax Confidential. Do not distribute without consent. 
7
Cassandra Terminology 
• Node (Vnode) 
• Cluster 
• Datacenter 
• Partition 
• Gossip 
• Snitch 
• Replication Factor 
• Consistency 
©2013 DataStax Confidential. Do not distribute without consent. 
8
Cassandra Node 
©2013 DataStax Confidential. Do not distribute without consent. 
9 
A computer server running Cassandra
Cassandra Cluster 
A group of Cassandra nodes working together 
©2013 DataStax Confidential. Do not distribute without consent. 
10
Cassandra Datacenter 
DC1 DC2 
©2013 DataStax Confidential. Do not distribute without consent. 
11
Logical Datacenter 
DC1 DC2 DC3 
©2013 DataStax Confidential. Do not distribute without consent. 
12
Physical Datacenter 
©2013 DataStax Confidential. Do not distribute without consent. 
13
Cloud-based Datacenters 
©2013 DataStax Confidential. Do not distribute without consent. 
14
Cassandra Cluster 
Data is evenly distributed around the nodes in a cluster 
©2013 DataStax Confidential. Do not distribute without consent. 
15
Overview of Data Partitioning in Cassandra 
There are two basic data partitioning strategies: 
1. Random partitioning – this is the default and 
recommended strategy. Partitions data as evenly as 
possible across all nodes using a hash of every column 
family row key 
2. Ordered partitioning – stores column family row keys in 
sorted order across the nodes in a database cluster
Data Distribution and Partitioning 
• Each node “owns” a set of tokens 
• A node’s token range is manually configured or randomly 
assigned when the node joins the cluster 
• A partition key is defined for each table 
• The partitioner applies a hash function to convert a 
partition key to a token. This determines which 
node(s) store that piece of data. 
• By default, Cassandra users the Murmur3Partitioner 
©2013 DataStax Confidential. Do not distribute without consent. 
17
What is a Hash Algorithm? 
• A function used to map data of arbitrary size to data 
of fixed size 
• Slight differences in input produce large differences in 
output 
Input MurmurHash 
1 8213365047359667313 
2 5293579765126103566 
3 -155496620801056360 
4 -663977588974966463 
5 958005880272148645 
The Murmur3Partitioner uses the MurmurHash function. This hashing function 
creates a 64-bit hash value of the row key. The possible range of hash values 
is from -263 to +263. 
©2013 DataStax Confidential. Do not distribute without consent. 
18
Data Distribution 
©2013 DataStax Confidential. Do not distribute without consent. 
-9223372036854775808 
5534023222112865484 
19 
-5534023222112865485 
1844674407370955161 -1844674407370955162 
Each node is 
configured 
with an initial 
token. 
The initial 
token 
determines the 
token range 
owned by the 
node.
5534023222112865484 
20 
-9223372036854775808 
-5534023222112865485 
-1844674407370955162 
1844674407370955161 
Data Distribution 
This node owns the token 
range: 
1844674407370955162 
To 
5534023222112865484
Overview of Replication in Cassandra 
• Replication is controlled by what is called the replication 
factor. A replication factor of 1 means there is only one 
copy of a row in a cluster. A replication factor of 2 means 
there are two copies of a row stored in a cluster 
• Replication is controlled at the keyspace level in 
Cassandra 
Original row 
Copy of row
Cassandra Replication Strategies 
• Simple Strategy 
• Network Topology Strategy:
Simple Topology – Single Datacenter 
Second Replica 
RF=2 
First Replica 
{ 'class' : 'SimpleStrategy', 'replication_factor' : 2 }; 
23
Simple Topology – Single Datacenter 
Third Replica 
Second Replica 
RF=3 
First Replica 
{ 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; 
24
Network Topology – Multiple Datacenters 
©2013 DataStax Confidential. Do not distribute without consent. 
25 
DC1 
RF=2 
DC2 
RF=3 
CREATE KEYSPACE Test 
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', ’DC1' : 2, ’DC2' : 3};
Network Topology – Rack Awareness 
How is the location of each node 
©2013 DataStax Confidential. Do not distribute without consent. 
(Rack, Datacenter) known? 
26 
DC1 
RACK1 
DC1 
RACK1 
DC1 
RACK1 
DC1 
RACK2 
DC1 
RACK2 
Second Replica 
DC1 
RACK2 
RF=2 
First Replica
Snitches 
• A snitch determines which data centers and racks are 
written to and read from. 
• Snitches inform Cassandra about the network 
topology so that requests are routed efficiently and 
allows Cassandra to distribute replicas by grouping 
machines into data centers and racks. 
• Cassandra does its best not to have more than one 
replica on the same rack. 
©2013 DataStax Confidential. Do not distribute without consent. 
27
Snitch Types 
SimpleSnitch 
• Single-data center deployments (or single-zone in public clouds) 
RackInferringSnitch 
• Determines the location of nodes by rack and data center 
corresponding to the IP addresses 
PropertyFileSnitch 
• User-defined description of the network details 
• cassandra-topology.properties file 
GossipingPropertyFileSnitch 
• Defines a local node's data center and rack 
• Uses gossip for propagating this information to other nodes 
• cassandra-rackdc.properties 
Amazon Snitches 
• EC2Snitch 
• EC2MultiRegionSnitch 
©2013 DataStax Confidential. Do not distribute without consent. 
28
Gossip = Internode Communications 
• Gossip is a peer-to-peer communication protocol in 
which nodes periodically (every second) exchange 
information about themselves and about other nodes 
they know about. 
• Cassandra uses gossip to discover location and state 
information about the other nodes participating in a 
Cassandra cluster 
©2013 DataStax Confidential. Do not distribute without consent. 
29
Load Balancing 
• Each node handles client requests, but the balancing 
policy is configurable 
• Round Robin 
• DC-Aware Round Robin 
• Token-Aware 
©2013 DataStax Confidential. Do not distribute without consent. 
30
Round Robin 
CLIENT 
local Remote 
©2013 DataStax Confidential. Do not distribute without consent. 
31
DC Aware Round Robin 
CLIENT 
local Remote 
©2013 DataStax Confidential. Do not distribute without consent. 
32
DC Aware Round Robin 
CLIENT 
local Remote 
The client attempts to contact nodes in the local datacenter. 
©2013 DataStax Confidential. Do not distribute without consent. 
33
DC Aware Round Robin 
CLIENT 
local Remote 
Remote nodes are used when local nodes cannot be reached. 
©2013 DataStax Confidential. Do not distribute without consent. 
34
Token Aware 
CLIENT 
local Remote 
©2013 DataStax Confidential. Do not distribute without consent. 
35
Vnodes (Virtual Nodes) 
Instead of each node owning a single token range, Vnodes divide 
each node into many ranges (256). 
Vnodes simplify many tasks in Cassandra: 
• You no longer have to calculate and assign tokens to each node. 
• Rebalancing a cluster is no longer necessary when adding or 
removing nodes. 
• Rebuilding a dead node is faster because it involves every other 
node in the cluster. 
• Improves the use of heterogeneous machines in a cluster. You 
can assign a proportional number of vnodes to smaller and larger 
machines. 
©2013 DataStax Confidential. Do not distribute without consent. 
36
Reading and Writing to Cassandra Nodes 
• Cassandra has a ‘location independence’ architecture, 
which allows any user to connect to any node in any data 
center and read/write the data they need 
• All writes being partitioned and replicated for them 
automatically throughout the cluster
Writing Data 
RF=3 transaction 
©2013 DataStax Confidential. Do not distribute without consent. 
The client sends a mutation 
(insert/update/delete) to a 
node in the cluster. 
That node serves as the 
coordinator for this 
38
Writing Data 
©2013 DataStax Confidential. Do not distribute without consent. 
The coordinator 
forwards the update 
to all replicas. 
39 
RF=3
Writing Data 
©2013 DataStax Confidential. Do not distribute without consent. 
40 
The replicas 
acknowledge that 
data was written. 
RF=3
Writing Data 
©2013 DataStax Confidential. Do not distribute without consent. 
41 
And the coordinator 
sends a successful 
response to the client. 
RF=3
What if a node is down? 
RF=3 Write Consistency (WC) 
©2013 DataStax Confidential. Do not distribute without consent. 
Only two nodes respond. 
The client gets to choose if 
the write was successful. 
42
Tunable Data Consistency 
• Choose between strong and eventual consistency (one 
to all responding) depending on the need 
• Can be done on a per-operation basis, and for both 
reads and writes 
• Handles multi-data center operations 
Writes 
• Any 
• One 
• Quorum 
• Local_Quorum 
• Each_Quorum 
• All 
Reads 
• One 
• Quorum 
• Local_Quorum 
• Each_Quorum 
• All
Consistency 
• ANY 
Returns data from any of the replica. 
• QUORUM 
Returns the most recent data from the majority of replicas. 
• LOCAL QUORUM 
Returns the most recent data from the majority of local replicas. 
• ALL 
Returns the most recent data from all replicas. 
©2013 DataStax Confidential. Do not distribute without consent. 
44
Quorum means > 50% 
©2013 DataStax Confidential. Do not distribute without consent. 
45 
WC = QUORUM 
Will this write succeed? 
YES!! 
A majority of replicas 
received the mutation. 
RF=3
What if two nodes are down? 
©2013 DataStax Confidential. Do not distribute without consent. 
WC = QUORUM 
Will this write succeed? 
46 
NO 
Failed to write a 
majority of replicas. 
RF=3
Multi DC Writes 
©2013 DataStax Confidential. Do not distribute without consent. 
47 
DC1 
RF=3 
The coordinator forwards the 
mutation to local replicas and a 
remote coordinator. 
DC2 
RF=3
Multi DC Writes 
©2013 DataStax Confidential. Do not distribute without consent. 
48 
DC1 
RF=3 
The remote coordinator 
forwards the mutation to 
replicas in the remote DC 
DC2 
RF=3
Multi DC Writes 
©2013 DataStax Confidential. Do not distribute without consent. 
All replicas acknowledge the write. 
49 
DC1 
RF=3 
DC2 
RF=3
Hinted Handoff 
©2013 DataStax Confidential. Do not distribute without consent. 
50 
HINT 
The write is replayed 
when the target node 
comes online
Reading Data 
©2013 DataStax Confidential. Do not distribute without consent. 
The client sends a query to 
a node in the cluster. 
That node serves as the 
coordinator. 
52 
RF=3
Reading Data 
©2013 DataStax Confidential. Do not distribute without consent. 
53 
The coordinator 
forwards the query 
to all replicas. 
RF=3
Reading Data 
©2013 DataStax Confidential. Do not distribute without consent. 
54 
The replicas 
respond with data. 
RF=3
Reading Data 
©2013 DataStax Confidential. Do not distribute without consent. 
55 
And the coordinator 
returns the data to the 
client. 
RF=3
What if the nodes disagree? 
WRITE 
©2013 DataStax Confidential. Do not distribute without consent. 
56 
Data was written with 
QUORUM when one 
node was down. 
The write was 
successful, but that 
node missed the 
update. 
RF=3
What if the nodes disagree? 
READ 
©2013 DataStax Confidential. Do not distribute without consent. 
57 
Now the node is back 
online, and it responds 
to a read request. 
It has older data than 
the other replicas. 
RF=3
What if the nodes disagree? 
NEWEST 
©2013 DataStax Confidential. Do not distribute without consent. 
58 
The coordinator resolves the 
discrepancy and sends the 
newest data to the client. 
READ REPAIR 
The coordinator also notifies 
the “out of date” node that it 
has old data. 
The “out of date” node 
receives updated data from 
another replica. 
RF=3
Rapid Read Protection 
©2013 DataStax Confidential. Do not distribute without consent. 
59 
During a read, does 
the coordinator really 
forward the query to all 
replicas? 
That seems 
RF=3 unnecessary!
Rapid Read Protection 
©2013 DataStax Confidential. Do not distribute without consent. 
NO 
Cassandra performs only as 
many requests as necessary to 
meet the requested Consistency 
Level. 
Cassandra routes requests to the 
most-responsive replicas. 
60 
RF=3
Rapid Read Protection 
©2013 DataStax Confidential. Do not distribute without consent. 
61 
If a replica doesn’t 
respond quickly, 
Cassandra will try 
another node. 
This is known as an 
“eager retry” RF=3
Writes (what happens within each node) 
• Data is first written to a commit log for durability. Your data is safe 
in Cassandra 
• Then written to a memtable in memory 
• Once the memtable becomes full, it is flushed to an SSTable 
(sorted strings table) 
• Writes are atomic at the row level; all columns are written or 
updated, or none are. RDBMS-styled transactions are not 
supported 
INSERT INTO… 
Commit log memtable 
SSTable 
Cassandra is known for being the fastest database in the industry where 
write operations are concerned.
SSTables 
©2013 DataStax Confidential. Do not distribute without consent. 
63 
Commit Log 
Mem table 
Cassandra writes to a commit log 
and to memory. When the 
memtable is full, data is flushed to 
disk, creating an SSTable. 
Disk
Compaction 
©2013 DataStax Confidential. Do not distribute without consent. 
64 
Commit Log 
Mem table 
SSTables are cleaned up using 
compaction. Compaction creates a 
new combined SSTable and 
deletes the old ones. 
Disk
Reads (what happens within each node) 
• Depending on the frequency of inserts and updates, a record will 
exist in multiple places. Each place must be read to retrieve the 
entire record. 
• Data is read from the memtable in memory. 
• Multiple SSTables may also be read. 
• Bloom filters prevent excessive reading of SSTables. 
SELECT * FROM… 
memtable 
SSTable 
Bloom Filter 
SSTable SSTable SSTable
DataStax Distinctives
What is DataStax Enterprise? 
©2014 DataStax Confidential. Do not distribute without consent. 
Strong Data Protection 
In-Memory OLTP/Analytics 
Point-and-Click/Automated Mgmt 
Certified Production Cassandra 
Multi-Workload/Use Case Capable 
Integrated OLTP, Analytics, Search 
Dev. IDE& 
Drivers 
Security Analytics Search Visual 
Monitoring 
Management 
Services In-Memory 
Professional 
Services 
Support& 
Training
Cassandra Clients – API & Native Driver 
• CQL (Cassandra Query Language) is the primary API 
• Clients that use the native driver also have access to various policies that 
enable the client to intelligently route requests as required. 
• DataStax drivers: Java, Python, C#, C++, Ruby, Node.js (much more to 
come and in the community) 
• This includes: 
• Load Balancing 
• Data Centre Aware 
• Latency Aware 
• Token Aware 
• Reconnection policies 
• Retry policies 
• Downgrading Consistency 
• Plus others.. 
• http://www.datastax.com/download/clientdrivers 
©2014 DataStax Confidential. Do not distribute without consent. 
68
69 
Enterprise Security
Security in Cassandra 
BENEFITS FEATURES 
Internal Authentication 
Manages login IDs and 
passwords inside the 
database 
+ Ensures only authorized 
users can access a 
database system using 
internal validation 
+ Simple to implement and 
easy to understand 
+ No learning curve from 
the relational world 
Object Permission 
Management 
controls who has access to 
what and who can do what 
in the database 
+ Provides granular based 
control over who can 
add/change/delete/read 
data 
+ Uses familiar GRANT/ 
REVOKE from relational 
systems 
+ No learning curve 
Client to Node 
Encryption 
protects data in flight to 
and from a database 
cluster 
+ Ensures data cannot 
be captured/stolen in 
route to a server 
+ Data is safe both in 
flight from/to a 
database and on the 
database; complete 
coverage is ensured
Advanced Security in DataStax Enterprise 
BENEFITS FEATURES 
External Authentication 
uses external security 
software systems to control 
security 
+ Only authorized users 
have access to a database 
system using external 
validation 
+ Uses most trusted external 
security systems 
(Kerberos, LDAP, AD), 
mainstays in government 
and finance 
+ Single sign on to all data 
domains 
Transparent Data 
Encryption 
encrypts data at rest 
+ Protects sensitive data 
at rest from theft and 
from being read at the 
file system level 
+ No changes needed at 
application level 
Data Auditing 
provides trail of who did and 
looked at what/when 
+ Supplies admins with an 
audit trail of all accesses 
and changes 
+ Granular control to audit 
only what’s needed 
+ Uses log4j interface to 
ensure performance and 
efficient audit operations
72 
OpsCenter
DataStax OpsCenter 
• Visual, browser-based user 
interface negates need to install 
client software 
• Administration tasks carried out in 
point-and-click fashion 
• Allows for visual rebalance of data 
across a cluster when new nodes 
are added 
• Contains proactive alerts that warn 
of impending issues. 
• Built-in external notification 
abilities 
• Visually perform and schedule 
backup operations
DataStax OpsCenter 
A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter A new, 10-node DSE cluster with OpsCenter running on AruWnnSin gin i n3 3m mininuutteess…… 
1 2 3 Done
75 
Integrated Search
Enterprise Search 
• Built-in enterprise search on Cassandra data via Solr integration 
• Facets, Filtering, Geospatial search, Text Analysis, etc. 
• Near real-time search operations 
• Search queries from CQL and REST/Solr 
• Solr shortcomings: 
• No bottleneck. Client can read/write to any Solr node. 
• Search index partitioning and replication for scalability and availability. 
• Multi-DC support 
• Data durability (Solr lacks write-ahead log, data can be lost) 
76 
Cassandra 
Replication 
Customer 
Facing 
Search 
Nodes
77 
Integrated Batch Analytics
Batch Analytics - Hadoop 
• Integrated Hadoop 1.0.4 
• CFS (Cassandra File System) , no HDFS 
• No Single Point of failure 
• No Hadoop complexity – every node is built the same 
• Hive / Pig / Sqoop / Mahout 
©2014 DataStax Confidential. Do not distribute without consent. 
Cassandra 
Replication 
78 
Customer 
Facing 
Hadoop 
Nodes
79 
External Batch Analytics
External Batch Analytics - BYOH 
Bring Your Own Hadoop 
Hive 
Request 
External Hadoop 
Resource 
Manager 
• Analytics from external 
Hadoop distribution 
• Hadoop 2.x support 
• Certified with Cloudera, 
Hortonworks 
Cassandra 
Nodes
Integrated Near Real-Time Analytics 
81
Real-Time Analytics - Spark 
• Tight integration with Cassandra 
• Real-time Streaming 
• Distributed Processing 
• “In-memory Map/Reduce”, multi-thread, best for iterations 
• GraphX, MLLib, SparkSQL, Shark (Hive SQL like) 
• DataStax / Databricks partnership 
• 10x – 100x speed of MapReduce 
©2014 DataStax Confidential. Do not distribute without consent. 
Cassandra 
Replication 
82 
Customer 
Facing 
Spark 
Nodes
Spark Solr 
Cassandra Cluster – Nodes Ring – Column Family Storage 
High Performance – Alway Available – Massive Scalability 
Company Confidential 
DataStax Enterprise Products 
© 2014 DataStax, All Rights Reserved. 
Hadoop 
Offline 
Application 
DataStax Cassandra Enterprise External Hadoop Distribution 
Cloudera, Hortonworks 
OpsCenter 
Hadoop 
Monitoring 
Operations 
Operational 
Application 
Real Time 
Search 
Real Time 
Analytics 
Batch 
Analytics 
SGBDR 
Analytics 
Transformations 
83
How to test DataStax Cassandra ? 
• CCM, A development tool for creating local Cassandra clusters 
• http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating- 
local-cassandra-clusters 
• https://github.com/pcmanus/ccm 
• Using Vagrant for Local Cassandra Development (VM) 
• http://www.cantoni.org/2014/08/26/vagrant-cassandra 
• https://github.com/bcantoni/vagrant-cassandra 
©2013 DataStax Confidential. Do not distribute without consent. 
84 
• Sandbox 
• http://www.datastax.com/download#dl-sandbox 
• Cloud : Amazon AWS, Google Cloud, Microsoft Azure 
• DataStax Code Samples 
• https://github.com/DataStaxCodeSamples/
Find out more 
• DataStax: http://www.datastax.com 
• Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html 
• Training: http://www.datatstax.com/training 
• Downloads: http://www.datastax.com/download 
• Documentation: http://www.datastax.com/docs 
• Developer Blog: http://www.datastax.com/dev/blog 
• Community Site: http://planetcassandra.org 
• Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars 
• Summit Talks: http://planetcassandra.org/Learn/CassandraSummit 
©2014 DataStax Confidential. Do not distribute without consent. 
85
Q&A 
Thank 
You !

Mais conteúdo relacionado

Mais procurados

Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @MovileEiti Kimura
 
OpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfOpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfssuser1490e8
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLScyllaDB
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLconfluent
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Federico Razzoli
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 

Mais procurados (20)

Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
 
OpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfOpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdf
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 

Destaque

Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query languageCourtney Robinson
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 

Destaque (7)

Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 

Semelhante a Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at WildHacks NU

Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...Informatik Aktuell
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache CassandraAlex Thompson
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...jaxLondonConference
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranData Con LA
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
D108636GC10_les01.pptx
D108636GC10_les01.pptxD108636GC10_les01.pptx
D108636GC10_les01.pptxSuresh569521
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101Bernd Ocklin
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Aruman Cassandra database
Aruman Cassandra databaseAruman Cassandra database
Aruman Cassandra databaseUmesh Dande
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudHossein Riasati
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 

Semelhante a Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at WildHacks NU (20)

DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache Cassandra
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Devops kc
Devops kcDevops kc
Devops kc
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
D108636GC10_les01.pptx
D108636GC10_les01.pptxD108636GC10_les01.pptx
D108636GC10_les01.pptx
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Aruman Cassandra database
Aruman Cassandra databaseAruman Cassandra database
Aruman Cassandra database
 
Cassandra
CassandraCassandra
Cassandra
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the Cloud
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Último (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at WildHacks NU

  • 1. Apache Cassandra and Datastax Enterprise Peter Halliday Sr. Software Engineer peter@datastax.com @phalliday
  • 2. A Look at Apache Cassandra…
  • 3. How Many Have Heard of… ©2013 DataStax Confidential. Do not distribute without consent. 3 • DataStax • Apache Cassandra
  • 4. What is Apache Cassandra? Apache Cassandra™ is a massively scalable NoSQL database. Cassandra is designed to handle big data workloads across multiple data centers with no single point of failure, providing enterprises with continuous availability without compromising performance.
  • 5. Customer Industries • Financial • eCommerce • Marketing • Recommendation Engines • Health Sciences • Fraud Detection • Sensor Data • Online Games ©2013 DataStax Confidential. Do not distribute without consent. 5
  • 6. Why use Cassandra? 100,000 txns/sec 200,000 txns/sec 400,000 txns/sec • Masterless architecture. • Continuous availability. • Multi-data center and cloud availability zone support. • Flexible data model. • Linear scale performance. • Operationally simple. • CQL – SQL-like language.
  • 7. History of Cassandra • Amazon Dynamo partitioning and replication • Log-structured ColumnFamily data model similar to Bigtable's ©2013 DataStax Confidential. Do not distribute without consent. 7
  • 8. Cassandra Terminology • Node (Vnode) • Cluster • Datacenter • Partition • Gossip • Snitch • Replication Factor • Consistency ©2013 DataStax Confidential. Do not distribute without consent. 8
  • 9. Cassandra Node ©2013 DataStax Confidential. Do not distribute without consent. 9 A computer server running Cassandra
  • 10. Cassandra Cluster A group of Cassandra nodes working together ©2013 DataStax Confidential. Do not distribute without consent. 10
  • 11. Cassandra Datacenter DC1 DC2 ©2013 DataStax Confidential. Do not distribute without consent. 11
  • 12. Logical Datacenter DC1 DC2 DC3 ©2013 DataStax Confidential. Do not distribute without consent. 12
  • 13. Physical Datacenter ©2013 DataStax Confidential. Do not distribute without consent. 13
  • 14. Cloud-based Datacenters ©2013 DataStax Confidential. Do not distribute without consent. 14
  • 15. Cassandra Cluster Data is evenly distributed around the nodes in a cluster ©2013 DataStax Confidential. Do not distribute without consent. 15
  • 16. Overview of Data Partitioning in Cassandra There are two basic data partitioning strategies: 1. Random partitioning – this is the default and recommended strategy. Partitions data as evenly as possible across all nodes using a hash of every column family row key 2. Ordered partitioning – stores column family row keys in sorted order across the nodes in a database cluster
  • 17. Data Distribution and Partitioning • Each node “owns” a set of tokens • A node’s token range is manually configured or randomly assigned when the node joins the cluster • A partition key is defined for each table • The partitioner applies a hash function to convert a partition key to a token. This determines which node(s) store that piece of data. • By default, Cassandra users the Murmur3Partitioner ©2013 DataStax Confidential. Do not distribute without consent. 17
  • 18. What is a Hash Algorithm? • A function used to map data of arbitrary size to data of fixed size • Slight differences in input produce large differences in output Input MurmurHash 1 8213365047359667313 2 5293579765126103566 3 -155496620801056360 4 -663977588974966463 5 958005880272148645 The Murmur3Partitioner uses the MurmurHash function. This hashing function creates a 64-bit hash value of the row key. The possible range of hash values is from -263 to +263. ©2013 DataStax Confidential. Do not distribute without consent. 18
  • 19. Data Distribution ©2013 DataStax Confidential. Do not distribute without consent. -9223372036854775808 5534023222112865484 19 -5534023222112865485 1844674407370955161 -1844674407370955162 Each node is configured with an initial token. The initial token determines the token range owned by the node.
  • 20. 5534023222112865484 20 -9223372036854775808 -5534023222112865485 -1844674407370955162 1844674407370955161 Data Distribution This node owns the token range: 1844674407370955162 To 5534023222112865484
  • 21. Overview of Replication in Cassandra • Replication is controlled by what is called the replication factor. A replication factor of 1 means there is only one copy of a row in a cluster. A replication factor of 2 means there are two copies of a row stored in a cluster • Replication is controlled at the keyspace level in Cassandra Original row Copy of row
  • 22. Cassandra Replication Strategies • Simple Strategy • Network Topology Strategy:
  • 23. Simple Topology – Single Datacenter Second Replica RF=2 First Replica { 'class' : 'SimpleStrategy', 'replication_factor' : 2 }; 23
  • 24. Simple Topology – Single Datacenter Third Replica Second Replica RF=3 First Replica { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; 24
  • 25. Network Topology – Multiple Datacenters ©2013 DataStax Confidential. Do not distribute without consent. 25 DC1 RF=2 DC2 RF=3 CREATE KEYSPACE Test WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', ’DC1' : 2, ’DC2' : 3};
  • 26. Network Topology – Rack Awareness How is the location of each node ©2013 DataStax Confidential. Do not distribute without consent. (Rack, Datacenter) known? 26 DC1 RACK1 DC1 RACK1 DC1 RACK1 DC1 RACK2 DC1 RACK2 Second Replica DC1 RACK2 RF=2 First Replica
  • 27. Snitches • A snitch determines which data centers and racks are written to and read from. • Snitches inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into data centers and racks. • Cassandra does its best not to have more than one replica on the same rack. ©2013 DataStax Confidential. Do not distribute without consent. 27
  • 28. Snitch Types SimpleSnitch • Single-data center deployments (or single-zone in public clouds) RackInferringSnitch • Determines the location of nodes by rack and data center corresponding to the IP addresses PropertyFileSnitch • User-defined description of the network details • cassandra-topology.properties file GossipingPropertyFileSnitch • Defines a local node's data center and rack • Uses gossip for propagating this information to other nodes • cassandra-rackdc.properties Amazon Snitches • EC2Snitch • EC2MultiRegionSnitch ©2013 DataStax Confidential. Do not distribute without consent. 28
  • 29. Gossip = Internode Communications • Gossip is a peer-to-peer communication protocol in which nodes periodically (every second) exchange information about themselves and about other nodes they know about. • Cassandra uses gossip to discover location and state information about the other nodes participating in a Cassandra cluster ©2013 DataStax Confidential. Do not distribute without consent. 29
  • 30. Load Balancing • Each node handles client requests, but the balancing policy is configurable • Round Robin • DC-Aware Round Robin • Token-Aware ©2013 DataStax Confidential. Do not distribute without consent. 30
  • 31. Round Robin CLIENT local Remote ©2013 DataStax Confidential. Do not distribute without consent. 31
  • 32. DC Aware Round Robin CLIENT local Remote ©2013 DataStax Confidential. Do not distribute without consent. 32
  • 33. DC Aware Round Robin CLIENT local Remote The client attempts to contact nodes in the local datacenter. ©2013 DataStax Confidential. Do not distribute without consent. 33
  • 34. DC Aware Round Robin CLIENT local Remote Remote nodes are used when local nodes cannot be reached. ©2013 DataStax Confidential. Do not distribute without consent. 34
  • 35. Token Aware CLIENT local Remote ©2013 DataStax Confidential. Do not distribute without consent. 35
  • 36. Vnodes (Virtual Nodes) Instead of each node owning a single token range, Vnodes divide each node into many ranges (256). Vnodes simplify many tasks in Cassandra: • You no longer have to calculate and assign tokens to each node. • Rebalancing a cluster is no longer necessary when adding or removing nodes. • Rebuilding a dead node is faster because it involves every other node in the cluster. • Improves the use of heterogeneous machines in a cluster. You can assign a proportional number of vnodes to smaller and larger machines. ©2013 DataStax Confidential. Do not distribute without consent. 36
  • 37. Reading and Writing to Cassandra Nodes • Cassandra has a ‘location independence’ architecture, which allows any user to connect to any node in any data center and read/write the data they need • All writes being partitioned and replicated for them automatically throughout the cluster
  • 38. Writing Data RF=3 transaction ©2013 DataStax Confidential. Do not distribute without consent. The client sends a mutation (insert/update/delete) to a node in the cluster. That node serves as the coordinator for this 38
  • 39. Writing Data ©2013 DataStax Confidential. Do not distribute without consent. The coordinator forwards the update to all replicas. 39 RF=3
  • 40. Writing Data ©2013 DataStax Confidential. Do not distribute without consent. 40 The replicas acknowledge that data was written. RF=3
  • 41. Writing Data ©2013 DataStax Confidential. Do not distribute without consent. 41 And the coordinator sends a successful response to the client. RF=3
  • 42. What if a node is down? RF=3 Write Consistency (WC) ©2013 DataStax Confidential. Do not distribute without consent. Only two nodes respond. The client gets to choose if the write was successful. 42
  • 43. Tunable Data Consistency • Choose between strong and eventual consistency (one to all responding) depending on the need • Can be done on a per-operation basis, and for both reads and writes • Handles multi-data center operations Writes • Any • One • Quorum • Local_Quorum • Each_Quorum • All Reads • One • Quorum • Local_Quorum • Each_Quorum • All
  • 44. Consistency • ANY Returns data from any of the replica. • QUORUM Returns the most recent data from the majority of replicas. • LOCAL QUORUM Returns the most recent data from the majority of local replicas. • ALL Returns the most recent data from all replicas. ©2013 DataStax Confidential. Do not distribute without consent. 44
  • 45. Quorum means > 50% ©2013 DataStax Confidential. Do not distribute without consent. 45 WC = QUORUM Will this write succeed? YES!! A majority of replicas received the mutation. RF=3
  • 46. What if two nodes are down? ©2013 DataStax Confidential. Do not distribute without consent. WC = QUORUM Will this write succeed? 46 NO Failed to write a majority of replicas. RF=3
  • 47. Multi DC Writes ©2013 DataStax Confidential. Do not distribute without consent. 47 DC1 RF=3 The coordinator forwards the mutation to local replicas and a remote coordinator. DC2 RF=3
  • 48. Multi DC Writes ©2013 DataStax Confidential. Do not distribute without consent. 48 DC1 RF=3 The remote coordinator forwards the mutation to replicas in the remote DC DC2 RF=3
  • 49. Multi DC Writes ©2013 DataStax Confidential. Do not distribute without consent. All replicas acknowledge the write. 49 DC1 RF=3 DC2 RF=3
  • 50. Hinted Handoff ©2013 DataStax Confidential. Do not distribute without consent. 50 HINT The write is replayed when the target node comes online
  • 51. Reading Data ©2013 DataStax Confidential. Do not distribute without consent. The client sends a query to a node in the cluster. That node serves as the coordinator. 52 RF=3
  • 52. Reading Data ©2013 DataStax Confidential. Do not distribute without consent. 53 The coordinator forwards the query to all replicas. RF=3
  • 53. Reading Data ©2013 DataStax Confidential. Do not distribute without consent. 54 The replicas respond with data. RF=3
  • 54. Reading Data ©2013 DataStax Confidential. Do not distribute without consent. 55 And the coordinator returns the data to the client. RF=3
  • 55. What if the nodes disagree? WRITE ©2013 DataStax Confidential. Do not distribute without consent. 56 Data was written with QUORUM when one node was down. The write was successful, but that node missed the update. RF=3
  • 56. What if the nodes disagree? READ ©2013 DataStax Confidential. Do not distribute without consent. 57 Now the node is back online, and it responds to a read request. It has older data than the other replicas. RF=3
  • 57. What if the nodes disagree? NEWEST ©2013 DataStax Confidential. Do not distribute without consent. 58 The coordinator resolves the discrepancy and sends the newest data to the client. READ REPAIR The coordinator also notifies the “out of date” node that it has old data. The “out of date” node receives updated data from another replica. RF=3
  • 58. Rapid Read Protection ©2013 DataStax Confidential. Do not distribute without consent. 59 During a read, does the coordinator really forward the query to all replicas? That seems RF=3 unnecessary!
  • 59. Rapid Read Protection ©2013 DataStax Confidential. Do not distribute without consent. NO Cassandra performs only as many requests as necessary to meet the requested Consistency Level. Cassandra routes requests to the most-responsive replicas. 60 RF=3
  • 60. Rapid Read Protection ©2013 DataStax Confidential. Do not distribute without consent. 61 If a replica doesn’t respond quickly, Cassandra will try another node. This is known as an “eager retry” RF=3
  • 61. Writes (what happens within each node) • Data is first written to a commit log for durability. Your data is safe in Cassandra • Then written to a memtable in memory • Once the memtable becomes full, it is flushed to an SSTable (sorted strings table) • Writes are atomic at the row level; all columns are written or updated, or none are. RDBMS-styled transactions are not supported INSERT INTO… Commit log memtable SSTable Cassandra is known for being the fastest database in the industry where write operations are concerned.
  • 62. SSTables ©2013 DataStax Confidential. Do not distribute without consent. 63 Commit Log Mem table Cassandra writes to a commit log and to memory. When the memtable is full, data is flushed to disk, creating an SSTable. Disk
  • 63. Compaction ©2013 DataStax Confidential. Do not distribute without consent. 64 Commit Log Mem table SSTables are cleaned up using compaction. Compaction creates a new combined SSTable and deletes the old ones. Disk
  • 64. Reads (what happens within each node) • Depending on the frequency of inserts and updates, a record will exist in multiple places. Each place must be read to retrieve the entire record. • Data is read from the memtable in memory. • Multiple SSTables may also be read. • Bloom filters prevent excessive reading of SSTables. SELECT * FROM… memtable SSTable Bloom Filter SSTable SSTable SSTable
  • 66. What is DataStax Enterprise? ©2014 DataStax Confidential. Do not distribute without consent. Strong Data Protection In-Memory OLTP/Analytics Point-and-Click/Automated Mgmt Certified Production Cassandra Multi-Workload/Use Case Capable Integrated OLTP, Analytics, Search Dev. IDE& Drivers Security Analytics Search Visual Monitoring Management Services In-Memory Professional Services Support& Training
  • 67. Cassandra Clients – API & Native Driver • CQL (Cassandra Query Language) is the primary API • Clients that use the native driver also have access to various policies that enable the client to intelligently route requests as required. • DataStax drivers: Java, Python, C#, C++, Ruby, Node.js (much more to come and in the community) • This includes: • Load Balancing • Data Centre Aware • Latency Aware • Token Aware • Reconnection policies • Retry policies • Downgrading Consistency • Plus others.. • http://www.datastax.com/download/clientdrivers ©2014 DataStax Confidential. Do not distribute without consent. 68
  • 69. Security in Cassandra BENEFITS FEATURES Internal Authentication Manages login IDs and passwords inside the database + Ensures only authorized users can access a database system using internal validation + Simple to implement and easy to understand + No learning curve from the relational world Object Permission Management controls who has access to what and who can do what in the database + Provides granular based control over who can add/change/delete/read data + Uses familiar GRANT/ REVOKE from relational systems + No learning curve Client to Node Encryption protects data in flight to and from a database cluster + Ensures data cannot be captured/stolen in route to a server + Data is safe both in flight from/to a database and on the database; complete coverage is ensured
  • 70. Advanced Security in DataStax Enterprise BENEFITS FEATURES External Authentication uses external security software systems to control security + Only authorized users have access to a database system using external validation + Uses most trusted external security systems (Kerberos, LDAP, AD), mainstays in government and finance + Single sign on to all data domains Transparent Data Encryption encrypts data at rest + Protects sensitive data at rest from theft and from being read at the file system level + No changes needed at application level Data Auditing provides trail of who did and looked at what/when + Supplies admins with an audit trail of all accesses and changes + Granular control to audit only what’s needed + Uses log4j interface to ensure performance and efficient audit operations
  • 72. DataStax OpsCenter • Visual, browser-based user interface negates need to install client software • Administration tasks carried out in point-and-click fashion • Allows for visual rebalance of data across a cluster when new nodes are added • Contains proactive alerts that warn of impending issues. • Built-in external notification abilities • Visually perform and schedule backup operations
  • 73. DataStax OpsCenter A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter A new, 10-node DSE cluster with OpsCenter running on AruWnnSin gin i n3 3m mininuutteess…… 1 2 3 Done
  • 75. Enterprise Search • Built-in enterprise search on Cassandra data via Solr integration • Facets, Filtering, Geospatial search, Text Analysis, etc. • Near real-time search operations • Search queries from CQL and REST/Solr • Solr shortcomings: • No bottleneck. Client can read/write to any Solr node. • Search index partitioning and replication for scalability and availability. • Multi-DC support • Data durability (Solr lacks write-ahead log, data can be lost) 76 Cassandra Replication Customer Facing Search Nodes
  • 76. 77 Integrated Batch Analytics
  • 77. Batch Analytics - Hadoop • Integrated Hadoop 1.0.4 • CFS (Cassandra File System) , no HDFS • No Single Point of failure • No Hadoop complexity – every node is built the same • Hive / Pig / Sqoop / Mahout ©2014 DataStax Confidential. Do not distribute without consent. Cassandra Replication 78 Customer Facing Hadoop Nodes
  • 78. 79 External Batch Analytics
  • 79. External Batch Analytics - BYOH Bring Your Own Hadoop Hive Request External Hadoop Resource Manager • Analytics from external Hadoop distribution • Hadoop 2.x support • Certified with Cloudera, Hortonworks Cassandra Nodes
  • 81. Real-Time Analytics - Spark • Tight integration with Cassandra • Real-time Streaming • Distributed Processing • “In-memory Map/Reduce”, multi-thread, best for iterations • GraphX, MLLib, SparkSQL, Shark (Hive SQL like) • DataStax / Databricks partnership • 10x – 100x speed of MapReduce ©2014 DataStax Confidential. Do not distribute without consent. Cassandra Replication 82 Customer Facing Spark Nodes
  • 82. Spark Solr Cassandra Cluster – Nodes Ring – Column Family Storage High Performance – Alway Available – Massive Scalability Company Confidential DataStax Enterprise Products © 2014 DataStax, All Rights Reserved. Hadoop Offline Application DataStax Cassandra Enterprise External Hadoop Distribution Cloudera, Hortonworks OpsCenter Hadoop Monitoring Operations Operational Application Real Time Search Real Time Analytics Batch Analytics SGBDR Analytics Transformations 83
  • 83. How to test DataStax Cassandra ? • CCM, A development tool for creating local Cassandra clusters • http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating- local-cassandra-clusters • https://github.com/pcmanus/ccm • Using Vagrant for Local Cassandra Development (VM) • http://www.cantoni.org/2014/08/26/vagrant-cassandra • https://github.com/bcantoni/vagrant-cassandra ©2013 DataStax Confidential. Do not distribute without consent. 84 • Sandbox • http://www.datastax.com/download#dl-sandbox • Cloud : Amazon AWS, Google Cloud, Microsoft Azure • DataStax Code Samples • https://github.com/DataStaxCodeSamples/
  • 84. Find out more • DataStax: http://www.datastax.com • Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html • Training: http://www.datatstax.com/training • Downloads: http://www.datastax.com/download • Documentation: http://www.datastax.com/docs • Developer Blog: http://www.datastax.com/dev/blog • Community Site: http://planetcassandra.org • Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars • Summit Talks: http://planetcassandra.org/Learn/CassandraSummit ©2014 DataStax Confidential. Do not distribute without consent. 85