SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Cassandra     for
Python Developers

     Tyler Hobbs
History

    Open-sourced by Facebook   2008

    Apache Incubator           2009

    Top-level Apache project   2010

    DataStax founded           2010
Strengths

    Scalable
    – 2x Nodes == 2x Performance

    Reliable (Available)
    – Replication that works
    – Multi-DC support
    – No single point of failure
Strengths

    Fast
    – 10-30k writes/sec, 1-10k reads/sec

    Analytics
    – Integrated Hadoop support
Weaknesses

    No ACID transactions
    – Don't need these as often as you'd think

    Limited support for ad-hoc queries
    – You'll give these up anyway when sharding an RDBMS


    Generally complements another system
    – Not intended to be one-size-fits-all
Clustering

    Every node plays the same role
    – No masters, slaves, or special nodes
Clustering

             0

      50          10




      40          20

             30
Clustering
                       Key: “www.google.com”
             0

      50          10




      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                          Key: “www.google.com”
             0
                          md5(“www.google.com”)
      50           10

                                     14

      40           20

             30
                  Replication Factor = 3
Clustering
   Client can talk to any node
Data Model

    Keyspace
    – A collection of Column Families
    – Controls replication settings

    Column Family
    – Kinda resembles a table
ColumnFamilies

    Static
    – Object data

    Dynamic
    – Pre-calculated query results
Static Column Families
                   Users
   zznate    password: *    name: Nate


   driftx    password: *   name: Brandon


   thobbs    password: *    name: Tyler


   jbellis   password: *   name: Jonathan   site: riptano.com
Dynamic Column Families
                     Following
zznate    driftx:   thobbs:


driftx


thobbs    zznate:


jbellis   driftx:   mdennis:   pcmanus   thobbs:   xedin:   zznate
Dynamic Column Families

    Timeline of tweets by a user

    Timeline of tweets by all of the people a
    user is following

    List of comments sorted by score

    List of friends grouped by state
Pycassa


    Python client library for Cassandra

    Open Source (MIT License)
    – www.github.com/pycassa/pycassa

    Users
    – Reddit
    – ~10k github downloads of every version
Installing Pycassa

    easy_install pycassa
    – or pip
Basic Layout

    pycassa.pool
    – Connection pooling

    pycassa.columnfamily
    – Primary module for the data API

    pycassa.system_manager
    – Schema management
The Data API

    RPC-based API

    Rows are like a sorted list of (name,value)
    tuples
    – Like a dict, but sorted by the names
    – OrderedDicts are used to preserve sorting
Inserting Data
>>> from pycassa.pool import ConnectionPool
>>> from pycassa.columnfamily import ColumnFamily
>>>
>>> pool = ConnectionPool(“MyKeyspace”)
>>> cf = ColumnFamily(pool, “MyCF”)
>>>
>>> cf.insert(“key”, {“col_name”: “col_value”})
>>> cf.get(“key”)
{“col_name”: “col_value”}
Inserting Data
>>> columns = {“aaa”: 1, “ccc”: 3}
>>> cf.insert(“key”, columns)
>>> cf.get(“key”)
{“aaa”: 1, “ccc”: 3}
>>>
>>> # Updates are the same as inserts
>>> cf.insert(“key”, {“aaa”: 42})
>>> cf.get(“key”)
{“aaa”: 42, “ccc”: 3}
>>>
>>> # We can insert anywhere in the row
>>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4})
>>> cf.get(“key”)
{“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
Fetching Data
>>> cf.get(“key”)
{“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
>>>
>>> # Get a set of columns by name
>>> cf.get(“key”, columns=[“bbb”, “ddd”])
{“bbb”: 2, “ddd”: 4}
Fetching Data
>>> # Get a slice of columns
>>> cf.get(“key”, column_start=”bbb”,
...               column_finish=”ccc”)
{“bbb”: 2, “ccc”: 3}
>>>
>>> # Slice from “ccc” to the end
>>> cf.get(“key”, column_start=”ccc”)
{“ccc”: 3, “ddd”: 4}
>>>
>>> # Slice from “bbb” to the beginning
>>> cf.get(“key”, column_start=”bbb”,
...               column_reversed=True)
{“bbb”: 2, “aaa”: 42}
Fetching Data
>>> # Get the first two columns in the row
>>> cf.get(“key”, column_count=2)
{“aaa”: 42, “bbb”: 2}
>>>
>>> # Get the last two columns in the row
>>> cf.get(“key”, column_reversed=True,
...               column_count=2)
{“ddd”: 4, “ccc”: 3}
Fetching Multiple Rows
>>> columns = {“col”: “val”}
>>> cf.batch_insert({“k1”: columns,
...                  “k2”: columns,
...                  “k3”: columns})
>>>
>>> # Get multiple rows by name
>>> cf.multiget([“k1”,“k2”])
{“k1”: {”col”: “val”},
 “k2”: {“col”: “val”}}


>>> # You can get slices of each row, too
>>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
Fetching a Range of Rows
>>> # Get a generator over all of the rows
>>> for key, columns in cf.get_range():
...     print key, columns
“k1” {”col”: “val”}
“k2” {“col”: “val”}
“k3” {“col”: “val”}


>>> # You can get slices of each row
>>> cf.get_range(column_start=”bbb”) …
Fetching Rows by Secondary Index
>>> from pycassa.index import *
>>>
>>> # Build up our index clause to match
>>> exp = create_index_expression(“name”, “Joe”)
>>> clause = create_index_clause([exp])
>>> matches = users.get_indexed_slices(clause)
>>>
>>> # results is a generator over matching rows
>>> for key, columns in matches:
...     print key, columns
“13” {”name”: “Joe”, “nick”: “thatguy2”}
“257” {“name”: “Joe”, “nick”: “flowers”}
“98” {“name”: “Joe”, “nick”: “fr0d0”}
Deleting Data
>>>   # Delete a whole row
>>>   cf.remove(“key1”)
>>>
>>>   # Or selectively delete columns
>>>   cf.remove(“key2”, columns=[“name”, “date”])
Connection Management

    pycassa.pool.ConnectionPool
    – Takes a list of servers
        • Can be any set of nodes in your cluster
    – pool_size, max_retries, timeout
    – Automatically retries operations against other nodes
        • Writes are idempotent!
    – Individual node failures are transparent
    – Thread safe
Async Options

    eventlet
    – Just need to monkeypatch socket and threading

    Twisted
    – Use Telephus instead of Pycassa
    – www.github.com/driftx/telephus
    – Less friendly, documented, etc
Tyler Hobbs
        @tylhobbs
tyler@datastax.com

Mais conteúdo relacionado

Mais procurados

Python in the database
Python in the databasePython in the database
Python in the databasepybcn
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database ReplicationMehdi Valikhani
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessJon Haddad
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraDataStax
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
glance replicator
glance replicatorglance replicator
glance replicatoririx_jp
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 

Mais procurados (20)

Python in the database
Python in the databasePython in the database
Python in the database
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 
glance replicator
glance replicatorglance replicator
glance replicator
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 

Semelhante a Cassandra for Python Developers

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsTyler Hobbs
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyMark Needham
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerNeo4j
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189Mahmoud Samir Fayed
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousRussell Spitzer
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84Mahmoud Samir Fayed
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210Mahmoud Samir Fayed
 
The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210Mahmoud Samir Fayed
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 

Semelhante a Cassandra for Python Developers (20)

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Python database access
Python database accessPython database access
Python database access
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 Furious
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210
 
The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Cassandra for Python Developers

  • 1. Cassandra for Python Developers Tyler Hobbs
  • 2. History  Open-sourced by Facebook 2008  Apache Incubator 2009  Top-level Apache project 2010  DataStax founded 2010
  • 3. Strengths  Scalable – 2x Nodes == 2x Performance  Reliable (Available) – Replication that works – Multi-DC support – No single point of failure
  • 4. Strengths  Fast – 10-30k writes/sec, 1-10k reads/sec  Analytics – Integrated Hadoop support
  • 5. Weaknesses  No ACID transactions – Don't need these as often as you'd think  Limited support for ad-hoc queries – You'll give these up anyway when sharding an RDBMS  Generally complements another system – Not intended to be one-size-fits-all
  • 6. Clustering  Every node plays the same role – No masters, slaves, or special nodes
  • 7. Clustering 0 50 10 40 20 30
  • 8. Clustering Key: “www.google.com” 0 50 10 40 20 30
  • 9. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 10. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 11. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 12. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
  • 13. Clustering  Client can talk to any node
  • 14. Data Model  Keyspace – A collection of Column Families – Controls replication settings  Column Family – Kinda resembles a table
  • 15. ColumnFamilies  Static – Object data  Dynamic – Pre-calculated query results
  • 16. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
  • 17. Dynamic Column Families Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 18. Dynamic Column Families  Timeline of tweets by a user  Timeline of tweets by all of the people a user is following  List of comments sorted by score  List of friends grouped by state
  • 19. Pycassa  Python client library for Cassandra  Open Source (MIT License) – www.github.com/pycassa/pycassa  Users – Reddit – ~10k github downloads of every version
  • 20. Installing Pycassa  easy_install pycassa – or pip
  • 21. Basic Layout  pycassa.pool – Connection pooling  pycassa.columnfamily – Primary module for the data API  pycassa.system_manager – Schema management
  • 22. The Data API  RPC-based API  Rows are like a sorted list of (name,value) tuples – Like a dict, but sorted by the names – OrderedDicts are used to preserve sorting
  • 23. Inserting Data >>> from pycassa.pool import ConnectionPool >>> from pycassa.columnfamily import ColumnFamily >>> >>> pool = ConnectionPool(“MyKeyspace”) >>> cf = ColumnFamily(pool, “MyCF”) >>> >>> cf.insert(“key”, {“col_name”: “col_value”}) >>> cf.get(“key”) {“col_name”: “col_value”}
  • 24. Inserting Data >>> columns = {“aaa”: 1, “ccc”: 3} >>> cf.insert(“key”, columns) >>> cf.get(“key”) {“aaa”: 1, “ccc”: 3} >>> >>> # Updates are the same as inserts >>> cf.insert(“key”, {“aaa”: 42}) >>> cf.get(“key”) {“aaa”: 42, “ccc”: 3} >>> >>> # We can insert anywhere in the row >>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4}) >>> cf.get(“key”) {“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
  • 25. Fetching Data >>> cf.get(“key”) {“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4} >>> >>> # Get a set of columns by name >>> cf.get(“key”, columns=[“bbb”, “ddd”]) {“bbb”: 2, “ddd”: 4}
  • 26. Fetching Data >>> # Get a slice of columns >>> cf.get(“key”, column_start=”bbb”, ... column_finish=”ccc”) {“bbb”: 2, “ccc”: 3} >>> >>> # Slice from “ccc” to the end >>> cf.get(“key”, column_start=”ccc”) {“ccc”: 3, “ddd”: 4} >>> >>> # Slice from “bbb” to the beginning >>> cf.get(“key”, column_start=”bbb”, ... column_reversed=True) {“bbb”: 2, “aaa”: 42}
  • 27. Fetching Data >>> # Get the first two columns in the row >>> cf.get(“key”, column_count=2) {“aaa”: 42, “bbb”: 2} >>> >>> # Get the last two columns in the row >>> cf.get(“key”, column_reversed=True, ... column_count=2) {“ddd”: 4, “ccc”: 3}
  • 28. Fetching Multiple Rows >>> columns = {“col”: “val”} >>> cf.batch_insert({“k1”: columns, ... “k2”: columns, ... “k3”: columns}) >>> >>> # Get multiple rows by name >>> cf.multiget([“k1”,“k2”]) {“k1”: {”col”: “val”}, “k2”: {“col”: “val”}} >>> # You can get slices of each row, too >>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
  • 29. Fetching a Range of Rows >>> # Get a generator over all of the rows >>> for key, columns in cf.get_range(): ... print key, columns “k1” {”col”: “val”} “k2” {“col”: “val”} “k3” {“col”: “val”} >>> # You can get slices of each row >>> cf.get_range(column_start=”bbb”) …
  • 30. Fetching Rows by Secondary Index >>> from pycassa.index import * >>> >>> # Build up our index clause to match >>> exp = create_index_expression(“name”, “Joe”) >>> clause = create_index_clause([exp]) >>> matches = users.get_indexed_slices(clause) >>> >>> # results is a generator over matching rows >>> for key, columns in matches: ... print key, columns “13” {”name”: “Joe”, “nick”: “thatguy2”} “257” {“name”: “Joe”, “nick”: “flowers”} “98” {“name”: “Joe”, “nick”: “fr0d0”}
  • 31. Deleting Data >>> # Delete a whole row >>> cf.remove(“key1”) >>> >>> # Or selectively delete columns >>> cf.remove(“key2”, columns=[“name”, “date”])
  • 32. Connection Management  pycassa.pool.ConnectionPool – Takes a list of servers • Can be any set of nodes in your cluster – pool_size, max_retries, timeout – Automatically retries operations against other nodes • Writes are idempotent! – Individual node failures are transparent – Thread safe
  • 33. Async Options  eventlet – Just need to monkeypatch socket and threading  Twisted – Use Telephus instead of Pycassa – www.github.com/driftx/telephus – Less friendly, documented, etc
  • 34. Tyler Hobbs @tylhobbs tyler@datastax.com