SlideShare uma empresa Scribd logo
1 de 21
Building a Cassandra based
 application from scratch
        Patrick McFadin
    Cassandra Summit 2012
         #cassandra12
This is me
• Chief Architect at Hobsons
  – Hobsons is an education services company. More
    here: www.hobsons.com
• Cassandra user since .7
• Follow me here: @PatrickMcFadin
Goals
•   Take a new concept
•   What’s the data model?!?!
•   Some sample code
•   You get homework! (If you want)
Here’s the plan
•   Conceptualize a new application
•   Identify the entity tables
•   Identify query tables
•   Code. Rinse. Repeat.
•   Deploy
•   …
•   Profit!
           * I’ll be using the term Tables which is equivalent to Column Families
www.killrvideos.com

                                                    Video Tit le                        User name
                                      Recommended
                                                                   D ipt ion
                                                                    escr




Start with a                                              Meow
                                                                                                       Ads
  concept                                                                                           by Google




 Video Sharing Website                              Rat ing:                   Tags: Foo Bar



                                                                                                    Upload New!
                                                                   Comment s




*Cat drawing by goodrob13 on Flickr
Break down the features
•   Post a video*
•   View a video
•   Add a comment
•   Rate a video
•   Tag a video


     * Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
Create Entity Tables

  Basic storage unit
Users
                       password        FirstName        LastName
      Username




•   Similar to a RDBMS table. Fairly fixed columns
•   Username is unique
•   Use secondary indexes on firstname and lastname for lookup
•   Adding columns with Cassandra is super easy




                              CREATE TABLE users (
                                username varchar PRIMARY KEY,
                                firstname varchar,
                                lastname varchar,
                                password varchar
                              );
Users: The set code
static void setUser(User user, Keyspace keyspace) {

    // Create a mutator that allows you to talk to casssandra
    Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

    try {

       // Use the mutator to insert data into our table
       mutator.addInsertion(user.getUsername(), "users",
          HFactory.createStringColumn("firstname", user.getFirstname()));
       mutator.addInsertion(user.getUsername(), "users”,
          HFactory.createStringColumn("lastname", user.getLastname()));
       mutator.addInsertion(user.getUsername(), "users",
          HFactory.createStringColumn("password", user.getPassword()));

       // Once the mutator is ready, execute on cassandra
       mutator.execute();

    } catch (HectorException he) {
       he.printStackTrace();
    }
}




                                                            You can implement the get…
Videos
                                   UserName      Description   Tags
    VideoId       VideoName
    <UUID>



•   Use a UUID as a row key for uniqueness
•   Allows for same video names
•   Tags should be stored in some sort of delimited format
•   Index on username may not be the best plan


                        CREATE TABLE videos (
                          videoid uuid PRIMARY KEY,
                          videoname varchar,
                          username varchar,
                          description varchar,
                          tags varchar
                        );
Videos: The get code
static Video getVideoByUUID(UUID videoId, Keyspace keyspace){

    Video video = new Video();

    //Create a slice query. We'll be getting specific column names
    SliceQuery<UUID, String, String> sliceQuery =
       HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer);

    sliceQuery.setColumnFamily("videos");
    sliceQuery.setKey(videoId);

    sliceQuery.setColumnNames("videoname","username","description","tags");

    // Execute the query and get the list of columns
    ColumnSlice<String,String> result = sliceQuery.execute().get();

    // Get each column by name and add them to our video object
    video.setVideoName(result.getColumnByName("videoname").getValue());
    video.setUsername(result.getColumnByName("username").getValue());
    video.setDescription(result.getColumnByName("description").getValue());
    video.setTags(result.getColumnByName("tags").getValue().split(","));

    return video;
}


                                                                  You can implement the set…
Comments
     VideoId      Username:<timestamp>           ..        Username:<timestamp>

     <UUID>


                    Time Order
•   Videos have many comments
•   Use Composite Columns to store user and time
•   Value of each column is the text of the comment
•   Order is as inserted
•   Use getSlice() to pull some or all of the comments

                                         CREATE TABLE comments (
                                           videoid uuid PRIMARY KEY
                                           comment varchar
                                         );
Rating a video
                             rating_count         rating_total
             VideoId
             <UUID>           <counter>             <counter>




• Use counter for single call update
• rating_count is how many ratings were given
• rating_total is the sum of rating
• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6


                                  CREATE TABLE video_rating (
                                    videoid uuid PRIMARY KEY,
                                    rating_counter counter,
                                    rating_total counter);*

                                  * Only valid in CQL 3+
Video Event
                       start_<timestamp>   stop_<timestamp>    start_<timestamp>
    VideoId:Username
                                           video_<timestamp>


                          Time Order
•     Track viewing events
•     Combine Video ID and Username for a unique row
•     Stop time can be used to pick up where they left off
•     Great for usage analytics later




                                    CREATE TABLE video_event (
                                      videoid_username varchar PRIMARY KEY,
                                      event varchar
                                    );
Create Query Tables

Indexes to support fast lookups
Lookup Video By Username
                      VideoId:<timestamp>       ..           VideoId:<timestamp>
       Username




•   Username is unique
•   One column for each new video uploaded
•   Column slice for time span. From x to y
•   VideoId is added the same time a Video record is added




                               CREATE TABLE username_video_index (
                                 username varchar PRIMARY KEY,
                                 videoid_timestamp varchar
                               );
Videos by Tag
                         VideoId                ..              VideoId
         tag




•   Tag is unique regardless of video
•   Great for “List videos with X tag”
•   Tags have to be updated in Video and Tag at the same time
•   Index integrity is maintained in app logic



                                   CREATE TABLE tag_index (
                                     tag varchar PRIMARY KEY,
                                     videoid varchar
                                   );
Deployment strategies
• Measure your risk
  – Replication factor?
  – Multi-datacenter?
  – Cost?
• Performance
  – Today != tomorrow. Scale when needed
  – Have a expansion plan ready
Wrap up
• Similar data model process to RDBMS… to
  start
• Query -> Index table
• Don’t be afraid to write in multiple tables at
  once
• Bonus points: Hadoop and Solr!
Go play!
•   Go to: http://github.com/pmcfadin
•   Look for projects with cassandra12
•   Clone or fork my examples
•   Implement stubbed methods
•   Send me your solutions: pmcfadin@gmail.com
•   Follow me for updates: @PatrickMcFadin
Thank You!


Connect with me at @PatrickMcFadin
            Or linkedIn
   Conference tag #cassandra12

Mais conteúdo relacionado

Destaque

Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesCal Henderson
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)leembtoleem
 
Servlet 3.1 Async I/O
Servlet 3.1 Async I/OServlet 3.1 Async I/O
Servlet 3.1 Async I/OSimone Bordet
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Rand Fishkin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 

Destaque (15)

Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Big Data at Riot Games
Big Data at Riot GamesBig Data at Riot Games
Big Data at Riot Games
 
居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)居住正義論壇II主題三(地政局)
居住正義論壇II主題三(地政局)
 
Servlet 3.1 Async I/O
Servlet 3.1 Async I/OServlet 3.1 Async I/O
Servlet 3.1 Async I/O
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
Karel čapek
Karel čapekKarel čapek
Karel čapek
 
Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011Inbound Marketing for Startups in 2011
Inbound Marketing for Startups in 2011
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 

Mais de Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 

Mais de Patrick McFadin (17)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 

Último

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Cassandra Summit 2012 - Building a Cassandra Based App From Scratch

  • 1. Building a Cassandra based application from scratch Patrick McFadin Cassandra Summit 2012 #cassandra12
  • 2. This is me • Chief Architect at Hobsons – Hobsons is an education services company. More here: www.hobsons.com • Cassandra user since .7 • Follow me here: @PatrickMcFadin
  • 3. Goals • Take a new concept • What’s the data model?!?! • Some sample code • You get homework! (If you want)
  • 4. Here’s the plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • Deploy • … • Profit! * I’ll be using the term Tables which is equivalent to Column Families
  • 5. www.killrvideos.com Video Tit le User name Recommended D ipt ion escr Start with a Meow Ads concept by Google Video Sharing Website Rat ing: Tags: Foo Bar Upload New! Comment s *Cat drawing by goodrob13 on Flickr
  • 6. Break down the features • Post a video* • View a video • Add a comment • Rate a video • Tag a video * Not talking about transcoding! Check out zencoder.com, it’s pretty sweet.
  • 7. Create Entity Tables Basic storage unit
  • 8. Users password FirstName LastName Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar PRIMARY KEY, firstname varchar, lastname varchar, password varchar );
  • 9. Users: The set code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } } You can implement the get…
  • 10. Videos UserName Description Tags VideoId VideoName <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid PRIMARY KEY, videoname varchar, username varchar, description varchar, tags varchar );
  • 11. Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. We'll be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; } You can implement the set…
  • 12. Comments VideoId Username:<timestamp> .. Username:<timestamp> <UUID> Time Order • Videos have many comments • Use Composite Columns to store user and time • Value of each column is the text of the comment • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid PRIMARY KEY comment varchar );
  • 13. Rating a video rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid PRIMARY KEY, rating_counter counter, rating_total counter);* * Only valid in CQL 3+
  • 14. Video Event start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Time Order • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later CREATE TABLE video_event ( videoid_username varchar PRIMARY KEY, event varchar );
  • 15. Create Query Tables Indexes to support fast lookups
  • 16. Lookup Video By Username VideoId:<timestamp> .. VideoId:<timestamp> Username • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar PRIMARY KEY, videoid_timestamp varchar );
  • 17. Videos by Tag VideoId .. VideoId tag • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar PRIMARY KEY, videoid varchar );
  • 18. Deployment strategies • Measure your risk – Replication factor? – Multi-datacenter? – Cost? • Performance – Today != tomorrow. Scale when needed – Have a expansion plan ready
  • 19. Wrap up • Similar data model process to RDBMS… to start • Query -> Index table • Don’t be afraid to write in multiple tables at once • Bonus points: Hadoop and Solr!
  • 20. Go play! • Go to: http://github.com/pmcfadin • Look for projects with cassandra12 • Clone or fork my examples • Implement stubbed methods • Send me your solutions: pmcfadin@gmail.com • Follow me for updates: @PatrickMcFadin
  • 21. Thank You! Connect with me at @PatrickMcFadin Or linkedIn Conference tag #cassandra12

Notas do Editor

  1. I’ll be using the term Tables instead of Column Family during this presentation.
  2. Comp Columns. Two different types
  3. Example shows use of a counter