3. “Writing is one of the most effective tools
available to develop a student's critical thinking.”
Why A Writing Space?
4. • Efficient Administration Of Writing Assignments
• Scalable Classrooms (500+)
• Workflow Optimization / Automation
• Integrated Access to Assessment Tools
o Grammar Checking
o Auto-Scoring
o Plagiarism Detection (Source Check)
• Grading Rubrics
• Online Editing and Document Upload
• Peer Review
• Group Projects
The Business Needs
5. • Highly "Internet" Scalable
• Global Presence
• Continuous Availability (Fault Tolerance)
• Broad OS And Browser Support
• Mobile Device Support - "Mobile First"
• Low Cost (Systems, Maintenance, Integration)
• Write Once, Integrate “Anywhere”
• Gain Experience With Modern NoSQL Technologies
• REST Service-Based Architecture
• Model UI
The Technical Goals
9. • Highly Scalable
• Easy Multi-Data Center Support
• Performance
• Distributed Ring Configuration (Master-less)
• Dynamic Schema, “Schema-less”
• Slice Queries
What We Like
10. • Eventual / Tunable Consistency
• Key-Name-Value Data Store (Column Based)
• Data Modeling Based On Core Queries
• All Rows in a CF Typically Don't Live On 1 Server
• However, All Columns For a Row Do
• RDBMS Mindset
• No Ad Hoc Queries
What Challenged Us
11. What Is Consistency?
• Write Consistency: Number Of Replicas Written To
• Read Consistency: Number Of Replicas Queried
• Replication Factor: Number Of Replicas For A Row
• Quorum Consistency Level (Read And Write):
o Option In Specifying Read/Write Consistency
o (Replication_Factor / 2) + 1
o Ensures Strong Consistency
o While Maintaining High Availability
• With 4 Servers, Writing Space uses:
o Replication Factor = 3
o Read and Write Quorum Consistency
12. Typical RDBMS Features Not Available (Yet):
• Referential Integrity Constraints / Foreign Keys
• Commit / Rollback
• Stored Procedures
• Joins
• Views
• Triggers
• Functions
• Security Privileges
• Rules
• Partitioned Table Definitions
What's Not In Cassandra...
17. The Hardware
• Many Inexpensive Servers (Actually 4 + 1)
• Our Configuration:
Processor: Xeon E5630, 2.53GHz, 4 Cores
Memory: 96 GB
Storage:
Two Mirrored Spinning Disks For OS / Binaries
Three Striped 480GB Solid State Drives
(Providing 1.3 TB Local DB Storage)
• Peer to Peer Ring
• Hot Swappable - Fault Tolerant
• "What's Your Insurance Company?"
18. Why DataStax Cassandra?
• A Certified, Production Ready Version Of Cassandra
• 24/7 World Class Support
• Integration With Hadoop
• Integration With Solr
• OpsCenter (Multi-Data Center Management Tool)
19. • Doc Store and UI
• Load: 3x Anticipated Load
• Total Time Of Run: 1.75 hours
• Max Document Size: 10k (25k, 50k and 75k DS)
Results
Average Response Time: < 300ms
Maximum Running Vusers: 684
Total Throughput (bytes): 7,176,727,121
Average Throughput (bytes/sec): 1,993,535
Total Hits: 342,833
Average Hits per Second: 95
DB Server CPU < 0.3%
Performance
20. • Document Store only
• Load: 100x Anticipated Load
• Total Time Of Run: 1 hour
• Document Size: 25k, 50k and 75k
Results
Average Response Time: < 100ms
Maximum Running Vusers: 2,200
Total Throughput (bytes): 2,291,522,553
Average Throughput (bytes/sec): 565,808
Total Hits: 834,640
Average Hits per Second: 206
DB Server CPU < 1%
Performance
22. Cloud Decision Points
• Cost Savings
• Continuous Availability
• Performance / Dynamic (Elastic) Scalability
• Global Distribution Of Access Points
• Redundancy
• Disaster Recovery
• Resiliency To Node / Connectivity Loses A Must
23. • Think About Reporting Up Front
• Data Analytics – Hadoop and Solr Are Heavy Duty
• More Expensive Hardware?
• Different RAID Configuration (Not Striping)
• Get Training – Especially About Schema Design
What Would We Do Differently?
24. Consider The Human Element...
• Mind Shift For RDBMS Folks
• Need To “Let Go” That Data Needs To Be Normalized
• Experience Of Operations Team
• Netflix - 4 People Managing 800+ Nodes
Global Enterprise
• Global Presence
• Disaster Recovery
• Internet Scale
Final Thoughts...