More Related Content Similar to C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts (20) More from DataStax Academy (20) C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts1. © Health Market Science 2013, All Rights Reserved
Isaac Rieksts
Software Developer
@IsaacRieksts, irieksts@gmail.com
CROSSING THE CHASM
SQL to NOSQL
#Cassandra13
2. © Health Market Science 2013, All Rights Reserved
Our Mission
§ Deliver the most current information on the U.S. healthcare
provider universe using integrated solutions in order for
customers to:
› Prevent fraud, waste and abuse across the healthcare system
› Comply with evolving state and federal regulations
› Improve market opportunity for non retail drugs and devices
#Cassandra13
3. © Health Market Science 2013, All Rights Reserved
The Business
Business
SolutionsHealth Care Provider & Facilities
Variety/Velocity
• >2000 of sources
• 6 Million unique HCPs
• 10+ years history
Data Challenges
• Constant change in real
world data
• Conflicting & partial info
• Frequent changes to
source structure
• Authoritative sources vs.
crowdsource
• Predicting source quality
Master Data Solutions
Medical Procedures & Diagnosis
Volume/Velocity
• ~1B claims annually
• +5B records annually
• 5+ years history
Data Challenges
• Sources have
incomplete capture
• Overlapping source data
• Statistical projections &
biases
• Social media type
relationships
Medical Claims Data
Batch
(CompleteView,
Expense Manager,
CompleteSpend)
Transactional
(PRS/PE)
Big Data
Relational DB &
Analytics
(Claims)
#Cassandra13
4. © Health Market Science 2013, All Rights Reserved
Master Data Management
Visualization
Dashboard / Reports
Structured Storage
RelationalIndexing
Flexible Storage
NoSQL Graph(s)
Interfacing
Web Services
Distributed Processing
Standardize
Validate
Match
Consolidate
Analytics
Data Sources
Government
Web
Customer
I’m happy
User Interface
#Cassandra13
5. © Health Market Science 2013, All Rights Reserved
Consolidation
First Name: John
Middle Name: David
Last Name: Smith
First Name: Mike
Middle Name: Steve
Last Name: Smith
First Name: Mike
Middle Name: David
Last Name: Smith
#Cassandra13
6. © Health Market Science 2013, All Rights Reserved
Legacy System
§ Relational DB
§ Jboss
§ Jboss MQ
§ 1 Week to process a record through the system
#Cassandra13
7. © Health Market Science 2013, All Rights Reserved
Our Solutions
Business
Needs
Finance & LegalBusiness SystemsComplianceSales & Marketing
Solutions
ComplianceData Assessment, Integration, &
Outsourcing
Enrichment Services
Provider Data
01010011
Market
Intelligence
HMS
Authoritative
Sources
PDC Federal StateMedical Claims Web Derived
Advanced
Technology
Storm
HMS MDM
#Cassandra13
8. © Health Market Science 2013, All Rights Reserved
Data Model
§ Think of full entity
§ Build entity as you go
§ Get full view upon fetch
§ Choose PK carefully
#Cassandra13
9. © Health Market Science 2013, All Rights Reserved
Cassandra-Indexing
§ Fast wide row alternate key for Cassandra
§ Two row pull process
› Fetch PKs matching AK
› Use PK to fetch your data
https://github.com/hmsonline/cassandra-indexing
#Cassandra13
10. © Health Market Science 2013, All Rights Reserved
Cassandra-Indexing
§ Key: Col1:Col2
§ Index: Col2:Col1
https://github.com/hmsonline/cassandra-indexing
#Cassandra13
11. © Health Market Science 2013, All Rights Reserved
Cassandra-Indexing Example
§ Key: <First Name>:<Last Name>
§ Index: <Last Name>:<First Name>
§ Data
› John:Smith
› Steve:Smith
› David:Jones
§ Index fetch “Smith” => John:Smith, Steve:Smith
§ Index fetch “Jones” => David:Jones
https://github.com/hmsonline/cassandra-indexing
#Cassandra13
12. © Health Market Science 2013, All Rights Reserved
System Phase 1
#Cassandra13
13. © Health Market Science 2013, All Rights Reserved
System Phase 2
#Cassandra13
14. © Health Market Science 2013, All Rights Reserved
System Phase 3
#Cassandra13
15. © Health Market Science 2013, All Rights Reserved
Oracle Advanced Queue
§ Integrate Relation DB and JMS
§ Near Real time processing of data
› Table trigger
§ Bulk exports
› Keep only what you need on the queue
#Cassandra13
16. © Health Market Science 2013, All Rights Reserved
Oracle Advanced Queue (cont)
§ Distributed processing
› Write to Cassandra as of queue time
› Write only ids and query back for data
#Cassandra13
17. © Health Market Science 2013, All Rights Reserved
Unit testing
§ Module level
› In memory mock
› Map<String, Map<String, Map<String, Map<String, String>>>>
› Map<Keyspace, Map<Column Family, Map<Column, Map<Row
Key, Value>>>>
§ Integration
› Embedded Cassandra super class
› Schema migration
#Cassandra13
18. © Health Market Science 2013, All Rights Reserved
QA
§ Fail fast and early
§ SoapUI and Maven
#Cassandra13
19. © Health Market Science 2013, All Rights Reserved
Organization Design
§ Project Manager
§ Business Analyst
§ Quality Assurance
§ Software Developer
§ Development Operations
#Cassandra13
20. © Health Market Science 2013, All Rights Reserved
Devops
§ Virtual Hardware (VMware)
§ Puppet
› Puppet Master
› Jenkins
§ Promote using config
› Same script run in DEV as in Prod
#Cassandra13
21. © Health Market Science 2013, All Rights Reserved
Real-time System
Kafka
Queue(s)
Offset
C*
A
BC
C* ES1
Kafka
Elastic
Search
ES2
C*
REST API
#Cassandra13
22. © Health Market Science 2013, All Rights Reserved
Storm
• Guaranteed once semantics
• Well-designed processing abstraction
• Beats BYODP
• Momentum
#Cassandra13
23. © Health Market Science 2013, All Rights Reserved
Storm and Cassandra
§ Use Cases:
› Write Storm Tuple data to C*
§ Computation Results
§ Pre-computed indices
› Read data from C* and emit Storm Tuples
§ Dynamic Lookups
http://github.com/hmsonline/storm-cassandra
#Cassandra13
24. © Health Market Science 2013, All Rights Reserved
Storm-Cassandra Project
§ ColumnsMapper Interface
› Tells the CassandraLookupBolt how to transform a C* row into a
Storm Tuple
§ Given a C* Row Key and list of Columns:
› Return a list of Storm Tuples
http://github.com/hmsonline/storm-cassandra
#Cassandra13
25. © Health Market Science 2013, All Rights Reserved
Vision
Engine
• Unpredictable schema/
layout
• Expand data storage
structure dynamically
• Fuzzy Search
Unstructured Data
• Traversing relationships
• Building connections
• Real time relationship
changes
Graph Database
• Traditional data base
• Predictable, logical structure
• Faceted Search
Structured Data
• Scalability
• Performance
• Processing power
• Virtual grow/shrink
Distributed Processing
Data
#Cassandra13
26. © Health Market Science 2013, All Rights Reserved
Summary
§ Cassandra-Indexing
§ Oracle Advanced Queue
§ Storm-Cassandra
#Cassandra13
27. © Health Market Science 2013, All Rights Reserved
THE SCIENCE OF
BETTER RESULTS
www.healthmarketscience.com
2700 Horizon Drive • King of Prussia, PA 19406 • 800.593.4467 • info@healthmarketscience.com
Questions?
#Cassandra13