SlideShare a Scribd company logo
1 of 27
Download to read offline
© Health Market Science 2013, All Rights Reserved
Isaac Rieksts
Software Developer
@IsaacRieksts, irieksts@gmail.com
CROSSING THE CHASM
SQL to NOSQL
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Our Mission
§ Deliver the most current information on the U.S. healthcare
provider universe using integrated solutions in order for
customers to:
›  Prevent fraud, waste and abuse across the healthcare system
›  Comply with evolving state and federal regulations
›  Improve market opportunity for non retail drugs and devices
#Cassandra13
© Health Market Science 2013, All Rights Reserved
The Business
Business
SolutionsHealth Care Provider & Facilities
Variety/Velocity
•  >2000 of sources
•  6 Million unique HCPs
•  10+ years history
Data Challenges
•  Constant change in real
world data
•  Conflicting & partial info
•  Frequent changes to
source structure
•  Authoritative sources vs.
crowdsource
•  Predicting source quality
Master Data Solutions
Medical Procedures & Diagnosis
Volume/Velocity
•  ~1B claims annually
•  +5B records annually
•  5+ years history
Data Challenges
•  Sources have
incomplete capture
•  Overlapping source data
•  Statistical projections &
biases
•  Social media type
relationships
Medical Claims Data
Batch
(CompleteView,
Expense Manager,
CompleteSpend)
Transactional
(PRS/PE)
Big Data
Relational DB &
Analytics
(Claims)
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Master Data Management
Visualization
Dashboard / Reports
Structured Storage
RelationalIndexing
Flexible Storage
NoSQL Graph(s)
Interfacing
Web Services
Distributed Processing
Standardize
Validate
Match
Consolidate
Analytics
Data Sources
Government
Web
Customer
I’m happy
User Interface
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Consolidation
First Name: John
Middle Name: David
Last Name: Smith
First Name: Mike
Middle Name: Steve
Last Name: Smith
First Name: Mike
Middle Name: David
Last Name: Smith
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Legacy System
§ Relational DB
§ Jboss
§ Jboss MQ
§ 1 Week to process a record through the system
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Our Solutions
Business
Needs
Finance & LegalBusiness SystemsComplianceSales & Marketing
Solutions
ComplianceData Assessment, Integration, &
Outsourcing
Enrichment Services
Provider Data
01010011
Market
Intelligence
HMS
Authoritative
Sources
PDC Federal StateMedical Claims Web Derived
Advanced
Technology
Storm
HMS MDM
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Data Model
§ Think of full entity
§ Build entity as you go
§ Get full view upon fetch
§ Choose PK carefully
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Cassandra-Indexing
§ Fast wide row alternate key for Cassandra
§ Two row pull process
›  Fetch PKs matching AK
›  Use PK to fetch your data
https://github.com/hmsonline/cassandra-indexing
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Cassandra-Indexing
§ Key: Col1:Col2
§ Index: Col2:Col1
https://github.com/hmsonline/cassandra-indexing
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Cassandra-Indexing Example
§ Key: <First Name>:<Last Name>
§ Index: <Last Name>:<First Name>
§ Data
›  John:Smith
›  Steve:Smith
›  David:Jones
§ Index fetch “Smith” => John:Smith, Steve:Smith
§ Index fetch “Jones” => David:Jones
https://github.com/hmsonline/cassandra-indexing
#Cassandra13
© Health Market Science 2013, All Rights Reserved
System Phase 1
#Cassandra13
© Health Market Science 2013, All Rights Reserved
System Phase 2
#Cassandra13
© Health Market Science 2013, All Rights Reserved
System Phase 3
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Oracle Advanced Queue
§ Integrate Relation DB and JMS
§ Near Real time processing of data
›  Table trigger
§ Bulk exports
›  Keep only what you need on the queue
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Oracle Advanced Queue (cont)
§ Distributed processing
›  Write to Cassandra as of queue time
›  Write only ids and query back for data
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Unit testing
§ Module level
›  In memory mock
›  Map<String, Map<String, Map<String, Map<String, String>>>>
›  Map<Keyspace, Map<Column Family, Map<Column, Map<Row
Key, Value>>>>
§ Integration
›  Embedded Cassandra super class
›  Schema migration
#Cassandra13
© Health Market Science 2013, All Rights Reserved
QA
§ Fail fast and early
§ SoapUI and Maven
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Organization Design
§ Project Manager
§ Business Analyst
§ Quality Assurance
§ Software Developer
§ Development Operations
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Devops
§ Virtual Hardware (VMware)
§ Puppet
›  Puppet Master
›  Jenkins
§ Promote using config
›  Same script run in DEV as in Prod
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Real-time System
Kafka
Queue(s)
Offset
C*
A
BC
C* ES1
Kafka
Elastic
Search
ES2
C*
REST API
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Storm
•  Guaranteed once semantics
•  Well-designed processing abstraction
•  Beats BYODP
•  Momentum
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Storm and Cassandra
§ Use Cases:
›  Write Storm Tuple data to C*
§  Computation Results
§  Pre-computed indices
›  Read data from C* and emit Storm Tuples
§  Dynamic Lookups
http://github.com/hmsonline/storm-cassandra
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Storm-Cassandra Project
§ ColumnsMapper Interface
›  Tells the CassandraLookupBolt how to transform a C* row into a
Storm Tuple
§ Given a C* Row Key and list of Columns:
›  Return a list of Storm Tuples
http://github.com/hmsonline/storm-cassandra
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Vision
Engine
•  Unpredictable schema/
layout
•  Expand data storage
structure dynamically
•  Fuzzy Search
Unstructured Data
•  Traversing relationships
•  Building connections
•  Real time relationship
changes
Graph Database
•  Traditional data base
•  Predictable, logical structure
•  Faceted Search
Structured Data
•  Scalability
•  Performance
•  Processing power
•  Virtual grow/shrink
Distributed Processing
Data
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Summary
§ Cassandra-Indexing
§ Oracle Advanced Queue
§ Storm-Cassandra
#Cassandra13
© Health Market Science 2013, All Rights Reserved
THE SCIENCE OF
BETTER RESULTS
www.healthmarketscience.com
2700 Horizon Drive • King of Prussia, PA 19406 • 800.593.4467 • info@healthmarketscience.com
Questions?
#Cassandra13

More Related Content

What's hot

Non-Relational Revolution - Joseph Idziorek
Non-Relational Revolution - Joseph IdziorekNon-Relational Revolution - Joseph Idziorek
Non-Relational Revolution - Joseph Idziorek
Amazon Web Services
 
Clear story _spark_
Clear story _spark_Clear story _spark_
Clear story _spark_
Geetanjali G
 

What's hot (20)

Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big Data
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache Cassandra
 
Complex Analytics using Open Source Technologies
Complex Analytics using Open Source TechnologiesComplex Analytics using Open Source Technologies
Complex Analytics using Open Source Technologies
 
Breakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopBreakout: Data Discovery with Hadoop
Breakout: Data Discovery with Hadoop
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
 
Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Non-Relational Revolution - Joseph Idziorek
Non-Relational Revolution - Joseph IdziorekNon-Relational Revolution - Joseph Idziorek
Non-Relational Revolution - Joseph Idziorek
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Clear story _spark_
Clear story _spark_Clear story _spark_
Clear story _spark_
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
 
IDERA Live | Maintaining Data Governance During Rapidly Changing Conditions
IDERA Live | Maintaining Data Governance During Rapidly Changing ConditionsIDERA Live | Maintaining Data Governance During Rapidly Changing Conditions
IDERA Live | Maintaining Data Governance During Rapidly Changing Conditions
 
Scaling Data overview
Scaling Data overviewScaling Data overview
Scaling Data overview
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 

Viewers also liked

Viewers also liked (20)

DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
 
C* Summit 2013: Processing an Avalanche of Medical Records by Terrell Deppe
C* Summit 2013: Processing an Avalanche of Medical Records by Terrell DeppeC* Summit 2013: Processing an Avalanche of Medical Records by Terrell Deppe
C* Summit 2013: Processing an Avalanche of Medical Records by Terrell Deppe
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
DataStax: 7 Deadly Sins for Cassandra Ops
DataStax: 7 Deadly Sins for Cassandra OpsDataStax: 7 Deadly Sins for Cassandra Ops
DataStax: 7 Deadly Sins for Cassandra Ops
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
 
SKB Kontur: Digging Cassandra cluster
SKB Kontur: Digging Cassandra clusterSKB Kontur: Digging Cassandra cluster
SKB Kontur: Digging Cassandra cluster
 
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsBeginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
New features in 3.0
New features in 3.0New features in 3.0
New features in 3.0
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Using Event-Driven Architectures with Cassandra
Using Event-Driven Architectures with CassandraUsing Event-Driven Architectures with Cassandra
Using Event-Driven Architectures with Cassandra
 
Traveler's Guide to Cassandra
Traveler's Guide to CassandraTraveler's Guide to Cassandra
Traveler's Guide to Cassandra
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
 
Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra
Cassandra Summit 2014: Huge Online Genealogical Database Driven By CassandraCassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra
Cassandra Summit 2014: Huge Online Genealogical Database Driven By Cassandra
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 

Similar to C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Vishal Bamba
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Keith Kraus
 

Similar to C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts (20)

C* Summit 2013: Aligning Technology Infrastructure With Horizontal Business G...
C* Summit 2013: Aligning Technology Infrastructure With Horizontal Business G...C* Summit 2013: Aligning Technology Infrastructure With Horizontal Business G...
C* Summit 2013: Aligning Technology Infrastructure With Horizontal Business G...
 
DataStax
DataStaxDataStax
DataStax
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
Building with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseBuilding with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right Database
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
SQL Access to NoSQL
SQL Access to NoSQLSQL Access to NoSQL
SQL Access to NoSQL
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making
Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision MakingFast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making
Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptx
 
Confluent:AWS - GameDay.pptx
 Confluent:AWS - GameDay.pptx Confluent:AWS - GameDay.pptx
Confluent:AWS - GameDay.pptx
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the Dark
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information Security
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
 
How to Avoid Epic Web Failure... Lessons Learned from Healthcare.gov
How to Avoid Epic Web Failure... Lessons Learned from Healthcare.govHow to Avoid Epic Web Failure... Lessons Learned from Healthcare.gov
How to Avoid Epic Web Failure... Lessons Learned from Healthcare.gov
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

  • 1. © Health Market Science 2013, All Rights Reserved Isaac Rieksts Software Developer @IsaacRieksts, irieksts@gmail.com CROSSING THE CHASM SQL to NOSQL #Cassandra13
  • 2. © Health Market Science 2013, All Rights Reserved Our Mission § Deliver the most current information on the U.S. healthcare provider universe using integrated solutions in order for customers to: ›  Prevent fraud, waste and abuse across the healthcare system ›  Comply with evolving state and federal regulations ›  Improve market opportunity for non retail drugs and devices #Cassandra13
  • 3. © Health Market Science 2013, All Rights Reserved The Business Business SolutionsHealth Care Provider & Facilities Variety/Velocity •  >2000 of sources •  6 Million unique HCPs •  10+ years history Data Challenges •  Constant change in real world data •  Conflicting & partial info •  Frequent changes to source structure •  Authoritative sources vs. crowdsource •  Predicting source quality Master Data Solutions Medical Procedures & Diagnosis Volume/Velocity •  ~1B claims annually •  +5B records annually •  5+ years history Data Challenges •  Sources have incomplete capture •  Overlapping source data •  Statistical projections & biases •  Social media type relationships Medical Claims Data Batch (CompleteView, Expense Manager, CompleteSpend) Transactional (PRS/PE) Big Data Relational DB & Analytics (Claims) #Cassandra13
  • 4. © Health Market Science 2013, All Rights Reserved Master Data Management Visualization Dashboard / Reports Structured Storage RelationalIndexing Flexible Storage NoSQL Graph(s) Interfacing Web Services Distributed Processing Standardize Validate Match Consolidate Analytics Data Sources Government Web Customer I’m happy User Interface #Cassandra13
  • 5. © Health Market Science 2013, All Rights Reserved Consolidation First Name: John Middle Name: David Last Name: Smith First Name: Mike Middle Name: Steve Last Name: Smith First Name: Mike Middle Name: David Last Name: Smith #Cassandra13
  • 6. © Health Market Science 2013, All Rights Reserved Legacy System § Relational DB § Jboss § Jboss MQ § 1 Week to process a record through the system #Cassandra13
  • 7. © Health Market Science 2013, All Rights Reserved Our Solutions Business Needs Finance & LegalBusiness SystemsComplianceSales & Marketing Solutions ComplianceData Assessment, Integration, & Outsourcing Enrichment Services Provider Data 01010011 Market Intelligence HMS Authoritative Sources PDC Federal StateMedical Claims Web Derived Advanced Technology Storm HMS MDM #Cassandra13
  • 8. © Health Market Science 2013, All Rights Reserved Data Model § Think of full entity § Build entity as you go § Get full view upon fetch § Choose PK carefully #Cassandra13
  • 9. © Health Market Science 2013, All Rights Reserved Cassandra-Indexing § Fast wide row alternate key for Cassandra § Two row pull process ›  Fetch PKs matching AK ›  Use PK to fetch your data https://github.com/hmsonline/cassandra-indexing #Cassandra13
  • 10. © Health Market Science 2013, All Rights Reserved Cassandra-Indexing § Key: Col1:Col2 § Index: Col2:Col1 https://github.com/hmsonline/cassandra-indexing #Cassandra13
  • 11. © Health Market Science 2013, All Rights Reserved Cassandra-Indexing Example § Key: <First Name>:<Last Name> § Index: <Last Name>:<First Name> § Data ›  John:Smith ›  Steve:Smith ›  David:Jones § Index fetch “Smith” => John:Smith, Steve:Smith § Index fetch “Jones” => David:Jones https://github.com/hmsonline/cassandra-indexing #Cassandra13
  • 12. © Health Market Science 2013, All Rights Reserved System Phase 1 #Cassandra13
  • 13. © Health Market Science 2013, All Rights Reserved System Phase 2 #Cassandra13
  • 14. © Health Market Science 2013, All Rights Reserved System Phase 3 #Cassandra13
  • 15. © Health Market Science 2013, All Rights Reserved Oracle Advanced Queue § Integrate Relation DB and JMS § Near Real time processing of data ›  Table trigger § Bulk exports ›  Keep only what you need on the queue #Cassandra13
  • 16. © Health Market Science 2013, All Rights Reserved Oracle Advanced Queue (cont) § Distributed processing ›  Write to Cassandra as of queue time ›  Write only ids and query back for data #Cassandra13
  • 17. © Health Market Science 2013, All Rights Reserved Unit testing § Module level ›  In memory mock ›  Map<String, Map<String, Map<String, Map<String, String>>>> ›  Map<Keyspace, Map<Column Family, Map<Column, Map<Row Key, Value>>>> § Integration ›  Embedded Cassandra super class ›  Schema migration #Cassandra13
  • 18. © Health Market Science 2013, All Rights Reserved QA § Fail fast and early § SoapUI and Maven #Cassandra13
  • 19. © Health Market Science 2013, All Rights Reserved Organization Design § Project Manager § Business Analyst § Quality Assurance § Software Developer § Development Operations #Cassandra13
  • 20. © Health Market Science 2013, All Rights Reserved Devops § Virtual Hardware (VMware) § Puppet ›  Puppet Master ›  Jenkins § Promote using config ›  Same script run in DEV as in Prod #Cassandra13
  • 21. © Health Market Science 2013, All Rights Reserved Real-time System Kafka Queue(s) Offset C* A BC C* ES1 Kafka Elastic Search ES2 C* REST API #Cassandra13
  • 22. © Health Market Science 2013, All Rights Reserved Storm •  Guaranteed once semantics •  Well-designed processing abstraction •  Beats BYODP •  Momentum #Cassandra13
  • 23. © Health Market Science 2013, All Rights Reserved Storm and Cassandra § Use Cases: ›  Write Storm Tuple data to C* §  Computation Results §  Pre-computed indices ›  Read data from C* and emit Storm Tuples §  Dynamic Lookups http://github.com/hmsonline/storm-cassandra #Cassandra13
  • 24. © Health Market Science 2013, All Rights Reserved Storm-Cassandra Project § ColumnsMapper Interface ›  Tells the CassandraLookupBolt how to transform a C* row into a Storm Tuple § Given a C* Row Key and list of Columns: ›  Return a list of Storm Tuples http://github.com/hmsonline/storm-cassandra #Cassandra13
  • 25. © Health Market Science 2013, All Rights Reserved Vision Engine •  Unpredictable schema/ layout •  Expand data storage structure dynamically •  Fuzzy Search Unstructured Data •  Traversing relationships •  Building connections •  Real time relationship changes Graph Database •  Traditional data base •  Predictable, logical structure •  Faceted Search Structured Data •  Scalability •  Performance •  Processing power •  Virtual grow/shrink Distributed Processing Data #Cassandra13
  • 26. © Health Market Science 2013, All Rights Reserved Summary § Cassandra-Indexing § Oracle Advanced Queue § Storm-Cassandra #Cassandra13
  • 27. © Health Market Science 2013, All Rights Reserved THE SCIENCE OF BETTER RESULTS www.healthmarketscience.com 2700 Horizon Drive • King of Prussia, PA 19406 • 800.593.4467 • info@healthmarketscience.com Questions? #Cassandra13