SlideShare uma empresa Scribd logo
1 de 18
page 1 |
Evolving a 1st Generation HBase
Deployment to 2nd and Beyond
Doug Meil
Chief Software Architect
HBase Committer
HBaseCon2013
page 2 |
Company
Background
page 3 |
Comprehensive view of care including all
venues of delivery representative of all major
diseases, treatments, and demographics
14 integrated delivery networks
with over 200 hospitals and
100,000 providers
$46 billion in care delivered
annual by our network members
24 million truly unique patients
The Explorys Value Based Care Big Network
page 4 |
Clinical
EMRs, claims, labs, registries, rep
orted outcomes
Operational
Providers org
charts, practices, locations, depa
rtments, physical assets, and care
workflow
Financial
Private / payer
claims, billing, patient
accounting systems
The Explorys
Platform
PCP Specialist Hospital
Post
acute
Long
term
Home Mobile
Full view of the continuum of care & cost
Secure | Cost Effective | Ready Now
Start with Data Completeness
 Aggregation
 Patient matching
 Curation & attribution
 Data governance
 Profiling
 Risk analytics
 Prediction
Insight
page 5 |
Why HBase?
page 6 |
HBase at Explorys
Transactional Store General Store
page 7 |
Source 1 Source 2 Source 3 Source 4
Explorys
Apps
1Extract & Load
Loads (Puts)1 Read (Scan)2 Bulk-Load3 Multi-Get4 Impala5
5 Queries
MultiGet
4
Power Search
2
Patient Chart
M/
R M/
R
“Late Binding”
Transformation &
Standardization
Generated
Results / Indexes
3
Explore
Measure
Registry
Engage
High Level HBase Usage Overview
page 8 |
Functional
Examples
page 9 |
NQF 0575 Example (Simple Example, Condensed)
Initial Population
Patients >= 17 and <= 74 before the start of the measurement period
Denominator
2 encounters (non-acute and outpatient) and an active diagnosis of diabetes
Or
Active meds indicative of diabetes
All within 2 years or during the measurement end-date
Exclusions
Things like active diagnosis of gestational diabetes will exclude patient from
denominator
Numerator
Most recent HbA1c test < 8%
Measures Generated in MapReduce
Measure Calculations
page 10 |
Measure Results Generated to HBase
Results by
 Measure
 Attributed Provider
 Patient
 Reporting Window
 … generated to HBase
Lots of Generated Data
Hundreds of Measures Generates Hundreds of Millions of Measure Results Per Day
Measure Generated Data
page 11 |
Heart Failure Functional Example
 No evidence of Myocardial Infarction
 THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent
 THEN Myocardial Infarction within one year
C. Diff. Infection Functional Example
 Ambulatory Encounter
 THEN an Inpatient Encounter
 THEN evidence of C. Diff. infection within 10 days
 THEN an Ambulatory Encounter within 30 days
Summary
NoSQL works well as the backend implementation for these kinds of “queries”
because it takes complex logic to satisfy this result.
PowerSearch
page 12 |
Technical
Details
page 13 |
Distro
CDH4.2.1
Hadoop Knobs
 HDFS Local read shortcut on
 HDFS Drop behind reads, Read-ahead on
 Snappy for MR temp files
 Read-ahead for MR temp files
 MR heartbeat on task finish
Cluster Information
page 14 |
HBase Knobs
 We pre-split our tables
 We Use KeyPrefixRegionSplitPolicy
 Snappy CF compression
 HLog compression on
 RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now)
HBase Knobs Under Consideration
 HBase Checksumming - currently off, but will probably turn on
 FAST_DIFF encoding – currently not in use, but will probably use for lookup tables
Cluster Information
page 15 |
Compression (HDFS and HBase)
LZO  Snappy
HBase Key Redesign
 Our initial HBase RowKeys were too beefy and too Stringy.
• Refactored to be tighter.
 Column names a bit too descriptive initially
 Changes related to the new KeyPrefixRegionSplitPolicy.
HBase Table Management
We have a layer of metadata around our MR jobs and apps and re-create our tables
from time to time, which makes schema changes easier.
What Have We Changed?
page 16 |
HBase Loading
 Index tables loaded with bulk-loading
 Experimented with WAL off and deferred log flushing, but bulk-loading is better.
HBase Gets
 When we started multi-Get didn’t even exist in HBase!
 This feature was very much appreciated, our DAO layer was modified to accept
batch requests.
• Minimizing RPCs makes a difference.
SQL?
Impala against HBase for internal data investigation
What Have We Changed?
page 17 |
Data Browsers
 We’ve built our own data browser for data inspection, and continue to add to it.
 This isn’t going away any time soon and is highly used.
 Also kind of necessary if you store complex objects in HBase
HBase Filters
 We have some.
 Didn’t initially, but they have proven quite useful.
Things We’ve Built
page 18 |
Questions?
Doug Meil
Chief Software Architect
Doug.Meil@explorys.com
www.explorys.com
Thank You!

Mais conteúdo relacionado

Mais procurados

Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelDataWorks Summit
 
Delivering Real-Time Business Value for Healthcare
Delivering Real-Time Business Value for HealthcareDelivering Real-Time Business Value for Healthcare
Delivering Real-Time Business Value for HealthcareSAP Technology
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDataWorks Summit
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Big Data Spain
 

Mais procurados (20)

Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Delivering Real-Time Business Value for Healthcare
Delivering Real-Time Business Value for HealthcareDelivering Real-Time Business Value for Healthcare
Delivering Real-Time Business Value for Healthcare
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 

Semelhante a HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond

HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysCloudera, Inc.
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP Technology
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRushtempledf
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
Pharmacy one source speed saves lives
Pharmacy one source   speed saves livesPharmacy one source   speed saves lives
Pharmacy one source speed saves livesNati Shalom
 
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysisD1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysisDr. Wilfred Lin (Ph.D.)
 
Thesis blending big data and cloud -epilepsy global data research and inform...
Thesis  blending big data and cloud -epilepsy global data research and inform...Thesis  blending big data and cloud -epilepsy global data research and inform...
Thesis blending big data and cloud -epilepsy global data research and inform...Anup Singh
 
Hw09 Protein Alignment
Hw09   Protein AlignmentHw09   Protein Alignment
Hw09 Protein AlignmentCloudera, Inc.
 
Jethro + Symphony Health at Qlik Qonnections
Jethro + Symphony Health at Qlik QonnectionsJethro + Symphony Health at Qlik Qonnections
Jethro + Symphony Health at Qlik QonnectionsRemy Rosenbaum
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewHamdaoui Younes
 
Costing your Bug Data Operations
Costing your Bug Data OperationsCosting your Bug Data Operations
Costing your Bug Data OperationsDataWorks Summit
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeDataWorks Summit
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
 
Cloud centric consumption based services for SAP, HANA, Concur, Ariba, C4C
Cloud centric consumption based services for SAP, HANA, Concur, Ariba, C4CCloud centric consumption based services for SAP, HANA, Concur, Ariba, C4C
Cloud centric consumption based services for SAP, HANA, Concur, Ariba, C4CAjay Kumar Uppal
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databasesGowriLatha1
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databasesGowriLatha1
 

Semelhante a HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond (20)

HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data Analysis
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Pharmacy one source speed saves lives
Pharmacy one source   speed saves livesPharmacy one source   speed saves lives
Pharmacy one source speed saves lives
 
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysisD1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
 
Thesis blending big data and cloud -epilepsy global data research and inform...
Thesis  blending big data and cloud -epilepsy global data research and inform...Thesis  blending big data and cloud -epilepsy global data research and inform...
Thesis blending big data and cloud -epilepsy global data research and inform...
 
Hw09 Protein Alignment
Hw09   Protein AlignmentHw09   Protein Alignment
Hw09 Protein Alignment
 
Jethro + Symphony Health at Qlik Qonnections
Jethro + Symphony Health at Qlik QonnectionsJethro + Symphony Health at Qlik Qonnections
Jethro + Symphony Health at Qlik Qonnections
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
Costing your Bug Data Operations
Costing your Bug Data OperationsCosting your Bug Data Operations
Costing your Bug Data Operations
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-Time
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
Cloud centric consumption based services for SAP, HANA, Concur, Ariba, C4C
Cloud centric consumption based services for SAP, HANA, Concur, Ariba, C4CCloud centric consumption based services for SAP, HANA, Concur, Ariba, C4C
Cloud centric consumption based services for SAP, HANA, Concur, Ariba, C4C
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databases
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databases
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond

  • 1. page 1 | Evolving a 1st Generation HBase Deployment to 2nd and Beyond Doug Meil Chief Software Architect HBase Committer HBaseCon2013
  • 3. page 3 | Comprehensive view of care including all venues of delivery representative of all major diseases, treatments, and demographics 14 integrated delivery networks with over 200 hospitals and 100,000 providers $46 billion in care delivered annual by our network members 24 million truly unique patients The Explorys Value Based Care Big Network
  • 4. page 4 | Clinical EMRs, claims, labs, registries, rep orted outcomes Operational Providers org charts, practices, locations, depa rtments, physical assets, and care workflow Financial Private / payer claims, billing, patient accounting systems The Explorys Platform PCP Specialist Hospital Post acute Long term Home Mobile Full view of the continuum of care & cost Secure | Cost Effective | Ready Now Start with Data Completeness  Aggregation  Patient matching  Curation & attribution  Data governance  Profiling  Risk analytics  Prediction Insight
  • 5. page 5 | Why HBase?
  • 6. page 6 | HBase at Explorys Transactional Store General Store
  • 7. page 7 | Source 1 Source 2 Source 3 Source 4 Explorys Apps 1Extract & Load Loads (Puts)1 Read (Scan)2 Bulk-Load3 Multi-Get4 Impala5 5 Queries MultiGet 4 Power Search 2 Patient Chart M/ R M/ R “Late Binding” Transformation & Standardization Generated Results / Indexes 3 Explore Measure Registry Engage High Level HBase Usage Overview
  • 9. page 9 | NQF 0575 Example (Simple Example, Condensed) Initial Population Patients >= 17 and <= 74 before the start of the measurement period Denominator 2 encounters (non-acute and outpatient) and an active diagnosis of diabetes Or Active meds indicative of diabetes All within 2 years or during the measurement end-date Exclusions Things like active diagnosis of gestational diabetes will exclude patient from denominator Numerator Most recent HbA1c test < 8% Measures Generated in MapReduce Measure Calculations
  • 10. page 10 | Measure Results Generated to HBase Results by  Measure  Attributed Provider  Patient  Reporting Window  … generated to HBase Lots of Generated Data Hundreds of Measures Generates Hundreds of Millions of Measure Results Per Day Measure Generated Data
  • 11. page 11 | Heart Failure Functional Example  No evidence of Myocardial Infarction  THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent  THEN Myocardial Infarction within one year C. Diff. Infection Functional Example  Ambulatory Encounter  THEN an Inpatient Encounter  THEN evidence of C. Diff. infection within 10 days  THEN an Ambulatory Encounter within 30 days Summary NoSQL works well as the backend implementation for these kinds of “queries” because it takes complex logic to satisfy this result. PowerSearch
  • 13. page 13 | Distro CDH4.2.1 Hadoop Knobs  HDFS Local read shortcut on  HDFS Drop behind reads, Read-ahead on  Snappy for MR temp files  Read-ahead for MR temp files  MR heartbeat on task finish Cluster Information
  • 14. page 14 | HBase Knobs  We pre-split our tables  We Use KeyPrefixRegionSplitPolicy  Snappy CF compression  HLog compression on  RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now) HBase Knobs Under Consideration  HBase Checksumming - currently off, but will probably turn on  FAST_DIFF encoding – currently not in use, but will probably use for lookup tables Cluster Information
  • 15. page 15 | Compression (HDFS and HBase) LZO  Snappy HBase Key Redesign  Our initial HBase RowKeys were too beefy and too Stringy. • Refactored to be tighter.  Column names a bit too descriptive initially  Changes related to the new KeyPrefixRegionSplitPolicy. HBase Table Management We have a layer of metadata around our MR jobs and apps and re-create our tables from time to time, which makes schema changes easier. What Have We Changed?
  • 16. page 16 | HBase Loading  Index tables loaded with bulk-loading  Experimented with WAL off and deferred log flushing, but bulk-loading is better. HBase Gets  When we started multi-Get didn’t even exist in HBase!  This feature was very much appreciated, our DAO layer was modified to accept batch requests. • Minimizing RPCs makes a difference. SQL? Impala against HBase for internal data investigation What Have We Changed?
  • 17. page 17 | Data Browsers  We’ve built our own data browser for data inspection, and continue to add to it.  This isn’t going away any time soon and is highly used.  Also kind of necessary if you store complex objects in HBase HBase Filters  We have some.  Didn’t initially, but they have proven quite useful. Things We’ve Built
  • 18. page 18 | Questions? Doug Meil Chief Software Architect Doug.Meil@explorys.com www.explorys.com Thank You!