SlideShare uma empresa Scribd logo
1 de 18
EndoMine System
Jewish General Hospital

by David Lauzon
and Anton Zakharov
Big Data Montreal #9
February 5th 2013         1 / 18
Presentation

•   Our Objectives
•   Requirements and context
•   Project scope
•   Hadoop Solution
    –   Big Data Solution Overview
    –   Hive Table Schema
    –   Compression Performance
    –   Data Architecture in Hadoop
    –   Hadoop/Impala Prototype Demo
• Oracle Solution
• Hadoop vs Oracle comparison
• What are expensive queries?

                                       2 / 18
Our Objectives


• Lead an end-of-study project in an
  industrial context
  – Requirements elicitation
  – Implement a « proof-of-concept » prototype


• Experiment with big data technologies
  – Compare with RDBMS



                                                 3 / 18
Requirements and context

• Department of Medical Diagnostic
  (medical test results DB, e.g. blood, urine, ...)
   – Dr. Shaun Eintracht
      • « ad hoc » Query
      • ETL Query
   – Dr. Elizabeth Mac Namara
      • « business intelligence » requirements
      • Realtime Dashboard

• Department of Endocrinology
   – Dr. Mark Trifiro
      • Data mining

                                                      4 / 18
Project scope


• First iteration = improve ad-hoc queries
  – Slow analytical queries and ETL (MS Access)
  – Risk of « crashing » production DB
  – Some queries impossible to process




                                                  5 / 18
Production DB (Oracle)




                         6 / 18
Solutions


• Solution 1 : Hadoop + Impala

• Solution 2 : Tune the existing Oracle RDBMS




                                                7 / 18
Big Data Solution Overview




                             8 / 18
Hive Table Schema




                    9 / 18
Compression Performance

250

200

150
                                                                 Impala
100                                                              Hive
                                                                 Oracle
50

 0
      Oracle FS   Text File   Sequence   SeqFile +   SeqFile +
                                 File      Gzip       Snappy


                                                                    10 / 18
Data Architecture in Hadoop

• All big tables are pre-joined
   – With specimen (1)
   – Without specimen (2)
• Partitioned using two schemes
   – Year-month (3)
   – Year and Test (4)
• 4 different versions of the same data:
   –   stay_order_results_yearmonth
   –   stay_order_results_year_and_test
   –   stay_order_results_specimen_yearmonth
   –   stay_order_results_specimen_year_and_test


                                                   11 / 18
Hadoop Prototype Demo




                        12 / 18
Oracle Solution


• Same tables as source DB
  – A big pre-joined table is not a good solution
• Techniques explored :
  – Partitioning
     • Partitions automatically created
  – Compression
     • Inefficient for joins
  – Clustering
  – Join multiple partitioned tables


                                                    13 / 18
Oracle Solution (continued)


• Avoid too many indexes on the big tables:
  – Takes a lot of memory
  – Slow to create
  – May not be used if query use more than 5% of the
    rows




                                                  14 / 18
Comparison: Hadoop Solution


• Pro
  – Crunch massive amount of data
  – Scalability
  – Free software
• Cons
  – Needs better UI and tune-ups
  – Maintenance cost
  – Require ETL time to merge data into one table
  – BIG Joins should be avoided

                                                    15 / 18
Comparison: Oracle Solution


• Pro
  – Just need to create a slave DB (just?)
  – Faster random-lookup
  – Easier to find expertise
• Cons
  – Scalability up to a certain point..
  – Synchronisation with master DB:
        • Rebuilding indexes would take hours


                                                16 / 18
What are expensive queries?


• If possible, avoid these constructs on
  large result sets
  – SELECT DISTINCT
  – ORDER BY
  – GROUP BY
  – JOIN big table with another big table
     • JOIN big table with multiple small tables should be OK




                                                            17 / 18
Conclusion


• Recommendation to use a “classic” RDBMS
  – The database fit on a single-node
  – Existing expertise in-house
  – Acceptable performance with appropriate
    tune-ups
  – Stop using MS Access
• Disadvantage : limited scalability



                                              18 / 18

Mais conteúdo relacionado

Mais procurados

ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
 

Mais procurados (20)

Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or Worse
 
Hadoop and IDW - When_to_use_which
Hadoop and IDW - When_to_use_whichHadoop and IDW - When_to_use_which
Hadoop and IDW - When_to_use_which
 
Optiq: A dynamic data management framework
Optiq: A dynamic data management frameworkOptiq: A dynamic data management framework
Optiq: A dynamic data management framework
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 

Destaque (6)

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
 
A7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloudA7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloud
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 

Semelhante a BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
 

Semelhante a BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case (20)

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

  • 1. EndoMine System Jewish General Hospital by David Lauzon and Anton Zakharov Big Data Montreal #9 February 5th 2013 1 / 18
  • 2. Presentation • Our Objectives • Requirements and context • Project scope • Hadoop Solution – Big Data Solution Overview – Hive Table Schema – Compression Performance – Data Architecture in Hadoop – Hadoop/Impala Prototype Demo • Oracle Solution • Hadoop vs Oracle comparison • What are expensive queries? 2 / 18
  • 3. Our Objectives • Lead an end-of-study project in an industrial context – Requirements elicitation – Implement a « proof-of-concept » prototype • Experiment with big data technologies – Compare with RDBMS 3 / 18
  • 4. Requirements and context • Department of Medical Diagnostic (medical test results DB, e.g. blood, urine, ...) – Dr. Shaun Eintracht • « ad hoc » Query • ETL Query – Dr. Elizabeth Mac Namara • « business intelligence » requirements • Realtime Dashboard • Department of Endocrinology – Dr. Mark Trifiro • Data mining 4 / 18
  • 5. Project scope • First iteration = improve ad-hoc queries – Slow analytical queries and ETL (MS Access) – Risk of « crashing » production DB – Some queries impossible to process 5 / 18
  • 7. Solutions • Solution 1 : Hadoop + Impala • Solution 2 : Tune the existing Oracle RDBMS 7 / 18
  • 8. Big Data Solution Overview 8 / 18
  • 10. Compression Performance 250 200 150 Impala 100 Hive Oracle 50 0 Oracle FS Text File Sequence SeqFile + SeqFile + File Gzip Snappy 10 / 18
  • 11. Data Architecture in Hadoop • All big tables are pre-joined – With specimen (1) – Without specimen (2) • Partitioned using two schemes – Year-month (3) – Year and Test (4) • 4 different versions of the same data: – stay_order_results_yearmonth – stay_order_results_year_and_test – stay_order_results_specimen_yearmonth – stay_order_results_specimen_year_and_test 11 / 18
  • 13. Oracle Solution • Same tables as source DB – A big pre-joined table is not a good solution • Techniques explored : – Partitioning • Partitions automatically created – Compression • Inefficient for joins – Clustering – Join multiple partitioned tables 13 / 18
  • 14. Oracle Solution (continued) • Avoid too many indexes on the big tables: – Takes a lot of memory – Slow to create – May not be used if query use more than 5% of the rows 14 / 18
  • 15. Comparison: Hadoop Solution • Pro – Crunch massive amount of data – Scalability – Free software • Cons – Needs better UI and tune-ups – Maintenance cost – Require ETL time to merge data into one table – BIG Joins should be avoided 15 / 18
  • 16. Comparison: Oracle Solution • Pro – Just need to create a slave DB (just?) – Faster random-lookup – Easier to find expertise • Cons – Scalability up to a certain point.. – Synchronisation with master DB: • Rebuilding indexes would take hours 16 / 18
  • 17. What are expensive queries? • If possible, avoid these constructs on large result sets – SELECT DISTINCT – ORDER BY – GROUP BY – JOIN big table with another big table • JOIN big table with multiple small tables should be OK 17 / 18
  • 18. Conclusion • Recommendation to use a “classic” RDBMS – The database fit on a single-node – Existing expertise in-house – Acceptable performance with appropriate tune-ups – Stop using MS Access • Disadvantage : limited scalability 18 / 18

Notas do Editor

  1. ChoisirShaun : échelle plus petite, besoin immédiat, permet de tester la technologie
  2. ChoisirShaun : échelle plus petite, besoin immédiat, permet de tester la technologie
  3. Base de donnéescontenant les données d’ analyse de test des spécimens des patients avec les résultats.Faire des requêtes analytiques sur la base de donnée en production est très lent et peut interférer avec le fonctionnement normal avec
  4. Base de donnéescontenant les données d’ analyse de test des spécimens des patients avec les résultats.Faire des requêtes analytiques sur la base de donnée en production est très lent et peut interférer avec le fonctionnement normal avec
  5. NE PARLERONS PAS DE : Extraction des exigences
  6. 25% plusrapide avec compression Snappy (5.5X compression)Impala 80% plus rapidequ’Oracle
  7. ChoisirShaun : échelle plus petite, besoin immédiat, permet de tester la technologie