SlideShare uma empresa Scribd logo
1 de 20
Big Data Schema Design

               Deepak
Overview
•   Schema design is vital for performance.
•   Keywords : Non-relational, NOSQL, Distributed
•   Underlying File system : GFS, HDFS
•   Examples : Hadoop, GFS, Hbase, Big Tables etc
•   Example implementations : Facebook, Wallmart
    etc.
When to use
• Typically with systems having >=100’s of
  millions/billions rows
• Records of the order of 100’s or 1000’s of
  TB’s
• No advanced Query Language needed
• Typed columns or other RDBMS features not
  needed
Hadoop Architecture
Hadoop Ecosystem
HBase Architecture
Overview
• HBase runs on top of HDFS
• HDFS was chosen because of its fault tolerance,
  check summing, failover properties
• Java Native client or REST API
• Manager manages cluster, Region Servers
  manages data
HBase Data Model
• Table: design-time namespace, has many rows.
• Row: atomic key/value container, with one row
  key
• Column Family: divide columns into physical files
• Column: a key in the k/v container inside a row
• Timestamp: long milliseconds, sorted descending
• Value: a time-versioned value in the k/v container
Distribution
More distribution
Thoughts on the logical view
• Unit of scalability is Region.
• The rows are not tied to a server. They maybe
  moved around for load balancing.
• Add nodes so that we do not have too many
  regions per node
• Too many regions per node will work against
  distribution
Column Family
• Each Column Family represents a Physical storage
  unit ( A Directory)
• Data that are queried together should be stored
  together.
• Features such as compression can be enabled per
  Column Family
Bloom Filter
• Generated automatically when an HFile is
  flushed to disk
• Available in primary memory
• Contains Row keys
• CK can be stored as part of RK, but that
  might overload the memory.
• Can filter based on what is stored.
Physical View
Key Cardinality
Tall vs Fat Tables
• Fat tables with large amounts of data in each
  column.
• Tall tables with large amounts of rows.
• Tall is good for search or scans
• Fat is good for fetches or gets
• Rows don’t split
• Atomicity is only at row level, having compound
  keys, atomicity is not guaranteed
Key Design
• Sequential keys : Example timestamp as key
• With Sequential keys you keep hot spotting on a
  region.
• Salting to distribute the records
• Field promotion
• Random keys
Key Design Performance
Summary
• Think twice before you decide on NOSQL
  technologies
• Avoid hotspots
• Store values at appropriate places
• Choose the right keys
• Store inferences into RDBMS if necessary
Visit us:

   Facebook: http://www.facebook.com/QBurst
        Twitter: http://twitter.com/qburst
 Google+: https://plus.google.com/+qburst/posts
LinkedIn: http://www.linkedin.com/company/qburst
YouTube: http://www.youtube.com/QBurstVideos


                www.qburst.com

Mais conteúdo relacionado

Mais procurados

Supercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchSupercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchArthur Gimpel
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
 
Indexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkIndexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkkeval dalasaniya
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)John Dougherty
 
Road to cloud-iaas
Road to cloud-iaasRoad to cloud-iaas
Road to cloud-iaasHatem Al Sum
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage EnginesKarthik .P.R
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azurePrabath Fonseka
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server sideHoward Marks
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetupRemus Rusanu
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Dave Anselmi
 
Short introduction to Redis
Short introduction to RedisShort introduction to Redis
Short introduction to RedisJimmyZoger
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSWong Hoi Sing Edison
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
CosmosDb for beginners
CosmosDb for beginnersCosmosDb for beginners
CosmosDb for beginnersPhil Pursglove
 

Mais procurados (20)

Supercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchSupercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with Elasticsearch
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Indexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkIndexing with solr search server and hadoop framework
Indexing with solr search server and hadoop framework
 
HBase
HBaseHBase
HBase
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Road to cloud-iaas
Road to cloud-iaasRoad to cloud-iaas
Road to cloud-iaas
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azure
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Postgres Open
Postgres OpenPostgres Open
Postgres Open
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetup
 
Storage for VDI
Storage for VDIStorage for VDI
Storage for VDI
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Short introduction to Redis
Short introduction to RedisShort introduction to Redis
Short introduction to Redis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWS
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
CosmosDb for beginners
CosmosDb for beginnersCosmosDb for beginners
CosmosDb for beginners
 

Semelhante a Schema Design

HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singhMayank Singh
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databasesAnirban Konar
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"Inhacking
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 

Semelhante a Schema Design (20)

HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
NoSql
NoSqlNoSql
NoSql
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 

Mais de QBurst

Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...QBurst
 
Best Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsBest Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsQBurst
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking ApplicationQBurst
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesQBurst
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesQBurst
 
Implementing AMP on WP Blog
Implementing AMP on WP Blog Implementing AMP on WP Blog
Implementing AMP on WP Blog QBurst
 
HTTPS Impact on SEO
HTTPS Impact on SEOHTTPS Impact on SEO
HTTPS Impact on SEOQBurst
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress SiteQBurst
 
QBurst Big Data Expertise - Infographic
 QBurst Big Data Expertise - Infographic  QBurst Big Data Expertise - Infographic
QBurst Big Data Expertise - Infographic QBurst
 

Mais de QBurst (9)

Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...
 
Best Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsBest Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native Apps
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking Application
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Implementing AMP on WP Blog
Implementing AMP on WP Blog Implementing AMP on WP Blog
Implementing AMP on WP Blog
 
HTTPS Impact on SEO
HTTPS Impact on SEOHTTPS Impact on SEO
HTTPS Impact on SEO
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress Site
 
QBurst Big Data Expertise - Infographic
 QBurst Big Data Expertise - Infographic  QBurst Big Data Expertise - Infographic
QBurst Big Data Expertise - Infographic
 

Último

MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...ShrutiBose4
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 

Último (20)

MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
Ms Motilal Padampat Sugar Mills vs. State of Uttar Pradesh & Ors. - A Milesto...
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 

Schema Design

  • 1. Big Data Schema Design Deepak
  • 2. Overview • Schema design is vital for performance. • Keywords : Non-relational, NOSQL, Distributed • Underlying File system : GFS, HDFS • Examples : Hadoop, GFS, Hbase, Big Tables etc • Example implementations : Facebook, Wallmart etc.
  • 3. When to use • Typically with systems having >=100’s of millions/billions rows • Records of the order of 100’s or 1000’s of TB’s • No advanced Query Language needed • Typed columns or other RDBMS features not needed
  • 7. Overview • HBase runs on top of HDFS • HDFS was chosen because of its fault tolerance, check summing, failover properties • Java Native client or REST API • Manager manages cluster, Region Servers manages data
  • 8. HBase Data Model • Table: design-time namespace, has many rows. • Row: atomic key/value container, with one row key • Column Family: divide columns into physical files • Column: a key in the k/v container inside a row • Timestamp: long milliseconds, sorted descending • Value: a time-versioned value in the k/v container
  • 11. Thoughts on the logical view • Unit of scalability is Region. • The rows are not tied to a server. They maybe moved around for load balancing. • Add nodes so that we do not have too many regions per node • Too many regions per node will work against distribution
  • 12. Column Family • Each Column Family represents a Physical storage unit ( A Directory) • Data that are queried together should be stored together. • Features such as compression can be enabled per Column Family
  • 13. Bloom Filter • Generated automatically when an HFile is flushed to disk • Available in primary memory • Contains Row keys • CK can be stored as part of RK, but that might overload the memory. • Can filter based on what is stored.
  • 16. Tall vs Fat Tables • Fat tables with large amounts of data in each column. • Tall tables with large amounts of rows. • Tall is good for search or scans • Fat is good for fetches or gets • Rows don’t split • Atomicity is only at row level, having compound keys, atomicity is not guaranteed
  • 17. Key Design • Sequential keys : Example timestamp as key • With Sequential keys you keep hot spotting on a region. • Salting to distribute the records • Field promotion • Random keys
  • 19. Summary • Think twice before you decide on NOSQL technologies • Avoid hotspots • Store values at appropriate places • Choose the right keys • Store inferences into RDBMS if necessary
  • 20. Visit us: Facebook: http://www.facebook.com/QBurst Twitter: http://twitter.com/qburst Google+: https://plus.google.com/+qburst/posts LinkedIn: http://www.linkedin.com/company/qburst YouTube: http://www.youtube.com/QBurstVideos www.qburst.com

Notas do Editor

  1. Activity 1   - Study Make a conscious effort to improve attention to detail everywhere. Wherever you go, look for things to recall later. When you're shopping look for three things to study. Take 15 to 20 seconds to study each object. After returning home, write down specific things about the objects. Make notes of the size, the shape, the color.   Activity 2     - Recollection           People tend to get careless about the things in which they are familiar. Complacency especially during routine actions does not exercise the mind. Make a point to look for details and notice things as often as possible. Have you noticed the number of steps you need to climb from the ground to reach 3rd floor and 4th floor  at QBurst.63,85)