O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Sub-second-sql-on-hadoop-at-scale

We will talk about two real-world challenging SQL on Hadoop use cases: #1 Highly Parallel Workload Over Massive Data, #2 Sub-second SQL for Online Reporting. The challenge is to meet very strict performance requirement over hundreds of billions of data. We will introduce how we solved these challenges using Hive on Tez, Hive LLAP and Phoenix. With real-life performance number!

Sub-second-sql-on-hadoop-at-scale

  1. 1. Sub-second SQL on Hadoop at Scale Yifeng Jiang Solutions Engineer, Hortonworks 2015/11/23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  2. 2. About Me 蒋 燚峰 (Yifeng Jiang) •  Solutions Engineer, Hortonworks •  Apache HBase book author •  I like hiking •  Twitter: @uprush
  3. 3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved SQL on Hadoop Solutions •  Hive on Tez •  The de facto standard of SQL on Hadoop for interactive SQL •  One tool, all big data SQL use cases: ETL, reporting, BI, analytics, etc. •  Hive LLAP •  To make Hive even faster
  4. 4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved SQL on Hadoop Solutions – Cont. •  Phoenix •  High performance relational database layer over HBase for low latency applications •  Spark SQL •  Spark's module for working with structured data •  Kylin •  Extreme OLAP engine for big data open sourced by eBay •  Not supported by Hortonworks
  5. 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Agenda Two Real-life SQL on Hadoop Use Cases •  Use Case #1: Highly Parallel Workload Over Massive Data •  Use Case #2: Sub-second SQL on Hadoop at Scale
  6. 6. © Hortonworks Inc. 2015. All Rights Reserved Highly Parallel Workload Over Massive Data Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  7. 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Case #1 Batch Reporting Massive Dataset •  13 months, 450B+ rows of data •  Adding 1.3B rows of data per day Highly Parallel Workload •  100K reports per day •  15K reports per hour Input Dataset
  8. 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Key Hive Optimization Hive on Tez selected Four Hive on Tez optimization points •  Partitioning •  Data loading •  Query execution •  Parallel tuning
  9. 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Partitioning Maximizing the number of partitions •  Basic and most important for performance •  Only read relevant data Keep the total number under a couple thousand partitions •  Hive seems to be able to handle this for queries very well CREATE TABLE access_logs ( host string, path string, referrer string, … ) PARTITIONED BY ( site int, ymd date )
  10. 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Loading Load data into Hive table stored as ORC Three main ORC parameters •  File system block size: 256MB •  Stripe size: 64MB •  Compression: ZLIB •  ZLIB is highly optimized in new hive versions
  11. 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Loading – Cont. Make sure ORC files are big enough •  Between 1 and 10 HDFS blocks if possible •  Avoid having lots of reducers that write to all partitions •  Enable optimize sort dynamic partitioning •  Or use DISTRIBUTED BY clause •  We chose DISTRIBUTED BY for fine grained control INSERT INTO orc_sales PARTITION ( country ) SELECT FROM daily_sales DISTRIBUTE BY country, gender;
  12. 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Execution A query execution essentially is put together from •  Client execution [ 0s if done correctly ] •  Optimization [HiveServer2] [~ 0.1s] •  HCatalog lookups [Hcatalog, Metastore] [ very fast in hive 0.14 ] •  Application Master creation [4-5s] •  Container Allocation [3-5s] •  Query Execution YARN and HDFS HiveServer2 Server #1 Client Running testing tool N connections N connections Metastore Metastore DB HiveServer2 Server #2 Tez AM Tez Container Tez Container …
  13. 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Execution – Cont. Connection setup has high overhead •  Open one connection and execute large number of queries •  Standard connection pooling Distribute Queries to 2 Hive Servers •  HiveServer2 becomes bottleneck at roughly 8-15 queries/s •  Deploy multiple Hive Servers through Ambari •  New fix out parallelizing query compilation in later versions
  14. 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Execution – Cont. Re-use Tez Session, Pre-warming •  Reinitializing Tez session takes 5+ seconds •  Turn on Tez session re-use •  Tez sessions can be pre-initialized with pre-warm (with some drawbacks). •  With pre-warm, full speed is practically instantaneous
  15. 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Execution – Cont. Re-use Tez container •  Re-creating containers takes 3 seconds •  Enable container reuse and keeping containers for a small period of time •  Key is to reach 100% utilization without wasting resources
  16. 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 0 10 20 30 40 50 60 70 80 90 100 0 2 4 6 8 10 12 24 48 60 72 84 96 112 136 148 172 184 196 204 216 228 Queries Per Second Cluster Utilization ( Memory ) Tuning for Parallel Execution The most important point for many real-world scenarios •  In most query tuning scenarios this is at first ignored •  Oftentimes single queries benefit from additional resources but this can reduce throughput Tez memory settings are key for parallelization •  With optimized Tez memory settings, we achieved 90+% CPU utilization in cluster Cluster UtilizationQueries Per Second Query Concurrency
  17. 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive vs. Impala Performance Benchmark Hive performance •  Most SQLs response within 20s •  Max to 70s for big result set Impala performance •  Many SQLs took 30s to 90s •  Big result set SQLs took more than 10m •  Notable performance degradation during parallel execution Benchmark Blog Number of queries by response time
  18. 18. © Hortonworks Inc. 2015. All Rights Reserved Sub-second SQL on Hadoop at Scale Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  19. 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Sub-second SQL Use Case Online Reporting •  Interactive online reporting •  Query is relatively simple •  Massive dataset •  Low latency requirement •  Sub-second response for most queries •  Up to several seconds for big queries •  Highly parallel 0 10000 20000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 SELECT account, yyyymmdd, sum(total_imps), sum(total_click), ... FROM table_x WHERE yyyymmdd >= xxx AND yyyymmdd < xxx AND account = xxx ... GROUP BY account, yyyymmdd, ...;
  20. 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Which One to Go? Apache Kylin Hive LLAP
  21. 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Performance Recap Hive is fast: interactive response •  Tez execution engine (replacing MapReduce) •  ORC columnar file format •  Cost based optimizer (CBO) •  Vectorized SQL engine Hive 0.10 Batch Processing 100-150x Query Speedup Hive 0.14 Human Interactive (5 seconds)
  22. 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive LLAP HDFS LLAP process runs on multiple nodes, accelerating Tez tasks Node Hive Query Node NodeNode Node LLAP LLAP LLAP LLAP LLAP = Live Long And Process
  23. 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Query Execution •  Number of concurrent queries throttled by Hive Server •  Hiver Server compile and optimize queries •  Each Query coordinated independently by a Tez AM •  Hive Operators used for processing •  Tez Runtime components used for data transfer •  Hive decides where query fragments run (LLAP, Container, AM) HiveServer Query/AM Controller Client(s) YARN Cluster AM1 llapd llapd llapd Container AM1 Container AM1 llapd Container AM2 AM2
  24. 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive LLAP -- Key Benefits Performance benefits •  Reduce starting time •  Columnar data cache •  Long-lived process is easy to optimize •  JIT, concurrent I/O, etc.
  25. 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tez vs. LLAP •  LLAP is 7 times faster for very small queries •  2 times faster for heavy queries •  1.5 times faster for high result size 0 5 10 15 20 25 avg max Day20 max Day200 max DMA20 max DMA200 max Landing Tez LLAP max q1 max q2 max q3 max q4 max q5avg
  26. 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scalability •  LLAP scales to 30 q/s at our cluster •  Additional hive server needed after 20 threads •  Timeline server needs to be disabled •  Impact on query latency around 48 threads and 25 q/s 0 5 10 15 20 25 30 35 40 5 threads 10 threads 1 hs 20 threads 1hs 20 threads 3 HS 48 threads 3 HS 72 threads 3HS 96 threads 3 HS Q/S Q/S 0 5 10 15 20 25 5 threads 10 threads 1 hs 20 threads 1hs 20 threads 3 HS 48 threads 3 HS 72 threads 3HS 96 threads 3 HS Average max Day20 max Day200 max DMA20 max DMA200 max Landing query latency impacted max q1 average max q2 max q3 max q4 max q5
  27. 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP vs. Phoenix: Average Latency 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Average Daily 20 Average Daily 200 Average DMA 20 Average DMA 200 Average Landing Average All LLAP 5 threads LLAP 20 threads 3HS ( 15 q/s ) Phoenix 256 4threads ( 15 q/s ) Phoenix 1024 10 threads ( 15 q/s ) •  All averages under 2s response time •  Fastest average latency with Phoenix at 15 q/s avg q1 avg q2 avg q3 avg q4 avg q5 avg q6
  28. 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 0 2 4 6 8 10 12 14 16 Max Daily 20 Max Daily 200 Max DMA 20 Max DMA 200 Max Landing LLAP 5 threads LLAP 20 threads 3HS ( 15 q/s ) Phoenix 256 4threads ( 15 q/s ) Phoenix 1024 10 threads ( 15 q/s ) LLAP vs. Phoenix: Max Latency max q1 max q2 max q3 max q4 max q5 •  Result size and scan size impact latency •  Fastest is Phoenix with 1024 regions •  Bottleneck on LLAP seems to be transferring result through HS to client •  Patch in work: HIVE-12049
  29. 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Phoenix Details •  Phoenix scales at least up to 40 q/s with 3 clients at our cluster 0 5 10 15 20 25 30 35 40 45 256 RS 4 threads 256 RS 10 threads 1024 RS 4 threads 1024 RS 10 threads 1024 RS 10 threads 3 clients Q/S 40% higher throughput at 256 regions 0 2 4 6 8 10 12 14 16 18 Average max Day20 max Day200 max DMA20 max DMA200 max Landing 256 RS 4 threads 256 RS 10 threads 1024 RS 4 threads 1024 RS 10 threads 1024 RS 10 threads 3 clients 3x faster big scan query at 1024 regions max q1average max q2 max q3 max q4 max q5
  30. 30. Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tuning Points for Phoenix •  Skip Scan •  Merge Sort Patch for Client •  Splitting Table using Hash •  HBase & Phoenix Configurations
  31. 31. Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Phoenix Skip Scan •  Using Skip Scan improved query performance by order of magnitude •  5-10 times faster •  Allows Phoenix to skip unneeded sub-keys •  E.g., skip from day to day SELECT * from T WHERE ((KEY1 >='a' AND KEY1 <= 'b') OR (KEY1 > 'c' AND KEY1 <= 'e')) AND KEY2 IN (1, 2) Ref: http://phoenix-hbase.blogspot.jp/2013/05/demystifying-skip-scan-in-phoenix.html
  32. 32. Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Phoenix Merge Sort Patch •  Phoenix bottleneck for large result sets was client side merge •  Patch: PHOENIX-2126 •  6 times faster for biggest result query by fixing slow merge-sort 0 10 20 30 40 50 60 70 80 Max Landing Max Daily 20 Max Daily 200 Max DMA 20 Max DMA 200 Phoenix Unpatched Phoenix Patched max q5 max q1 max q2 max q3 max q4
  33. 33. Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Phoenix Splitting Table •  Salted Tables •  Automatically salted on row key •  Manual split point definition •  Minimize result set size ( client side merge ) •  Increase Parallelization ( server side aggregation ) •  Create original salt based on group by keys •  Divide table into N regions in order to maximize CPU usage CREATE TABLE my_table ( salt CHAR(2) NOT NULL, user_id INTEGER NOT NULL, ymd INTEGER NOT NULL, clicks BIGINT, CONSTRAINT pk PRIMARY KEY (salt,user_id,ymd)) SPLIT ON (’01’,’02’, … ,’ff’); CREATE TABLE my_table (a_key VARCHAR PRIMARY KEY, a_col VARCHAR) SALT_BUCKETS = 20;
  34. 34. Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Phoenix Configurations •  Increase Cache use •  -XX:MaxDirectMemorySize = 30720MB •  -hbase.bucketcache.size = 20480MB •  Remove bottlenecks •  hbase.regionserver.handler.count = 240 •  phoenix.query.queueSize = 100000 •  phoenix.query.threadPoolSize = 2048 •  Prevent AutoSplit •  hbase.hregion.max.filesize = 100GB
  35. 35. Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Sub-second SQL Use Case Summary •  Hive LLAP/Tez •  Tez is proven and scalable in heavy batch tasks •  LLAP reduces Hive latency significantly •  LLAP is under active development •  Phoenix [ winner for this particular use case ] •  Sub-second queries possible today •  Simple SQL plays to Phoenix strengths •  PHOENIX-2126 fixes client bottlenecks for large queries
  36. 36. Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  37. 37. Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved SQL on Hadoop at Scale Hive is the de facto standard of SQL on Hadoop •  Hive on Tez for batch and interactive SQL •  Best solution for all general big data SQL use cases: ETL, reporting, BI, analytics, etc. •  LLAP to make Hive even faster •  Hive on Tez and LLAP proved performance at scale Other SQL on Hadoop Options •  Phoenix, Spark SQL, Kylin •  Great tool for particular use case
  38. 38. Page 38 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tweet: #hadooproadshow Thank You

×