O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

TDC2016SP - Trilha BigData

219 visualizações

Publicada em

Google Cloud Bigquery, um jeito novo de fazer Analytics e Datawarehouse

Publicada em: Educação
  • Seja o primeiro a comentar

TDC2016SP - Trilha BigData

  1. 1. Google Cloud Bigquery
  2. 2. +Michel Pereira @michelpereira Cloud Solutions Engineer Google Cloud
  3. 3. Enterprise Google Cloud is The Datacenter as a Computer
  4. 4. Photo credit: Matt Chan
  5. 5. Google Research Publications
  6. 6. Open Source Implementations Bigtable Flume Dremel
  7. 7. Managed Cloud Versions Bigtable Bigtable Flume Dataflow Dremel BigQuery
  8. 8. Google BigQueryGoogle BigQuery
  9. 9. 02 Count some stuff
  10. 10. SELECT count(word) FROM publicdata:samples.shakespeare Words in Shakespeare
  11. 11. Query complete (1.3s elapsed, 1.27 MB processed)
  12. 12. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_20150212_01] Wikipedia hits over 1 hour
  13. 13. Query complete (1.1s elapsed, 46.7 MB processed)
  14. 14. SELECT SUM(requests) AS total FROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")') Several years of Wikipedia data
  15. 15. Query complete (6.7s elapsed, 460 GB processed)
  16. 16. How about a RegExp SELECT SUM(requests) AS total FROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")') WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))
  17. 17. Query complete (21.0s elapsed, 3.31 TB processed)
  18. 18. 03 How did it do that? o_O
  19. 19. Qualities of a good RDBMS
  20. 20. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  21. 21. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  22. 22. Storing data - Break it, Compress it, Spread it -- -- -- -- -- -- -- -- -- -- -- -- Table Columns Disks
  23. 23. Reading data: Life of a BigQuery SELECT sum(requests) as sum FROM ( SELECT requests, title FROM [fh-bigquery:wikipedia. pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )
  24. 24. Life of a BigQuery - Nodes L L MMixer Leaf Storage
  25. 25. L L L L M M M Life of a BigQuery - Nodes Root Mixer Mixer Leaf Storage
  26. 26. Life of a BigQuery - Execution Plan Query L L L L M M MRoot Mixer Mixer Leaf Storage
  27. 27. Life of a BigQueryLife of a BigQuery - Map Root Mixer Mixer Leaf Storage SELECT requests, title L L L L M M M
  28. 28. Life of a BigQueryLife of a BigQuery - Filter, Group and Count Root Mixer Mixer Leaf Storage 5.4 Bil SELECT requests, title WHERE (REGEXP_MATCH(title, '[Jj]en.+'))L L L L M M M
  29. 29. Life of a BigQueryLife of a BigQuery - Group and Count Root Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title L L L L M M M
  30. 30. Life of a BigQueryLife of a BigQuery - Aggregate at the Root Root Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title SELECT sum(requests) L L L L M M M

×