O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

TDC2016SP - Trilha BigData

144 visualizações

Publicada em

SQL in the BigData Era

Publicada em: Educação
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

TDC2016SP - Trilha BigData

  1. 1. Globalcode – Open4education Big Data – SQL In The Big Data Era Rafael Aguiar Data Science Engineer @InLocoMedia
  2. 2. Globalcode – Open4education Agenda Contexto Definição de Big Data Um mapa do ecossistema Apache Hive Apache Hue Por onde começar
  3. 3. Globalcode – Open4education Mobile Ad Network baseada em localização de alta precisão (1-3m) Terabytes de dados comprimidos/mês Como entender padrões de visita? Como recomendar melhores anúncios?
  4. 4. Globalcode – Open4education Big Data “Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” McKinsey (2011)
  5. 5. Globalcode – Open4education Ecossistema
  6. 6. Globalcode – Open4education Apache Hive The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It provides: Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis. A mechanism to impose structure on a variety of data formats Query execution via Apache Tez, Apache Spark, or MapReduce
  7. 7. Globalcode – Open4education Apache Hive
  8. 8. Globalcode – Open4education Apache Hive Quando usar o Hive? Você já sabe SQL e quer começar a processar grandes datasets sem quebrar a cabeça Você precisa rodar um job rapidamente e não tem tempo hábil para escrever um código limpo e otimizado
  9. 9. Globalcode – Open4education Apache Hive CREATE TABLE tdc_participants ( name STRING, age INT, skills ARRAY <STRING>, likes_beer BOOLEAN, home_town STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2. OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "'", "escapeChar" = "" ) STORED AS TEXTFILE; SELECT home_town, count(*) FROM tdc_participants WHERE array_contains(skills, "big-data") AND likes_beer = TRUE GROUP BY home_town;
  10. 10. Globalcode – Open4education Apache Hive CREATE TEMPORARY FUNCTION st_linestring AS "com.esri.hadoop.hive.ST_LineString"; CREATE TEMPORARY FUNCTION st_setsrid AS "com.esri.hadoop.hive.ST_SetSRID"; CREATE TEMPORARY FUNCTION st_geodesiclengthwgs84 AS "com.esri.hadoop.hive.ST_GeodesicLengthWGS84"; CREATE TABLE location (id STRING, lat DOUBLE, lng DOUBLE, epoch BIGINT) {...}; SET hivevar:PLACE_OF_INTEREST= named_struct("lat",1.0, "lng", 1.0); SET hivevar:MAX_DISTANCE = 10; SET hivevar:SPATIAL_REF_ID = 4326; SELECT count(distinct id) From location WHERE location.lat IS NOT NULLAND location.lng IS NOT NULLAND st_geodesiclengthwgs84( st_setsrid( st_linestring( ${hivevar:PLACE_OF_INTEREST}.lng, ${hivevar:PLACE_OF_INTEREST}.lat, location.lng, location.lat), ${hivevar:SPATIAL_REF_ID})) < ${hivevar:MAX_DISTANCE};
  11. 11. Globalcode – Open4education Apache Hue http://demo.gethue.com/
  12. 12. Globalcode – Open4education Por onde começar https://hive.apache.org/ http://gethue.com/ Programming Hive, by Edward Capriolo https://github.com/Prokopp/the-free-hive-book
  13. 13. Globalcode – Open4education Rafael Aguiar rafael@inlocomedia.com @rafadaguiar #TDCHive
  14. 14. Globalcode – Open4education Obrigado!

×