SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
1
Boston Hadoop User Group Meetup, July 7, 2015
Kamil Bajda-Pawlikowski
Matt Fuller
2
•  History of Teradata Center for Hadoop
–  Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and
Abadi
–  Pioneered SQL-on-Hadoop market
–  Based on work done by database research group in Yale Computer Science
Department
–  Hybrid of Hadoop scalability and DBMS performance
•  Today
–  Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
–  30 developers with deep Hadoop and database expertise
–  Headquarters in Boston, MA
–  Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3
•  What is Presto?
•  What is Teradata doing?
•  Can I see a Demo?
•  How can I contribute?
Talk Agenda
4
•  100% open source distributed ANSI SQL engine for Big Data
–  Modern code base
–  Proven scalability
–  Optimized for low latency, Interactive querying
•  Cross platform query capability, not only SQL on Hadoop
•  Distributed under the Apache license, now supported by Teradata
•  Used by a community of well known, well respected technology companies
What is Presto?
5
History of Presto
FALL 2012
4 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook
open sources
Hive
Timeline image courtesy of Facebook
6
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
7
Presto Extensibility – connectors
Parser/
analyzer
Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
…
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Data stream API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
8
•  Data stays in memory during execution and is pipelined across nodes MPP-style
•  Vectorized columnar processing
•  Presto is written in highly tuned Java
–  Efficient in-memory data structures
–  Very careful coding of inner loops
–  Bytecode generation
•  Optimized ORC reader
Presto = Performance
9
•  Facebook
–  Multiple production clusters (100s of nodes total)
-  Including 300PB Hadoop data warehouse
–  1000s of internal daily active users
–  Millions of queries each month
–  Multiple PBs scanned every day
–  Trillions of rows a day
•  Netflix
–  Over 200-node production cluster on EC2
–  Over 15 PB in S3 (Parquet format)
–  Over 300 users and 2.5K queries daily
Presto in Production
10
•  100% open source contributions to Presto to
increase adoption in the enterprise
•  A multi-year roadmap commitment to
phased enhancements of the open source
code
•  The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto
11
•  Hadoop Distro Agnostic
•  Modern Code Base
–  Presto is well-designed open source software with proper database
architecture
•  Strong Like-Minded Community
•  Push down processing across multiple data platforms
•  Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
12
Demo Time!
13
Implement Integrate Proliferate
•  Installer
•  Documentation
•  Monitoring & Support
Tools
•  Management Tool
Integration
•  YARN Integration
•  ODBC / JDBC Drivers
•  BI Certification
•  Security
•  Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto
14
•  Ease of install and management via Presto-Admin tool
–  www.github.com/prestodb/presto-admin
–  Packaging Presto as an RPM
•  Testing Framework for Presto
–  www.github.com/prestodb/tempto
–  Added large number of tests
•  Improvements to JDBC driver
–  To be open sourced on www.github.com/prestodb soon!
•  Various SQL improvements
Teradata’s Contributions
15
•  YARN Integration
•  Ambari Integration
•  ODBC & JDBC Drivers that actually work
•  Security – Authentication & Authorization
•  Continued SQL Improvements
•  BI tool certifications – e.g. Tableau
•  More Connectors – e.g. Hbase
•  Open Source our Docker based Dev Env
•  Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap
16
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto User’s Group: www.groups.google.com/group/presto-users
Facebook Page: www.facebook.com/prestodb
Twitter: #prestodb
How can I contribute?
17
Available for Download
–  Presto 101t Server, CLI, JDBC
–  Presto-Admin 0.1
–  Documentation
–  HDP w/ Presto VM Sandbox
–  CDH w/ Presto VM Sandbox
www.teradata.com/presto
Presto 101t certified by Teradata
18

Mais conteúdo relacionado

Mais procurados

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 

Mais procurados (20)

Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Presto
PrestoPresto
Presto
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 

Destaque

Demystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data Governance
Health Catalyst
 

Destaque (20)

What is the best Healthcare Data Warehouse Model for Your Organization?
What is the best Healthcare Data Warehouse Model for Your Organization?What is the best Healthcare Data Warehouse Model for Your Organization?
What is the best Healthcare Data Warehouse Model for Your Organization?
 
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOsA New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
 
Is Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Is Value-Based Healthcare Here to Stay? Looking for Answers in New PoliciesIs Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Is Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
 
INIA- CISA: Análisis de las amenazas en la fauna silvestre
INIA- CISA: Análisis de las amenazas en la fauna silvestreINIA- CISA: Análisis de las amenazas en la fauna silvestre
INIA- CISA: Análisis de las amenazas en la fauna silvestre
 
Getting The Most Out of Your Data Analyst - HAS Session 9
Getting The Most Out of Your Data Analyst - HAS Session 9Getting The Most Out of Your Data Analyst - HAS Session 9
Getting The Most Out of Your Data Analyst - HAS Session 9
 
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
 
Five Strategies for Easing the Burden of Clinical Quality Measures
Five Strategies for Easing the Burden of Clinical Quality MeasuresFive Strategies for Easing the Burden of Clinical Quality Measures
Five Strategies for Easing the Burden of Clinical Quality Measures
 
7 Features of Highly Effective Outcomes Improvement Projects
7 Features of Highly Effective Outcomes Improvement Projects7 Features of Highly Effective Outcomes Improvement Projects
7 Features of Highly Effective Outcomes Improvement Projects
 
The Who, What, and How of Health Outcome Measures
The Who, What, and How of Health Outcome MeasuresThe Who, What, and How of Health Outcome Measures
The Who, What, and How of Health Outcome Measures
 
The Top Five Recommendations for Improving the Patient Experience
The Top Five Recommendations for Improving the Patient ExperienceThe Top Five Recommendations for Improving the Patient Experience
The Top Five Recommendations for Improving the Patient Experience
 
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
 
Healthcare Interoperability: New Tactics and Technology
Healthcare Interoperability: New Tactics and TechnologyHealthcare Interoperability: New Tactics and Technology
Healthcare Interoperability: New Tactics and Technology
 
Why You Need to Understand Value-Based Reimbursement and How to Survive It
Why You Need to Understand Value-Based Reimbursement and How to Survive ItWhy You Need to Understand Value-Based Reimbursement and How to Survive It
Why You Need to Understand Value-Based Reimbursement and How to Survive It
 
The Top 7 Outcomes Measures and 3 Measurement Essentials
The Top 7 Outcomes Measures and 3 Measurement EssentialsThe Top 7 Outcomes Measures and 3 Measurement Essentials
The Top 7 Outcomes Measures and 3 Measurement Essentials
 
Outcomes improvement: what you get when you mix good data with physician enga...
Outcomes improvement: what you get when you mix good data with physician enga...Outcomes improvement: what you get when you mix good data with physician enga...
Outcomes improvement: what you get when you mix good data with physician enga...
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
 
Why We Need to Shift Healthcare Quality Measures from Volume to Value
Why We Need to Shift Healthcare Quality Measures from Volume to ValueWhy We Need to Shift Healthcare Quality Measures from Volume to Value
Why We Need to Shift Healthcare Quality Measures from Volume to Value
 
Demystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data Governance
 
Understanding Risk Stratification, Comorbidities, and the Future of Healthcare
Understanding Risk Stratification, Comorbidities, and the Future of HealthcareUnderstanding Risk Stratification, Comorbidities, and the Future of Healthcare
Understanding Risk Stratification, Comorbidities, and the Future of Healthcare
 

Semelhante a Boston Hadoop Meetup: Presto for the Enterprise

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 

Semelhante a Boston Hadoop Meetup: Presto for the Enterprise (20)

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Apache drill
Apache drillApache drill
Apache drill
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClassECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
 
Interoperability Ms Sap
Interoperability Ms SapInteroperability Ms Sap
Interoperability Ms Sap
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration Services
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
London Oracle Developer Meetup April 18
London Oracle Developer Meetup April 18London Oracle Developer Meetup April 18
London Oracle Developer Meetup April 18
 

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Último (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 

Boston Hadoop Meetup: Presto for the Enterprise

  • 1. 1 Boston Hadoop User Group Meetup, July 7, 2015 Kamil Bajda-Pawlikowski Matt Fuller
  • 2. 2 •  History of Teradata Center for Hadoop –  Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi –  Pioneered SQL-on-Hadoop market –  Based on work done by database research group in Yale Computer Science Department –  Hybrid of Hadoop scalability and DBMS performance •  Today –  Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop –  30 developers with deep Hadoop and database expertise –  Headquarters in Boston, MA –  Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
  • 3. 3 •  What is Presto? •  What is Teradata doing? •  Can I see a Demo? •  How can I contribute? Talk Agenda
  • 4. 4 •  100% open source distributed ANSI SQL engine for Big Data –  Modern code base –  Proven scalability –  Optimized for low latency, Interactive querying •  Cross platform query capability, not only SQL on Hadoop •  Distributed under the Apache license, now supported by Teradata •  Used by a community of well known, well respected technology companies What is Presto?
  • 5. 5 History of Presto FALL 2012 4 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive Timeline image courtesy of Facebook
  • 6. 6 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
  • 7. 7 Presto Extensibility – connectors Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
  • 8. 8 •  Data stays in memory during execution and is pipelined across nodes MPP-style •  Vectorized columnar processing •  Presto is written in highly tuned Java –  Efficient in-memory data structures –  Very careful coding of inner loops –  Bytecode generation •  Optimized ORC reader Presto = Performance
  • 9. 9 •  Facebook –  Multiple production clusters (100s of nodes total) -  Including 300PB Hadoop data warehouse –  1000s of internal daily active users –  Millions of queries each month –  Multiple PBs scanned every day –  Trillions of rows a day •  Netflix –  Over 200-node production cluster on EC2 –  Over 15 PB in S3 (Parquet format) –  Over 300 users and 2.5K queries daily Presto in Production
  • 10. 10 •  100% open source contributions to Presto to increase adoption in the enterprise •  A multi-year roadmap commitment to phased enhancements of the open source code •  The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
  • 11. 11 •  Hadoop Distro Agnostic •  Modern Code Base –  Presto is well-designed open source software with proper database architecture •  Strong Like-Minded Community •  Push down processing across multiple data platforms •  Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
  • 13. 13 Implement Integrate Proliferate •  Installer •  Documentation •  Monitoring & Support Tools •  Management Tool Integration •  YARN Integration •  ODBC / JDBC Drivers •  BI Certification •  Security •  Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto
  • 14. 14 •  Ease of install and management via Presto-Admin tool –  www.github.com/prestodb/presto-admin –  Packaging Presto as an RPM •  Testing Framework for Presto –  www.github.com/prestodb/tempto –  Added large number of tests •  Improvements to JDBC driver –  To be open sourced on www.github.com/prestodb soon! •  Various SQL improvements Teradata’s Contributions
  • 15. 15 •  YARN Integration •  Ambari Integration •  ODBC & JDBC Drivers that actually work •  Security – Authentication & Authorization •  Continued SQL Improvements •  BI tool certifications – e.g. Tableau •  More Connectors – e.g. Hbase •  Open Source our Docker based Dev Env •  Open our Continuous Integration platform to the community Teradata’s Contribution Product Roadmap
  • 16. 16 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto User’s Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
  • 17. 17 Available for Download –  Presto 101t Server, CLI, JDBC –  Presto-Admin 0.1 –  Documentation –  HDP w/ Presto VM Sandbox –  CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata
  • 18. 18