Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

•Transferir como PPTX, PDF•

5 gostaram•1,670 visualizações

Teradata has been hard at work on Presto, and we want to share with you what we've done so far and our roadmap going forward. From presto-admin, a tool for installing and administering Presto, to YARN/Ambari support, to fully certified JDBC and ODBC drivers, we are committed to making Presto the best, most enterprise-ready SQL-on Hadoop solution out there.

Tecnologia

Hello, Enterprise! Meet Presto
Teradata Contributions to Presto
10/6/15
Christina Wallin

2
• Teradata Center for Hadoop
• Formerly Hadapt, the first SQL-on-Hadoop company (founded in 2010)
• Offices in Boston and Warsaw, some remote employees in CA and CT
• Around 20 employees working on Presto
• Contributors to the open source project Presto!
Who are we?

3
What is Presto?
• 100% open source distributed ANSI SQL engine for Big Data
– Modern architecture and implementation
– Proven scalability and performance
– Optimized for low latency, interactive querying
• Cross platform query capability, not only SQL on Hadoop
• Distributed under the Apache license, now supported by Teradata
• Used by a community of well known, well respected technology companies

4
Presto Architecture
Coordinator
Parser/
analyzer
Planner Scheduler
Worker
Client
Worker
Worker

5
Presto Pluggable Data sources Capabilities
Push-down to Hadoop System Push-down to
Other Database
HADOOP HDFS
OTHER
DATABASES
HADOOP
KAFKA
Hadoop
HADOOP
PRESTO
Push-down
to NoSQL
Databases
NOSQL
DATABASES

6
Teradata Contributions to Presto
Implement Integrate Proliferate
• Installer
• Documentation
• Monitoring & Support
Tools
• Management Tool
Integration
• YARN Integration
ODBC Driver
• JDBC Driver
• BI Certification
• Security
• Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage

8
• presto-admin can:
– Install and uninstall Presto
– Deploy configuration files across the cluster
– Start/stop/restart Presto servers
– Show you the status of the cluster
– Add and remove connectors
– Upgrade Presto to a different version
– Collect logs, query info, system info for support
• Additionally, we added an RPM for Presto
• https://github.com/prestodb/presto-admin
presto-admin: a tool to manage and install Presto

10
Ambari Integration (Work In Progress)
• http://github.com/prestodb/ambari-presto-service

15
Resource Allocation with YARN
• Slated for Q4 2015
• Allow Presto to run its services within YARN containers so that YARN
knows about memory/CPU allocated to Presto.
– Using Apache Slider
– The allocation is fixed and upfront
– Supports HDP and CDH Hadoop Versions
• YARN CGroups Integration
• http://github.com/prestodb/presto-yarn

17
• Improved ODBC driver -- Q4 2015
• Improved JDBC driver -- Q1 2016
• Certification against Tableau, Qlik, etc. – mid 2016
Unleashing Presto on Business Intelligence Tools

18
• Current Contributions
– DECIMAL type (WIP)
– Additional smaller things – new functions, bug fixes, TIMESTAMP support for
Parquet
• Future goal: Support TPC-H and TPC-DS unmodified!
– Additional subquery and join support
– EXISTS, EXCEPT, INTERSECT
– Various other odds and ends
Expanded ANSI SQL Support

20
• https://github.com/facebook/presto
• https://github.com/prestodb/presto-admin
• Certified distro: http://www.teradata.com/presto/
– Also can download VM images pre-installed with Presto
How can I give Presto a try?

Mais conteúdo relacionado

Mais procurados

Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.

Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso

Presto@Netflix Presto Meetup 03-19-15Zhenxiao Luo

Presto at TwitterBill Graham

Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito

Presto: Distributed sql query engine kiran palaka

PrestoKnoldus Inc.

Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya

Bullet: A Real Time Data Query EngineDataWorks Summit

Presto updates to 0.178Kai Sasaki

Presto@UberZhenxiao Luo

How to ensure Presto scalability  in multi use case Kai Sasaki

From Batch to Streaming ET(L) with Apache ApexDataWorks Summit

HBaseConEast2016: Splice machine open source rdbmsMichael Stack

Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang

Membase Meetup 2010Membase

Presto in my_use_casewyukawa

Building Distributed Data Streaming SystemAshish Tadose

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA

Automatic Scaling Iterative ComputationsGuozhang Wang

Mais procurados (20)

Presto meetup 2015-03-19 @Facebook

Presto at Facebook - Presto Meetup @ Boston (10/6/2015)

Presto@Netflix Presto Meetup 03-19-15

Presto at Twitter

Presto @ Treasure Data - Presto Meetup Boston 2015

Presto: Distributed sql query engine

Presto

Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...

Bullet: A Real Time Data Query Engine

Presto updates to 0.178

Presto@Uber

How to ensure Presto scalability  in multi use case

From Batch to Streaming ET(L) with Apache Apex

HBaseConEast2016: Splice machine open source rdbms

Building Realtim Data Pipelines with Kafka Connect and Spark Streaming

Membase Meetup 2010

Presto in my_use_case

Building Distributed Data Streaming System

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine

Automatic Scaling Iterative Computations

Destaque

Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Matt Fuller

Presto as a Service - Tips for operation and monitoringTaro L. Saito

Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB

Presto overviewShixiong Zhu

PrestoMK JUNG

Presto Meetup 2016 Small StartHiroshi Toyama

Presto Meetup @ Facebook (3/22/2016)Martin Traverso

AWS Meet-up: Logging At Scale on AWSChris Riddell

Prestogres internalsSadayuki Furuhashi

Future of Data Meetup : BoontadataAbdelkrim Hadjidj

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.

Big Data: SQL query federation for Hadoop and RDBMS dataCynthia Saracco

Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit

Amazon EMR Facebook Presto Meetupstevemcpherson

Presto changesN Masahiro

Presto in my_use_case2wyukawa

Presto - SQL on anythingGrzegorz Kokosiński

Internals of Presto ServiceTreasure Data, Inc.

Teradata Big Data London SeminarHortonworks

Destaque (20)

Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)

Presto as a Service - Tips for operation and monitoring

Understanding Presto - Presto meetup @ Tokyo #1

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

Presto overview

Presto

Presto Meetup 2016 Small Start

Presto Meetup @ Facebook (3/22/2016)

AWS Meet-up: Logging At Scale on AWS

Prestogres internals

Future of Data Meetup : Boontadata

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...

Big Data: SQL query federation for Hadoop and RDBMS data

Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse

Amazon EMR Facebook Presto Meetup

Presto changes

Presto in my_use_case2

Presto - SQL on anything

Internals of Presto Service

Teradata Big Data London Seminar

Semelhante a Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

Open Source SQL for Hadoop: Where are we and Where are we Going?DataWorks Summit

Hortonworks.bdbEmil Andreas Siemes

Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.

Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367

Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks

Bi on Big Data - Strata 2016 in LondonDremio Corporation

Hitachi Data Systems Hadoop SolutionHitachi Vantara

Twitter with hadoop for oowGwen (Chen) Shapira

Munich HUG 21.11.2013Emil Andreas Siemes

Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho

Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media

Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt

Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.

Savanna - Elastic Hadoop on OpenStackSergey Lukjanov

UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business IntelligenceJonathan Pletzke

Talend for big_data_intorductionLakshman Dhullipalla

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.

Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel

Summer Shorts: Big Data Integrationibi

Webinar: What's new in CDAP 3.5?Cask Data

Semelhante a Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015) (20)

Open Source SQL for Hadoop: Where are we and Where are we Going?

Hortonworks.bdb

Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud

Building Scalable Big Data Infrastructure Using Open Source Software Presenta...

Teradata - Presentation at Hortonworks Booth - Strata 2014

Bi on Big Data - Strata 2016 in London

Hitachi Data Systems Hadoop Solution

Twitter with hadoop for oow

Munich HUG 21.11.2013

Big Data Integration Webinar: Getting Started With Hadoop Big Data

Self-Service BI for big data applications using Apache Drill (Big Data Amster...

Modernizing Global Shared Data Analytics Platform and our Alluxio Journey

Savanna - Elastic Hadoop on OpenStack

UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence

Talend for big_data_intorduction

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack

Piranha vs. mammoth predator appliances that chew up big data

Summer Shorts: Big Data Integration

Webinar: What's new in CDAP 3.5?

Último

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

A Call to Action for Generative AI in 2024Results

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

1. Hello, Enterprise! Meet Presto Teradata Contributions to Presto 10/6/15 Christina Wallin

2. 2 • Teradata Center for Hadoop • Formerly Hadapt, the first SQL-on-Hadoop company (founded in 2010) • Offices in Boston and Warsaw, some remote employees in CA and CT • Around 20 employees working on Presto • Contributors to the open source project Presto! Who are we?

3. 3 What is Presto? • 100% open source distributed ANSI SQL engine for Big Data – Modern architecture and implementation – Proven scalability and performance – Optimized for low latency, interactive querying • Cross platform query capability, not only SQL on Hadoop • Distributed under the Apache license, now supported by Teradata • Used by a community of well known, well respected technology companies

4. 4 Presto Architecture Coordinator Parser/ analyzer Planner Scheduler Worker Client Worker Worker

5. 5 Presto Pluggable Data sources Capabilities Push-down to Hadoop System Push-down to Other Database HADOOP HDFS OTHER DATABASES HADOOP KAFKA Hadoop HADOOP PRESTO Push-down to NoSQL Databases NOSQL DATABASES

6. 6 Teradata Contributions to Presto Implement Integrate Proliferate • Installer • Documentation • Monitoring & Support Tools • Management Tool Integration • YARN Integration ODBC Driver • JDBC Driver • BI Certification • Security • Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage

7. 7 Easy Installation and Administration

8. 8 • presto-admin can: – Install and uninstall Presto – Deploy configuration files across the cluster – Start/stop/restart Presto servers – Show you the status of the cluster – Add and remove connectors – Upgrade Presto to a different version – Collect logs, query info, system info for support • Additionally, we added an RPM for Presto • https://github.com/prestodb/presto-admin presto-admin: a tool to manage and install Presto

9. 9 Hadoop Ecosystem Integration

10. 10 Ambari Integration (Work In Progress) • http://github.com/prestodb/ambari-presto-service

11. 11

12. 12

13. 13

14. 14

15. 15 Resource Allocation with YARN • Slated for Q4 2015 • Allow Presto to run its services within YARN containers so that YARN knows about memory/CPU allocated to Presto. – Using Apache Slider – The allocation is fixed and upfront – Supports HDP and CDH Hadoop Versions • YARN CGroups Integration • http://github.com/prestodb/presto-yarn

16. 16 Enterprise Database Features

17. 17 • Improved ODBC driver -- Q4 2015 • Improved JDBC driver -- Q1 2016 • Certification against Tableau, Qlik, etc. – mid 2016 Unleashing Presto on Business Intelligence Tools

18. 18 • Current Contributions – DECIMAL type (WIP) – Additional smaller things – new functions, bug fixes, TIMESTAMP support for Parquet • Future goal: Support TPC-H and TPC-DS unmodified! – Additional subquery and join support – EXISTS, EXCEPT, INTERSECT – Various other odds and ends Expanded ANSI SQL Support

19. 19 Demo of presto-admin!

20. 20 • https://github.com/facebook/presto • https://github.com/prestodb/presto-admin • Certified distro: http://www.teradata.com/presto/ – Also can download VM images pre-installed with Presto How can I give Presto a try?

21. 21 Questions?

22. 22

Notas do Editor

Interactive performance of execution engine Code generation for operators (similarly to Impala) Data is pipelined MPP-style Runs at Facebook scale *Capable of querying other non-HDFS data stores as well*
Add information specific to your understanding of the client challenges or objectives that would lead to an analytic roadmap. This should be very tailored to the client audience.
Presto-Yarn Integration objective - resource allocation meant for long running services. In addition for cases where Presto and Hadoop share the same hardware (or cluster) Yarn integration also provides an unified way of accounting and monitoring of cluster utilization. The goal of this is to be transparent to YARN about how much RAM / CPU was allocated to Presto so that less is available to other YARN applications (MapReduce, Tez, etc.) The allocation is fixed and upfront - no dynamic changes to resource allocation supported for Phase 2. To reconfigure memory/cpu settings, a restart is necessary. YARN has introduced support for CPU sharing (via CGroups). Currently, CGroups is only used for limiting CPU usage. So we will leverage this to limit Presto in the CPU usage. (Slider also has some CPU resource sharing support) Apache Slider is a YARN application to deploy existing distributed applications on YARN, monitor them and make them larger or smaller as desired . Slider’s objective is to make it easy for existing distributed applications, like Presto, to be deployed on a YARN cluster without changes and with little or no custom code.
Untar presto-admin & install ./presto-admin server install presto-server-rpm.rpm ./presto-admin server start Pause briefly so that the coordinator finds the workers ./presto-admin server status ./presto-admin configuration show Cat hive.properties Mv hive.properties /opt/prestoadmin/connectors ./presto-admin connector add hive ./presto-admin server restart wait ./presto-admin server status Presto CLI: ./presto –server localhost:8080 –catalog hive –schema default show tables; Create table lineitem as select * from tpch.1gb.lineitem; Select count(*) from lineitem;

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

Semelhante a Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015) (20)

Último

Último (20)

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

Notas do Editor