Introduction to Dremio

•

5 gostaram•2,170 visualizações

An introduction to self-service data with Dremio. Dremio reimagines analytics for modern data. Created by veterans of open source and big data technologies, Dremio is a fundamentally new approach that dramatically simplifies and accelerates time to insight. Dremio empowers business users to curate precisely the data they need, from any data source, then accelerate analytical processing for BI tools, machine learning, data science, and SQL clients. Dremio starts to deliver value in minutes, and learns from your data and queries, making your data engineers, analysts, and data scientists more productive.

Software

Tomer Shiran
Founder & CEO
Previously VP Product, MapR
Previously Microsoft; IBM
Jacques Nadeau
Founder & CTO
Recognized SQL & NoSQL expert Founder of
Apache Arrow
Kelly Stirman
CMO
Previously VP Strategy, MongoDB
Previously Field CTO, MarkLogic
Ajay Singh
Head of Field Engineering
Previously Technical Alliances, Hortonworks
Previously Alliances, MarkLogic
Ron Avnur
VP Engineering
Previously VP Product, MongoDB
Previously VP Engineering, MarkLogic
Collin Weitzman
VP Customer Success
Previously Sales Executive, Mesosphere
Previously Sales Executive, MapR

Area Dremio Team Members
Columnar memory Creators of Apache Arrow
Columnar storage Creator of Apache Parquet
ETL Tech lead of Twitter analytics data pipeline
UI
UI lead for Apple (iCloud Photos, iTunes
U)
UX UX lead for Splunk
World Class
Technical Team
Top Silicon
Valley VCs

Analytics on modern
data is incredibly hard
Unprecedented complexity

The demands for data
are growing rapidly
Increasing demands
Reporting
New products
Forecasting
Threat detection
BI
Machine
Learning
Segmenting
Fraud prevention

Today you engineer data flows and reshaping
Data Staging
• Custon ETL
• Fragile transforms
• Slow moving
SQL

Today you engineer data flows and reshaping
Data Staging
Data Warehouse
• $$$
• High overhead
• Proprietary lock in
• Custon ETL
• Fragile transforms
• Slow moving
SQL

Today you engineer data flows and reshaping
Data Staging
Data Warehouse
Cubes, BI Extracts &
Aggregation Tables • Data sprawl
• Governance issues
• Slow to update
• $$$
• High overhead
• Proprietary lock in
• Custon ETL
• Fragile transforms
• Slow moving
SQL
+
+
+
+
+
+
+
+
+

✓ Works with any data source
✓ Works with any BI tool
✓ No ETL, no data warehouse, no cubes
✓ Makes data self-service, collaborative
✓ Makes Big Data feel small
✓ Open source
There’s a better way,

Dremio flows, reshapes, and accelerates for you
SQL

Dremio deployment reference architectures
EC2S3
LDAPLDAP LDAP
Parquet Parquet
YARNHDFS
LDAPLDAP LDAP
Parquet Parquet
On dedicated infrastructure On Hadoop
SQL SQL

Four key areas excite customers
BI on Modern Data
Use any BI tool with Elasticsearch, MongoDB,
S3, HDFS, plus joins to relational data
Autonomous Data
Acceleration
Make PB-scale queries fast, without cubes,
aggregation tables, or ETL
Data Lineage
Improve governance with full view of access
patterns, data flows, data reshaping, and sharing
Self Service Data
Empower IT and analysts to discover, curate,
accelerate, and share data
1
2
3
4

Mais conteúdo relacionado

Mais procurados

Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia

Modernizing to a Cloud Data ArchitectureDatabricks

Data Lakehouse Symposium | Day 4Databricks

DW Migration Webinar-March 2022.pptxDatabricks

Learn to Use Databricks for Data ScienceDatabricks

Data Discovery at Databricks with AmundsenDatabricks

Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation

Zero to Snowflake Presentation Brett VanderPlaats

Building a modern data warehouseJames Serra

Databricks for DummiesRodney Joyce

How to boost your datamanagement with Dremio ?Vincent Terrasi

Building a Virtual Data Lake with Apache ArrowDremio Corporation

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Moving to Databricks & DeltaDatabricks

Microsoft Azure BI Solutions in the CloudMark Kromer

What’s New with Databricks Machine LearningDatabricks

Azure data platform overviewJames Serra

Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker

Considerations for Data Access in the LakehouseDatabricks

Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra

Mais procurados (20)

Making Data Timelier and More Reliable with Lakehouse Technology

Modernizing to a Cloud Data Architecture

Data Lakehouse Symposium | Day 4

DW Migration Webinar-March 2022.pptx

Learn to Use Databricks for Data Science

Data Discovery at Databricks with Amundsen

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache

Zero to Snowflake Presentation

Building a modern data warehouse

Databricks for Dummies

How to boost your datamanagement with Dremio ?

Building a Virtual Data Lake with Apache Arrow

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Moving to Databricks & Delta

Microsoft Azure BI Solutions in the Cloud

What’s New with Databricks Machine Learning

Azure data platform overview

Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021

Considerations for Data Access in the Lakehouse

Data Lakehouse, Data Mesh, and Data Fabric (r2)

Semelhante a Introduction to Dremio

Options for Data Prep - A Survey of the Current MarketDremio Corporation

It's All About the Data - Tia DubuissonCatalina Arango

Derfor skal du bruge en DataLakeMicrosoft

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks

Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra

Transform your DBMS to drive engagement innovation with Big DataAshnikbiz

Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano

Develop a Custom Data Solution Architecture with NorthBayAmazon Web Services

Pitfalls of Data Warehousing_2019-04-24Martin Bém

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media

ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY

Trivadis Azure Data LakeTrivadis

How does Microsoft solve Big Data?James Serra

Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauDATAVERSITY

Data Mesh using Microsoft FabricNathan Bijnens

Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services

DA_01_Intro.pptxAlok Mohapatra

The Great Lakes: How to Approach a Big Data ImplementationInside Analysis

Semelhante a Introduction to Dremio (20)

Options for Data Prep - A Survey of the Current Market

It's All About the Data - Tia Dubuisson

Derfor skal du bruge en DataLake

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...

Data Lakehouse, Data Mesh, and Data Fabric (r1)

Transform your DBMS to drive engagement innovation with Big Data

Demystifying Data Warehouse as a Service (DWaaS)

Develop a Custom Data Solution Architecture with NorthBay

Pitfalls of Data Warehousing_2019-04-24

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...

ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...

Trivadis Azure Data Lake

How does Microsoft solve Big Data?

Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau

Data Mesh using Microsoft Fabric

Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)

DA_01_Intro.pptx

The Great Lakes: How to Approach a Big Data Implementation

Último

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Software Quality Assurance Interview QuestionsArshad QA

Right Money Management App For Your Financial GoalsJhone kinadey

Project Based Learning (A.I).pptx detail explanationkaushalgiri8080

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Active Directory Penetration Testing, cionsystems.com.pdfCionsystems

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

Introduction to Dremio

2. About Dremio

3. Tomer Shiran Founder & CEO Previously VP Product, MapR Previously Microsoft; IBM Jacques Nadeau Founder & CTO Recognized SQL & NoSQL expert Founder of Apache Arrow Kelly Stirman CMO Previously VP Strategy, MongoDB Previously Field CTO, MarkLogic Ajay Singh Head of Field Engineering Previously Technical Alliances, Hortonworks Previously Alliances, MarkLogic Ron Avnur VP Engineering Previously VP Product, MongoDB Previously VP Engineering, MarkLogic Collin Weitzman VP Customer Success Previously Sales Executive, Mesosphere Previously Sales Executive, MapR

4. Area Dremio Team Members Columnar memory Creators of Apache Arrow Columnar storage Creator of Apache Parquet ETL Tech lead of Twitter analytics data pipeline UI UI lead for Apple (iCloud Photos, iTunes U) UX UX lead for Splunk World Class Technical Team Top Silicon Valley VCs

5. Our view of the market

6. Analytics on modern data is incredibly hard Unprecedented complexity

7. The demands for data are growing rapidly Increasing demands Reporting New products Forecasting Threat detection BI Machine Learning Segmenting Fraud prevention

8. Your analysts are hungry for data SQL

9. Today you engineer data flows and reshaping Data Staging • Custon ETL • Fragile transforms • Slow moving SQL

10. Today you engineer data flows and reshaping Data Staging Data Warehouse • $$$ • High overhead • Proprietary lock in • Custon ETL • Fragile transforms • Slow moving SQL

11. Today you engineer data flows and reshaping Data Staging Data Warehouse Cubes, BI Extracts & Aggregation Tables • Data sprawl • Governance issues • Slow to update • $$$ • High overhead • Proprietary lock in • Custon ETL • Fragile transforms • Slow moving SQL + + + + + + + + +

12. There’s a better way,

13. ✓ Works with any data source ✓ Works with any BI tool ✓ No ETL, no data warehouse, no cubes ✓ Makes data self-service, collaborative ✓ Makes Big Data feel small ✓ Open source There’s a better way,

14. Dremio flows, reshapes, and accelerates for you SQL

15. Dremio deployment reference architectures EC2S3 LDAPLDAP LDAP Parquet Parquet YARNHDFS LDAPLDAP LDAP Parquet Parquet On dedicated infrastructure On Hadoop SQL SQL

16. Four key areas excite customers BI on Modern Data Use any BI tool with Elasticsearch, MongoDB, S3, HDFS, plus joins to relational data Autonomous Data Acceleration Make PB-scale queries fast, without cubes, aggregation tables, or ETL Data Lineage Improve governance with full view of access patterns, data flows, data reshaping, and sharing Self Service Data Empower IT and analysts to discover, curate, accelerate, and share data 1 2 3 4

17. Thank You!

Notas do Editor

BI assumes single relational database, but… Data in non-relational technologies Data fragmented across many systems Massive scale and velocity
Data is the business, and… Era of impatient smartphone natives Rise of self-service BI Accelerating time to market Because of the complexity of modern data and increasing demands for data, IT gets crushed in the middle: Slow or non-responsive IT “Shadow Analytics” Data governance risk Illusive data engineers Immature software Competing strategic initiatives
Here’s the problem everyone is trying to solve today. You have consumers of data with their favorite tools. BI products like Tableau, PowerBI, Qlik, as well as data science tools like Python, R, Spark, and SQL. Then you have all your data, in a mix of relational, NoSQL, Hadoop, and cloud like S3. So how are you going to get the data to the people asking for it?
Here’s how everyone tries to solve it: First you move the data out of the operational systems into a staging area, that might be Hadoop, or one of the cloud file systems like S3 or Azure Blob Store. You write a bunch of ETL scripts to move the data. These are expensive to write and maintain, and they’re fragile – when the sources change, the scripts have to change too.
Here’s how everyone tries to solve it: First you move the data out of the operational systems into a staging area, that might be Hadoop, or one of the cloud file systems like S3 or Azure Blob Store. You write a bunch of ETL scripts to move the data. These are expensive to write and maintain, and they’re fragile – when the sources change, the scripts have to change too. Then you move the data into a data warehouse. This could be Redshift, Teradata, Vertica, or other products. These are all proprietary, and they take DBA experts to make them work. And to move the data here you write another set of scripts. But what we see with many customers is that the performance here isn’t sufficient for their needs, and so …
Here’s how everyone tries to solve it: First you move the data out of the operational systems into a staging area, that might be Hadoop, or one of the cloud file systems like S3 or Azure Blob Store. You write a bunch of ETL scripts to move the data. These are expensive to write and maintain, and they’re fragile – when the sources change, the scripts have to change too. Then you move the data into a data warehouse. This could be Redshift, Teradata, Vertica, or other products. These are all proprietary, and they take DBA experts to make them work. And to move the data here you write another set of scripts. But what we see with many customers is that the performance here isn’t sufficient for their needs, and so … You build cubes and aggregation tables to get the performance your users are asking for. And to do this you build another set of scripts. In the end you’re left with something like this picture. You may have more layers, the technologies may be different, but you’re probably living with something like this. And nobody likes this – it’s expensive, the data movement is slow, it’s hard to change. But worst of all, you’re left with a dynamic where every time a consumer of the data wants a new piece of data: They open a ticket with IT IT begins an engineering project to build another set of pipelines, over several weeks or months
And so we started Dremio to say, hey, we think there’s a better way to do this.
And when we got started we asked ourselves, what would we need to do to make this better. And we came up with these requirements. Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, S3, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof. Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them. No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options. Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT. Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop. Open source. It’s 2017, so we think this has to be open source.
And that’s Dremio. It sits between all the places you’re creating or capturing data, and all the tools you use to access data. At a high level, that’s how Dremio works. We’ll get into how it works a little later.
To go one level deeper, Dremio is a distributed process that you run on 1 – 1000+ servers. You can run it on dedicated infrastructure, like you see on the left. Or in your Hadoop cluster, provisioned and managed via YARN. OK, enough with the pictures, let’s get into the demo.
But one quick question for you. Dremio is a big product, and there are lots of things we could show you, but it would be great to get a little guidance on how to spend our time. When we show Dremio to customers, they tend to get excited by four key areas:

Introduction to Dremio

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Introduction to Dremio

Semelhante a Introduction to Dremio (20)

Último

Último (20)

Introduction to Dremio

Notas do Editor