TechEvent Databricks on Azure

Trivadis
TrivadisTrivadis
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Databricks on Azure
Spark on the Azure Cloud
Marco Amhof
Aron Weller
The Components and the history
CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology,
any distribution
Workload optimized,
managed clusters
Data Engineering in a
Job-as-a-service model
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Clusters Managed Clusters Big Data as-a-service
Azure HDInsight
Frictionless & Optimized
Spark clusters
Azure Databricks
BIGDATA
STORAGE
BIGDATA
ANALYTICS
ReducedAdministration
Available Big Data Solutions onAzure
Databricks Platform
What is Apache Spark
Apache Spark is a fast and general engine for large-scale data processing
• Created by UC Berkeley AmpLab
• The hot trend in Big Data!
• Based on 2007 Microsoft Dryad paper
• Written in Scala, supports Java, Python, SQL and R
• Can run programs over 100x faster than Hadoop MapReduce in memory, or more than 10x faster on
disk
• Runs everywhere – runs on Hadoop, Mesos, standalone or in the cloud
• Databricks is the main contributor and also the main commercial vendor of Spark
Apache Spark
• In-memory computing capabilities
deliver speed
• General execution model supports
wide variety of use cases
• Ease of development by native APIs
• Java
• Scala
• Python
• R
Spark Use Cases
• Ad hoc, exploratory, interactive
analytics
• Real-time + Batch Analytics
• Real-Time Machine Learning
• Real-Time Graph Processing
• Approximate, Time-bound queries
• …
 Databricks provides a performance optimized Spark Plattform!
(Up to 5 times faster than open source Spark)
Alternative Approaches – Motivation
Data Sharing in Map Reduce …
iter. 1 iter. 2 . .
.
Input
HDFS
read
HDFS
write
HDFS
read
HDFS
write
Input
query 1
query 2
query 3
result 1
result 2
result 3
. . .
HDFS
read
iter. 1 iter. 2 . .
.
Input
Alternative Approaches – Motivation
What we would be more efficient … this is what Spark offers!
Distributed
memory
Input
query 1
query 2
query 3
. . .
one-time
processing
Spark: A brief history
Apache Spark
Spark Unifies:
 Batch Processing
 Interactive SQL
 Real-time processing
 Machine Learning
 Deep Learning
 Graph Processing
An unified, open source, parallel, data processing framework for Big Data Analytics
Spark Core Engine
Spark SQL
Interactive
Queries
Spark Structured
Streaming
Stream processing
Spark MLlib
Machine
Learning
Yarn Mesos
Standalone
Scheduler
Spark MLlib
Machine
Learning
Spark
Streaming
Stream processing
GraphX
Graph
Computation
Spark vs. specialized Systems in Hadoop Ecosystem
Spark and the Hadoop Ecosystem
Source: Spark in Action, Manning Press
Databricks Spark offers very high performance
Benchmarks have shown Databricks to often have better performance than alternatives
Databricks - Company Overview
• Founded in late 2013
• By the creators of Apache Spark,
original team from UC Berkeley
AMPLab
• Largest code contributor code to
Apache Spark
• Provides support
• Provides certifications
• Main Product:
The Unified Analytics Platform
Databricks from Architecture View
Azure Databricks
 Azure Databricks is a first party service on Azure.
Azure Databricks is integrated seamlessly with Azure
services:
• Azure Portal: Service an be launched directly from Azure
Portal
• Azure Storage Services: Directly access data in Azure
• Azure Active Directory: For user authentication
• Azure SQL DW and Azure Cosmos DB: Enables you to
combine structured and unstructured data for analytics
• Apache Kafka for HDInsight: Enables you to use Kafka
as a streaming data source or sink
• Azure Billing: You get a single bill from Azure
• Azure Power BI: For rich data visualization
Microsoft Azure
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE
PIPELINES
DATA
ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA
SCIENTIST
BUSINESS ANALYST
Build on secure & trusted cloud
Azure Databricks
Azure Databricks Core Components
Azure
Databricks
Business / custom apps
(Structured)
Logs, files and media
(unstructured)
Azure Storage
Polybase
Azure SQL Data Warehouse
Data Factory
Data Factory
Azure Databricks
(Spark)
Analytical dashboards
(PowerBI)
Model & ServePrep & TrainStoreIngest Intelligence
Modern Data Analytics Landscape
Azure Analysis ServicesOn Prem, Cloud
Apps & Data
Extending Data Warehouses
Analytics on Big Data using Data Science
Realtime Analytics
ProvisioningAzure Databricks
 Azure Databricks is provisioned
directly from the Azure Portal like any
other Azure service
• With Azure Databricks, the Azure
Portal offers a unified portal to
provision and administer Azure
Databricks as well as other Azure
services.
 Any Azure user with the appropriate
subscription and authorization can
provision Azure Databricks service.
• There is no need for a separate
Databricks account
Spark Machine Learning Overview
Offers a set of parallelized machine learning algorithms
Supports Model Selection (hyperparameter tuning) using Cross
Validation and Train-Validation Split.
Supports Java, Scala or Python apps using DataFrame-based API.
Benefits include:
– An uniform API across ML algorithms and across multiple
languages
– Optimizations through Tungsten and Catalyst
Spark MLlib comes pre-installed on Azure Databricks
3rd Party libraries supported include: H20 Sparkling Water, SciKit-
learn, XGBoost and more
MMLSpark
Microsoft Machine Learning Library for Apache Spark (MMLSpark) lets you
easily create scalable machine learning models for large datasets.
It includes integration of SparkML pipelines with the Microsoft Cognitive
Toolkit and OpenCV, enabling you to:
 Ingress and pre-process image data
 Featurize images and text using pre-trained deep learning models
 Train and score classification and regression models using implicit
featurization
Visualization inside Azure Databricks
Azure Databricks supports a number of visualization plots out of the box !
 All notebooks, regardless of their
language, support Databricks
visualizations.
 When you run the notebook the
visualizations are rendered inside
the notebook in-place
 The visualizations are written in
HTML.
 You can change the plot type just
by picking from the selection
Advance Visualization using Power BI
Enables powerful visualization of data in Spark with Power BI
Power BI is a business analytics tool
that provides data Visualization,
Report and Dashboard throughout an
organization
Power BI Desktop can connect to
Azure Databricks clusters to query
data using JDBC/ODBC server that
runs on the driver node.
Name Presentation28 9/26/2018
Benefits
Secure Collaboration
Azure Databricks enables secure collaboration between colleagues
Fine Grained Permissions
AAD-based User Authentication
Security
Databricks fully integrated with the Azure Platform
Active Directory
– Impersonation Information
– Network Security
– Storage Security
– etc.
No crossing network borders for connecting to sources
Encrypted end to end communication
Ease of use
Databricks fully integrated with the Azure Platform
One entry point, the Azure Portal
Simplified access to your data
Webbrowser based notebook application
– Developers playground
– Analysts environment
Pay only while you use
One billing point, Azure Platform
Data Access
Databricks fully integrated with the Azure Platform
Azure Storage
– Blob Storage
– Data Lake Store
Azure Databases
– SQL Server Data Warehouse (Relational)
– Cosmos DB (NoSQL)
Azure Event Hub
etc.
Performance
Databricks fully integrated with the Azure Platform
Direct access to the data
– Files
– Databases
– Virtual machines
Direct access to the event hub
Databricks optimized Spark
Azure performance
Name Presentation34 9/26/2018
Demo
Name Presentation35 9/26/2018
Problems solved
using Databricks on Azure
Complex Architecture
Using Spark on Azure is hard.
It needs a complex architecture to get and use the data
without loosing the tight security of Azure.
Active Directory
Simplified Architecture with Databricks
Azure Databricks fully integrates with the Azure Platform.
• Connect to the data directly with tight security.
• Easy to maintain.
Active Directory
Use Cases
Computer Science
Powerhouse as base platform for data science
Efficient machine learning libraries included
– simply import the MLLib-Libraries and use them
Use what you know best
– use R, Python, Scala, Java or even SQL for your analysis
Power to fit
– use clusters to speed up training of your models
Marketing
Get more information in the 360 degree customer view
Gain deeper insights in social media
– aggregated, filtered and weightened
See trends faster
– more wide field of information sources
Get sight on potencial customers you would miss
– using graph technology to identify in a big mass of potencial customers
Industry
Optimize your productivity, efficiency and machine maintainance
Leverage sensor data using data science
– predictive maintainance on machines
– potencial on higher throughput on machines
Access to a more wide source of information
– identify new markets based on recent customer segments
– faster time to market
Name Presentation42 9/26/2018
Q&A
Name Presentation43 9/26/2018
Mixing Languages in Notebooks
Normally a notebook is associated with a specific language.
However, with Azure Databricks notebooks, you can mix multiple languages in
the same notebook.
This is done using the language magic command:
• %python Allows you to execute python code in a notebook
• %sql Allows you to execute sql code in a notebook
• %r Allows you to execute r code in a notebook
• %scala Allows you to execute scala code in a notebook
• %sh Allows you to execute shell code in your notebook.
• %fs Allows you to use Databricks Utilities - dbutils filesystem commands.
• %md To include rendered markdown
Spark Structured Streaming Comparison
Spark - Benefits
Performance
Using in-memory computing, Spark is considerably
faster than Hadoop (100x in some tests).
Can be used for batch and real-time data processing.
Developer Productivity
Easy-to-use APIs for processing large datasets.
Includes 100+ operators for transforming.
Ecosystem
Spark has built-in support for many data sources,
rich ecosystem of ISV applications and a large dev
community.
Available on multiple public clouds (AWS, Google
and Azure) and multiple on-premises distributors
Unified Engine
Integrated framework includes higher-level libraries
for interactive SQL queries, Stream Analytics, ML and
graph processing.
A single application can combine all types of
processing.
1 de 46

Recomendados

Introduction to Azure Databricks por
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
27.2K visualizações53 slides
201905 Azure Databricks for Machine Learning por
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
681 visualizações32 slides
Databricks for Dummies por
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
1.7K visualizações19 slides
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020 por
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
361 visualizações13 slides
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A... por
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
1K visualizações29 slides
Azure DataBricks for Data Engineering by Eugene Polonichko por
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
499 visualizações22 slides

Mais conteúdo relacionado

Mais procurados

Spark as a Service with Azure Databricks por
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksLace Lofranco
820 visualizações43 slides
Azure Databricks - An Introduction (by Kris Bock) por
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey
370 visualizações23 slides
Azure data platform overview por
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
19.4K visualizações57 slides
Introduction to Data Engineering por
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
623 visualizações51 slides
Learn to Use Databricks for Data Science por
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
1.6K visualizações12 slides
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga... por
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
136 visualizações23 slides

Mais procurados(20)

Spark as a Service with Azure Databricks por Lace Lofranco
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
Lace Lofranco820 visualizações
Azure Databricks - An Introduction (by Kris Bock) por Daniel Toomey
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey370 visualizações
Azure data platform overview por James Serra
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra19.4K visualizações
Introduction to Data Engineering por Durga Gadiraju
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju623 visualizações
Learn to Use Databricks for Data Science por Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K visualizações
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga... por DataScienceConferenc1
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1136 visualizações
Announcing Databricks Cloud (Spark Summit 2014) por Databricks
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks1.9K visualizações
Azure Synapse Analytics Overview (r1) por James Serra
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra24.7K visualizações
Mapping Data Flows Training deck Q1 CY22 por Mark Kromer
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
Mark Kromer601 visualizações
Modernizing to a Cloud Data Architecture por Databricks
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks644 visualizações
Introducing Databricks Delta por Databricks
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks5.9K visualizações
Delta Lake with Azure Databricks por Dustin Vannoy
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
Dustin Vannoy418 visualizações
DW Migration Webinar-March 2022.pptx por Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K visualizações
Data Lakehouse Symposium | Day 4 por Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K visualizações
Power BI for Big Data and the New Look of Big Data Solutions por James Serra
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra7.1K visualizações
Achieving Lakehouse Models with Spark 3.0 por Databricks
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks621 visualizações
Designing a modern data warehouse in azure por Antonios Chatzipavlis
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis851 visualizações
Getting Started with Delta Lake on Databricks por Knoldus Inc.
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
Knoldus Inc.275 visualizações
Building Modern Data Platform with Microsoft Azure por Dmitry Anoshin
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin3.2K visualizações
Azure Synapse Analytics Overview (r2) por James Serra
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra23.2K visualizações

Similar a TechEvent Databricks on Azure

Azure databricks c sharp corner toronto feb 2019 heather grandy por
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyNilesh Shah
213 visualizações24 slides
Azure Databricks & Spark @ Techorama 2018 por
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
1.9K visualizações11 slides
Comparing Microsoft Big Data Platform Technologies por
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesJen Stirrup
185 visualizações85 slides
Differentiate Big Data vs Data Warehouse use cases for a cloud solution por
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
9.3K visualizações62 slides
5 Comparing Microsoft Big Data Technologies for Analytics por
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
374 visualizações64 slides
Ai & Data Analytics 2018 - Azure Databricks for data scientist por
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAlberto Diaz Martin
452 visualizações28 slides

Similar a TechEvent Databricks on Azure(20)

Azure databricks c sharp corner toronto feb 2019 heather grandy por Nilesh Shah
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah213 visualizações
Azure Databricks & Spark @ Techorama 2018 por Nathan Bijnens
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens1.9K visualizações
Comparing Microsoft Big Data Platform Technologies por Jen Stirrup
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup185 visualizações
Differentiate Big Data vs Data Warehouse use cases for a cloud solution por James Serra
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra9.3K visualizações
5 Comparing Microsoft Big Data Technologies for Analytics por Jen Stirrup
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup374 visualizações
Ai & Data Analytics 2018 - Azure Databricks for data scientist por Alberto Diaz Martin
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin452 visualizações
Global AI Bootcamp Madrid - Azure Databricks por Alberto Diaz Martin
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin271 visualizações
Azure Data.pptx por FedoRam1
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam174 visualizações
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron) por Trivadis
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis284 visualizações
Machine Learning and AI por James Serra
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra12.5K visualizações
Big Data Adavnced Analytics on Microsoft Azure por Mark Tabladillo
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo350 visualizações
Azure satpn19 time series analytics with azure adx por Riccardo Zamana
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
Riccardo Zamana271 visualizações
USQL Trivadis Azure Data Lake Event por Trivadis
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis464 visualizações
Azure Data Explorer deep dive - review 04.2020 por Riccardo Zamana
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana562 visualizações
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ... por Michael Rys
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys742 visualizações
Azure Data Platform Overview.pdf por Dustin Vannoy
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
Dustin Vannoy145 visualizações
1 Introduction to Microsoft data platform analytics for release por Jen Stirrup
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup567 visualizações
Microsoft cloud big data strategy por James Serra
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra8.7K visualizações

Mais de Trivadis

Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ... por
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...Trivadis
908 visualizações27 slides
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan... por
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...Trivadis
522 visualizações26 slides
Azure Days 2019: Master the Move to Azure (Konrad Brunner) por
Azure Days 2019: Master the Move to Azure (Konrad Brunner)Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Azure Days 2019: Master the Move to Azure (Konrad Brunner)Trivadis
250 visualizações28 slides
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A... por
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...Trivadis
207 visualizações16 slides
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss) por
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
223 visualizações41 slides
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa... por
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...Trivadis
178 visualizações22 slides

Mais de Trivadis(20)

Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ... por Trivadis
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Azure Days 2019: Azure Chatbot Development for Airline Irregularities (Remco ...
Trivadis908 visualizações
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan... por Trivadis
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Azure Days 2019: Trivadis Azure Foundation – Das Fundament für den ... (Nisan...
Trivadis522 visualizações
Azure Days 2019: Master the Move to Azure (Konrad Brunner) por Trivadis
Azure Days 2019: Master the Move to Azure (Konrad Brunner)Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Azure Days 2019: Master the Move to Azure (Konrad Brunner)
Trivadis250 visualizações
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A... por Trivadis
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Azure Days 2019: Keynote Azure Switzerland – Status Quo und Ausblick (Primo A...
Trivadis207 visualizações
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss) por Trivadis
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis223 visualizações
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa... por Trivadis
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Azure Days 2019: Get Connected with Azure API Management (Gerry Keune & Stefa...
Trivadis178 visualizações
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H... por Trivadis
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Trivadis452 visualizações
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (... por Trivadis
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Trivadis136 visualizações
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po... por Trivadis
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
Azure Days 2019: Azure@Helsana: Die Erweiterung von Dynamics CRM mit Azure Po...
Trivadis311 visualizações
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ... por Trivadis
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
TechEvent 2019: Kundenstory - Kein Angebot, kein Auftrag – Wie Du ein individ...
Trivadis523 visualizações
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod... por Trivadis
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
TechEvent 2019: Oracle Database Appliance M/L - Erfahrungen und Erfolgsmethod...
Trivadis71 visualizações
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis por Trivadis
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - TrivadisTechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
TechEvent 2019: Security 101 für Web Entwickler; Roland Krüger - Trivadis
Trivadis139 visualizações
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O... por Trivadis
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
TechEvent 2019: Trivadis & Swisscom Partner Angebote; Konrad Häfeli, Markus O...
Trivadis211 visualizações
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ... por Trivadis
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
TechEvent 2019: DBaaS from Swisscom Cloud powered by Trivadis; Konrad Häfeli ...
Trivadis219 visualizações
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr... por Trivadis
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...
Trivadis96 visualizações
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T... por Trivadis
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
TechEvent 2019: More Agile, More AI, More Cloud! Less Work?!; Oliver Dörr - T...
Trivadis202 visualizações
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -... por Trivadis
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
TechEvent 2019: Kundenstory - Vom Hauptmann zu Köpenick zum Polizisten 2020 -...
Trivadis233 visualizações
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;... por Trivadis
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
TechEvent 2019: Vom Rechenzentrum in die Oracle Cloud - Übertragungsmethoden;...
Trivadis222 visualizações
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis por Trivadis
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - TrivadisTechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
TechEvent 2019: The sleeping Power of Data; Eberhard Lösch - Trivadis
Trivadis67 visualizações
TechEvent 2019: Tales from a Scrum Master; Ernst Jakob - Trivadis por Trivadis
TechEvent 2019: Tales from a Scrum Master; Ernst Jakob - TrivadisTechEvent 2019: Tales from a Scrum Master; Ernst Jakob - Trivadis
TechEvent 2019: Tales from a Scrum Master; Ernst Jakob - Trivadis
Trivadis207 visualizações

Último

The Importance of Cybersecurity for Digital Transformation por
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationNUS-ISS
25 visualizações26 slides
Java Platform Approach 1.0 - Picnic Meetup por
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic MeetupRick Ossendrijver
25 visualizações39 slides
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ... por
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation
25 visualizações9 slides
[2023] Putting the R! in R&D.pdf por
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdfEleanor McHugh
38 visualizações127 slides
Tunable Laser (1).pptx por
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptxHajira Mahmood
21 visualizações37 slides
The Research Portal of Catalonia: Growing more (information) & more (services) por
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
66 visualizações25 slides

Último(20)

The Importance of Cybersecurity for Digital Transformation por NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS25 visualizações
Java Platform Approach 1.0 - Picnic Meetup por Rick Ossendrijver
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver25 visualizações
[2023] Putting the R! in R&D.pdf por Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 visualizações
Tunable Laser (1).pptx por Hajira Mahmood
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptx
Hajira Mahmood21 visualizações
Attacking IoT Devices from a Web Perspective - Linux Day por Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 visualizações
Web Dev - 1 PPT.pdf por gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 visualizações
Business Analyst Series 2023 - Week 3 Session 5 por DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10165 visualizações
AMAZON PRODUCT RESEARCH.pdf por JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta14 visualizações
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze por NUS-ISS
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
NUS-ISS19 visualizações
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS32 visualizações
Igniting Next Level Productivity with AI-Infused Data Integration Workflows por Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software91 visualizações
Future of Learning - Yap Aye Wee.pdf por NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 visualizações
SAP Automation Using Bar Code and FIORI.pdf por Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Virendra Rai, PMP19 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 visualizações
Perth MeetUp November 2023 por Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price12 visualizações
Empathic Computing: Delivering the Potential of the Metaverse por Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst449 visualizações
handbook for web 3 adoption.pdf por Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex19 visualizações

TechEvent Databricks on Azure

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Databricks on Azure Spark on the Azure Cloud Marco Amhof Aron Weller
  • 2. The Components and the history
  • 3. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Clusters Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIGDATA STORAGE BIGDATA ANALYTICS ReducedAdministration Available Big Data Solutions onAzure
  • 5. What is Apache Spark Apache Spark is a fast and general engine for large-scale data processing • Created by UC Berkeley AmpLab • The hot trend in Big Data! • Based on 2007 Microsoft Dryad paper • Written in Scala, supports Java, Python, SQL and R • Can run programs over 100x faster than Hadoop MapReduce in memory, or more than 10x faster on disk • Runs everywhere – runs on Hadoop, Mesos, standalone or in the cloud • Databricks is the main contributor and also the main commercial vendor of Spark
  • 6. Apache Spark • In-memory computing capabilities deliver speed • General execution model supports wide variety of use cases • Ease of development by native APIs • Java • Scala • Python • R Spark Use Cases • Ad hoc, exploratory, interactive analytics • Real-time + Batch Analytics • Real-Time Machine Learning • Real-Time Graph Processing • Approximate, Time-bound queries • …  Databricks provides a performance optimized Spark Plattform! (Up to 5 times faster than open source Spark)
  • 7. Alternative Approaches – Motivation Data Sharing in Map Reduce … iter. 1 iter. 2 . . . Input HDFS read HDFS write HDFS read HDFS write Input query 1 query 2 query 3 result 1 result 2 result 3 . . . HDFS read
  • 8. iter. 1 iter. 2 . . . Input Alternative Approaches – Motivation What we would be more efficient … this is what Spark offers! Distributed memory Input query 1 query 2 query 3 . . . one-time processing
  • 9. Spark: A brief history
  • 10. Apache Spark Spark Unifies:  Batch Processing  Interactive SQL  Real-time processing  Machine Learning  Deep Learning  Graph Processing An unified, open source, parallel, data processing framework for Big Data Analytics Spark Core Engine Spark SQL Interactive Queries Spark Structured Streaming Stream processing Spark MLlib Machine Learning Yarn Mesos Standalone Scheduler Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation
  • 11. Spark vs. specialized Systems in Hadoop Ecosystem
  • 12. Spark and the Hadoop Ecosystem Source: Spark in Action, Manning Press
  • 13. Databricks Spark offers very high performance Benchmarks have shown Databricks to often have better performance than alternatives
  • 14. Databricks - Company Overview • Founded in late 2013 • By the creators of Apache Spark, original team from UC Berkeley AMPLab • Largest code contributor code to Apache Spark • Provides support • Provides certifications • Main Product: The Unified Analytics Platform
  • 16. Azure Databricks  Azure Databricks is a first party service on Azure. Azure Databricks is integrated seamlessly with Azure services: • Azure Portal: Service an be launched directly from Azure Portal • Azure Storage Services: Directly access data in Azure • Azure Active Directory: For user authentication • Azure SQL DW and Azure Cosmos DB: Enables you to combine structured and unstructured data for analytics • Apache Kafka for HDInsight: Enables you to use Kafka as a streaming data source or sink • Azure Billing: You get a single bill from Azure • Azure Power BI: For rich data visualization Microsoft Azure
  • 17. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Azure Databricks
  • 18. Azure Databricks Core Components Azure Databricks
  • 19. Business / custom apps (Structured) Logs, files and media (unstructured) Azure Storage Polybase Azure SQL Data Warehouse Data Factory Data Factory Azure Databricks (Spark) Analytical dashboards (PowerBI) Model & ServePrep & TrainStoreIngest Intelligence Modern Data Analytics Landscape Azure Analysis ServicesOn Prem, Cloud Apps & Data
  • 21. Analytics on Big Data using Data Science
  • 23. ProvisioningAzure Databricks  Azure Databricks is provisioned directly from the Azure Portal like any other Azure service • With Azure Databricks, the Azure Portal offers a unified portal to provision and administer Azure Databricks as well as other Azure services.  Any Azure user with the appropriate subscription and authorization can provision Azure Databricks service. • There is no need for a separate Databricks account
  • 24. Spark Machine Learning Overview Offers a set of parallelized machine learning algorithms Supports Model Selection (hyperparameter tuning) using Cross Validation and Train-Validation Split. Supports Java, Scala or Python apps using DataFrame-based API. Benefits include: – An uniform API across ML algorithms and across multiple languages – Optimizations through Tungsten and Catalyst Spark MLlib comes pre-installed on Azure Databricks 3rd Party libraries supported include: H20 Sparkling Water, SciKit- learn, XGBoost and more
  • 25. MMLSpark Microsoft Machine Learning Library for Apache Spark (MMLSpark) lets you easily create scalable machine learning models for large datasets. It includes integration of SparkML pipelines with the Microsoft Cognitive Toolkit and OpenCV, enabling you to:  Ingress and pre-process image data  Featurize images and text using pre-trained deep learning models  Train and score classification and regression models using implicit featurization
  • 26. Visualization inside Azure Databricks Azure Databricks supports a number of visualization plots out of the box !  All notebooks, regardless of their language, support Databricks visualizations.  When you run the notebook the visualizations are rendered inside the notebook in-place  The visualizations are written in HTML.  You can change the plot type just by picking from the selection
  • 27. Advance Visualization using Power BI Enables powerful visualization of data in Spark with Power BI Power BI is a business analytics tool that provides data Visualization, Report and Dashboard throughout an organization Power BI Desktop can connect to Azure Databricks clusters to query data using JDBC/ODBC server that runs on the driver node.
  • 29. Secure Collaboration Azure Databricks enables secure collaboration between colleagues Fine Grained Permissions AAD-based User Authentication
  • 30. Security Databricks fully integrated with the Azure Platform Active Directory – Impersonation Information – Network Security – Storage Security – etc. No crossing network borders for connecting to sources Encrypted end to end communication
  • 31. Ease of use Databricks fully integrated with the Azure Platform One entry point, the Azure Portal Simplified access to your data Webbrowser based notebook application – Developers playground – Analysts environment Pay only while you use One billing point, Azure Platform
  • 32. Data Access Databricks fully integrated with the Azure Platform Azure Storage – Blob Storage – Data Lake Store Azure Databases – SQL Server Data Warehouse (Relational) – Cosmos DB (NoSQL) Azure Event Hub etc.
  • 33. Performance Databricks fully integrated with the Azure Platform Direct access to the data – Files – Databases – Virtual machines Direct access to the event hub Databricks optimized Spark Azure performance
  • 35. Name Presentation35 9/26/2018 Problems solved using Databricks on Azure
  • 36. Complex Architecture Using Spark on Azure is hard. It needs a complex architecture to get and use the data without loosing the tight security of Azure. Active Directory
  • 37. Simplified Architecture with Databricks Azure Databricks fully integrates with the Azure Platform. • Connect to the data directly with tight security. • Easy to maintain. Active Directory
  • 39. Computer Science Powerhouse as base platform for data science Efficient machine learning libraries included – simply import the MLLib-Libraries and use them Use what you know best – use R, Python, Scala, Java or even SQL for your analysis Power to fit – use clusters to speed up training of your models
  • 40. Marketing Get more information in the 360 degree customer view Gain deeper insights in social media – aggregated, filtered and weightened See trends faster – more wide field of information sources Get sight on potencial customers you would miss – using graph technology to identify in a big mass of potencial customers
  • 41. Industry Optimize your productivity, efficiency and machine maintainance Leverage sensor data using data science – predictive maintainance on machines – potencial on higher throughput on machines Access to a more wide source of information – identify new markets based on recent customer segments – faster time to market
  • 44. Mixing Languages in Notebooks Normally a notebook is associated with a specific language. However, with Azure Databricks notebooks, you can mix multiple languages in the same notebook. This is done using the language magic command: • %python Allows you to execute python code in a notebook • %sql Allows you to execute sql code in a notebook • %r Allows you to execute r code in a notebook • %scala Allows you to execute scala code in a notebook • %sh Allows you to execute shell code in your notebook. • %fs Allows you to use Databricks Utilities - dbutils filesystem commands. • %md To include rendered markdown
  • 46. Spark - Benefits Performance Using in-memory computing, Spark is considerably faster than Hadoop (100x in some tests). Can be used for batch and real-time data processing. Developer Productivity Easy-to-use APIs for processing large datasets. Includes 100+ operators for transforming. Ecosystem Spark has built-in support for many data sources, rich ecosystem of ISV applications and a large dev community. Available on multiple public clouds (AWS, Google and Azure) and multiple on-premises distributors Unified Engine Integrated framework includes higher-level libraries for interactive SQL queries, Stream Analytics, ML and graph processing. A single application can combine all types of processing.