SlideShare a Scribd company logo
1 of 48
SQLSaturday #230 Rheinland
Sascha Dittmann
Softwarearchitekt & Entwickler – Ernst & Young GmbH
www.sascha-dittmann.de
Georg Urban
Snr. Technology Solution Professional | Data Platform
georg.urban@microsoft.com
13.07.2013
HIVE IN A NUTSHELL
Hadoop & Business Intelligence
 Hadoop is great for storing & processing *large*
amounts of data
 (but) Map/Reduce jobs are kind of low level
 (most) BI tools rely on relational or
multidimensional data sources
and declarative languages like SQL | MDX | DAX
?
The Hive Project
 Hive was started at Facebook (2008)
 Goal:
empower business users to query Hadoop clusters
with standard tools & SQL
 Famous paper at VLDB conference 2009
in 2009 already 700TB data „lived“ in Hive at
Facebook: 5.000 queries a day from over 100 users
Hive is a „Data Warehouse“ for Hadoop!
(a system for managing data structures build on top of Hadoop)
http://www.vldb.org/pvldb/2/vldb09-938.pdf
Hive architecture
 Query Language: HiveQL
(subset of SQL)
 Uses Map/Reduce for execution
 Rule based
optimizer
Driver
(Compiler, Optimizer, Executor)
Command Line
Interface
Web
Interface
Thrift Server
Metastore
JDBC ODBC
Performance is an issue:
Hortonworks stinger initiative aims for „human-time use cases“
Hive concepts
 Well known: Databases | Tables | Rows & Columns
 Table = (file or) directory
e.g. twitter_feeds -> /user/hive/warehouse/twitter_feeds
 Storage: ORC (Optimized Row Columnar), Textfile, RCFile
(Record Columnar File), etc.
 Primitive Types: integer, float, string, date, boolean
 …plus arrays, maps, user defined types
 Partitions = subdirectories
 Indexes = data subsets or bitmaps
 HiveQL: SELECT…FROM…WHERE (incl.
Joins, Aggregates, Union All, Subqueries)
 Can embed M/R scripts
HiveQL Example
CREATE TABLE logdata(
logdate string,
logtime string,
…
time_taken int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
LOAD DATA INPATH '/w3c/input/data' OVERWRITE INTO TABLE
logdata;
SELECT logdate, logtime, time_taken FROM logdata LIMIT 200;
Working with Hive
BIG DATA & BI
HDInsight in Action: Managing Big Data
Enriching Big Data in PowerPivot
Big Data Mashup with Power View = Insights
BI on HDInsight
Real World Big Data
 Yahoo! 180 PB raw data in > 40.000 computers (polystructured)*
 Biggest Hadoop cluster: 4.500 nodes
(2x4 CPUs, 4x1 TB disks, 16 GB RAM)
 Page Impressions:
 Cube with 207 Measures | 24 Dimensions | 247 Attributes
 Desktop Clients (MS & Tableau): < 6s ad hoc query time
http://wiki.apache.org/hadoop/PoweredBy
Real Life Example (Sensor Data Analytics)
XML,
structured
&
unstructured
files
Preprocessing(z.B.C24,…)
HiveHive
Integration Services
Database Service
IntegrationServices
ERP & other DBs
HDInsight
Browsers
Excel &
PowerPivot
3. Party
Mobile
Clients
Analytical & DM Tools
(R, SPSS, MS Data Mining,…)
Longterm Storage & Preprocessing
Power View
More
Microsoft…
(Reporting
Services,
Performance
Point,…)
Integration, Analysis & Persistance Publishing & Collaboration
Integration Services
SharePoint
Self Service
analytical
Applications
Analysis Services
SQL SERVER PARALLEL DATA
WAREHOUSE
Parallel Data Warehouse Concepts
V1 Reference PDW V2
The Basic Full Rack 10X Faster & 50% Lower Capital Cost
Control Node
Mgmt. Node
Landing Zone
Backup Node
Estimated Total HW List Price: $1MM$ Estimated Total HW List Price: $500K$
Infiniband & Ethernet
Fiber Channel
70% more disk I/O bandwidth
Infiniband
& Ethernet
• 128 cores on 8 compute nodes
• 2TB of RAM on compute
• Up to 168 TB of tempdb
• Up to 1PB of user data
• 160 cores on 10 compute nodes
• 1.28 TB of RAM on compute
• Up to 30 TB of tempdb
• Up to 150 TB of user data
ComputeNode Storage
Compute & Storage
Control Node
SQL Server 2012 Parallel Data Warehouse
 Up- & downscale
 2-56 Compute Nodes
 Unique standardized Nodes: 256 GB
 1-6 Racks
 Compute & Storage Nodes are
VMs
 Simple Management
 Hardware Abstraction
 Different Workloads
 ColumnStore v2 storage
 High compressions
 Updatable for incremental loads
Development Goals
FDR
Infiniband
Direct attached SAS
Hardware Architecture
Startsmall
&grow
Dynamic Scale Up
Start small with a few Terabyte warehouse
Add capacity up to 5 Petabytes
Increments by 2-3 Compute Nodes plus
Storage
0TB 5 PB
Enterprise
Warehouse
PB
NoDowntime
Infiniband
Infiniband
Ethernet
Ethernet
Control Node
Failover Node
JBOD 1
Compute Node 1
Compute Node 2
JBOD 2
Compute Node 3
Compute Node 4
JBOD 3
Compute Node 5
Compute Node 6
JBOD 4
Compute Node 7
Compute Node 8
Customer
Use
Base Unit (6U):
• Redundant Infiniband
• Redundant Ethernet
• Mgmt & Control (Active)
• Rack Failover Node (Passive)
Base Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
¼Rack
15TB
(Raw)
1/2Rack
30TB(Raw)
Customer Space (8U)
• ETL Servers
• Backup Servers
• Passive Unit (Additional spares)
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
FullRack
60TB(Raw)
Infiniband
Infiniband
Ethernet
Ethernet
Failover Node
JBOD 5
Compute Node 9
Compute Node 10
JBOD 6
Compute Node 11
Compute Node 12
JBOD 7
Compute Node 13
Compute Node 14
JBOD 8
Compute Node 15
Compute Node 16
Customer
Use
Extension Base Unit (5U):
• Redundant Infiniband
• Redundant Ethernet
• Rack Failover Node (Passive)
Extension Base Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
1¼Rack
75.5TB
(Raw)
Customer Space (9U)
• ETL Servers
• Backup Servers
• Passive Unit (Additional spares)
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
3Rack
181.2TB(Raw)
11/2Rack
90.6TB(Raw)
2Rack
120.8TB(Raw)
Infiniband
Infiniband
Ethernet
Ethernet
Failover Node
JBOD 9
Compute Node 17
Compute Node 18
JBOD 10
Compute Node 19
Compute Node 20
JBOD 11
Compute Node 21
Compute Node 22
JBOD 12
Compute Node 23
Compute Node 24
Customer
Use
Extension Base Unit (5U):
• Redundant Infiniband
• Redundant Ethernet
• Rack Failover Node (Passive)
Extension Base Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Customer Space (9U)
• ETL Servers
• Backup Servers
• Passive Unit (Additional spares)
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
Scale Unit (7U):
• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)
• JBOD 5U
• 1TB Drives
• User Data Capacity: 75TB
• 2 – 56 compute nodes
• 1 – 7 racks
• 1, 2, or 3 TB drives
• 15.1 – 1268.4 TB raw
• 53 – 6342 TB User data
• Up to 7 spare nodes
available across the
entire appliance
HP
Configuration
 VMs for different workloads
(e.g. HDInsight zone)
 Storage Spaces manage
 physical disks on JBOD(s)
 33 logical mirrored drives
(66 drives & 4 hot spares)
 Clustered Shared Volumes
(CSV) allows all
nodes to access the LUNs on the JBOD
 One cluster across the whole appliance
 VMs are automatically migrated on failure
Host 1
Host 0
Host 2
Host 3
JBOD
CTL MAD
FAB
AD
VMM
Compute 1
Compute 2
Host 2
Compute 1
Agility Due to Virtualization
* 3 nodes per JBOD
in Dell Configuration
xVelocity Columnstore as primary Storage
C1
C
2
C4 C5 C6
C
3
T.C1 T.C3T.C2 T.C4
T.C1 T.C3T.C2 T.C4
T.C1 T.C3T.C2 T.C4
Better IO & Caching
 columns stored independend
 early segment elimination
 aggressives read ahead
Speicher-Optimierung
 new Memory Broker
 segments are loaded when
needed
 …and stay as long as possible
Batch Mode
 max. parallelism
 ca. 1.000 values per kernel
 CPU time is reduced by
ratio 7 to 40
SELECT Region, SUM(Sales) … T.C2 T.C4
Bitmapofqualifiedrows
Column vectors
Batch-
Object
In Memory Columnstore Index
Compression Rates in the Demo
82
34.2
22.9
11.2
4
0
10
20
30
40
50
60
70
80
90
Rohdaten Page Compression Backup
Compression
Columnstore-Index
(Disk)
Columstore-Index
(Memory)
in GB
Kompressionsrate bei *diesen* Daten: ca. 20
Columnstore: The next Generation
 Columnstore becomes primary data
structure
(clustered index)
 No need for base table
 Allows Updates & Deletes
(temporary row store)
 Easy data managment
 Improvements:
 Supports all (reasonable) data types
 Support more query operators
 Statistics on partitioned tables
PDW v2 & SQL Server 2014
Microsoft BI Stack Connectivity
Targeted Driver SNAC11 SNAC10 .NET (sqlclient)
OLEDB ODBC OLEDB ODBC
PDW Tool 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit
SSRS/Reporting Services
SSRS - SS2012 (report builder and SSDT) X X X X X X
SSRS - SS2008 R2 (report builder) X n/a X n/a X n/a
SSAS/Analysis Services
SSAS – SS2012 X X X X X X X X X X
SSAS – SS2008 R2 X X X X X X
Linked Server (DQ)
DQ – SS2008 X X X X
SSIS/Integration Services
SSIS - SS2012 X X X X X X X X X X
SSIS - SS2008 R2 X X X X X X
PowerPivot for Excel n/a n/a X X n/a n/a X X
Power View
SS2012 w/ and w/o direct query X X X X X X
MS BI Direct Query
Excel X X X X n/a n/a
Direct Query n/a n/a X X n/a n/a
Access n/a n/a X X n/a n/a
Master Data Services
SS2012 X X X X X X
SS2008 X X X X
Quality Services
SS2012 X X X X X X
SS2008 X X X X
Monitoring
 Build in Monitoring by GUI or Management Views
(DMVs)
 System Center Management Packs for PDW
Simple Resource Management
 Pre-built resource classes in PDW
 Resource class =
 PDW concurrency slots in use
 Memory utilization
 Priority
 DBA controls how requests are
mapped
to resource classes.
 PDW honors resource class at run-
time
PDW Concurrency slots in use: 3
Memory: V1 HW – 600MB, V2 HW – 1.2GB
Priority: Medium
PDW Concurrency slots in use: 21
Memory: V1 HW ~4.2GB, V2 HW ~8.4GB
Priority: High
PDW Concurrency slots in use: 1
Memory: V1 HW ~200MB, V2 HW ~400MB
Priority: Medium
PDW Concurrency slots in use: 7
Memory: V1 HW – 1.4GB, V2 HW ~2.8GB
Priority: High
Improve T-SQL Parity
T-SQL additions to increase compatibility:
 SQL Server Data Tools
 Microsoft BI Tools
 3rd Party Tools, like Tableau
Dedicated PDW tools are deprecated
Catalog SPs examples:
sp_tables_rowset;2
sp_catalogs_rowset
sp_executesql
Built-in function examples:
db_id
db_name
object_id
General T-SQL improvements examples:
cross/outer apply
sp_prepare
sp_execute
Configuration Functions:
• @@LANGUAGE
• @@SPID
SET options:
• SET
ROWCOUNT
• SET FMTONLY
POLYBASE
Hadoop / Big Data-Integration: Microsoft
 T-SQL query engine for RDBMS & Hadoop
 Cost base optimizer. decides on:
 Moving HDFS data into RDBMS storage
 Rendering operators in Map/Reduce-Jobs or
 HDFS-Bridge for parallelized Data Transport
&
HDFS Data Nodes
Regular T-SQL Results
PDW V2
External Tables are mapped to HDFS files
Fields in the file are defined as columns in
the PDW External table
File characteristics are also provided during
definition
This works for HDInsight, Hortonworks HDP
& Cloudera
CREATE EXTERNAL TABLE ClickEvent
(
url varchar(50),
event_date date,
user_IP varchar(50)),
WITH
(LOCATION
=‘hdfs://MyHadoop:5000/clickstream/click.
txt’,
FORMAT_OPTIONS
(FIELD_TERMINATOR = '|'));
External Tables
Hadoop Integration
Polybase: Creating an external Table
 familiar tooling:
SSMS & Data Tools
Polybase: A really simple Query Example
 Here: external data
movement is
ExternalRoundRobinMove
 Parallel HDFS readers will
run on every data node
(e.g. 10 nodes à 8 threads)
BI & Big Data Solution with SQL PDW v2
&
Single Query; Structured and Unstructured
Query and join Hadoop tables with Relational Tables
Use Standard SQL language
ExistingSQL
Skillset
NoIT
Intervention
SaveTime
andCostsDatabas
e
HDFS
(Hadoop)
SQL Server
2012 PDW
Powered by
PolyBase
SQL
Analyzeall
DataTypes
PolyBase: Breakthrough in Data Processing
Resources
 SQL Server CAT-Blog
http://sqlcat.com
 bwin Case Study
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000001470
 Microsoft Big Data Site
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big-
data.aspx
 Introduction to Hadoop on Windows Azure
http://channel9.msdn.com/Events/windowsazure/learn/Introduction-to-Hadoop-on-Windows-Azure
 SQL Server Team Blog
http://blogs.technet.com/b/dataplatforminsider
 Microsoft YouTube Big Data Channel
http://www.youtube.com/playlist?list=PLD471EE01A293CC34
 TechEd Sessions
http://channel9.msdn.com/Events/TechEd
 Microsoft Connect (Product Feedback)
http://connect.microsoft.com
Vielen Dank an die Volunteers!
13.07.2013 |
Große Verlosung!
 Am Ende der Veranstaltung (ca. 18:00 Uhr)
 Gewinnt viele Preise!
 Deshalb:
13.07.2013 |
Besucht unsere Sponsoren!
Unsere „You Rock! “ Sponsoren
13.07.2013 |
Vielen Dank an all unsere Sponsoren!
13.07.2013 |
Gold
Silber
Bronze
Media Sponsoren:
13.07.2013 |
Hands-on event: PASS Camp 2013!
13.07.2013 |

More Related Content

What's hot

What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data servicesRajesh Kolla
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...Insight Technology, Inc.
 
Azure for Data Platform
Azure for Data PlatformAzure for Data Platform
Azure for Data PlatformMariano Kovo
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
Deep Dive DMG (september update)
Deep Dive DMG (september update)Deep Dive DMG (september update)
Deep Dive DMG (september update)Jean-Pierre Riehl
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)Ontico
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling databaseMUG Perú
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 

What's hot (20)

Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Azure for Data Platform
Azure for Data PlatformAzure for Data Platform
Azure for Data Platform
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
Deep Dive DMG (september update)
Deep Dive DMG (september update)Deep Dive DMG (september update)
Deep Dive DMG (september update)
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 

Viewers also liked

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?inovex GmbH
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelSascha Dittmann
 
Real-Time-Analytics mit Spark und Cassandra
Real-Time-Analytics mit Spark und CassandraReal-Time-Analytics mit Spark und Cassandra
Real-Time-Analytics mit Spark und CassandraThomas Mann
 
Crawl the entire web in 10 minutes...and just 100€
Crawl the entire web  in 10 minutes...and just 100€Crawl the entire web  in 10 minutes...and just 100€
Crawl the entire web in 10 minutes...and just 100€Danny Linden
 

Viewers also liked (6)

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
Wer gewinnt das SQL-Rennen auf der Hadoop-Strecke?
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next Level
 
Real-Time-Analytics mit Spark und Cassandra
Real-Time-Analytics mit Spark und CassandraReal-Time-Analytics mit Spark und Cassandra
Real-Time-Analytics mit Spark und Cassandra
 
Crawl the entire web in 10 minutes...and just 100€
Crawl the entire web  in 10 minutes...and just 100€Crawl the entire web  in 10 minutes...and just 100€
Crawl the entire web in 10 minutes...and just 100€
 

Similar to SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)

HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Saptak Sen
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...Daniel Martin
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBob Ward
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 

Similar to SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2) (20)

HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
 
11540800.ppt
11540800.ppt11540800.ppt
11540800.ppt
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Mysql8for blr usercamp
Mysql8for blr usercampMysql8for blr usercamp
Mysql8for blr usercamp
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 

More from Sascha Dittmann

Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureSascha Dittmann
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSascha Dittmann
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric Sascha Dittmann
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSascha Dittmann
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightSascha Dittmann
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile ServicesSascha Dittmann
 
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwicklerdotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET EntwicklerSascha Dittmann
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopSascha Dittmann
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)Sascha Dittmann
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudSascha Dittmann
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...Sascha Dittmann
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureSascha Dittmann
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Sascha Dittmann
 

More from Sascha Dittmann (16)

C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft Azure
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsight
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
 
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwicklerdotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing Workshop
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
 
Big Data & NoSQL
Big Data & NoSQLBig Data & NoSQL
Big Data & NoSQL
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)

  • 1. SQLSaturday #230 Rheinland Sascha Dittmann Softwarearchitekt & Entwickler – Ernst & Young GmbH www.sascha-dittmann.de Georg Urban Snr. Technology Solution Professional | Data Platform georg.urban@microsoft.com 13.07.2013
  • 2.
  • 3. HIVE IN A NUTSHELL
  • 4. Hadoop & Business Intelligence  Hadoop is great for storing & processing *large* amounts of data  (but) Map/Reduce jobs are kind of low level  (most) BI tools rely on relational or multidimensional data sources and declarative languages like SQL | MDX | DAX ?
  • 5. The Hive Project  Hive was started at Facebook (2008)  Goal: empower business users to query Hadoop clusters with standard tools & SQL  Famous paper at VLDB conference 2009 in 2009 already 700TB data „lived“ in Hive at Facebook: 5.000 queries a day from over 100 users Hive is a „Data Warehouse“ for Hadoop! (a system for managing data structures build on top of Hadoop) http://www.vldb.org/pvldb/2/vldb09-938.pdf
  • 6. Hive architecture  Query Language: HiveQL (subset of SQL)  Uses Map/Reduce for execution  Rule based optimizer Driver (Compiler, Optimizer, Executor) Command Line Interface Web Interface Thrift Server Metastore JDBC ODBC Performance is an issue: Hortonworks stinger initiative aims for „human-time use cases“
  • 7. Hive concepts  Well known: Databases | Tables | Rows & Columns  Table = (file or) directory e.g. twitter_feeds -> /user/hive/warehouse/twitter_feeds  Storage: ORC (Optimized Row Columnar), Textfile, RCFile (Record Columnar File), etc.  Primitive Types: integer, float, string, date, boolean  …plus arrays, maps, user defined types  Partitions = subdirectories  Indexes = data subsets or bitmaps  HiveQL: SELECT…FROM…WHERE (incl. Joins, Aggregates, Union All, Subqueries)  Can embed M/R scripts
  • 8. HiveQL Example CREATE TABLE logdata( logdate string, logtime string, … time_taken int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; LOAD DATA INPATH '/w3c/input/data' OVERWRITE INTO TABLE logdata; SELECT logdate, logtime, time_taken FROM logdata LIMIT 200;
  • 11.
  • 12. HDInsight in Action: Managing Big Data
  • 13. Enriching Big Data in PowerPivot
  • 14. Big Data Mashup with Power View = Insights
  • 16. Real World Big Data  Yahoo! 180 PB raw data in > 40.000 computers (polystructured)*  Biggest Hadoop cluster: 4.500 nodes (2x4 CPUs, 4x1 TB disks, 16 GB RAM)  Page Impressions:  Cube with 207 Measures | 24 Dimensions | 247 Attributes  Desktop Clients (MS & Tableau): < 6s ad hoc query time http://wiki.apache.org/hadoop/PoweredBy
  • 17. Real Life Example (Sensor Data Analytics) XML, structured & unstructured files Preprocessing(z.B.C24,…) HiveHive Integration Services Database Service IntegrationServices ERP & other DBs HDInsight Browsers Excel & PowerPivot 3. Party Mobile Clients Analytical & DM Tools (R, SPSS, MS Data Mining,…) Longterm Storage & Preprocessing Power View More Microsoft… (Reporting Services, Performance Point,…) Integration, Analysis & Persistance Publishing & Collaboration Integration Services SharePoint Self Service analytical Applications Analysis Services
  • 18. SQL SERVER PARALLEL DATA WAREHOUSE
  • 20. V1 Reference PDW V2 The Basic Full Rack 10X Faster & 50% Lower Capital Cost Control Node Mgmt. Node Landing Zone Backup Node Estimated Total HW List Price: $1MM$ Estimated Total HW List Price: $500K$ Infiniband & Ethernet Fiber Channel 70% more disk I/O bandwidth Infiniband & Ethernet • 128 cores on 8 compute nodes • 2TB of RAM on compute • Up to 168 TB of tempdb • Up to 1PB of user data • 160 cores on 10 compute nodes • 1.28 TB of RAM on compute • Up to 30 TB of tempdb • Up to 150 TB of user data ComputeNode Storage Compute & Storage Control Node
  • 21. SQL Server 2012 Parallel Data Warehouse  Up- & downscale  2-56 Compute Nodes  Unique standardized Nodes: 256 GB  1-6 Racks  Compute & Storage Nodes are VMs  Simple Management  Hardware Abstraction  Different Workloads  ColumnStore v2 storage  High compressions  Updatable for incremental loads Development Goals FDR Infiniband Direct attached SAS Hardware Architecture
  • 22. Startsmall &grow Dynamic Scale Up Start small with a few Terabyte warehouse Add capacity up to 5 Petabytes Increments by 2-3 Compute Nodes plus Storage 0TB 5 PB Enterprise Warehouse PB NoDowntime
  • 23. Infiniband Infiniband Ethernet Ethernet Control Node Failover Node JBOD 1 Compute Node 1 Compute Node 2 JBOD 2 Compute Node 3 Compute Node 4 JBOD 3 Compute Node 5 Compute Node 6 JBOD 4 Compute Node 7 Compute Node 8 Customer Use Base Unit (6U): • Redundant Infiniband • Redundant Ethernet • Mgmt & Control (Active) • Rack Failover Node (Passive) Base Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB ¼Rack 15TB (Raw) 1/2Rack 30TB(Raw) Customer Space (8U) • ETL Servers • Backup Servers • Passive Unit (Additional spares) Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB FullRack 60TB(Raw) Infiniband Infiniband Ethernet Ethernet Failover Node JBOD 5 Compute Node 9 Compute Node 10 JBOD 6 Compute Node 11 Compute Node 12 JBOD 7 Compute Node 13 Compute Node 14 JBOD 8 Compute Node 15 Compute Node 16 Customer Use Extension Base Unit (5U): • Redundant Infiniband • Redundant Ethernet • Rack Failover Node (Passive) Extension Base Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB 1¼Rack 75.5TB (Raw) Customer Space (9U) • ETL Servers • Backup Servers • Passive Unit (Additional spares) Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB 3Rack 181.2TB(Raw) 11/2Rack 90.6TB(Raw) 2Rack 120.8TB(Raw) Infiniband Infiniband Ethernet Ethernet Failover Node JBOD 9 Compute Node 17 Compute Node 18 JBOD 10 Compute Node 19 Compute Node 20 JBOD 11 Compute Node 21 Compute Node 22 JBOD 12 Compute Node 23 Compute Node 24 Customer Use Extension Base Unit (5U): • Redundant Infiniband • Redundant Ethernet • Rack Failover Node (Passive) Extension Base Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Customer Space (9U) • ETL Servers • Backup Servers • Passive Unit (Additional spares) Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB Scale Unit (7U): • 2 HP 1U Servers • (16 Cores/Ea. Total: 32) • JBOD 5U • 1TB Drives • User Data Capacity: 75TB • 2 – 56 compute nodes • 1 – 7 racks • 1, 2, or 3 TB drives • 15.1 – 1268.4 TB raw • 53 – 6342 TB User data • Up to 7 spare nodes available across the entire appliance HP Configuration
  • 24.  VMs for different workloads (e.g. HDInsight zone)  Storage Spaces manage  physical disks on JBOD(s)  33 logical mirrored drives (66 drives & 4 hot spares)  Clustered Shared Volumes (CSV) allows all nodes to access the LUNs on the JBOD  One cluster across the whole appliance  VMs are automatically migrated on failure Host 1 Host 0 Host 2 Host 3 JBOD CTL MAD FAB AD VMM Compute 1 Compute 2 Host 2 Compute 1 Agility Due to Virtualization * 3 nodes per JBOD in Dell Configuration
  • 25. xVelocity Columnstore as primary Storage C1 C 2 C4 C5 C6 C 3 T.C1 T.C3T.C2 T.C4 T.C1 T.C3T.C2 T.C4 T.C1 T.C3T.C2 T.C4 Better IO & Caching  columns stored independend  early segment elimination  aggressives read ahead Speicher-Optimierung  new Memory Broker  segments are loaded when needed  …and stay as long as possible Batch Mode  max. parallelism  ca. 1.000 values per kernel  CPU time is reduced by ratio 7 to 40 SELECT Region, SUM(Sales) … T.C2 T.C4 Bitmapofqualifiedrows Column vectors Batch- Object
  • 27. Compression Rates in the Demo 82 34.2 22.9 11.2 4 0 10 20 30 40 50 60 70 80 90 Rohdaten Page Compression Backup Compression Columnstore-Index (Disk) Columstore-Index (Memory) in GB Kompressionsrate bei *diesen* Daten: ca. 20
  • 28. Columnstore: The next Generation  Columnstore becomes primary data structure (clustered index)  No need for base table  Allows Updates & Deletes (temporary row store)  Easy data managment  Improvements:  Supports all (reasonable) data types  Support more query operators  Statistics on partitioned tables PDW v2 & SQL Server 2014
  • 29. Microsoft BI Stack Connectivity Targeted Driver SNAC11 SNAC10 .NET (sqlclient) OLEDB ODBC OLEDB ODBC PDW Tool 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit 32 bit 64 bit SSRS/Reporting Services SSRS - SS2012 (report builder and SSDT) X X X X X X SSRS - SS2008 R2 (report builder) X n/a X n/a X n/a SSAS/Analysis Services SSAS – SS2012 X X X X X X X X X X SSAS – SS2008 R2 X X X X X X Linked Server (DQ) DQ – SS2008 X X X X SSIS/Integration Services SSIS - SS2012 X X X X X X X X X X SSIS - SS2008 R2 X X X X X X PowerPivot for Excel n/a n/a X X n/a n/a X X Power View SS2012 w/ and w/o direct query X X X X X X MS BI Direct Query Excel X X X X n/a n/a Direct Query n/a n/a X X n/a n/a Access n/a n/a X X n/a n/a Master Data Services SS2012 X X X X X X SS2008 X X X X Quality Services SS2012 X X X X X X SS2008 X X X X
  • 30. Monitoring  Build in Monitoring by GUI or Management Views (DMVs)  System Center Management Packs for PDW
  • 31. Simple Resource Management  Pre-built resource classes in PDW  Resource class =  PDW concurrency slots in use  Memory utilization  Priority  DBA controls how requests are mapped to resource classes.  PDW honors resource class at run- time PDW Concurrency slots in use: 3 Memory: V1 HW – 600MB, V2 HW – 1.2GB Priority: Medium PDW Concurrency slots in use: 21 Memory: V1 HW ~4.2GB, V2 HW ~8.4GB Priority: High PDW Concurrency slots in use: 1 Memory: V1 HW ~200MB, V2 HW ~400MB Priority: Medium PDW Concurrency slots in use: 7 Memory: V1 HW – 1.4GB, V2 HW ~2.8GB Priority: High
  • 32. Improve T-SQL Parity T-SQL additions to increase compatibility:  SQL Server Data Tools  Microsoft BI Tools  3rd Party Tools, like Tableau Dedicated PDW tools are deprecated Catalog SPs examples: sp_tables_rowset;2 sp_catalogs_rowset sp_executesql Built-in function examples: db_id db_name object_id General T-SQL improvements examples: cross/outer apply sp_prepare sp_execute Configuration Functions: • @@LANGUAGE • @@SPID SET options: • SET ROWCOUNT • SET FMTONLY
  • 34. Hadoop / Big Data-Integration: Microsoft  T-SQL query engine for RDBMS & Hadoop  Cost base optimizer. decides on:  Moving HDFS data into RDBMS storage  Rendering operators in Map/Reduce-Jobs or  HDFS-Bridge for parallelized Data Transport & HDFS Data Nodes Regular T-SQL Results PDW V2
  • 35. External Tables are mapped to HDFS files Fields in the file are defined as columns in the PDW External table File characteristics are also provided during definition This works for HDInsight, Hortonworks HDP & Cloudera CREATE EXTERNAL TABLE ClickEvent ( url varchar(50), event_date date, user_IP varchar(50)), WITH (LOCATION =‘hdfs://MyHadoop:5000/clickstream/click. txt’, FORMAT_OPTIONS (FIELD_TERMINATOR = '|')); External Tables Hadoop Integration
  • 36. Polybase: Creating an external Table  familiar tooling: SSMS & Data Tools
  • 37. Polybase: A really simple Query Example  Here: external data movement is ExternalRoundRobinMove  Parallel HDFS readers will run on every data node (e.g. 10 nodes à 8 threads)
  • 38. BI & Big Data Solution with SQL PDW v2 &
  • 39. Single Query; Structured and Unstructured Query and join Hadoop tables with Relational Tables Use Standard SQL language ExistingSQL Skillset NoIT Intervention SaveTime andCostsDatabas e HDFS (Hadoop) SQL Server 2012 PDW Powered by PolyBase SQL Analyzeall DataTypes PolyBase: Breakthrough in Data Processing
  • 40.
  • 41. Resources  SQL Server CAT-Blog http://sqlcat.com  bwin Case Study http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000001470  Microsoft Big Data Site http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big- data.aspx  Introduction to Hadoop on Windows Azure http://channel9.msdn.com/Events/windowsazure/learn/Introduction-to-Hadoop-on-Windows-Azure  SQL Server Team Blog http://blogs.technet.com/b/dataplatforminsider  Microsoft YouTube Big Data Channel http://www.youtube.com/playlist?list=PLD471EE01A293CC34  TechEd Sessions http://channel9.msdn.com/Events/TechEd  Microsoft Connect (Product Feedback) http://connect.microsoft.com
  • 42.
  • 43. Vielen Dank an die Volunteers! 13.07.2013 |
  • 44. Große Verlosung!  Am Ende der Veranstaltung (ca. 18:00 Uhr)  Gewinnt viele Preise!  Deshalb: 13.07.2013 | Besucht unsere Sponsoren!
  • 45. Unsere „You Rock! “ Sponsoren 13.07.2013 |
  • 46. Vielen Dank an all unsere Sponsoren! 13.07.2013 | Gold Silber Bronze
  • 48. Hands-on event: PASS Camp 2013! 13.07.2013 |