SlideShare uma empresa Scribd logo
1 de 42
Big Data mit Microsoft?
Wie HDInsight, SQL Server 2014
und Excel zusammenspielen
Olivia Klose, Technical Evangelist
Georg Urban, Sr. Technology Solution Professional
Microsoft Deutschland GmbH
The large hadron collider produces 15 PB/year*

http://public.web.cern.ch/public/en/lhc/Computing-en.html
But what if I don‟t
own a large hadron
collider …
 Large scale plants
 Vehicle fleets
 Smart Grids
 Green Energy
 Stock Exchanges
 Host Protocols
 Computer Centers
 Web Farms
 Twitter
 Facebook
 Google Analytics
 …
XML – but…
 polystructured
 varying
 no explicit schema
 lot„s of hex-BLOBs

40.000 attributes & growing

„here is my data“

</meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet
sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00
endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz
satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18
70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00
00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2
01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02
01</result><result name="STAT_KL15_ROH">0</result><result
name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result
name="STAT_ISTGANG_TEXT">Neutral</result>
<sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-0430T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result
name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00
00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00
00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 0
00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 2
00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00
02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02
00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00
8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 0
00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00
09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00
00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00
00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00
00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00
00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00
00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00
00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00
00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6
00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 0
00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00
17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A 00 00 26 00 00 13 00 0D 00 08
09 00 04 00 0A 00 00 00 00 00 00 00 00 03 00 00 0D 00 00 0B 00 00 07 00 02 00 05 00 01 00 01 00 04
00 00 00 00 00 00 04 00 07 00 08 00 06 00 04 00 …





small data subsets are stored
most data stays in file system (original XML-files)
only about 3 years history is stored in the moment
very much denormalized data
(e.g. Entity-Attribute-Value tables)

 TCO & performance limits

(queries are slow - pivoting is expensive)
 cover the whole live cycle 15 years

(incl. production data)
 more data sources: social media (motortalk)
 lower TCO for storage & flexible analysis
 …impossible with „classical“ RDBMS
"Big data" is high-volume, high-velocity
and high-variety information assets that
demand cost-effective, innovative forms of
information processing for enhanced
insight and decision making.

Source:
The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
...

Modular Hardware Architecture

...

 ColumnStore v2 storage
 Hadoop Regions
Tight integration of
“nonstructured” data

FDR Infiniband

Ultra high compression

Direct
attached SAS

Scale Unit
Parallel Data Warehouse Screenshots
PDW in SQL Server Data Tools

A familiar development enviroment
…just counting rows

Scanning 10 billion rows…

…does not take…

…that long!
…a reporting query

…won„t take…

And even complex queries…

…much longer!
Data Distribution
Data is distributed evenly
over all data nodes…
Azure UX

Azure SDK

HDInsight

*
Hive

Templeton

RDP

*
Pig

HCatalog

Ambari

Map Reduce

*
Azure Blobs

*

= good to know!

HDFS

Sqoop
Oozie
Analyze

Demo-Umgebung
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Solution Components

HDInsight

Virtual Machine

Twitter

Excel
Big Data Twitter Demo
Azure Management Portal
Analyse

Manage
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo – Dashboard
Analyse

Manage
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo – SQL Azure
Analyse

Manage
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo
Azure Blob Storage
Analyse

Analyse
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo – Hive
Analyse

Insight
Extract

Azure Blob
Storage
…

Twitter

Hive Tables
StreamInsight

SQL Azure

Real-Time
Dashboard

Mash Up &
Visualise
Big Data Twitter Demo
Mash Up in Excel
Polybase

Regular T-SQL

Results

 T-SQL query engine for RDBMS & Hadoop
 Cost base optimizer. decides on:
 Rendering operators in Map/Reduce-Jobs or
 Moving HDFS data into RDBMS storage

PDW

 HDFS-Bridge for parallelized Data Transport

HDFS Data Nodes

&
T-SQL for Polybase

A distributed query.

Definition of an external table.
Modern Data Warehousing
Parallel Data Warehouse

HDInsight

Polybase

&
Big Data Enterprise Architecture

&
What„s next…
Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/
Twitter Big Data Setup: http://aka.ms/bigdatatwitter
Azure Trial: http://aka.ms/azurenow
HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/
Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/
PDW und Polybase: http://microsoft.com/pdw

Microsoft Big Data: http://microsoft.com/bigdata
Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
“Big data is like teen sex.
Everybody is talking about it,
everyone thinks everyone else is doing
it,
so everyone claims they are doing it.”
Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University
Big Data mit Microsoft?

Mais conteúdo relacionado

Semelhante a Big Data mit Microsoft?

Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...Olivia Klose
 
Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog Ahmad Shabri
 
Compilation process
Compilation processCompilation process
Compilation processAlex Denisov
 
Horses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche ApplicationsHorses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche ApplicationsNikita Johnson
 
LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段Koji Shinkubo
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 
IBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middlewareIBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middlewareOktawian Powazka
 
Acerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリーAcerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリーFollowpower Liu
 
nullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bitsnullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bitsn|u - The Open Security Community
 
Looking in the eye of the bits
Looking in the eye of the bitsLooking in the eye of the bits
Looking in the eye of the bitsIftach Ian Amit
 
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaAWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaSerhiy Batyuk
 

Semelhante a Big Data mit Microsoft? (20)

Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
Jumpstarting big data projects / Architectural Considerations of HDInsight Ap...
 
Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog Aimp3 memory manager_eventlog
Aimp3 memory manager_eventlog
 
Log
LogLog
Log
 
Compilation process
Compilation processCompilation process
Compilation process
 
Horses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche ApplicationsHorses for Courses: Deep Learning Beyond Niche Applications
Horses for Courses: Deep Learning Beyond Niche Applications
 
Log
LogLog
Log
 
LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段LT SAP HANAネットワークプロトコル初段
LT SAP HANAネットワークプロトコル初段
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Oiu
OiuOiu
Oiu
 
talk.ppt
talk.ppttalk.ppt
talk.ppt
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 
IBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middlewareIBM Global Security Kit as a Cryptographic layer for IBM middleware
IBM Global Security Kit as a Cryptographic layer for IBM middleware
 
Acerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリーAcerノートpcバッテリー,リチウムイオンバッテリー
Acerノートpcバッテリー,リチウムイオンバッテリー
 
CAR Email 5.22.03 (b)
CAR Email 5.22.03 (b)CAR Email 5.22.03 (b)
CAR Email 5.22.03 (b)
 
Performance Risk Management
Performance Risk ManagementPerformance Risk Management
Performance Risk Management
 
nullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bitsnullcon 2011 - Memory analysis – Looking into the eye of the bits
nullcon 2011 - Memory analysis – Looking into the eye of the bits
 
crack satellite
crack satellite crack satellite
crack satellite
 
Looking in the eye of the bits
Looking in the eye of the bitsLooking in the eye of the bits
Looking in the eye of the bits
 
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@LohikaAWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
AWS Simple Workflow: Distributed Out of the Box! - Morning@Lohika
 
CAR Email 4.11.03 (b)
CAR Email 4.11.03 (b)CAR Email 4.11.03 (b)
CAR Email 4.11.03 (b)
 

Mais de Olivia Klose

Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?Olivia Klose
 
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Olivia Klose
 
TechCamps - Internet of Things
TechCamps - Internet of ThingsTechCamps - Internet of Things
TechCamps - Internet of ThingsOlivia Klose
 
What does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudWhat does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudOlivia Klose
 
Developer Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine LearningDeveloper Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine LearningOlivia Klose
 
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-EndDotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-EndOlivia Klose
 
Would I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft AzureWould I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft AzureOlivia Klose
 

Mais de Olivia Klose (8)

Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?Evolution of AI - Why is my computer still so dumb?
Evolution of AI - Why is my computer still so dumb?
 
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)Deep Learning for New User Interactions (Gestures, Speech and Emotions)
Deep Learning for New User Interactions (Gestures, Speech and Emotions)
 
TechCamps - Internet of Things
TechCamps - Internet of ThingsTechCamps - Internet of Things
TechCamps - Internet of Things
 
What does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the CloudWhat does Bob really want? Recommenders in the Cloud
What does Bob really want? Recommenders in the Cloud
 
Developer Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine LearningDeveloper Week 2015: Azure Machine Learning
Developer Week 2015: Azure Machine Learning
 
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-EndDotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
Dotnet Cologne 2015: //Rebuild - Big Data Analysis End-to-End
 
Would I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft AzureWould I have survived the Titanic? Machine Learning in Microsoft Azure
Would I have survived the Titanic? Machine Learning in Microsoft Azure
 
Big Data DIY
Big Data DIYBig Data DIY
Big Data DIY
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Big Data mit Microsoft?

  • 1. Big Data mit Microsoft? Wie HDInsight, SQL Server 2014 und Excel zusammenspielen Olivia Klose, Technical Evangelist Georg Urban, Sr. Technology Solution Professional Microsoft Deutschland GmbH
  • 2.
  • 3. The large hadron collider produces 15 PB/year* http://public.web.cern.ch/public/en/lhc/Computing-en.html
  • 4. But what if I don‟t own a large hadron collider …
  • 5.  Large scale plants  Vehicle fleets  Smart Grids  Green Energy  Stock Exchanges  Host Protocols  Computer Centers  Web Farms  Twitter  Facebook  Google Analytics  …
  • 6.
  • 7.
  • 8. XML – but…  polystructured  varying  no explicit schema  lot„s of hex-BLOBs 40.000 attributes & growing „here is my data“ </meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00 endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18 70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00 00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2 01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02 01</result><result name="STAT_KL15_ROH">0</result><result name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result name="STAT_ISTGANG_TEXT">Neutral</result> <sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-0430T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00 00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00 00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 2 00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00 02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02 00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00 8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 0 00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00 09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00 00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00 00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00 00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00 00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6 00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 0 00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00 17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A 00 00 26 00 00 13 00 0D 00 08 09 00 04 00 0A 00 00 00 00 00 00 00 00 03 00 00 0D 00 00 0B 00 00 07 00 02 00 05 00 01 00 01 00 04 00 00 00 00 00 00 04 00 07 00 08 00 06 00 04 00 …
  • 9.     small data subsets are stored most data stays in file system (original XML-files) only about 3 years history is stored in the moment very much denormalized data (e.g. Entity-Attribute-Value tables)  TCO & performance limits (queries are slow - pivoting is expensive)  cover the whole live cycle 15 years (incl. production data)  more data sources: social media (motortalk)  lower TCO for storage & flexible analysis  …impossible with „classical“ RDBMS
  • 10. "Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
  • 11.
  • 12. ... Modular Hardware Architecture ...  ColumnStore v2 storage  Hadoop Regions Tight integration of “nonstructured” data FDR Infiniband Ultra high compression Direct attached SAS Scale Unit
  • 13. Parallel Data Warehouse Screenshots
  • 14. PDW in SQL Server Data Tools A familiar development enviroment
  • 15. …just counting rows Scanning 10 billion rows… …does not take… …that long!
  • 16. …a reporting query …won„t take… And even complex queries… …much longer!
  • 17. Data Distribution Data is distributed evenly over all data nodes…
  • 18.
  • 19.
  • 20.
  • 21. Azure UX Azure SDK HDInsight * Hive Templeton RDP * Pig HCatalog Ambari Map Reduce * Azure Blobs * = good to know! HDFS Sqoop Oozie
  • 22.
  • 25. Big Data Twitter Demo Azure Management Portal
  • 27. Big Data Twitter Demo – Dashboard
  • 29. Big Data Twitter Demo – SQL Azure
  • 31. Big Data Twitter Demo Azure Blob Storage
  • 33. Big Data Twitter Demo – Hive
  • 35. Big Data Twitter Demo Mash Up in Excel
  • 36. Polybase Regular T-SQL Results  T-SQL query engine for RDBMS & Hadoop  Cost base optimizer. decides on:  Rendering operators in Map/Reduce-Jobs or  Moving HDFS data into RDBMS storage PDW  HDFS-Bridge for parallelized Data Transport HDFS Data Nodes &
  • 37. T-SQL for Polybase A distributed query. Definition of an external table.
  • 38. Modern Data Warehousing Parallel Data Warehouse HDInsight Polybase &
  • 39. Big Data Enterprise Architecture &
  • 40. What„s next… Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/ Twitter Big Data Setup: http://aka.ms/bigdatatwitter Azure Trial: http://aka.ms/azurenow HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/ Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/ PDW und Polybase: http://microsoft.com/pdw Microsoft Big Data: http://microsoft.com/bigdata Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
  • 41. “Big data is like teen sex. Everybody is talking about it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University

Notas do Editor

  1. Olivia