SlideShare uma empresa Scribd logo
1 de 34
Sascha Dittmann
Blog: http://www.sascha-dittmann.de
Twitter: @SaschaDittmann
Microsoft HDInsight für .NET Entwickler
Big Data Analysen mit JavaScript und C#
Large Hadron Collider (CERN Schweiz)
http://public.web.cern.ch/public/en/lhc/Computing-en.html
Der LHC Teilchenbeschleuniger
produziert 15 PB Messdaten pro Jahr*
Woher kommt Big Data
70% of U.S.
smartphone owners
regularly shop online
via their devices.
44% of users
(350M people)
access Facebook via
mobile devices.
50% of
millennials use
mobile devices to
research products.
60%of U.S.
mobile data will be
audio and video
streaming by 2014.
Mobility
2/3of the world's
mobile data traffic will
be video by 2016.
33%of BI will
be consumed via
handheld devices
by 2013.
Gaming consoles are
now used an average of
1.5 hrs/wk
to connect to the
Internet.
80%growth of
unstructured data is
predicted over the
next five years.
1.8 zettabytes
of digital data were
in use
worldwide in
2011, up 30%
from 2010.
1 in 4
Facebook users
add their location
to posts
(2B/month).
500M Tweets
are hosted on
Twitter each day.
38% of people
recommend a brand
they “like” or follow
on a social network.
100M
Facebook
“likes” per day.
Brands get
Big
Data
Social
Mobility Cloud
Big Data Szenarien
Web app
optimization
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting
Natural resource
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
optimization
Legal
discovery
Big Data ist sexy
http://hbr.org/
Apache Hadoop Ecosystem
MapReduce (Job Scheduling/Execution System)
HDFS
(Hadoop Distributed File System)
HBase (Column DB)
Pig (Data
Flow)
Hive
(Warehouse
and Data
Access)
Oozie
(Workflow)
Sqoop
Traditional BI Tools
HBase / Cassandra
(Columnar NoSQL Databases)
Avro(Serialization)
Zookeeper(Coordination)
Apache
Mahout
Cascading
(programming
model)
Hadoop = MapReduce + HDFS
Flume
Microsoft HDInsight
MapReduce (Job Scheduling/Execution System)
HDFS
(Hadoop Distributed File System)
HBase (Column DB)
Pig
(Data
Flow)
Hive
(Warehous
e and Data
Access)
Oozie
(Workflow)
Sqoop
Traditional BI Tools
HBase / Cassandra
(Columnar NoSQL Databases)
Avro(Serialization)
Zookeeper(Coordination)
Apache
Mahout
Cascading
(programmin
g model)
Hadoop = MapReduce + HDFS
Flume
Windows
SystemCenter
ActiveDirectory
Visual Studio
Hadoop Distributed File System (HDFS)
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)
 Portable Operating System Interface (POSIX)
 Replikation auf mehrere Datenknoten
js> #ls /user/Sascha/input/ncdc
Found 9 items
drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all
drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2
drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata
drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro
drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab
-rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt
-rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
HDInsight Dashboard Demo
Map/Reduce am Beispiel von Messdaten
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
Jahr Lufttemperatur
Map/Reduce am Beispiel von Messdaten
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
Messqualität
Map/Reduce
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Reduce
0067011990999991950051507004+68750
0043011990999991950051512004+68750
0043011990999991950051518004+68750
0043012650999991949032412004+62300
0043012650999991949032418004+62300
1949,0
1950,22
1950,55
1952,-11
1950,33
1949,0
1950,[22,33,55]
1952,-11
1949,0
1950,55
1952,-11
Map/Reduce mit Combine Methode
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
Shuffle
DataNode
Reduce
0067011990999991950051507004+68750
0043011990999991950051512004+68750
0043011990999991950051518004+68750
0043012650999991949032412004+62300
0043012650999991949032418004+62300
1949,0
1950,22
1950,55
1952,-11
1950,33
1949,0
1950,55
1952,-11
1950,33
1949,0
1950,[33,55]
1952,-11
1949,0
1950,55
1952,-11
Map/Reduce am Beispiel von Messdaten
Wörter zählen mit JavaScript (Map)
Wörter zählen mit JavaScript (Reduce)
Map/Reduce mit JavaScript
Verfeinern mit Pig Latin
pig
.from("/user/Sascha/input/texte")
.mapReduce("/user/…/WordCount.js"
, "Woerter, Anzahl:long")
.orderBy("Anzahl DESC")
.take(15)
.to("/user/Sascha/output/Top15Woerter")
Pig Latin
Wörter zählen mit C# (Map - Classic)
Wörter zählen mit C# (Reduce - Classic)
Map/Reduce mit C#
.NET Job Submission Framework (Map)
.NET Job Submission Framework (Reduce)
Externe Hive-Tabelle erzeugen
CREATE EXTERNAL TABLE twitter_raw
(
tweet_json STRING
)
COMMENT 'Twitter Sample Data'
ROW FORMAT DELIMITED LINES TERMINATED
BY '10'
STORED AS TEXTFILE
LOCATION '/example/twitterdata';
Twitter JSON
{
"possibly_sensitive_editable":true,
"place":null,
"text":"Pre - #ConvCloud chat insights. " #Cloud Security, are we missing the point?" from
@christianve http://t.co/Smo0CPvb #HP #cloudsource”,
"id_str":"223418953114984448”,
"favorited":false,
"possibly_sensitive":false,
"created_at":"Thu Jul 12 14:10:04 +0000 2012",
"retweeted":false,
"retweet_count":0,
"user":{
"is_translator":false,
"profile_use_background_image":true,
"profile_image_url_https":"https://si0.twimg.com/profile_images/640456324/
Paul_Calento_normal.jpg",
"id_str":"103006513",
"profile_text_color":"333333",
"statuses_count":5984,
"following":null,
"followers_count":744,
"default_profile_image":false,
"profile_link_color":"FF3300",
}, …..
}
JSON in Hive interpretieren
FROM twitter_raw
INSERT OVERRIDE TABLE twitter_temp
SELECT get_json_object(tweet_json, '$.created_at'),
substr(get_json_object(tweet_json, '$.created_at'),9,2),
substr(get_json_object(tweet_json, '$.created_at'),12,8),
get_json_object(tweet_json, '$.in_reply_to_user_id_str'),
get_json_object(tweet_json, '$.text'),
get_json_object(tweet_json, '$.contributors'),
get_json_object(tweet_json, '$.retweeted'),
get_json_object(tweet_json, '$.truncated'),
get_json_object(tweet_json, '$.favorited'),
cast(get_json_object(tweet_json, '$.retweet_count') as int),
/* … */
get_json_object(tweet_json, '$.user.profile_image_url_https'),
cast(get_json_object(tweet_json, '$.user.followers_count') as int),
get_json_object(tweet_json, '$.user.location'),
get_json_object(tweet_json, '$.user.time_zone'),
get_json_object(tweet_json, '$.user.created_at');
Hive
RDBMS vs. Hadoop
RDBMS Hadoop
Volumen Gigabyte Petabyte
Verarbeitung Ad-Hoc und batch Batch
Updates Viele Lese- und
Schreibzugriffe
Einmal schreiben,
Viele Lesezugriffe
Schema Statisches Schema Dynamisches Schema
Datenintegrität Hoch Niedrig
Skalierverhalten Nicht-Linear Linear
Polybase / SQL Server PDW
Fragen
? ?
?
?
?

Mais conteúdo relacionado

Semelhante a dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler

Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scalejgoulah
 
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...FIDE Master Tihomir Dovramadjiev PhD
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
 
Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​Rafal Warzycha
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Guía de usuario
Guía de usuarioGuía de usuario
Guía de usuarioSe Aprender
 
Making a Better World with Technology Innovations
Making a Better World with Technology InnovationsMaking a Better World with Technology Innovations
Making a Better World with Technology InnovationsImesh Gunaratne
 
Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...Rick Robinson
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the coversAthens Big Data
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data ScienceEdureka!
 
Francis da costa rethinks the internet of things zd_net
Francis da costa rethinks the internet of things   zd_netFrancis da costa rethinks the internet of things   zd_net
Francis da costa rethinks the internet of things zd_netMeshDynamics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Kinectic vision looking deep into depth
Kinectic vision   looking deep into depthKinectic vision   looking deep into depth
Kinectic vision looking deep into depthppd1961
 
Realtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTRealtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTMetatron
 
Vinay Reddy resume
Vinay Reddy resumeVinay Reddy resume
Vinay Reddy resumeVinay Reddy
 
Ds latino alejandrov4
Ds latino alejandrov4Ds latino alejandrov4
Ds latino alejandrov4alejandro_xf
 
A novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applicationsA novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applicationsHoopeer Hoopeer
 
Web 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking BackWeb 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking BackGarrick Schmitt
 

Semelhante a dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler (20)

Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scale
 
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
 
Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Guía de usuario
Guía de usuarioGuía de usuario
Guía de usuario
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Making a Better World with Technology Innovations
Making a Better World with Technology InnovationsMaking a Better World with Technology Innovations
Making a Better World with Technology Innovations
 
Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Francis da costa rethinks the internet of things zd_net
Francis da costa rethinks the internet of things   zd_netFrancis da costa rethinks the internet of things   zd_net
Francis da costa rethinks the internet of things zd_net
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Kinectic vision looking deep into depth
Kinectic vision   looking deep into depthKinectic vision   looking deep into depth
Kinectic vision looking deep into depth
 
Realtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTRealtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKT
 
Vinay Reddy resume
Vinay Reddy resumeVinay Reddy resume
Vinay Reddy resume
 
Ds latino alejandrov4
Ds latino alejandrov4Ds latino alejandrov4
Ds latino alejandrov4
 
A novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applicationsA novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applications
 
Web 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking BackWeb 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking Back
 

Mais de Sascha Dittmann

Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureSascha Dittmann
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSascha Dittmann
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric Sascha Dittmann
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSascha Dittmann
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelSascha Dittmann
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightSascha Dittmann
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)Sascha Dittmann
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile ServicesSascha Dittmann
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopSascha Dittmann
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)Sascha Dittmann
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudSascha Dittmann
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...Sascha Dittmann
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureSascha Dittmann
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Sascha Dittmann
 

Mais de Sascha Dittmann (18)

C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft Azure
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next Level
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsight
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing Workshop
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
 
Big Data & NoSQL
Big Data & NoSQLBig Data & NoSQL
Big Data & NoSQL
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler

  • 1. Sascha Dittmann Blog: http://www.sascha-dittmann.de Twitter: @SaschaDittmann Microsoft HDInsight für .NET Entwickler Big Data Analysen mit JavaScript und C#
  • 2. Large Hadron Collider (CERN Schweiz) http://public.web.cern.ch/public/en/lhc/Computing-en.html Der LHC Teilchenbeschleuniger produziert 15 PB Messdaten pro Jahr*
  • 3. Woher kommt Big Data 70% of U.S. smartphone owners regularly shop online via their devices. 44% of users (350M people) access Facebook via mobile devices. 50% of millennials use mobile devices to research products. 60%of U.S. mobile data will be audio and video streaming by 2014. Mobility 2/3of the world's mobile data traffic will be video by 2016. 33%of BI will be consumed via handheld devices by 2013. Gaming consoles are now used an average of 1.5 hrs/wk to connect to the Internet. 80%growth of unstructured data is predicted over the next five years. 1.8 zettabytes of digital data were in use worldwide in 2011, up 30% from 2010. 1 in 4 Facebook users add their location to posts (2B/month). 500M Tweets are hosted on Twitter each day. 38% of people recommend a brand they “like” or follow on a social network. 100M Facebook “likes” per day. Brands get Big Data Social Mobility Cloud
  • 4. Big Data Szenarien Web app optimization Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting Natural resource exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure optimization Legal discovery
  • 5. Big Data ist sexy http://hbr.org/
  • 6. Apache Hadoop Ecosystem MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehouse and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume
  • 7. Microsoft HDInsight MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehous e and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programmin g model) Hadoop = MapReduce + HDFS Flume Windows SystemCenter ActiveDirectory Visual Studio
  • 8. Hadoop Distributed File System (HDFS) Bootvorgang Ausfallsicherheit Benutzeranfrage
  • 9. Hadoop Distributed File System (HDFS) Bootvorgang Ausfallsicherheit Benutzeranfrage
  • 11. Hadoop Distributed File System (HDFS)  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls /user/Sascha/input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 13. Map/Reduce am Beispiel von Messdaten 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Jahr Lufttemperatur
  • 14. Map/Reduce am Beispiel von Messdaten 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Messqualität
  • 16. Map/Reduce mit Combine Methode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,55 1952,-11 1950,33 1949,0 1950,[33,55] 1952,-11 1949,0 1950,55 1952,-11
  • 17. Map/Reduce am Beispiel von Messdaten
  • 18. Wörter zählen mit JavaScript (Map)
  • 19. Wörter zählen mit JavaScript (Reduce)
  • 21. Verfeinern mit Pig Latin pig .from("/user/Sascha/input/texte") .mapReduce("/user/…/WordCount.js" , "Woerter, Anzahl:long") .orderBy("Anzahl DESC") .take(15) .to("/user/Sascha/output/Top15Woerter")
  • 23. Wörter zählen mit C# (Map - Classic)
  • 24. Wörter zählen mit C# (Reduce - Classic)
  • 26. .NET Job Submission Framework (Map)
  • 27. .NET Job Submission Framework (Reduce)
  • 28. Externe Hive-Tabelle erzeugen CREATE EXTERNAL TABLE twitter_raw ( tweet_json STRING ) COMMENT 'Twitter Sample Data' ROW FORMAT DELIMITED LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/example/twitterdata';
  • 29. Twitter JSON { "possibly_sensitive_editable":true, "place":null, "text":"Pre - #ConvCloud chat insights. " #Cloud Security, are we missing the point?" from @christianve http://t.co/Smo0CPvb #HP #cloudsource”, "id_str":"223418953114984448”, "favorited":false, "possibly_sensitive":false, "created_at":"Thu Jul 12 14:10:04 +0000 2012", "retweeted":false, "retweet_count":0, "user":{ "is_translator":false, "profile_use_background_image":true, "profile_image_url_https":"https://si0.twimg.com/profile_images/640456324/ Paul_Calento_normal.jpg", "id_str":"103006513", "profile_text_color":"333333", "statuses_count":5984, "following":null, "followers_count":744, "default_profile_image":false, "profile_link_color":"FF3300", }, ….. }
  • 30. JSON in Hive interpretieren FROM twitter_raw INSERT OVERRIDE TABLE twitter_temp SELECT get_json_object(tweet_json, '$.created_at'), substr(get_json_object(tweet_json, '$.created_at'),9,2), substr(get_json_object(tweet_json, '$.created_at'),12,8), get_json_object(tweet_json, '$.in_reply_to_user_id_str'), get_json_object(tweet_json, '$.text'), get_json_object(tweet_json, '$.contributors'), get_json_object(tweet_json, '$.retweeted'), get_json_object(tweet_json, '$.truncated'), get_json_object(tweet_json, '$.favorited'), cast(get_json_object(tweet_json, '$.retweet_count') as int), /* … */ get_json_object(tweet_json, '$.user.profile_image_url_https'), cast(get_json_object(tweet_json, '$.user.followers_count') as int), get_json_object(tweet_json, '$.user.location'), get_json_object(tweet_json, '$.user.time_zone'), get_json_object(tweet_json, '$.user.created_at');
  • 31. Hive
  • 32. RDBMS vs. Hadoop RDBMS Hadoop Volumen Gigabyte Petabyte Verarbeitung Ad-Hoc und batch Batch Updates Viele Lese- und Schreibzugriffe Einmal schreiben, Viele Lesezugriffe Schema Statisches Schema Dynamisches Schema Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • 33. Polybase / SQL Server PDW