SlideShare uma empresa Scribd logo
1 de 12
Apache Hadoop-based Services für Windows Azure




Sascha Dittmann
Software Developer / Solution Architect
Twitter: @SaschaDittmann
Blog:    http://www.sascha-dittmann.de
Apache Hadoop & Co

             Zookeeper




    Pig
Hadoop Distributed File System

           Cluster Startvorgang
Hadoop Distributed File System
           Ausfall des Namenodes (Failover)
Hadoop Distributed File System
       Benuteranfrage


                        ①

           ②     ②          ②
Hadoop Distributed File System
 Portable Operating System Interface (POSIX)
 Replikation auf mehrere Datenknoten
js> #ls input/ncdc
Found 9 items
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:09 /user/Sascha/input/ncdc/all
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab
-rw-r--r-- 3 Sascha   supergroup   529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt
-rw-r--r-- 3 Sascha   supergroup   168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
Map/Reduce
 DataNode   DataNode   DataNode   0067011990999991950051507004+68750
                                  0043011990999991950051512004+68750
                                  0043011990999991950051518004+68750
                                  0043012650999991949032412004+62300
                                  0043012650999991949032418004+62300




                                  1949,0
                                                         1952,-11
                                  1950,22
   Map        Map        Map      1950,55
                                                         1950,33




   Sort       Sort       Sort     1949,0
                                  1950,[22,33,55]
  Shuffle    Shuffle    Shuffle   1952,-11




             Reduce
                                  1949,0
                                  1950,55
                                  1952,-11
Map/Reduce
 DataNode   DataNode   DataNode   0067011990999991950051507004+68750
                                  0043011990999991950051512004+68750
                                  0043011990999991950051518004+68750
                                  0043012650999991949032412004+62300
                                  0043012650999991949032418004+62300




                                  1949,0
                                                         1952,-11
                                  1950,22
   Map        Map        Map      1950,55
                                                         1950,33




                                  1949,0                 1952,-11
 Combine    Combine    Combine    1950,55                1950,33




   Sort       Sort       Sort     1949,0
                                  1950,[33,55]
  Shuffle    Shuffle    Shuffle   1952,-11




             Reduce
                                  1949,0
                                  1950,55
                                  1952,-11
RDBMS vs. Map/Reduce
                          RDBMS                  Map/Reduce
Datenmenge                Gigabytes              Petabytes
Zugriff                   Interaktiv und Batch   Batch
Lese- / Schreibzugriffe   Viele Lese- und        Einmaliges Schreiben
                          Schreibzugriffe        Viele Lesezugriffe
Datenstruktur             Statisches Schema      Dynamisches Schema
Datenintegrität           Hoch                   Niedrig
Skalierverhalten          Nicht-Linear           Linear
Apache Hadoop & Co

             Zookeeper




    Pig
Demos
 Hadoop Dashboard
 Interactive Console
 Remote Desktop
 Nutzung des WA Storage
 Map/Reduce via JavaScript
 C# Streaming
 Power Pivot
Cloud Bloggers


Die Blogs der deutschen Cloud Computing-Community

Link: http://cloudbloggers.de

Mais conteúdo relacionado

Mais de Sascha Dittmann

dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
Sascha Dittmann
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
Sascha Dittmann
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
Sascha Dittmann
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
Sascha Dittmann
 

Mais de Sascha Dittmann (15)

C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft Azure
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsight
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
 
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwicklerdotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing Workshop
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
 
Big Data & NoSQL
Big Data & NoSQLBig Data & NoSQL
Big Data & NoSQL
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Services für Windows Azure

  • 1. Apache Hadoop-based Services für Windows Azure Sascha Dittmann Software Developer / Solution Architect Twitter: @SaschaDittmann Blog: http://www.sascha-dittmann.de
  • 2. Apache Hadoop & Co Zookeeper Pig
  • 3. Hadoop Distributed File System Cluster Startvorgang
  • 4. Hadoop Distributed File System Ausfall des Namenodes (Failover)
  • 5. Hadoop Distributed File System Benuteranfrage ① ② ② ②
  • 6. Hadoop Distributed File System  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn drwxr-xr-x - Sascha supergroup 0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6 drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 7. Map/Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[22,33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  • 8. Map/Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 1949,0 1952,-11 Combine Combine Combine 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  • 9. RDBMS vs. Map/Reduce RDBMS Map/Reduce Datenmenge Gigabytes Petabytes Zugriff Interaktiv und Batch Batch Lese- / Schreibzugriffe Viele Lese- und Einmaliges Schreiben Schreibzugriffe Viele Lesezugriffe Datenstruktur Statisches Schema Dynamisches Schema Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • 10. Apache Hadoop & Co Zookeeper Pig
  • 11. Demos  Hadoop Dashboard  Interactive Console  Remote Desktop  Nutzung des WA Storage  Map/Reduce via JavaScript  C# Streaming  Power Pivot
  • 12. Cloud Bloggers Die Blogs der deutschen Cloud Computing-Community Link: http://cloudbloggers.de