SlideShare uma empresa Scribd logo
1 de 17
Data Tuesday – 10 janvier 2012
Pierre Lagarde (DPE) – pierlag@microsoft.com
Benjamin Guinebertière (DPE) – www.benjguin.com
Microsoft Distribution of Hadoop [MDH]

• Code name : Isotope
• Leveraging the Hadoop data-driven
  community
  –   OnPremise – Cloud
  –   Windows Server integration [AD – Secure HDFS]
  –   Connection with SQL Server / Excel
  –   Developer Framework [JavaScript, .NET, F#, …]
  –   Hadoop as a Service through Azure [eMDH]
Structural Overview

                                                        ISOTOPE
                                                  [Azure and Enterprise]


 Java - JavaScript       Streaming OM   HiveQL                   PigLatin               .NET/C#/F#          (T)SQL



                     NOSQL                           OCEAN OF DATA                                   ETL
                                        [unstructured, semi-structured, structured]



                                                    HDFS


                        A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS




EIS / ERP                    RDBMS                  File System                       OData [RSS]          Azure Storage
Création d’un cluster à la demande
Map/Reduce - Java
Map/Reduce – C#
Map/Reduce - JavaScript
Démo - JavaScript

                      distcp     HDFS




                                             Sort/filter
                                                                                JavaScript M/R
                                                                       from("books")
Azure Storage                                                            .mapReduce("file.js", "word, count:long")
                                                                         .orderBy("count DESC")
                                                                         .take(10)
                                                                         .to("top10")



                                                           HDFS File



                                                                           Graph.bar(data)




                  Excel ODBC     Reporting
                Hive Connector   SQLServer
•   from("books")
    .mapReduce("bin/WordCountLong.js", "word, count:long")
    .orderBy("count DESC")
    .take(10)
    .to("demo-top10")
• #get top10
2012-01-10-data tuesday
2012-01-10-data tuesday
2012-01-10-data tuesday
2012-01-10-data tuesday

Mais conteúdo relacionado

Mais procurados

Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
ragho
 
Empowering Semantic Zooming with Hadoop and HBase
Empowering Semantic Zooming with Hadoop and HBaseEmpowering Semantic Zooming with Hadoop and HBase
Empowering Semantic Zooming with Hadoop and HBase
DataWorks Summit
 
Making Big Data, small
Making Big Data, smallMaking Big Data, small
Making Big Data, small
MarcinJedyk
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
royans
 

Mais procurados (19)

Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hive
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Workspace Management
Workspace ManagementWorkspace Management
Workspace Management
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Empowering Semantic Zooming with Hadoop and HBase
Empowering Semantic Zooming with Hadoop and HBaseEmpowering Semantic Zooming with Hadoop and HBase
Empowering Semantic Zooming with Hadoop and HBase
 
Big data solution capacity planning
Big data solution capacity planningBig data solution capacity planning
Big data solution capacity planning
 
Hadoop
HadoopHadoop
Hadoop
 
Yes, Sql!
Yes, Sql!Yes, Sql!
Yes, Sql!
 
Making Big Data, small
Making Big Data, smallMaking Big Data, small
Making Big Data, small
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"
 

Semelhante a 2012-01-10-data tuesday

NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 

Semelhante a 2012-01-10-data tuesday (20)

NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
Hands-On Apache Spark
Hands-On Apache SparkHands-On Apache Spark
Hands-On Apache Spark
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
NoSQL - "simple" web monitoring
NoSQL - "simple" web monitoringNoSQL - "simple" web monitoring
NoSQL - "simple" web monitoring
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 

Mais de benjguin

Le bulletin Azure épisode 8
Le bulletin Azure épisode 8Le bulletin Azure épisode 8
Le bulletin Azure épisode 8
benjguin
 
2012 06-15-la fermeduweb-microsoft
2012 06-15-la fermeduweb-microsoft2012 06-15-la fermeduweb-microsoft
2012 06-15-la fermeduweb-microsoft
benjguin
 
Le Bulletin Azure Mobiles Republic
Le Bulletin Azure Mobiles RepublicLe Bulletin Azure Mobiles Republic
Le Bulletin Azure Mobiles Republic
benjguin
 

Mais de benjguin (20)

Le Bulletin Azure, épisode 15
Le Bulletin Azure, épisode 15Le Bulletin Azure, épisode 15
Le Bulletin Azure, épisode 15
 
Le Bulletin Azure, témoignage, Capsule Technologie
Le Bulletin Azure, témoignage, Capsule TechnologieLe Bulletin Azure, témoignage, Capsule Technologie
Le Bulletin Azure, témoignage, Capsule Technologie
 
Le bulletin Azure épisode 14: MetricsHub et HDInsight
Le bulletin Azure épisode 14: MetricsHub et HDInsightLe bulletin Azure épisode 14: MetricsHub et HDInsight
Le bulletin Azure épisode 14: MetricsHub et HDInsight
 
Le Bulletin Azure épisode 13
Le Bulletin Azure épisode 13Le Bulletin Azure épisode 13
Le Bulletin Azure épisode 13
 
Le Bulletin Azure, épisode 10
Le Bulletin Azure, épisode 10Le Bulletin Azure, épisode 10
Le Bulletin Azure, épisode 10
 
Le bulletin Azure épisode 8
Le bulletin Azure épisode 8Le bulletin Azure épisode 8
Le bulletin Azure épisode 8
 
2012 06-15-la fermeduweb-microsoft
2012 06-15-la fermeduweb-microsoft2012 06-15-la fermeduweb-microsoft
2012 06-15-la fermeduweb-microsoft
 
Le Bulletin Azure Mobiles Republic
Le Bulletin Azure Mobiles RepublicLe Bulletin Azure Mobiles Republic
Le Bulletin Azure Mobiles Republic
 
Le Bulletin Azure - Témoignage avec ZeCloud
Le Bulletin Azure - Témoignage avec ZeCloudLe Bulletin Azure - Témoignage avec ZeCloud
Le Bulletin Azure - Témoignage avec ZeCloud
 
Le Bulletin Azure - Témoignage avec Ysance et SpecialChem
Le Bulletin Azure - Témoignage avec Ysance et SpecialChemLe Bulletin Azure - Témoignage avec Ysance et SpecialChem
Le Bulletin Azure - Témoignage avec Ysance et SpecialChem
 
JavaScript aussi sur le serveur et jusque dans le cloud?
JavaScript aussi sur le serveur et jusque dans le cloud?JavaScript aussi sur le serveur et jusque dans le cloud?
JavaScript aussi sur le serveur et jusque dans le cloud?
 
Le Bulletin Azure Episode 5
Le Bulletin Azure Episode 5Le Bulletin Azure Episode 5
Le Bulletin Azure Episode 5
 
Le Bulletin Azure, témoignage de V-Trafic et Inifinite Square
Le Bulletin Azure, témoignage de V-Trafic et Inifinite SquareLe Bulletin Azure, témoignage de V-Trafic et Inifinite Square
Le Bulletin Azure, témoignage de V-Trafic et Inifinite Square
 
Le bulletin Azure épisode 3
Le bulletin Azure épisode 3Le bulletin Azure épisode 3
Le bulletin Azure épisode 3
 
Créez votre application sur Windows Azure avec visual studio 2010
Créez votre application sur Windows Azure avec visual studio 2010Créez votre application sur Windows Azure avec visual studio 2010
Créez votre application sur Windows Azure avec visual studio 2010
 
AppFabric : le middleware disponible aussi en nuage
AppFabric : le middleware disponible aussi en nuageAppFabric : le middleware disponible aussi en nuage
AppFabric : le middleware disponible aussi en nuage
 
Développer un site Web fonctionnel et élastique sur Azure
Développer un site Web fonctionnel et élastique sur AzureDévelopper un site Web fonctionnel et élastique sur Azure
Développer un site Web fonctionnel et élastique sur Azure
 
Le bulletin Azure épisode 2
Le bulletin Azure épisode 2Le bulletin Azure épisode 2
Le bulletin Azure épisode 2
 
Le Bulletin Azure épisode 1
Le Bulletin Azure épisode 1Le Bulletin Azure épisode 1
Le Bulletin Azure épisode 1
 
ArchiTech archims web
ArchiTech archims webArchiTech archims web
ArchiTech archims web
 

Último

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

2012-01-10-data tuesday

  • 1. Data Tuesday – 10 janvier 2012 Pierre Lagarde (DPE) – pierlag@microsoft.com Benjamin Guinebertière (DPE) – www.benjguin.com
  • 2. Microsoft Distribution of Hadoop [MDH] • Code name : Isotope • Leveraging the Hadoop data-driven community – OnPremise – Cloud – Windows Server integration [AD – Secure HDFS] – Connection with SQL Server / Excel – Developer Framework [JavaScript, .NET, F#, …] – Hadoop as a Service through Azure [eMDH]
  • 3. Structural Overview ISOTOPE [Azure and Enterprise] Java - JavaScript Streaming OM HiveQL PigLatin .NET/C#/F# (T)SQL NOSQL OCEAN OF DATA ETL [unstructured, semi-structured, structured] HDFS A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS EIS / ERP RDBMS File System OData [RSS] Azure Storage
  • 4. Création d’un cluster à la demande
  • 8. Démo - JavaScript distcp HDFS Sort/filter JavaScript M/R from("books") Azure Storage .mapReduce("file.js", "word, count:long") .orderBy("count DESC") .take(10) .to("top10") HDFS File Graph.bar(data) Excel ODBC Reporting Hive Connector SQLServer
  • 9.
  • 10.
  • 11. from("books") .mapReduce("bin/WordCountLong.js", "word, count:long") .orderBy("count DESC") .take(10) .to("demo-top10")
  • 12.

Notas do Editor

  1. Isotope is the all-up effort around Microsoft and Hadoop. It includes several components:A full distribution of Apache Hadoop that runs on standard windows hardware.A full version of Apache Hadoop that runs on the Azure cloudConnectors from Hadoop (any Hadoop, not just Microsoft’s) to Microsoft’s key products – SQL, Excel, PDW, etc.Jscript shell for live scripting of Hadoop from the browserAdmin, monitoring, and authoring tools to make Microsoft Hadoop best-in-class