SlideShare a Scribd company logo
1 of 53
Building Big Data Solutions in
the Microsoft Platform
Jesus Rodriguez
Tellago, Inc, Tellago Studios
Big Data?
About Me…
•   Hackerpreneur
•   Co-Founder Tellago, Tellago Studios, Inc.
•   Microsoft Architect Advisor
•   Microsoft MVP
•   Oracle ACE
•   Speaker, Author
•   http://weblogs.asp.net/gsusx
•   http://jrodthoughts.com
•   http://moesion.com
Agenda
• Big Data Overview
• MS HDInsight
   –   Map Reduce
   –   HDFS
   –   Hive
   –   Pig
   –   Sqoop
• HDInsight Service
• The Hadoop Ecosystem
• The Future….
Big Data?
•   A bunch of data?
•   An industry?
•   An expertise?
•   A trend?
•   A cliché?
A Clue?
• 2008: Google processes 20 PB a day
• 2009: Facebook has 2.5 PB user
  data + 15 TB/day
• 2009: eBay has 6.5 PB user data +
  50 TB/day
• 2011: Yahoo! has 180-200 PB of data
• 2012: Facebook ingests 500 TB/day
We Love Data!
But...
Processing Large Amounts of
   Data is Complicated....
Sucessful Big Data = Scalable
 Computing + Large Storage
A Trivial Model
Not So Fast....
Parallel Data Computing is
              Complicated
So Is Large Data Storage
Enter the World of Hadoop...
Hadoop Design Principles
•   System Shall Manage and Heal Itself
•   Performance Shall Scale Linearly
•   Compute Shall Move to Data
•   Simple Core, Modular and Extensible
Hadoop History
•   2002-2004: Doug Cutting and Mike Cafarella started working on Nutch
•   2003-2004: Google publishes GFS and MapReduce papers
•   2004: Cutting adds DFS & MapReduce support to Nutch
•   2006: Yahoo! hires Cutting, Hadoop spins out of Nutch
•   2007: NY Times converts 4TB of archives over 100 EC2s
•   2008: Web-scale deployments at Y!, Facebook, Last.fm
•   April 2008: Yahoo does fastest sort of a TB, 3.5mins over 910 nodes
•   May 2009:
     – Yahoo does fastest sort of a TB, 62secs over 1460 nodes
     – Yahoo sorts a PB in 16.25hours over 3658 nodes
•   June 2009, Oct 2009: Hadoop Summit, Hadoop World
•   September 2009: Doug Cutting joins Cloudera
Hadoop Ecosystem
                            ETL Tools        BI Reporting      RDBMS
Zookeepr (Coordination)




                          Pig (Data Flow)    Hive (SQL)         Sqoop




                                                                             Avro (Serialization)
                          MapReduce (Job Scheduling/Execution System)

                          HBase (key-value store)   (Streaming/Pipes APIs)


                                              HDFS
                                 (Hadoop Distributed File System)
Microsoft & Hadoop
HDInsight
HDFS
HDFS Is…
• A distributed file system
• Redundant storage
• Designed to reliably store data using
  commodity hardware
• Designed to expect hardware failures
• Intended for large files
• Designed for batch inserts
• The Hadoop Distributed File System
HDFS at a Glance
  Block Size = 64MB
 Replication Factor = 3




Cost/GB is a few ¢/month
      vs $/month
HDInsight
HDFS
Demo
Map Reduce
Map Reduce Is…
• A programming model for expressing
  distributed computations at a massive
  scale
• An execution framework for organizing
  and performing such computations
• An open-source implementation called
  Hadoop
Map Reduce At a Glance
HDInsight
Map Reduce Demo
Hive
Hive Is…
• A system for managing and querying structured data
  built on top of Hadoop
   – Map-Reduce for execution
   – HDFS for storage
   – Metadata on raw files

• Key Building Principles:
   – SQL as a familiar data warehousing tool
   – Extensibility – Types, Functions, Formats, Scripts
   – Scalability and Performance
Hive Architecture
HDInsight
Hacking with Hive
Pig
Pig Is…
Apache Pig is a platform for analyzing large data sets that consists of a
  high-level language (PigLatin) for expressing data analysis programs,
  coupled with infrastructure for evaluating these programs.

•   Ease of programming

•   Optimization opportunities

•   Extensibility

•   Built upon Hadoop
Pig Architecture
  Grunt (Interactive shell)                       PigServer (Java API)

                                 Parser (PigLatinLogicalPlan)


                              Optimizer (LogicalPlan  LogicalPlan)
Pig Context
                Compiler (LogicalPlan  PhysiclaPlan  MapReducePlan)

                                        ExecutionEngine

                                  Hadoop
HDInsight
Rocking Data Processing
        with Pig
Sqoop
Sqoop Is…
• Easy import of data from many
  databases to HDFS
• Generates code for use in MapReduce
  applications
• Integrates with Hive
Sqoop Architecture
HDInsight
Bulk Data Loading Using
Sqoop
HDInsight Service
HDInsight Service Architecture
HDInsight
HDInsight Service
   Overview
Hadoop Considerations
Super Crowded Ecosystem
The Hadoop Ecosystem
Hadoop is not a silver bullet...
Some Challenges
• Hadoop doesn’t power big data applications
   –     Not a transactional datastore. Slosh back and forth via
       ETL
• Processing latency
   –      Non-incremental, must re-slurp entire dataset every
       pass
• Ad-Hoc queries
   –    Bare metal interface, data import
• Graphs
   –    Only a handful of graph problems amenable to MR
Beyond Hadoop
• Percolator(incremental processing)
http://research.google.com/pubs/pub36726.html
• Dremel(ad-hoc analysis queries)
http://research.google.com/pubs/pub36632.html
• Pregel (Big graphs)
http://dl.acm.org/citation.cfm?id=1807184
In the Meantime...
Takeaways
• Hadoop provides the foundation of big
  data solutions
• Computing and storage are the
  fundamental components of Hadoop
• HDInsight Server and Service are
  Microsoft’s distributions of Hadoop
• HDInsight is just one component of
  Microsoft’s BI strategy
Thanks
 jesus.rodriguez@tellago.com
 http://www.tellagostudios.com
     http://jrodthoughts.com
http://twitter.com/#!/jrodthoughts
  http://weblogs.asp.net/gsusx

More Related Content

What's hot

Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...
Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...
Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...Lucas Jellema
 
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...Vishal Pawar
 
Azure enterprise integration platform
Azure enterprise integration platformAzure enterprise integration platform
Azure enterprise integration platformMichael Stephenson
 
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...Vishal Pawar
 
AMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JET
AMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JETAMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JET
AMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JETLucas Jellema
 
SqlSat Victoria governance for PowerBI
SqlSat Victoria governance for PowerBISqlSat Victoria governance for PowerBI
SqlSat Victoria governance for PowerBIBerkovich Consulting
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Zeeshan Ikram
 
Power BI February update with Recent Cool features
Power BI February update with Recent Cool features Power BI February update with Recent Cool features
Power BI February update with Recent Cool features Vishal Pawar
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerSPC Adriatics
 
Governance for power bi Toronto SPS Saturday
Governance for power bi Toronto SPS Saturday Governance for power bi Toronto SPS Saturday
Governance for power bi Toronto SPS Saturday Berkovich Consulting
 
General Presentation - DIAD and AIAD, Dashboard and Apps
General Presentation - DIAD and AIAD, Dashboard and AppsGeneral Presentation - DIAD and AIAD, Dashboard and Apps
General Presentation - DIAD and AIAD, Dashboard and AppsVishal Pawar
 
Formulating Power BI Enterprise Strategy
Formulating Power BI Enterprise StrategyFormulating Power BI Enterprise Strategy
Formulating Power BI Enterprise StrategyTeo Lachev
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning Jesus Rodriguez
 
Embed Interactive Reports in Your Apps
Embed Interactive Reports in Your AppsEmbed Interactive Reports in Your Apps
Embed Interactive Reports in Your AppsTeo Lachev
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRMCatherine Eibner
 
Types of connections in Power BI
Types of connections in Power BITypes of connections in Power BI
Types of connections in Power BISwapnil Jadhav
 
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...WikibonCommunity
 

What's hot (20)

Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...
Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...
Planning your move to the cloud: SaaS Enablement and User Experience (Oracle ...
 
Power BI for CEO
Power BI for CEOPower BI for CEO
Power BI for CEO
 
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
 
Azure enterprise integration platform
Azure enterprise integration platformAzure enterprise integration platform
Azure enterprise integration platform
 
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
 
AMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JET
AMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JETAMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JET
AMIS and Oracle JET - Oracle OpenWorld 2017 Panel on JET
 
Mbas governance for power bi
Mbas governance for power biMbas governance for power bi
Mbas governance for power bi
 
SqlSat Victoria governance for PowerBI
SqlSat Victoria governance for PowerBISqlSat Victoria governance for PowerBI
SqlSat Victoria governance for PowerBI
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01
 
Power BI February update with Recent Cool features
Power BI February update with Recent Cool features Power BI February update with Recent Cool features
Power BI February update with Recent Cool features
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint Server
 
Governance for power bi Toronto SPS Saturday
Governance for power bi Toronto SPS Saturday Governance for power bi Toronto SPS Saturday
Governance for power bi Toronto SPS Saturday
 
General Presentation - DIAD and AIAD, Dashboard and Apps
General Presentation - DIAD and AIAD, Dashboard and AppsGeneral Presentation - DIAD and AIAD, Dashboard and Apps
General Presentation - DIAD and AIAD, Dashboard and Apps
 
Formulating Power BI Enterprise Strategy
Formulating Power BI Enterprise StrategyFormulating Power BI Enterprise Strategy
Formulating Power BI Enterprise Strategy
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
Embed Interactive Reports in Your Apps
Embed Interactive Reports in Your AppsEmbed Interactive Reports in Your Apps
Embed Interactive Reports in Your Apps
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRM
 
Types of connections in Power BI
Types of connections in Power BITypes of connections in Power BI
Types of connections in Power BI
 
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
 

Similar to Big Data in the Microsoft Platform

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 

Similar to Big Data in the Microsoft Platform (20)

Hadoop
HadoopHadoop
Hadoop
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 

More from Jesus Rodriguez

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesJesus Rodriguez
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketJesus Rodriguez
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersJesus Rodriguez
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Jesus Rodriguez
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesJesus Rodriguez
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFiJesus Rodriguez
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Jesus Rodriguez
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi AnalyticsJesus Rodriguez
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesJesus Rodriguez
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revJesus Rodriguez
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsJesus Rodriguez
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesJesus Rodriguez
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesJesus Rodriguez
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningJesus Rodriguez
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceJesus Rodriguez
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven revJesus Rodriguez
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldJesus Rodriguez
 

More from Jesus Rodriguez (20)

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
 
MEV Deep Dive .pptx
MEV Deep Dive .pptxMEV Deep Dive .pptx
MEV Deep Dive .pptx
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real World
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Big Data in the Microsoft Platform

  • 1.
  • 2. Building Big Data Solutions in the Microsoft Platform Jesus Rodriguez Tellago, Inc, Tellago Studios
  • 4. About Me… • Hackerpreneur • Co-Founder Tellago, Tellago Studios, Inc. • Microsoft Architect Advisor • Microsoft MVP • Oracle ACE • Speaker, Author • http://weblogs.asp.net/gsusx • http://jrodthoughts.com • http://moesion.com
  • 5. Agenda • Big Data Overview • MS HDInsight – Map Reduce – HDFS – Hive – Pig – Sqoop • HDInsight Service • The Hadoop Ecosystem • The Future….
  • 6. Big Data? • A bunch of data? • An industry? • An expertise? • A trend? • A cliché?
  • 7. A Clue? • 2008: Google processes 20 PB a day • 2009: Facebook has 2.5 PB user data + 15 TB/day • 2009: eBay has 6.5 PB user data + 50 TB/day • 2011: Yahoo! has 180-200 PB of data • 2012: Facebook ingests 500 TB/day
  • 10. Processing Large Amounts of Data is Complicated....
  • 11. Sucessful Big Data = Scalable Computing + Large Storage
  • 14. Parallel Data Computing is Complicated
  • 15. So Is Large Data Storage
  • 16. Enter the World of Hadoop...
  • 17. Hadoop Design Principles • System Shall Manage and Heal Itself • Performance Shall Scale Linearly • Compute Shall Move to Data • Simple Core, Modular and Extensible
  • 18. Hadoop History • 2002-2004: Doug Cutting and Mike Cafarella started working on Nutch • 2003-2004: Google publishes GFS and MapReduce papers • 2004: Cutting adds DFS & MapReduce support to Nutch • 2006: Yahoo! hires Cutting, Hadoop spins out of Nutch • 2007: NY Times converts 4TB of archives over 100 EC2s • 2008: Web-scale deployments at Y!, Facebook, Last.fm • April 2008: Yahoo does fastest sort of a TB, 3.5mins over 910 nodes • May 2009: – Yahoo does fastest sort of a TB, 62secs over 1460 nodes – Yahoo sorts a PB in 16.25hours over 3658 nodes • June 2009, Oct 2009: Hadoop Summit, Hadoop World • September 2009: Doug Cutting joins Cloudera
  • 19. Hadoop Ecosystem ETL Tools BI Reporting RDBMS Zookeepr (Coordination) Pig (Data Flow) Hive (SQL) Sqoop Avro (Serialization) MapReduce (Job Scheduling/Execution System) HBase (key-value store) (Streaming/Pipes APIs) HDFS (Hadoop Distributed File System)
  • 22. HDFS
  • 23. HDFS Is… • A distributed file system • Redundant storage • Designed to reliably store data using commodity hardware • Designed to expect hardware failures • Intended for large files • Designed for batch inserts • The Hadoop Distributed File System
  • 24. HDFS at a Glance Block Size = 64MB Replication Factor = 3 Cost/GB is a few ¢/month vs $/month
  • 27. Map Reduce Is… • A programming model for expressing distributed computations at a massive scale • An execution framework for organizing and performing such computations • An open-source implementation called Hadoop
  • 28. Map Reduce At a Glance
  • 30. Hive
  • 31. Hive Is… • A system for managing and querying structured data built on top of Hadoop – Map-Reduce for execution – HDFS for storage – Metadata on raw files • Key Building Principles: – SQL as a familiar data warehousing tool – Extensibility – Types, Functions, Formats, Scripts – Scalability and Performance
  • 34. Pig
  • 35. Pig Is… Apache Pig is a platform for analyzing large data sets that consists of a high-level language (PigLatin) for expressing data analysis programs, coupled with infrastructure for evaluating these programs. • Ease of programming • Optimization opportunities • Extensibility • Built upon Hadoop
  • 36. Pig Architecture Grunt (Interactive shell) PigServer (Java API) Parser (PigLatinLogicalPlan) Optimizer (LogicalPlan  LogicalPlan) Pig Context Compiler (LogicalPlan  PhysiclaPlan  MapReducePlan) ExecutionEngine Hadoop
  • 38. Sqoop
  • 39. Sqoop Is… • Easy import of data from many databases to HDFS • Generates code for use in MapReduce applications • Integrates with Hive
  • 48. Hadoop is not a silver bullet...
  • 49. Some Challenges • Hadoop doesn’t power big data applications – Not a transactional datastore. Slosh back and forth via ETL • Processing latency – Non-incremental, must re-slurp entire dataset every pass • Ad-Hoc queries – Bare metal interface, data import • Graphs – Only a handful of graph problems amenable to MR
  • 50. Beyond Hadoop • Percolator(incremental processing) http://research.google.com/pubs/pub36726.html • Dremel(ad-hoc analysis queries) http://research.google.com/pubs/pub36632.html • Pregel (Big graphs) http://dl.acm.org/citation.cfm?id=1807184
  • 52. Takeaways • Hadoop provides the foundation of big data solutions • Computing and storage are the fundamental components of Hadoop • HDInsight Server and Service are Microsoft’s distributions of Hadoop • HDInsight is just one component of Microsoft’s BI strategy
  • 53. Thanks jesus.rodriguez@tellago.com http://www.tellagostudios.com http://jrodthoughts.com http://twitter.com/#!/jrodthoughts http://weblogs.asp.net/gsusx