SlideShare uma empresa Scribd logo
1 de 17
Session Objectives And Takeaways
 Understanding HDInsight cluster types in Azure
 HBase as a Hadoop storage option in Hadoop
 Understanding data processing options in Hadoop ecosystem
using Storm and Spark.
• HDInsight is the Microsoft implementation of Hadoop ecosystem components in the
cloud.
• Azure HDInsight deploys and provisions Apache Hadoop clusters in the cloud, providing a
software framework designed to manage, analyze, and report on big data with high
reliability and availability.
• HDInsight is available on Windows and Linux
• HDInsight on Linux: A Hadoop cluster on Ubuntu
• HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2
What is HDInsight
• HDInsight provides cluster Types & Configurations for:
• Hadoop (HDFS)
• HBase
• Storm
• Spark (Preview)
• Skip maintaining and purchasing hardware
• HDInsight has powerful programming extensions for languages including C#, Java,
and .NET. Use your programming language of choice on Hadoop to create, configure,
submit, and monitor Hadoop jobs.
HDInsight clusters on Azure
• Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled
after Google BigTable.
• HBase provides random access and strong consistency for large amounts of unstructured
and semistructured data in a schemaless database organized by column families
• Data is stored in the rows of a table, and data within a row is grouped by column family.
• The open-source code scales linearly to handle petabytes of data on thousands of nodes.
It can rely on data redundancy, batch processing, and other features that are provided by
distributed applications in the Hadoop ecosystem.
What is HBase
Order No Customer Name Customer Phone Company Name Company
Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
Customer Company
Order No Customer Name Customer Phone Company Name Company Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
• HBase Commands:
• create  Equivalent to Create table in T-SQL
• get  Equivalent to Select statements in T-SQL
• put  Equivalent to Update, Insert statement in T-SQL
• scan  Equivalent to Select (no where condition) in T-SQL
• HBase shell is your query tool to execute in CRUD commands to a HBase cluster.
• Data can also be managed using the HBase C# API, which provides a client library on top
of the HBase REST API.
• An HBase database can also be queried by using Hive.
What is HBase
• Apache Hive is a data warehouse system for Hadoop, which enables data summarization,
querying, and analysis of data by using HiveQL (a query language similar to SQL).
• Hive understands how to work with structured and semi-structured data, such as text files
where the fields are delimited by specific characters.
• Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly
structured data.
• Hive can also be extended through user-defined functions (UDF).
• A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL.
What is Hive
• Apache Storm is a distributed, fault-tolerant, open-source computation system that allows
you to process data in real-time with Hadoop.
• Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions
in the Azure environment by using Apache Hadoop.
• Storm solutions can also provide guaranteed processing of data, with the ability to replay
data that was not successfully processed the first time.
• Ability to write Storm components in C#, JAVA and Python.
• Azure Scale up or Scale down without an impact for running Storm topologies.
• Ease of provision and use in Azure
• Visual Studio project templates for Storm apps
What is Apache Storm
• Apache Storm apps are submitted as Topologies.
• A topology is a graph of computation that processes streams
• Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and
they are consumed by bolts.
• Tuple: A named list of dynamically typed values.
• Spout: Consumes data from a data source and emits one or more streams.
• Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are
also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a
blob, or other data store.
• Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures.
Apache Storm Components
• Apache Spark™ is a fast and general engine for large-scale data processing.
• Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on
disk.
• Write applications quickly in Java, Scala, Python, R.
• Combine SQL, streaming, and complex analytics.
• Spark's in-memory computation capabilities
make it a good choice for iterative algorithms in
ML and graph computations.
• Spark is also compatible with Azure Blob storage (WASB) so your existing data stored in
Azure can easily be processed via Spark.
• Easy to provision on Azure as HDInsight Spark cluster.
What is Apache Spark
Session Objectives And Takeaways
 Understanding HDInsight cluster types in Azure
 HBase as a Hadoop storage option in Hadoop
 Understanding data processing options in Hadoop ecosystem
using Storm and Spark.
www.MostafaElzoghbi.com

Mais conteúdo relacionado

Mais procurados

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DiveTravis Wright
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeJosh Lane
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowPyData
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewJustin Munsters
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...DataWorks Summit
 
Tomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLTomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLThe Hive
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsDataWorks Summit
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi namboori
 

Mais procurados (19)

Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep Dive
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
What database
What databaseWhat database
What database
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
 
Microsoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse OverviewMicrosoft Azure Data Warehouse Overview
Microsoft Azure Data Warehouse Overview
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
 
Tomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQLTomer Shiran, MapR_Hadoop&SQL
Tomer Shiran, MapR_Hadoop&SQL
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introduction
 

Destaque

Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azureMostafa
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - publicMostafa
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startupsMostafa
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump StartMostafa
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure mlMostafa
 
Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insMostafa
 
Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using AzureMostafa
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsMostafa
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Mike Martin
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14SwitchPitch
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insightsNicolas Takashi
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureWillem Meints
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoophadooparchbook
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure FunctionsJim O'Neil
 

Destaque (20)

Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azure
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - public
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startups
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump Start
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure ml
 
Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-ins
 
Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using Azure
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook Connectors
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Azure IOT
Azure IOTAzure IOT
Azure IOT
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 
Software scope
Software scopeSoftware scope
Software scope
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 

Semelhante a Big data solutions in azure

Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Dataconomy Media
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsightEng Teong Cheah
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azureMostafa
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 

Semelhante a Big data solutions in azure (20)

Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Hive
HiveHive
Hive
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 

Mais de Mostafa

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicMostafa
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure MLMostafa
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningMostafa
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloudMostafa
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI EmbeddedMostafa
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BIMostafa
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceMostafa
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge readyMostafa
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaMostafa
 
Identity and o365 on Azure
Identity and o365 on AzureIdentity and o365 on Azure
Identity and o365 on AzureMostafa
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platformMostafa
 
Building IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & AzureBuilding IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & AzureMostafa
 

Mais de Mostafa (16)

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud public
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure ML
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloud
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI Embedded
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BI
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud service
 
eRecall
eRecalleRecall
eRecall
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge ready
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache Cordova
 
Identity and o365 on Azure
Identity and o365 on AzureIdentity and o365 on Azure
Identity and o365 on Azure
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
 
Building IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & AzureBuilding IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & Azure
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Big data solutions in azure

  • 1.
  • 2. Session Objectives And Takeaways  Understanding HDInsight cluster types in Azure  HBase as a Hadoop storage option in Hadoop  Understanding data processing options in Hadoop ecosystem using Storm and Spark.
  • 3. • HDInsight is the Microsoft implementation of Hadoop ecosystem components in the cloud. • Azure HDInsight deploys and provisions Apache Hadoop clusters in the cloud, providing a software framework designed to manage, analyze, and report on big data with high reliability and availability. • HDInsight is available on Windows and Linux • HDInsight on Linux: A Hadoop cluster on Ubuntu • HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2 What is HDInsight
  • 4. • HDInsight provides cluster Types & Configurations for: • Hadoop (HDFS) • HBase • Storm • Spark (Preview) • Skip maintaining and purchasing hardware • HDInsight has powerful programming extensions for languages including C#, Java, and .NET. Use your programming language of choice on Hadoop to create, configure, submit, and monitor Hadoop jobs. HDInsight clusters on Azure
  • 5.
  • 6. • Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. • HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families • Data is stored in the rows of a table, and data within a row is grouped by column family. • The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem. What is HBase
  • 7. Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA Customer Company Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
  • 8. • HBase Commands: • create  Equivalent to Create table in T-SQL • get  Equivalent to Select statements in T-SQL • put  Equivalent to Update, Insert statement in T-SQL • scan  Equivalent to Select (no where condition) in T-SQL • HBase shell is your query tool to execute in CRUD commands to a HBase cluster. • Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. • An HBase database can also be queried by using Hive. What is HBase
  • 9. • Apache Hive is a data warehouse system for Hadoop, which enables data summarization, querying, and analysis of data by using HiveQL (a query language similar to SQL). • Hive understands how to work with structured and semi-structured data, such as text files where the fields are delimited by specific characters. • Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data. • Hive can also be extended through user-defined functions (UDF). • A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL. What is Hive
  • 10.
  • 11. • Apache Storm is a distributed, fault-tolerant, open-source computation system that allows you to process data in real-time with Hadoop. • Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions in the Azure environment by using Apache Hadoop. • Storm solutions can also provide guaranteed processing of data, with the ability to replay data that was not successfully processed the first time. • Ability to write Storm components in C#, JAVA and Python. • Azure Scale up or Scale down without an impact for running Storm topologies. • Ease of provision and use in Azure • Visual Studio project templates for Storm apps What is Apache Storm
  • 12. • Apache Storm apps are submitted as Topologies. • A topology is a graph of computation that processes streams • Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and they are consumed by bolts. • Tuple: A named list of dynamically typed values. • Spout: Consumes data from a data source and emits one or more streams. • Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a blob, or other data store. • Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures. Apache Storm Components
  • 13.
  • 14. • Apache Spark™ is a fast and general engine for large-scale data processing. • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. • Write applications quickly in Java, Scala, Python, R. • Combine SQL, streaming, and complex analytics. • Spark's in-memory computation capabilities make it a good choice for iterative algorithms in ML and graph computations. • Spark is also compatible with Azure Blob storage (WASB) so your existing data stored in Azure can easily be processed via Spark. • Easy to provision on Azure as HDInsight Spark cluster. What is Apache Spark
  • 15.
  • 16. Session Objectives And Takeaways  Understanding HDInsight cluster types in Azure  HBase as a Hadoop storage option in Hadoop  Understanding data processing options in Hadoop ecosystem using Storm and Spark.

Notas do Editor

  1. The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.
  2. The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.
  3. Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/
  4. Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-overview/
  5. Ref: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-overview/
  6. Apache Storm in HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-overview/
  7. Apache Storm in HDInsight https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-overview/
  8. Demo: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-develop-csharp-visual-studio-topology/
  9. Ref: http://spark.apache.org/
  10. Demo: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-ipython-notebook-machine-learning/ Apache Spark notepads https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/
  11. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-overview/
  12. HD Insight main documentation: https://azure.microsoft.com/en-us/documentation/services/hdinsight/