SlideShare uma empresa Scribd logo
1 de 47
Visualising Big Data
Big Data Visualisation with Hadoop, Hive and
Excel 2013
Sponsors
Explore Everything PASS Has to Offer
Free SQL Server and BI Web Events

Free 1-day Training Events

Regional Event

This is Community

Business Analytics Training

Local User Groups Around the World

Session Recordings

PASS Newsletter

Free Online Technical Training

3
About me
 Director-At-Large (Elect) PASS Board from
Jan 2014
 SQL Server MVP
 Blogger, data strategist, public speaker,
technologist
 Joint owner of Copper Blue Consulting Ltd

4 |
Agenda





5 |

Overview of Big Data Technologies
Data Visualisation with Office365 and PowerBI
Hive
Visualising Big Data with Microsoft
Big Data.
HDInsight Ecosystem
ODBC

Distributed Processing
(Map Reduce)

Distributed Storage
(HDFS)
(Azure Data
Marketplace)

Windows Azure
Storage
What is Hadoop?

“Flexible and Available
Architecture for Large Scale
computation and data
processing on a network of
highly available commodity
hardware.”
Hadoop’s Lineage

* Resource: Kerberos Konference (Yahoo) – 2010
Data Visualisation Background
We have the tools. All we’ve
got to
do is imagine what could be.
We can reinvent the present;
we can transform the world
around us.
Jason Silva

10
Almost 50% of your
brain is dedicated to
visual processing.
David van Essen

Researchers found that colour
visuals increase the willingness to

read by

11

80%

About 70% of your
sensory receptors are in
your eyes.
Why is Data Visualisation Important?

 It’s clearly a
budget. It has a
lot of numbers in
it. George W Bush

I could never figure out
where the decimal
point went. (Lord
Randolph Churchill)
The Unknown Unknowns
 That is to say, there are things that we
know we don't know. But there are also
unknown unknowns. There are things
we don't know we don't know. (Donald
Rumsfeld)
What is the purpose of Hive?
Hive is a solution to a business problem:
How do you analyse large amounts of data?

Data Scientists want to study data
Communicate with the data

Businesses want to reap benefits of data
Results that make sense of the data

16
17
What is the purpose of Hive?
Hive is a data warehousing system for Hadoop
To meet the needs of businesses, data scientists, analysts and BI
professionals

Data, Summarized
Fit a structure onto data

Data, Analyzed
Analysis of Large Datasets stored in Hadoop File Systems
SQL-Like language called HiveQL
Custom mappers and reduces when HiveQL isn’t enough

18
Agenda
 Hive solves the business problem of analysing large amounts of
data

•
•
•
•

19

What is the purpose of Hive?
Why Hive?
A history of Hive
What are Hive’s constituents
Why Hive?
Can’t Hadoop be used to solve these problems?
Why is there a need for Hive?

Writing MR jobs in Java can be difficult
You don’t know it’s wrong until it’s fallen over!

Joining Large Datasets can be difficult
Learning Curve

20
Agenda
 Hive solves the business problem of analysing large amounts of
data

•
•
•
•

21

What is the purpose of Hive?
Why Hive?
A history of Hive
What are Hive’s constituents
Hive History

22
Hive History

23
What can Hive offer you?
 Hive can help with a range of business problems:

•
•
•
•

24

Log Processing
Predictive Modelling
Hypothesis testing
And Business Intelligence
Hive is not a replacement for SQL
 So don’t throw out your SQL Server instances!

• Hive is for processing large data sets that may span
hundreds, or even thousands, of machines
• Hive as a high overhead for starting a job. It translates queries
to MR so it takes time
• Hive does not cache data, like SQL Server
• Hive performance tuning is mainly Hadoop performance
tuning
• Similarity of the query engine, but different architectures for
different purposes

25
Agenda
 Hive solves the business problem of analysing large amounts of
data

•
•
•
•

What is the purpose of Hive?
Why Hive?
A history of Hive
What are Hive’s constituents?
 Hive as a SQL-like Language Query Tool
 Hive as a Translation Tool
 Hive as a Structuring Tool

26
HiveQL
Hive QL is a SQL-like language
It outputs naturally occurring groups for further analysis

Easy Data Summarization
Large Datasets, summarized
Fit a structure onto data

Analysis of Large Datasets stored in Hadoop file systems
SQL-Like language called HiveQL
Custom mappers and reduces when HiveQL isn’t enough

27
HiveQL Queries like SQL Queries?
Similarities in Syntax and Features
Similar features

SELECT
FROM
WHERE
GROUP BY / HAVING
Table Aliases
Computed Columns

28
HiveQL Queries like SQL Queries?
Similarities in Syntax and Features
Similar features

Aggregate Functions
Nested Select
CASE
LIKE / RLIKE
JOIN
ORDER BY / SORT BY

29
How does Hive work?
Hive as a Translation Tool
Compiles and executes queries

Hive translates the SQL Query to a Map Reduce Job
These are chained together
Queries are compiled and executed

30
How does Hive work?
Hive as a structuring Tool
Creates a schema around the data
Tables stored in Directories

Hive Tables
Rows and columns, like SQL tables

Hive Metastore
Namespace with a set of tables
Holds table definitions
Physical Layout
Column Types
Partition Information

31
Hive and SQL Data Types
Hive

SQL

Tinyint

Tinyint

SmallInt

Smallint

Int

Int

BigInt

BigInt

Boolean

Bit (setting as NOT NULL)

Float

Float

Double

Real

BigDecimal

Decimal

33
Hive and SQL Data Types
HEADING

HEADING

String

Char, varchar, nvarchar, ntext, text, image

Binary

binary

Timestamp

Timestamp (note that this is being deprecated).
RowVersion

34
Hive Mathematical Operations
 Primitive Types

 Complex Types

• Plus

• Arrays

• Negative

• Maps

• Addition

• Structs

• Subtraction

• Union

• Multiplication
• Division
• Modulus

35
How does Hive work?
Hive as a structuring Tool
Creates a schema around the data
Tables stored in Directories

Hive Tables
Rows and columns, like SQL tables

Hive Metastore
Namespace with a set of tables
Holds table definitions
Physical Layout
Column Types
Partition Information

36
Visualising Big Data

Self-Service

Insights
Actions
37
Different Tools for Different Jobs
 Power View

 Power Map

 Highly Visual Design Experience

 Power Map is a new 3D
visualization add-in for Excel
helping you to analyse
geographical and temporal data

 Power View is an interactive, ad
hoc, query and visualization
experience.
 It is for business question ‘mystery’
solving

 Mapping
 Exploring
 Interacting

38
38
Data where you want it

39
39
Data you want about ‘where’

40
40
Data you want to share

41
41
Your data…. Fresh.

42
42
Demo

43
What did we learn from the demo?

44
JOIN US for our second annual event to get the best learning for
analyzing, managing, and sharing business information and
insights through the Microsoft Data Platform of technologies.
Don’t be shy… questions?
Thank you for listening
Sponsors

Mais conteúdo relacionado

Mais procurados

Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
Anna Shymchenko
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive
 

Mais procurados (20)

Hadoop acm presentation
Hadoop acm presentationHadoop acm presentation
Hadoop acm presentation
 
Kyvos Insights
Kyvos Insights Kyvos Insights
Kyvos Insights
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
From hadoop to spark
From hadoop to sparkFrom hadoop to spark
From hadoop to spark
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 

Destaque

Sql saturday denmark power bi for pdf
Sql saturday denmark power bi for pdfSql saturday denmark power bi for pdf
Sql saturday denmark power bi for pdf
Jen Stirrup
 

Destaque (7)

Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
democratization of data sql-konferenz
democratization of data sql-konferenzdemocratization of data sql-konferenz
democratization of data sql-konferenz
 
Sql saturday denmark power bi for pdf
Sql saturday denmark power bi for pdfSql saturday denmark power bi for pdf
Sql saturday denmark power bi for pdf
 
Office 365 Saturday Europe - Yammer, Office 365, SharePoint (yOS) : hybrid ar...
Office 365 Saturday Europe - Yammer, Office 365, SharePoint (yOS) : hybrid ar...Office 365 Saturday Europe - Yammer, Office 365, SharePoint (yOS) : hybrid ar...
Office 365 Saturday Europe - Yammer, Office 365, SharePoint (yOS) : hybrid ar...
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

Semelhante a Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013

The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
Joseph D'Antoni
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detail
HariKumar544765
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
HGanesh
 

Semelhante a Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013 (20)

The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Hive.pptx
Hive.pptxHive.pptx
Hive.pptx
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Hive
HiveHive
Hive
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detail
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 

Mais de Jen Stirrup

Mais de Jen Stirrup (20)

AI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfAI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdf
 
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONBUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform Technologies
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Sales Analytics in Power BI
Sales Analytics in Power BISales Analytics in Power BI
Sales Analytics in Power BI
 
Analytics for Marketing
Analytics for MarketingAnalytics for Marketing
Analytics for Marketing
 
Diversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersDiversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doers
 
Artificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveArtificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspective
 
How to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successHow to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to success
 
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpower
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Blockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsBlockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence Professionals
 
Examples of the worst data visualization ever
Examples of the worst data visualization everExamples of the worst data visualization ever
Examples of the worst data visualization ever
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013

  • 1. Visualising Big Data Big Data Visualisation with Hadoop, Hive and Excel 2013
  • 3. Explore Everything PASS Has to Offer Free SQL Server and BI Web Events Free 1-day Training Events Regional Event This is Community Business Analytics Training Local User Groups Around the World Session Recordings PASS Newsletter Free Online Technical Training 3
  • 4. About me  Director-At-Large (Elect) PASS Board from Jan 2014  SQL Server MVP  Blogger, data strategist, public speaker, technologist  Joint owner of Copper Blue Consulting Ltd 4 |
  • 5. Agenda     5 | Overview of Big Data Technologies Data Visualisation with Office365 and PowerBI Hive Visualising Big Data with Microsoft
  • 7. HDInsight Ecosystem ODBC Distributed Processing (Map Reduce) Distributed Storage (HDFS) (Azure Data Marketplace) Windows Azure Storage
  • 8. What is Hadoop? “Flexible and Available Architecture for Large Scale computation and data processing on a network of highly available commodity hardware.”
  • 9. Hadoop’s Lineage * Resource: Kerberos Konference (Yahoo) – 2010
  • 10. Data Visualisation Background We have the tools. All we’ve got to do is imagine what could be. We can reinvent the present; we can transform the world around us. Jason Silva 10
  • 11. Almost 50% of your brain is dedicated to visual processing. David van Essen Researchers found that colour visuals increase the willingness to read by 11 80% About 70% of your sensory receptors are in your eyes.
  • 12. Why is Data Visualisation Important?  It’s clearly a budget. It has a lot of numbers in it. George W Bush I could never figure out where the decimal point went. (Lord Randolph Churchill)
  • 13.
  • 14. The Unknown Unknowns  That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know. (Donald Rumsfeld)
  • 15.
  • 16. What is the purpose of Hive? Hive is a solution to a business problem: How do you analyse large amounts of data? Data Scientists want to study data Communicate with the data Businesses want to reap benefits of data Results that make sense of the data 16
  • 17. 17
  • 18. What is the purpose of Hive? Hive is a data warehousing system for Hadoop To meet the needs of businesses, data scientists, analysts and BI professionals Data, Summarized Fit a structure onto data Data, Analyzed Analysis of Large Datasets stored in Hadoop File Systems SQL-Like language called HiveQL Custom mappers and reduces when HiveQL isn’t enough 18
  • 19. Agenda  Hive solves the business problem of analysing large amounts of data • • • • 19 What is the purpose of Hive? Why Hive? A history of Hive What are Hive’s constituents
  • 20. Why Hive? Can’t Hadoop be used to solve these problems? Why is there a need for Hive? Writing MR jobs in Java can be difficult You don’t know it’s wrong until it’s fallen over! Joining Large Datasets can be difficult Learning Curve 20
  • 21. Agenda  Hive solves the business problem of analysing large amounts of data • • • • 21 What is the purpose of Hive? Why Hive? A history of Hive What are Hive’s constituents
  • 24. What can Hive offer you?  Hive can help with a range of business problems: • • • • 24 Log Processing Predictive Modelling Hypothesis testing And Business Intelligence
  • 25. Hive is not a replacement for SQL  So don’t throw out your SQL Server instances! • Hive is for processing large data sets that may span hundreds, or even thousands, of machines • Hive as a high overhead for starting a job. It translates queries to MR so it takes time • Hive does not cache data, like SQL Server • Hive performance tuning is mainly Hadoop performance tuning • Similarity of the query engine, but different architectures for different purposes 25
  • 26. Agenda  Hive solves the business problem of analysing large amounts of data • • • • What is the purpose of Hive? Why Hive? A history of Hive What are Hive’s constituents?  Hive as a SQL-like Language Query Tool  Hive as a Translation Tool  Hive as a Structuring Tool 26
  • 27. HiveQL Hive QL is a SQL-like language It outputs naturally occurring groups for further analysis Easy Data Summarization Large Datasets, summarized Fit a structure onto data Analysis of Large Datasets stored in Hadoop file systems SQL-Like language called HiveQL Custom mappers and reduces when HiveQL isn’t enough 27
  • 28. HiveQL Queries like SQL Queries? Similarities in Syntax and Features Similar features SELECT FROM WHERE GROUP BY / HAVING Table Aliases Computed Columns 28
  • 29. HiveQL Queries like SQL Queries? Similarities in Syntax and Features Similar features Aggregate Functions Nested Select CASE LIKE / RLIKE JOIN ORDER BY / SORT BY 29
  • 30. How does Hive work? Hive as a Translation Tool Compiles and executes queries Hive translates the SQL Query to a Map Reduce Job These are chained together Queries are compiled and executed 30
  • 31. How does Hive work? Hive as a structuring Tool Creates a schema around the data Tables stored in Directories Hive Tables Rows and columns, like SQL tables Hive Metastore Namespace with a set of tables Holds table definitions Physical Layout Column Types Partition Information 31
  • 32. Hive and SQL Data Types Hive SQL Tinyint Tinyint SmallInt Smallint Int Int BigInt BigInt Boolean Bit (setting as NOT NULL) Float Float Double Real BigDecimal Decimal 33
  • 33. Hive and SQL Data Types HEADING HEADING String Char, varchar, nvarchar, ntext, text, image Binary binary Timestamp Timestamp (note that this is being deprecated). RowVersion 34
  • 34. Hive Mathematical Operations  Primitive Types  Complex Types • Plus • Arrays • Negative • Maps • Addition • Structs • Subtraction • Union • Multiplication • Division • Modulus 35
  • 35. How does Hive work? Hive as a structuring Tool Creates a schema around the data Tables stored in Directories Hive Tables Rows and columns, like SQL tables Hive Metastore Namespace with a set of tables Holds table definitions Physical Layout Column Types Partition Information 36
  • 37. Different Tools for Different Jobs  Power View  Power Map  Highly Visual Design Experience  Power Map is a new 3D visualization add-in for Excel helping you to analyse geographical and temporal data  Power View is an interactive, ad hoc, query and visualization experience.  It is for business question ‘mystery’ solving  Mapping  Exploring  Interacting 38 38
  • 38. Data where you want it 39 39
  • 39. Data you want about ‘where’ 40 40
  • 40. Data you want to share 41 41
  • 43. What did we learn from the demo? 44
  • 44. JOIN US for our second annual event to get the best learning for analyzing, managing, and sharing business information and insights through the Microsoft Data Platform of technologies.
  • 45. Don’t be shy… questions?
  • 46. Thank you for listening