SlideShare a Scribd company logo
1 of 44
Download to read offline
Social Media Analytics using Azure Technologies
Koray Kocabaş
#sqlsatistanbul
Sponsors
Media Sponsor
Main Sponsor
Swag Sponsor
#sqlsatistanbul
What do we need ?
Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.
#sqlsatistanbul
Session Evaluations
Evaluate sessions and get a chance for the raffle:
http://spoke.at/sqlsat451
#sqlsatistanbul
About Me...
Koray Kocabaş
Data Platform (SQL Server) MVP
Yemeksepeti Business Intelligence
Bahcesehir University Instructor
@koraykocabas
https://tr.linkedin.com/in/koraykocabas
Blog: http://www.misjournal.com
E-Mail: koraykocabas@outlook.com
The Data Deluge
#sqlsatistanbul
What kind of solutions using Big Data
• Clickstream analysis to find buying patterns
• Sentiment analysis for text data
• Fraud detection; forensic analysis
• Machine learning
• Healthcare research
• Predictive Maintenance
Just dream it. Data is everywhere!
Twitter launched in 2006
Active users per month
~316 Millions (August)
~320 Millions (October)
%80 of users is Mobile!
Tweets per second 6.000
Tweets per day ~500 Million
Tweets per year ~200 Billion
Twitter generate a lot of data (12
TB per day)
90 % of buyers trust peer
recommendations
55 % of Twitter users are females
The average Twitter user has 27
Followers
Why it is so Popular?
Event based data
Unstructured data
Detail event information
Streaming
Who is the influencer
TweetTracker
TweetArchivist
Radian6
Sysomos
Tweet Deck
Hootsuite
Twitter Problems Dashboards For Tweets
#sqlsatistanbul
PROBLEMS...
#sqlsatistanbul
1. Collect Twitter Data & Get Simple Information
2. Data Enrichment
3. Store Semi - Structured Data
4. Analyze Semi - Structured Data
5. Visualize Meaningful Results
#sqlsatistanbul
#sqlsatistanbul
Collect Twitter Data & Get Simple Information
#sqlsatistanbul
#sqlsatistanbul
Real-Time Analytics
Intake millions of events per second
Process data from connected devices/apps
Detect patterns and anomalies in streaming data
Transform, augment, correlate, temporal operations
No hardware (PaaS offering)
Up and running in a few clicks (and within minutes)
No performance tuning
Efficiently pay only for usage
Not paying for idle resources
Low startup costs
Scale from small to large when required
Only SQL queries needed (Thousand lines of code in other solutions, such as Apache Storm)
#sqlsatistanbul
Stream Analytics Query Language Functions
DML Statements
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• CASE
• JOIN
• UNION
Windowing Extensions
• Tumbling Window
• Hopping Window
• Sliding Window
• Duration
Aggregate Functions
• SUM
• COUNT
• AVG
• MIN
• MAX
Scaling Functions
• WITH
• PARTITION BY
Date and Time Functions
• DATENAME
• DATEPART
• DAY
• MONTH
• YEAR
• DATETIMEFROMPARTS
• DATEDIFF
• DATADD
String Functions
• LEN
• CONCAT
• CHARINDEX
• SUBSTRING
Statistical Functions
• VAR
• VARP
• STDEV
0 5 10 15 20 25 30
0 5 10 15 20 25 30
4
4
5
The count of tweets every 10 secondsTumbling Windows
SELECT Topic, Count(*) AS Count
FROM sqlsaturdaystream TIMESTAMP BY CreatedAt
GROUP BY Topic, TumblingWindow(second,10)
0 5 10 15 20 25 30
Every 5 seconds give me the count of
tweets over 10 seconds by topic
Hopping Windows
SELECT Topic, Count(*) AS Count
FROM sqlsaturdaystream TIMESTAMP BY CreatedAt
GROUP BY Topic, HoppingWindow(second,10,5)
0 5 10 15 20 25 30
If the tweets count is above a threshold
of 8 for a total of 5 seconds
Sliding Windows
SELECT Topic, Count(*) AS Count
FROM sqlsaturdaystream TIMESTAMP BY CreatedAt
GROUP BY Topic, SlidingWindow(second,5)
HAVING Count(*)>8
#sqlsatistanbul
Stream Analytics
Event Hub
#sqlsatistanbul
Data Enrichment
#sqlsatistanbul
Data Azure Machine Learning Consumers
Local storage
Upload data from PC…
Cloud storage
Azure Storage
Azure Table
Hive
etc.
Excel
Business Apps
Business problem Modeling Business valueDeployment
Azure Marketplace
(Applications store)
Azure ML Gallery
(community)
ML Web Services
(REST API Services)
ML Studio
(Web IDE)
Workspace:
Experiments
Datasets
Trained models
Notebooks
Access settings
Data Model API
Manage
API
#sqlsatistanbul
#sqlsatistanbul
https://sites.google.com/site/miningtwitter/questions/sentiment/sentiment
http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
Sentiment140 (formerly known as "Twitter Sentiment")
allows you to discover the sentiment of a brand, product,
or topic on Twitter.
#sqlsatistanbul
SQL Server 2016
CTP 3.1
Revolution R Open
3.2.2 for Revolution
R Enterprise
Revolution R
Enterprise 7.5.0
Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS.
Microsoft announced on January 23, 2015 that they had reached an agreement to purchase Revolution Analytics for an as yet undisclosed amount.
#sqlsatistanbul
The Klout Score is a number between 1-100 that
represents your influence.
Collect and normalize more than 12 billion signals
a day
Hive data warehouse of more than 1 trillion rows
Klout acquired for $200 million by Lithium
Technologies
#sqlsatistanbul
Store Semi - Structured Data
Analyze Semi - Structured Data
#sqlsatistanbul
#sqlsatistanbul
Developed by Facebook. Later it was adopted in Apache as an open source project.
A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis
Integration between Hadoop and BI and visualization
Provides an SQL Like language called Hive QL to query data
Create Index, includes Partitioning
Not supported Update (isn’t correct)
Hive provides Users, Groups, Roles. But it’s not designed for high security.
Console (hive>), script, ODBC/JDBC, SQuirreL, HUE, Web Interface, etc.
Most popular Business Intelligence Tools support Hive
#sqlsatistanbul
Data Types
Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc.
Complex Data Types: arrays, maps, structs
ARRAY<string>: workplace: istanbul, ankara
STRUCT<sex:string,age:int> : Female,25
MAP<string,int>: SOLR:92
Hive RDBMS
SQL Interface SQL Interface
Focus on analytics ay focus on online or analytics
No transactions Transactions usually supported
Partition adds, no random Inserts. Random Insert and Update supported
Distributed processing via map/reduce Distributed processing varies by vendor (if available)
Scales to hundreds of nodes Seldom scale beyond 20 nodes
Built for commodity hardware Often built on proprietary hardware (especially when scaling out)
Low cost per petabyte What's petabyte? :) (note: Are you sure?)
#sqlsatistanbul
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
#sqlsatistanbul
#sqlsatistanbul
Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter)
A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs
Processing large semi-structured data sets using Hadoop Map Reduce
Write complex MapReduce jobs using a simple script language (Pig Latin)
Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.)
Developers can develop UDF
Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera)
Easy to use and efficient
#sqlsatistanbul
Data Types
Simple Data Types: int, float, double, chararray (UTF-8), bytearray
Complex Data Types: map (Key,Value), Tuple, Bag (list of tuples)
Commands
Loading: LOAD, STORE, DUMP
Filtering: FILTER, FOREACH, DISTINCT
Grouping: JOIN, GROUP, COGROUP, CROSS
Ordering: ORDER, LIMIT
Merging & Split: UNION, SPLIT
SQL SCRIPT PIG SCRIPT
SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int);
SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3;
SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10;
SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2);
E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3);
... HAVING sum(col3) > 5 F=FILTER E BY $2>5;
... ORDER BY col1 G=ORDER F BY $0
SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1;
J=DISTINCT I;
SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1
K=GROUP A BY col1;
L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}
#sqlsatistanbul
Ohhh Finally Demo Time!
#sqlsatistanbul
Visualize Meaningful Results
#sqlsatistanbul
#sqlsatistanbul
Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data
and Business Analytics Immersion, Getting Started with Microsoft Azure Machine Learning
Real World Big Data in Azure, Big Data on Amazon Web Services, Reporting with MongoDB,
Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science &
Hadoop Workflows at Scale With Scalding, SQL on Hadoop - Analyzing Big Data with Hive
Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for
Healthcare, Data Science at Scale, The Data Scientist's Toolbox, R Programming
Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics
using Hadoop eco system, Big Data: How Data Analytics Is Transforming the World, Applied Data
Science with R, Hadoop Enterprise Integration
Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and
Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical
Thinking for Data Science and Analytics
#sqlsatistanbul

More Related Content

What's hot

What's hot (20)

Understanding Cortana Intelligence Suite & Power BI Demo
Understanding Cortana Intelligence Suite & Power BI DemoUnderstanding Cortana Intelligence Suite & Power BI Demo
Understanding Cortana Intelligence Suite & Power BI Demo
 
From Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeFrom Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics Converge
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Geo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsGeo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data Grids
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it Matters
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 

Viewers also liked

Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
Gephi Consortium
 
Research methodology theory chapt. 1- kotthari
Research methodology theory  chapt. 1- kotthariResearch methodology theory  chapt. 1- kotthari
Research methodology theory chapt. 1- kotthari
Rubia Bhatia
 

Viewers also liked (6)

Big data con SQL Server 2014
Big data con SQL Server 2014Big data con SQL Server 2014
Big data con SQL Server 2014
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 
Research methodology theory chapt. 1- kotthari
Research methodology theory  chapt. 1- kotthariResearch methodology theory  chapt. 1- kotthari
Research methodology theory chapt. 1- kotthari
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

Similar to Social media analytics using Azure Technologies

DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
aiuy
 

Similar to Social media analytics using Azure Technologies (20)

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 

Recently uploaded

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 

Recently uploaded (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

Social media analytics using Azure Technologies

  • 1. Social Media Analytics using Azure Technologies Koray Kocabaş
  • 3. #sqlsatistanbul What do we need ? Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.
  • 4. #sqlsatistanbul Session Evaluations Evaluate sessions and get a chance for the raffle: http://spoke.at/sqlsat451
  • 5. #sqlsatistanbul About Me... Koray Kocabaş Data Platform (SQL Server) MVP Yemeksepeti Business Intelligence Bahcesehir University Instructor @koraykocabas https://tr.linkedin.com/in/koraykocabas Blog: http://www.misjournal.com E-Mail: koraykocabas@outlook.com
  • 7. #sqlsatistanbul What kind of solutions using Big Data • Clickstream analysis to find buying patterns • Sentiment analysis for text data • Fraud detection; forensic analysis • Machine learning • Healthcare research • Predictive Maintenance Just dream it. Data is everywhere!
  • 8.
  • 9. Twitter launched in 2006 Active users per month ~316 Millions (August) ~320 Millions (October) %80 of users is Mobile! Tweets per second 6.000 Tweets per day ~500 Million Tweets per year ~200 Billion Twitter generate a lot of data (12 TB per day) 90 % of buyers trust peer recommendations 55 % of Twitter users are females The average Twitter user has 27 Followers
  • 10. Why it is so Popular?
  • 11.
  • 12.
  • 13. Event based data Unstructured data Detail event information Streaming Who is the influencer TweetTracker TweetArchivist Radian6 Sysomos Tweet Deck Hootsuite Twitter Problems Dashboards For Tweets
  • 15. #sqlsatistanbul 1. Collect Twitter Data & Get Simple Information 2. Data Enrichment 3. Store Semi - Structured Data 4. Analyze Semi - Structured Data 5. Visualize Meaningful Results
  • 17. #sqlsatistanbul Collect Twitter Data & Get Simple Information
  • 19. #sqlsatistanbul Real-Time Analytics Intake millions of events per second Process data from connected devices/apps Detect patterns and anomalies in streaming data Transform, augment, correlate, temporal operations No hardware (PaaS offering) Up and running in a few clicks (and within minutes) No performance tuning Efficiently pay only for usage Not paying for idle resources Low startup costs Scale from small to large when required Only SQL queries needed (Thousand lines of code in other solutions, such as Apache Storm)
  • 20. #sqlsatistanbul Stream Analytics Query Language Functions DML Statements • SELECT • FROM • WHERE • GROUP BY • HAVING • CASE • JOIN • UNION Windowing Extensions • Tumbling Window • Hopping Window • Sliding Window • Duration Aggregate Functions • SUM • COUNT • AVG • MIN • MAX Scaling Functions • WITH • PARTITION BY Date and Time Functions • DATENAME • DATEPART • DAY • MONTH • YEAR • DATETIMEFROMPARTS • DATEDIFF • DATADD String Functions • LEN • CONCAT • CHARINDEX • SUBSTRING Statistical Functions • VAR • VARP • STDEV
  • 21. 0 5 10 15 20 25 30
  • 22. 0 5 10 15 20 25 30 4 4 5 The count of tweets every 10 secondsTumbling Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, TumblingWindow(second,10)
  • 23. 0 5 10 15 20 25 30 Every 5 seconds give me the count of tweets over 10 seconds by topic Hopping Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, HoppingWindow(second,10,5)
  • 24. 0 5 10 15 20 25 30 If the tweets count is above a threshold of 8 for a total of 5 seconds Sliding Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, SlidingWindow(second,5) HAVING Count(*)>8
  • 27. #sqlsatistanbul Data Azure Machine Learning Consumers Local storage Upload data from PC… Cloud storage Azure Storage Azure Table Hive etc. Excel Business Apps Business problem Modeling Business valueDeployment Azure Marketplace (Applications store) Azure ML Gallery (community) ML Web Services (REST API Services) ML Studio (Web IDE) Workspace: Experiments Datasets Trained models Notebooks Access settings Data Model API Manage API
  • 30. #sqlsatistanbul SQL Server 2016 CTP 3.1 Revolution R Open 3.2.2 for Revolution R Enterprise Revolution R Enterprise 7.5.0 Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS. Microsoft announced on January 23, 2015 that they had reached an agreement to purchase Revolution Analytics for an as yet undisclosed amount.
  • 31. #sqlsatistanbul The Klout Score is a number between 1-100 that represents your influence. Collect and normalize more than 12 billion signals a day Hive data warehouse of more than 1 trillion rows Klout acquired for $200 million by Lithium Technologies
  • 32. #sqlsatistanbul Store Semi - Structured Data Analyze Semi - Structured Data
  • 34. #sqlsatistanbul Developed by Facebook. Later it was adopted in Apache as an open source project. A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis Integration between Hadoop and BI and visualization Provides an SQL Like language called Hive QL to query data Create Index, includes Partitioning Not supported Update (isn’t correct) Hive provides Users, Groups, Roles. But it’s not designed for high security. Console (hive>), script, ODBC/JDBC, SQuirreL, HUE, Web Interface, etc. Most popular Business Intelligence Tools support Hive
  • 35. #sqlsatistanbul Data Types Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc. Complex Data Types: arrays, maps, structs ARRAY<string>: workplace: istanbul, ankara STRUCT<sex:string,age:int> : Female,25 MAP<string,int>: SOLR:92 Hive RDBMS SQL Interface SQL Interface Focus on analytics ay focus on online or analytics No transactions Transactions usually supported Partition adds, no random Inserts. Random Insert and Update supported Distributed processing via map/reduce Distributed processing varies by vendor (if available) Scales to hundreds of nodes Seldom scale beyond 20 nodes Built for commodity hardware Often built on proprietary hardware (especially when scaling out) Low cost per petabyte What's petabyte? :) (note: Are you sure?)
  • 38. #sqlsatistanbul Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter) A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs Processing large semi-structured data sets using Hadoop Map Reduce Write complex MapReduce jobs using a simple script language (Pig Latin) Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.) Developers can develop UDF Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera) Easy to use and efficient
  • 39. #sqlsatistanbul Data Types Simple Data Types: int, float, double, chararray (UTF-8), bytearray Complex Data Types: map (Key,Value), Tuple, Bag (list of tuples) Commands Loading: LOAD, STORE, DUMP Filtering: FILTER, FOREACH, DISTINCT Grouping: JOIN, GROUP, COGROUP, CROSS Ordering: ORDER, LIMIT Merging & Split: UNION, SPLIT SQL SCRIPT PIG SCRIPT SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int); SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3; SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10; SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2); E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3); ... HAVING sum(col3) > 5 F=FILTER E BY $2>5; ... ORDER BY col1 G=ORDER F BY $0 SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1; J=DISTINCT I; SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1 K=GROUP A BY col1; L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}
  • 43. #sqlsatistanbul Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data and Business Analytics Immersion, Getting Started with Microsoft Azure Machine Learning Real World Big Data in Azure, Big Data on Amazon Web Services, Reporting with MongoDB, Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science & Hadoop Workflows at Scale With Scalding, SQL on Hadoop - Analyzing Big Data with Hive Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for Healthcare, Data Science at Scale, The Data Scientist's Toolbox, R Programming Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics using Hadoop eco system, Big Data: How Data Analytics Is Transforming the World, Applied Data Science with R, Hadoop Enterprise Integration Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical Thinking for Data Science and Analytics