SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Analyzing
Data
From
Movielens
Presents by
Qijie Pan
Junwei Guan
Cory Matthew ,Hayward
Popular Movies
Problem 1
"Popular" Movie?
• What is "POPULAR"?
Problem 1.1
"Popular" Movie?
Most rating numbers?
Problem 1.1
"Popular" Movie?
Highest rating scores?
Problem 1.1
"Popular" Movie?
Highest rating scores + Most rating numbers!
top 10
highest films
among top 50
most rated
movies
Problem 1.2
Which groups to please?
Young People Or the Old?
Problem 2
Investigation to histograms
Filtered
BY
"size()>100"
Problem 3
Correlation: Men versus Women
All samples:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 10:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 20:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 30:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 40:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 50:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 60:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 70:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 80:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 90:
Problem 3
Correlation: Men versus Women
Filtered by rated larger than 100:
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
Small
Number
! ! ! !
Only Two
Choices? ? ? ?
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
Marketing Plans ! ! ! !
Advertisement
----Twitter, Facebook
----Oriented Deliverey
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
What is MORE . . .
100% prefer Drama
95% prefer Comedy
55% prefer Comedy&Rommance
For All the 20 Occupations
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
Questions can be
Answered
From the DATA :
The Most Popular Genres
Oriented Advertisements
Preferences by Gender,Age...
ZIP code
Regional Investment Strategies !
------Audience Location Distribution ! ! !
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
Problem 4
Business Intelligence
Occupation
&
Favorite
Genres
MORE Investment on Drama&Comedy
More Focus on the OLD
More investment in the East/West
If I were a Data Scientist:
Q & A

Mais conteúdo relacionado

Destaque

Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013Cloudera Japan
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 

Destaque (11)

Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 

AnalyzingMovieData and Business Intelligence