SlideShare uma empresa Scribd logo
1 de 36
IBM Research - Almaden
© 2012 IBM Corporation
Leveraging Big Data to Derive
Actionable People Insights
Huahai Yang
USER Group, Computer Science
IBM Research - Almaden
© 2012 IBM Corporation2
About Us
• Aka. IBM Almaden Research Center (ARC)
• On top of a hill in the southern tip of Silicon Valley
IBM Research - Almaden
© 2012 IBM Corporation3
About Us
• ARC
– Science & Technology
– Storage Systems
– Service Science Research
– Computer Science
• Theory
• Database Management
• Intelligent Information System
• Healthcare Information Technology
• User System and Experience Research (USER)
– Currently led by Michelle X. Zhou
• Join us, we are always hiring!
– Interns, Postdocs, Software Engineers and Research Staff
Members
IBM Research - Almaden
© 2012 IBM Corporation4
Big Data Opportunities
• Industries
– Finance
– Retail
– Product Manufacturer
– Tel-communication
– Entertainment
– …
• Potentials
– Customer
acquisition/retention
– Market segmentation
– Brand management
– Risk assessment
• 300+ million tweets daily
• 1+ million blog posts daily
IBM Research - Almaden
© 2012 IBM Corporation5
Our Focus: Insights about People
• Perceptions, Sentiments, Personalities and other profiles
IBM Research - Almaden
© 2012 IBM Corporation6
Outline
 Consumer users
– OpinionBlocks
• Visually summarizing product reviews
 Business users
– Brandy
• Understanding user perception and personality for direct
marketing
– qCrowd
• Actively engaging individuals on social media
IBM Research - Almaden
© 2012 IBM Corporation7
OpinionBlocks Motivation
 Online product reviews
– Significant in consumer decision making
– Large volume, uncurated
– Limited search tools
– Users have different priorities
 Help consumer makes better use of reviews
IBM Research - Almaden
© 2012 IBM Corporation8
Difficulties with Review Text
 A lot of variations in terms of:
– length
– clarity of language
– to the point vs vague
– emotional vs subjective
 Incoherent with the rating
 Redundant
IBM Research - Almaden
© 2012 IBM Corporation9
Prior Work
Chen et al. “Visualizing Analysis of Conflicting Opinions”, 2006
Oelka et al. “Visual Opinion Analysis of Customer Feedback Data”, 2009
Analyze first, and visualize the
analysis results only
IBM Research - Almaden
© 2012 IBM Corporation10
Can Consumer Trust the Analysis?
 Sentiment analysis is not a solved problem in NLP
– Often less than 80% accuracy
– Aspect oriented sentiment is even less accurate
 Low resolution
– Often polarity only: positive, negative, neutral
 Not very actionable
“…it was not clean, but I am not expecting a better
performance from any vacuum.”
IBM Research - Almaden
© 2012 IBM Corporation11
Our Approach
 Support an interactive reading experience where users can
search for relevant information.

 Visualize the text itself while highlighting analysis results.

 Show the categorized text in context so that users can
judge fairness of the sentiment analysis.

 Progressively disclose textual information while
continuously providing visual graphical summaries
IBM Research - Almaden
© 2012 IBM Corporation12
Overview First
 Provide summary of
overall opinion
 Identify important
features and key
issues in each
 Interactivity reveals
correlations among
features
IBM Research - Almaden
© 2012 IBM Corporation13
Filter on Demand

 Polarity of feature


 Keywords


 Snippets
IBM Research - Almaden
© 2012 IBM Corporation14
Zoom across LODs
IBM Research - Almaden
© 2012 IBM Corporation15
Zoom across LODs
IBM Research - Almaden
© 2012 IBM Corporation16
Work in Progress
 Formal user studies
– Does the system help consumers?
• Learn better about the product domain
• Find information faster
• Make better decisions
 Resign of the UI
• Compare products
• Scalable with larger number of reviews
IBM Research - Almaden
© 2012 IBM Corporation17
Brandy
 360 ゚ understanding of a business brand: evidence-
based brand management from social media
– Brand associations (e.g., key aspects)
– Competitive brands
– Brand evolution
– User modeling of those who voiced brand perception
• Demographics, personality traits, locations, brand association and
sentiments
 Active brand management for effective marketing
– Craft/adjust marketing messages based on brand analysis
• Associations and customer needs
– Deliver marketing messages to target customers
• Individual customers (e.g., customer retention)
• Customer segments (e.g., customer acquisition)
IBM Research - Almaden
© 2012 IBM Corporation18
Social
Data
Enriched/new
Customer Profile
Segment-based
Direct Marketing
(Customer Acquisition)
Brand
Management
(Marketing Research)
Existing Customers New Customers
Social Data
of Known
Customers
Perceptual
Map
Social Data of
Unknown
Customers
(A) Profiling Fusion
(C) Brand perception from
social data
(D) Overlay
Unica (Today)
(B) User profiling from
social data
Customer
Profiles
Individual-based
Direct Marketing
(Customer retention)
Data
Key Technology
Applications
Project Map
(E) (F) (G)
IBM Research - Almaden
© 2012 IBM Corporation19
Features # Examples Computation
LIWC
(dictionary-
based
measurement
of aspects of
word usage)
68 First person
Negation
Feeling
Communication
Leisure
Death
Let g be a LIWC category, Ng denotes
the number of occurrences of words in
that category in one’s tweets and N
denotes the total number of words in
his/her tweets. A score for category g is
then: Ng/N.
Big Five
(personality
types,
OCEAN)
5 Openness Using correlations with LIWC features
as reported by previous researchers
(e.g., Yarkoni et al.)
Big Five
Facets
30 Liberalism
Imagination
Using correlations with LIWC features
as reported by previous researchers
(e.g., Yarkoni et al.)
Modeling Personality
 Research in psychology have shown that word usage in one’s writings
such as blogs and essays is related with one’s personality
– Our research has correlated Big 5 facet features with willingness and
readiness to respond to questions on social networks
IBM Research - Almaden
© 2012 IBM Corporation20 Industry Solutions Joint Program
Ongoing: Correlating Brands with Personalities
 Data
– 3000 Twitter users
discussing Walmart,
Costco, and Sears
 Analytics
– Measured personality
traits from twits
– “Openness” scores per
brand shown to right.
All Brands
Costco
Sears
Walmart
IBM Research - Almaden
© 2012 IBM Corporation21
Brand Perceptions from Twitter Data
 Twitter Data
– Collected from Twitter4J Stream-fashion API
– Queries: Walmart, Costco, Sears
– Total ~1M tweets , from May 31st
- June 21 (3 weeks)
 Data Filtering & Enhancements
– Started with 6 categories from Consumer Report, “selection, quality, layout,
service, checkout, value”
– Category expansion- using synonyms, Wordnet and related terms. E.g.,
– Added new categories that POP out of the data, “jobs, people, visiting,
experience..”.
• E.g., “I don"t know why I even try anymore. Walmart always disappoints.
#walmartsucks”
• “Walmart stay crowded!”
checkout line,wait,cashier,counter,self checkout,cash,register
layout Isle,passage,lane,shelf,door,gate,lost,spacious,narrow,row
IBM Research - Almaden
© 2012 IBM Corporation22
Classification- Inputs
 Training Data
– Manually categorized ~3500 tweets
• Walmart (1341), Costco (1420), Sears (1109)
– Run WEKA with 3 different models
• SVM
• Multinomial
• KNN
– Run ICM with some tuning
• Naïve based
• Knowledge based and rules based
– Cross validation with 2-folds.
 Data Expansion
– Run algorithm that finds similar tweets for those 3500 categorized
tweets, ended up with ~75,000 tweets.
IBM Research - Almaden
© 2012 IBM Corporation23
Classification- Results
Walmart Costco
WEKA ICM WEKA ICM
Category #tweets Precision Recall Precision Recall #tweets Precision Recall Precision Recall
Visiting 112 0.313 0.357 0.412 0.5 244 0.457 0.496 0.86 0.39
Jobs 187 0.907 0.834 1 0.782 32 0.773 0.531 1 0.538
Experience 99 0.253 0.222 0.272 0.312 114 0.199 0.254 0.177 0.544
Checkout 135 0.927 0.748 0.963 0.791 16 0.333 0.188 0.2 0.12
Service 17 0 0 0 0 3 0 0 0 0
Value 19 0.5 0.053 0 0 53 0.1 0.057 0.125 0.08
Layout 6 0 0 0 0 4 0 0 0 0
Quality 18 0.25 0.056 0 0 53 0.16 0.075 0.2 0.318
People 81 0.294 0.185 0.357 0.256 43 0.053 0.023 0.2 0.3889
Pharmacy 63 0.925 0.778 0.92 0.741 95 0.926 0.916 0.905 0.941
Selection 50 0.389 0.14 0.112 0.409 132 0.284 0.189 0.407 0.159
Other
Category
574 0.598 0.763 0.673 0.111 712 0.625 0.704 0.702 0.5938
IBM Research - Almaden
© 2012 IBM Corporation24
Classification- Insights
 Insufficient evidence: some categories are rarely discussed in the
tweets. E.g., “layout”, “service”.
 Complexity of category: Even for the same number of training data,
the classification for some categories have much lower accuracy
Visiting
Jobs
Experience
Checkout
Service
Value
Layout
Quality
People
Pharmacy
Selection
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140 160 180 200
#tweets
Precision
IBM Research - Almaden
© 2012 IBM Corporation25
Infrastructure- UI
IBM Research - Almaden
© 2012 IBM Corporation26
Work in progress
• Improve classification quality of brand perceptions
– Unsupervised methods to uncover unknown perceptions
• Relating brand perceptions with consumer personalities
• User studies on marketing professionals
– How well the tools support marketing tasks?
–
IBM Research - Almaden
© 2012 IBM Corporation27
Qcrowd: asking targeted strangers questions
IBM Research - Almaden
© 2012 IBM Corporation28
Engagement Continuum
IBM Research - Almaden
© 2012 IBM Corporation29
System Architecture
IBM Research - Almaden
© 2012 IBM Corporation30
Research Questions
• Where might this be helpful?
– Questions about an event that are best answered soon after
the event
– Questions for which there might be a diversity of opinions
– More?
• How feasible is this approach?
– Will people answer questions from strangers?
– Will use of incentives increase responses?
– What is the quality of the answers?
–
IBM Research - Almaden
© 2012 IBM Corporation31
Test Scenarios: TSA Tracker & Camera Review
• Crowdsourcing airport security wait time via twitter
• Crowdsourcing product reviews via twitter
– Ask follow-up questions if responded
IBM Research - Almaden
© 2012 IBM Corporation32
Questions Asked
• TSA Tracker
– Without incentive
– With incentive
–
• Camera Reviews
IBM Research - Almaden
© 2012 IBM Corporation33
Results
IBM Research - Almaden
© 2012 IBM Corporation34
Follow-up Questions
IBM Research - Almaden
© 2012 IBM Corporation35
Observations
IBM Research - Almaden
© 2012 IBM Corporation36
Thank You! Questions?
User System and Experience Research (USER)
IBM Research – Almaden
http://www.almaden.ibm.com/cs/disciplines/user/
Jilin Chen Allen Cypher Eben Haber Eser Kandogan Tessa Lau Jalal Mahmud
Jeffrey Nichols Barton Smith Huahai Yang Michelle X. ZhouTara Mathews

Mais conteúdo relacionado

Semelhante a Leveraging Big Data to Derive Actionable People Insights

WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?Haluk Demirkan
 
IBM Cognos Social Media Analytic Solution - G A InfoMart
IBM Cognos Social Media Analytic Solution - G A InfoMartIBM Cognos Social Media Analytic Solution - G A InfoMart
IBM Cognos Social Media Analytic Solution - G A InfoMartGA InfoMart Ltd
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lakeKaran Sachdeva
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysBusiness Over Broadway
 
Self Service Outline Updated 8 js
Self Service Outline Updated 8 jsSelf Service Outline Updated 8 js
Self Service Outline Updated 8 jsJulia Smith
 
Harnessing the Predictive Power of the Digital Thumbprint FINAL-MAY
Harnessing the Predictive Power of the Digital Thumbprint FINAL-MAYHarnessing the Predictive Power of the Digital Thumbprint FINAL-MAY
Harnessing the Predictive Power of the Digital Thumbprint FINAL-MAYBrendan R. Grady
 
Big Data, customer analytics and loyalty marketing
Big Data, customer analytics and loyalty marketingBig Data, customer analytics and loyalty marketing
Big Data, customer analytics and loyalty marketingKevin May
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Turi, Inc.
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBigML, Inc
 
Civichub AI usos cívicos presentacion ignacio cabrero 20181120
Civichub AI usos cívicos presentacion ignacio cabrero 20181120Civichub AI usos cívicos presentacion ignacio cabrero 20181120
Civichub AI usos cívicos presentacion ignacio cabrero 20181120ignaciocabrero
 
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...Skyl.ai
 
360 metadata - crucial for digital marketing - framework for you
360 metadata - crucial for digital marketing - framework for you360 metadata - crucial for digital marketing - framework for you
360 metadata - crucial for digital marketing - framework for youHeimo Hänninen
 
Introduction to (Big) Data Science
Introduction to (Big) Data ScienceIntroduction to (Big) Data Science
Introduction to (Big) Data ScienceInfoFarm
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Sandesh Rao
 
The Human Customer: B2B's Great Migration to Customer Experience in Digital
The Human Customer: B2B's Great Migration to Customer Experience in DigitalThe Human Customer: B2B's Great Migration to Customer Experience in Digital
The Human Customer: B2B's Great Migration to Customer Experience in DigitalRightpoint
 
How an AI-backed recommendation system can help increase revenue for your onl...
How an AI-backed recommendation system can help increase revenue for your onl...How an AI-backed recommendation system can help increase revenue for your onl...
How an AI-backed recommendation system can help increase revenue for your onl...Skyl.ai
 

Semelhante a Leveraging Big Data to Derive Actionable People Insights (20)

WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
 
IBM Cognos Social Media Analytic Solution - G A InfoMart
IBM Cognos Social Media Analytic Solution - G A InfoMartIBM Cognos Social Media Analytic Solution - G A InfoMart
IBM Cognos Social Media Analytic Solution - G A InfoMart
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lake
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI module
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and Surveys
 
Self Service Outline Updated 8 js
Self Service Outline Updated 8 jsSelf Service Outline Updated 8 js
Self Service Outline Updated 8 js
 
Harnessing the Predictive Power of the Digital Thumbprint FINAL-MAY
Harnessing the Predictive Power of the Digital Thumbprint FINAL-MAYHarnessing the Predictive Power of the Digital Thumbprint FINAL-MAY
Harnessing the Predictive Power of the Digital Thumbprint FINAL-MAY
 
Big Data, customer analytics and loyalty marketing
Big Data, customer analytics and loyalty marketingBig Data, customer analytics and loyalty marketing
Big Data, customer analytics and loyalty marketing
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Civichub AI usos cívicos presentacion ignacio cabrero 20181120
Civichub AI usos cívicos presentacion ignacio cabrero 20181120Civichub AI usos cívicos presentacion ignacio cabrero 20181120
Civichub AI usos cívicos presentacion ignacio cabrero 20181120
 
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
Future of Ecommerce: How to Improve the Online Shopping Experience Using Mach...
 
360 metadata - crucial for digital marketing - framework for you
360 metadata - crucial for digital marketing - framework for you360 metadata - crucial for digital marketing - framework for you
360 metadata - crucial for digital marketing - framework for you
 
05 predictive with spss
05 predictive with spss05 predictive with spss
05 predictive with spss
 
Introduction to (Big) Data Science
Introduction to (Big) Data ScienceIntroduction to (Big) Data Science
Introduction to (Big) Data Science
 
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
 
Ezml Stanford 2015
Ezml Stanford 2015Ezml Stanford 2015
Ezml Stanford 2015
 
Taming data lake - scalable metrics model
Taming data lake - scalable metrics modelTaming data lake - scalable metrics model
Taming data lake - scalable metrics model
 
The Human Customer: B2B's Great Migration to Customer Experience in Digital
The Human Customer: B2B's Great Migration to Customer Experience in DigitalThe Human Customer: B2B's Great Migration to Customer Experience in Digital
The Human Customer: B2B's Great Migration to Customer Experience in Digital
 
How an AI-backed recommendation system can help increase revenue for your onl...
How an AI-backed recommendation system can help increase revenue for your onl...How an AI-backed recommendation system can help increase revenue for your onl...
How an AI-backed recommendation system can help increase revenue for your onl...
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Leveraging Big Data to Derive Actionable People Insights

  • 1. IBM Research - Almaden © 2012 IBM Corporation Leveraging Big Data to Derive Actionable People Insights Huahai Yang USER Group, Computer Science
  • 2. IBM Research - Almaden © 2012 IBM Corporation2 About Us • Aka. IBM Almaden Research Center (ARC) • On top of a hill in the southern tip of Silicon Valley
  • 3. IBM Research - Almaden © 2012 IBM Corporation3 About Us • ARC – Science & Technology – Storage Systems – Service Science Research – Computer Science • Theory • Database Management • Intelligent Information System • Healthcare Information Technology • User System and Experience Research (USER) – Currently led by Michelle X. Zhou • Join us, we are always hiring! – Interns, Postdocs, Software Engineers and Research Staff Members
  • 4. IBM Research - Almaden © 2012 IBM Corporation4 Big Data Opportunities • Industries – Finance – Retail – Product Manufacturer – Tel-communication – Entertainment – … • Potentials – Customer acquisition/retention – Market segmentation – Brand management – Risk assessment • 300+ million tweets daily • 1+ million blog posts daily
  • 5. IBM Research - Almaden © 2012 IBM Corporation5 Our Focus: Insights about People • Perceptions, Sentiments, Personalities and other profiles
  • 6. IBM Research - Almaden © 2012 IBM Corporation6 Outline  Consumer users – OpinionBlocks • Visually summarizing product reviews  Business users – Brandy • Understanding user perception and personality for direct marketing – qCrowd • Actively engaging individuals on social media
  • 7. IBM Research - Almaden © 2012 IBM Corporation7 OpinionBlocks Motivation  Online product reviews – Significant in consumer decision making – Large volume, uncurated – Limited search tools – Users have different priorities  Help consumer makes better use of reviews
  • 8. IBM Research - Almaden © 2012 IBM Corporation8 Difficulties with Review Text  A lot of variations in terms of: – length – clarity of language – to the point vs vague – emotional vs subjective  Incoherent with the rating  Redundant
  • 9. IBM Research - Almaden © 2012 IBM Corporation9 Prior Work Chen et al. “Visualizing Analysis of Conflicting Opinions”, 2006 Oelka et al. “Visual Opinion Analysis of Customer Feedback Data”, 2009 Analyze first, and visualize the analysis results only
  • 10. IBM Research - Almaden © 2012 IBM Corporation10 Can Consumer Trust the Analysis?  Sentiment analysis is not a solved problem in NLP – Often less than 80% accuracy – Aspect oriented sentiment is even less accurate  Low resolution – Often polarity only: positive, negative, neutral  Not very actionable “…it was not clean, but I am not expecting a better performance from any vacuum.”
  • 11. IBM Research - Almaden © 2012 IBM Corporation11 Our Approach  Support an interactive reading experience where users can search for relevant information.   Visualize the text itself while highlighting analysis results.   Show the categorized text in context so that users can judge fairness of the sentiment analysis.   Progressively disclose textual information while continuously providing visual graphical summaries
  • 12. IBM Research - Almaden © 2012 IBM Corporation12 Overview First  Provide summary of overall opinion  Identify important features and key issues in each  Interactivity reveals correlations among features
  • 13. IBM Research - Almaden © 2012 IBM Corporation13 Filter on Demand   Polarity of feature    Keywords    Snippets
  • 14. IBM Research - Almaden © 2012 IBM Corporation14 Zoom across LODs
  • 15. IBM Research - Almaden © 2012 IBM Corporation15 Zoom across LODs
  • 16. IBM Research - Almaden © 2012 IBM Corporation16 Work in Progress  Formal user studies – Does the system help consumers? • Learn better about the product domain • Find information faster • Make better decisions  Resign of the UI • Compare products • Scalable with larger number of reviews
  • 17. IBM Research - Almaden © 2012 IBM Corporation17 Brandy  360 ゚ understanding of a business brand: evidence- based brand management from social media – Brand associations (e.g., key aspects) – Competitive brands – Brand evolution – User modeling of those who voiced brand perception • Demographics, personality traits, locations, brand association and sentiments  Active brand management for effective marketing – Craft/adjust marketing messages based on brand analysis • Associations and customer needs – Deliver marketing messages to target customers • Individual customers (e.g., customer retention) • Customer segments (e.g., customer acquisition)
  • 18. IBM Research - Almaden © 2012 IBM Corporation18 Social Data Enriched/new Customer Profile Segment-based Direct Marketing (Customer Acquisition) Brand Management (Marketing Research) Existing Customers New Customers Social Data of Known Customers Perceptual Map Social Data of Unknown Customers (A) Profiling Fusion (C) Brand perception from social data (D) Overlay Unica (Today) (B) User profiling from social data Customer Profiles Individual-based Direct Marketing (Customer retention) Data Key Technology Applications Project Map (E) (F) (G)
  • 19. IBM Research - Almaden © 2012 IBM Corporation19 Features # Examples Computation LIWC (dictionary- based measurement of aspects of word usage) 68 First person Negation Feeling Communication Leisure Death Let g be a LIWC category, Ng denotes the number of occurrences of words in that category in one’s tweets and N denotes the total number of words in his/her tweets. A score for category g is then: Ng/N. Big Five (personality types, OCEAN) 5 Openness Using correlations with LIWC features as reported by previous researchers (e.g., Yarkoni et al.) Big Five Facets 30 Liberalism Imagination Using correlations with LIWC features as reported by previous researchers (e.g., Yarkoni et al.) Modeling Personality  Research in psychology have shown that word usage in one’s writings such as blogs and essays is related with one’s personality – Our research has correlated Big 5 facet features with willingness and readiness to respond to questions on social networks
  • 20. IBM Research - Almaden © 2012 IBM Corporation20 Industry Solutions Joint Program Ongoing: Correlating Brands with Personalities  Data – 3000 Twitter users discussing Walmart, Costco, and Sears  Analytics – Measured personality traits from twits – “Openness” scores per brand shown to right. All Brands Costco Sears Walmart
  • 21. IBM Research - Almaden © 2012 IBM Corporation21 Brand Perceptions from Twitter Data  Twitter Data – Collected from Twitter4J Stream-fashion API – Queries: Walmart, Costco, Sears – Total ~1M tweets , from May 31st - June 21 (3 weeks)  Data Filtering & Enhancements – Started with 6 categories from Consumer Report, “selection, quality, layout, service, checkout, value” – Category expansion- using synonyms, Wordnet and related terms. E.g., – Added new categories that POP out of the data, “jobs, people, visiting, experience..”. • E.g., “I don"t know why I even try anymore. Walmart always disappoints. #walmartsucks” • “Walmart stay crowded!” checkout line,wait,cashier,counter,self checkout,cash,register layout Isle,passage,lane,shelf,door,gate,lost,spacious,narrow,row
  • 22. IBM Research - Almaden © 2012 IBM Corporation22 Classification- Inputs  Training Data – Manually categorized ~3500 tweets • Walmart (1341), Costco (1420), Sears (1109) – Run WEKA with 3 different models • SVM • Multinomial • KNN – Run ICM with some tuning • Naïve based • Knowledge based and rules based – Cross validation with 2-folds.  Data Expansion – Run algorithm that finds similar tweets for those 3500 categorized tweets, ended up with ~75,000 tweets.
  • 23. IBM Research - Almaden © 2012 IBM Corporation23 Classification- Results Walmart Costco WEKA ICM WEKA ICM Category #tweets Precision Recall Precision Recall #tweets Precision Recall Precision Recall Visiting 112 0.313 0.357 0.412 0.5 244 0.457 0.496 0.86 0.39 Jobs 187 0.907 0.834 1 0.782 32 0.773 0.531 1 0.538 Experience 99 0.253 0.222 0.272 0.312 114 0.199 0.254 0.177 0.544 Checkout 135 0.927 0.748 0.963 0.791 16 0.333 0.188 0.2 0.12 Service 17 0 0 0 0 3 0 0 0 0 Value 19 0.5 0.053 0 0 53 0.1 0.057 0.125 0.08 Layout 6 0 0 0 0 4 0 0 0 0 Quality 18 0.25 0.056 0 0 53 0.16 0.075 0.2 0.318 People 81 0.294 0.185 0.357 0.256 43 0.053 0.023 0.2 0.3889 Pharmacy 63 0.925 0.778 0.92 0.741 95 0.926 0.916 0.905 0.941 Selection 50 0.389 0.14 0.112 0.409 132 0.284 0.189 0.407 0.159 Other Category 574 0.598 0.763 0.673 0.111 712 0.625 0.704 0.702 0.5938
  • 24. IBM Research - Almaden © 2012 IBM Corporation24 Classification- Insights  Insufficient evidence: some categories are rarely discussed in the tweets. E.g., “layout”, “service”.  Complexity of category: Even for the same number of training data, the classification for some categories have much lower accuracy Visiting Jobs Experience Checkout Service Value Layout Quality People Pharmacy Selection 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 200 #tweets Precision
  • 25. IBM Research - Almaden © 2012 IBM Corporation25 Infrastructure- UI
  • 26. IBM Research - Almaden © 2012 IBM Corporation26 Work in progress • Improve classification quality of brand perceptions – Unsupervised methods to uncover unknown perceptions • Relating brand perceptions with consumer personalities • User studies on marketing professionals – How well the tools support marketing tasks? –
  • 27. IBM Research - Almaden © 2012 IBM Corporation27 Qcrowd: asking targeted strangers questions
  • 28. IBM Research - Almaden © 2012 IBM Corporation28 Engagement Continuum
  • 29. IBM Research - Almaden © 2012 IBM Corporation29 System Architecture
  • 30. IBM Research - Almaden © 2012 IBM Corporation30 Research Questions • Where might this be helpful? – Questions about an event that are best answered soon after the event – Questions for which there might be a diversity of opinions – More? • How feasible is this approach? – Will people answer questions from strangers? – Will use of incentives increase responses? – What is the quality of the answers? –
  • 31. IBM Research - Almaden © 2012 IBM Corporation31 Test Scenarios: TSA Tracker & Camera Review • Crowdsourcing airport security wait time via twitter • Crowdsourcing product reviews via twitter – Ask follow-up questions if responded
  • 32. IBM Research - Almaden © 2012 IBM Corporation32 Questions Asked • TSA Tracker – Without incentive – With incentive – • Camera Reviews
  • 33. IBM Research - Almaden © 2012 IBM Corporation33 Results
  • 34. IBM Research - Almaden © 2012 IBM Corporation34 Follow-up Questions
  • 35. IBM Research - Almaden © 2012 IBM Corporation35 Observations
  • 36. IBM Research - Almaden © 2012 IBM Corporation36 Thank You! Questions? User System and Experience Research (USER) IBM Research – Almaden http://www.almaden.ibm.com/cs/disciplines/user/ Jilin Chen Allen Cypher Eben Haber Eser Kandogan Tessa Lau Jalal Mahmud Jeffrey Nichols Barton Smith Huahai Yang Michelle X. ZhouTara Mathews

Notas do Editor

  1. G is a must, and E&F are bonus
  2. Visiting,112 OtherCategory,574 Jobs,187 Experience,99 Checkout,135 Value,19 Service,17 Layout,6 Quality,18 People,81 Pharmacy,63 Selection,50