Engaging with Users on Public Social Media

Engaging with Users on
Public Social Media
Jeffrey Nichols
IBM Research – Almaden
jwnichols@us.ibm.com

IBM Research – Almaden
• 400+ research employees; 100+ students
and post-docs
• Research in Computer Science, Storage
Systems, Science and Technology, Services
Science
• User Focused Systems in CS

The Buzz of the Crowd
People are generating 1+ billion
status updates every day
Topics covered in status updates
are highly diverse:
• Weather, traffic, and other day-to-
day annoyances
• Experiences with products
• Reaction to events
How can we leverage this buzz to
do something useful?
* 1/2 billion updates every day on
Twitter as of October 2012

Challenge: The Information Iceberg
Information revealed through status updates
Useful information known to members of social network
GOAL

Example #1:
Learning more about customer incidents
to improve service
• What happened?
• Was it something in particular about this store?
• Could other people have the same experience?
• How can we make things right?
This information could be used to
improve the customer experience

Example #2:
Tracking crime to improve reporting and
better allocate resources
• Where was it stolen?
• Was a report filed with police?
Over time, this information could
suggest how to allocate officers or
funds to different areas of the city

• How long did it take to get through security?
This information could be used by
the security agency (TSA) to identify
problem spots and allocate officers.
It can also be used by consumers to
plan their air travel.
Example #3:
Tracking wait times at airport security checkpoints,
shows updates may indirectly suggest person has info.

Uses for Engagement on Social Media
The ability to actively identify and engage with the right people
at the right time on social media can empower an organization
• Collect just-in-time information from users
• Disseminate important information (broadcast or targeted)
• Motivate users to perform a task
• Seize timely business opportunities
(e.g., cross- or up selling)

Where might this be helpful?
• Questions that have spatial and/or temporal
specificity (e.g., about an event)
• Questions for which there might be a diversity
of opinion
• More?

Other
Advantages
• Information is easier to
extract from responses
because the question is
known
• Sample range can be
controlled by asking
questions from users with a
variety of different profiles
• No waiting needed…
questions can be asked in
real-time
• Potential answerers can be
primed with the question
before they have the answer

How feasible is this approach?
• Will people answer questions from strangers?
• Will use of an incentive increase responses?
• What is the quality of the answers?

Concrete Prototype: TSA Tracker
Crowdsourcing airport security wait times through Twitter
14
Step #1. Watch for people
tweeting about being
in airport
Step #2. Ask nicely if they
would share wait time to
help others
Step #3. Collect responses
and share relevant data
on web site
Step #4. Say thank you!
Key Question:
Will people respond to questions
from strangers?
http://tsatracker.org/
@tsatracker , @tsatracking

Questions
From @tsatracker (includes incentive)
“If you went through security at <airport code>, can you
reply with your wait time? Info will be used to help other
travelers”
From @tsatracking (no incentive)
“If you went through security at <airport code>, can you
reply with your wait time?”

Concrete Prototype: Product Reviews
Step 1. Identify owners of a product
Step 2. Ask focused question about product
• How is the image quality?
• Does it take good low light pictures?
• How quickly does it take a picture after pressing
the shutter button?
• How durable is it?
• What accessories are must haves?
• Etc…
Step 3-4. Ask more questions if user responds
Step 5. Visualize results as structured product review (future work)
Key Questions:
Will people respond to questions
in this different domain?
Will people respond to follow-up
questions at the same rate?
Do responses contain useful &
accurate information?

Product Review Scenarios
Samsung Galaxy Tab 10.1
• Popular consumer electronics product at the time of the study
(didn’t want to use iPad)
• Compared to reviews from Amazon.com
L.A.-area Food Trucks
• Vibrant scene and Twitter is a primary means of communication
• Food trucks usually identified in tweet by @handle
• Compared to reviews from Yelp.com

Question Asking Dashboard*
Keyword-filtered stream
User’s recent tweets
Responses

Quality Evaluation Methods
Human Coding
• Twitter responses & Traditional Reviews
• Relevance of response
• Information Types
Information Entropy
• Comparison between Twitter/Amazon, Twitter/Yelp
Mechanical Turk Questionnaire
• Usefulness, Objectiveness, Trustworthiness, Balance,
Readability

Suspended!
• @tsatracking account (no incentive condition)
given 1 week suspension after asking 150
questions
• Did not violate Twitter Terms of Use
• Exceeded threshold for blocks or message
marked as spam
• Neither of our other accounts were
suspended

Results
Answer:
42% response rate
44% of answers
received in 30 mins
No significant difference
between any conditions
(taking into account suspension)
Key Question:
Will people respond to
questions from strangers?

Follow-up Question Results
• Significant differences between all 4 questions (H=50.12, df=3, p < 0.0001, Kruskal-Wallis)
and just the 3 follow-ups (H=25.46, df=2, p < 0.0001, Kruskal-Wallis)

Qualitative Results
• @tsatracker account picked up 16
followers
• Many positive responses (“this will be
great for travelers”)
• Only one slightly negative response (“this
is creepy”), but that person also gave an
answer

Response Quality (Coding)
ResponseCount
RelevantAnswer
WrongAnswer
ButUsefulInfo
Multi-Message
Response
AverageInfoper
Response
Off-topicInfoper
Response
Tablet 258 71% 19% 3% 1.82 0.48
Food Truck 111 82% 6% 6% 1.69 0.46
# Irrelevant
Responses
No
Experience
Didn't know
or understand
Thinks
we're
a bot
Tablet 75 63% 11% 7%
Food Truck 20 25% 30% 0%
Overall
Breakdown
Irrelevant
Response
Breakdown

Information Entropy
The Twitter method is dependent on the questions
• Despite trying to base our questions on the contents of Amazon reviews,
ose reviews still contained more information.
• Our food truck questions went beyond Yelp reviews
* Calculated using a shrinkage entropy estimator
Tablet Food Truck
Amazon Twitter Yelp Twitter
Information
(bits)
4.25 3.76 3.27 4.24
Tablet Food Truck
Amazon Twitter Yelp Twitter
Information
(bits)
4.09 3.73 3.27 3.02
All Information
Information In
Both Sets

Mechanical Turk Evaluation
Tablet
Amazon Twitter
Mann-
Whitney p
Usefulness 3.19 2.64 868.5 0.006
Objectiveness 2.94 2.53 814.5 0.042
Trustworthiness 2.94 2.39 861.0 0.008
Balance 3.00 2.11 936.0 0.001
Readability 2.92 2.61 741.5 0.270
Food Truck
Yelp Twitter
Mann-
Whitney p
Usefulness 2.86 2.56 734.0 0.309
Objectiveness 2.17 2.08 672.0 0.783
Trustworthiness 2.58 2.14 800.5 0.071
Balance 2.47 1.72 921.0 0.002
Readability 2.89 2.11 896.0 0.004
Completion Times
• Tablet
26.5 minutes for Amazon
25.8 minutes for Twitter
• Food Truck
19.9 minutes for Yelp
16.8 minutes for Twitter
Explanation of Results
• Few concrete examples of
experiences in Twitter
answers
• Limited information about
Twitter reviewers

Conclusions
• Response rates independent of
domain seem to be around 40-
45%
• Providing an explanation or
incentive does not seem to affect
response
• Answer quality is fairly high at
70-80%
• Quality seems to be tied to
targeting accuracy
• Most “bad” answers come from
people who didn’t know the answer
to our question

The Targeting Challenge
• Finding relevant people
• Identifying the most likely responders

Filtering for Relevance
Chen, et al. to appear at ICWSM 2013

Problem
It’s difficult to identify relevant tweets from keywords alone
underspecified overspecified
Bridging the gap with regular expressions and rules can take hours or days of
authoring by a human expert

Use Crowd to Create Intelligent Filters
1. Collect sample of
relevant tweets
(keyword filter)
2. Collect ground truth filter
results from crowd on
Mechanical Turk
3. Machine learn a filter models
using SPSS Modeler
4. Use models to filter
tweets in real-time
Filter
Model
5. Social media dashboard
users can react faster and
more accurately
Filter
Model
Each filter requires
a few hours and
~$35 to create

Evaluation
Scenarios
• Customer service for Delta Airlines & Hertz Rent-a-car
• Relevance filter + Opinion filter
Evaluation Questions
• Quality of Crowd-Labeled Ground Truth
• Effectiveness of Filter Algorithms
• Usefulness by Users in Filtering Tasks

Evaluation Results
Label Agreement
(Pair-wise Cohen’s kappa)
Performance

Likelihood of Response
Mahmud, et al. IUI 2013

Baseline: Human Judgement
How well can humans identify “willingness” and “readiness”?
Two surveys on CrowdFlower:
• Willingness: Asked each participant to predict if a displayed Twitter
user would be willing to respond to a given question, assuming that
the user has the ability to answer
• Readiness: Asked each participant to predict how soon (e.g. 1 hour, 1
day) the person would respond assuming that s/he is willing to
respond
100 participants for the first survey and 50 for the second

Willingness Result
• 29% correct when only tweets of a user was displayed
• 38% correct when complete twitter profile was displayed.
• Selecting users for question asking is also difficult for the crowd

Readiness Result
58% Correct
• Compared with the ground truth
• For example, if a participant predicted that person X will respond
within an hour, but the response was not received in time, the
prediction is then incorrect

Features for Machine Selection
• Responsiveness
• E.g., mean response time to other users’ mention
• Profile
• E.g., use particular words in profile description
• Activity
• E.g., number of tweets
• Readiness
• E.g., percentage of tweets occurring at each hour of the day
• Personality

Openness
Conscientiousness
Extraversion
Agreeableness
Neuroticism
[Tausczik&Pennebaker 2010, Yarkoni 2010]
Map the use of words, frequency, &
correlation with Big5 based on
psycholinguistic dictionaries (LIWC++)
“Agreeableness”
wonderful (0.28), together (0.26) …
porn (-0.25), cost (-0.23)
Deriving Personality from Social Media

Feature Analysis
Significant Features
• For TSA-tracker-1 dataset, we found 42 significant features (FDR was 2.8%).
• For Product dataset, we found 31 features as significant (FDR 4.2%)
• For TSA-tracker-2 dataset, we found 11 significant features (FDR 11.2%)
Top-4 Discriminative Features
- Top-4 features were found using
extensive experiments.

Evaluation
Evaluating Prediction Model
TSA-tracker-1 TSA-tracker-2 Product
SVM Logistic SVM Logistic SVM Logistic
Precision 0.62 0.60 0.52 0.51 0.67 0.654
Recall 0.63 0.61 0.53 0.55 0.71 0.62
F1 0.625 0.606 0.525 0.53 0.689 0.625
AUC 0.657 0.599 0.592 0.514 0.716 0.55
Comparison of Average Response Rates using Different Approaches
TSA-tracker-1 TSA-tracker-2 Product
Baseline 42% 33% 31%
Binary-classification 62% 52% 67%
Top-K-Selection 61% 54% 67%
Our Algorithm 67% 56% 69%

Live Experiment
Method
• Used Twitter’s Search API and a set of rules to find 500
users who mentioned airport and 500 for product
• Randomly asked 100 users for the security wait time
• Used our algorithm to identify 100 users for questioning from the
remaining 400 users

Engagement Continuum
• Scenario-based filtering
• Smart engagement recommendations
(e.g., based on location inference)
• Customizable engagement scenarios
• Domain-specific analytics
manual assisted automatic
System U
Humans do all the work Analytics streamline decisions:
“press button to engage”
System-driven engagement
Send this:
Send
• Rule-based engagement
• Exception identification
and notification
• Intelligent transition to
human-driven engagement
as desired
• Keyword filtering
• Unstructured engagement
• Domain-independent analytics
47

To wrap up…
• Interaction on social media enables a variety of
applications
• Collecting information using this approach is
feasible and produces quality information
• Targeting can be improved flexibly through
crowd-assisted filtering
• Likely responders can be identified from their
social media content

Thanks!
For more information, contact:
Jeffrey Nichols
jwnichols@us.ibm.com

Samsung Galaxy Tab 10.1
Questions
• 2 iterations
• First round Qs based on CNET and
Engadget editor reviews
• Second round modified based on
top 10 user reviews of tablets on
Amazon.com
Procedure
• Identified users from real-time
twitter stream
• Keywords and then manual human
inspection
• Questions chosen semi-randomly
based on content of tweet,
answers received so far
Round #2 Questions

Samsung Galaxy Tab 10.1 Example

Los Angeles Food Trucks
Questions
• Based on our own intuitions of
what information would be
interesting
Procedure
• Identified users from real-time
twitter stream
• @handles for food trucks and then
manual human inspection
• Asked questions for 90 active LA
food trucks at time of study
• Most traffic was concentrated for
just three (Kogi Taco, Grilled
Cheese, and GrillEmAll), and we
report results only for those

Los Angeles Food Trucks Example

Example:
Real-time Viewer Insight
Real-time collection of relevant users
Historical Social Media
Comprehensive User Profile
Rule-based Facts
Deep Traits from Pyscholinguistic
Analysis
Lives in Chicago, IL
Loves Deception on NBC
Directed Engagement to Learn More
Collect opinion about a new
show
Market new product
Etc.

Engaging with Users on Public Social Media

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (20)

Similar to Engaging with Users on Public Social Media

Similar to Engaging with Users on Public Social Media (20)

Recently uploaded

Recently uploaded (20)

Engaging with Users on Public Social Media

Editor's Notes