Some elementary principles and procedures for Facebook data-mining. Combination of Graph API and OpenRefine software for parsing the JSON output. Two beer brands are analyze with respect to their active fans and engagement.
The second part is dedicated to the Interest positioning (as pioneered by PerfectCrowd) technique and what can OutWit Hub do as a substitute for more sophisticated techniques & apps.
1. Pleasures of basic
Facebook data
shoveling
Jan Fait
STEM/MARK
Guest Lecture at Charles University,
Prague, 4.12.2013
2. Today we are going to talk about :
1. Why
A tiny philosophical
corner
2. How
No programming, just copy
pasting
3. Why would I even try to mine FB data
myself?
The Boring part
The Fun part
Why are we doing
this?
What‘s in it for you?
What are other ways
to do this?
How is it done?
4. What is a facebook like worth for your
business?
5. Here‘s why. Sample questions:
In what ways are my fans like my other customers?
What do I actually know about my fans and followers on top of
their age?
Can I group my followers into segments?
Can I target my followers based on what they (are) like ?
Which ones are creating the most activity?
What on earth are all the other ones doing?
How similar/different is my competitors fanbase?
6. Built-in insights are fine for fanpage
managers, but not for research
Who could have
guessed..
7. Limitations of FB research?
External validity
Research in social media tells you little about life outside social
media
Facebook self vs. Real self
Sampling
Only some profiles are public > Is there enough data to make
claims about my fanbase?
Organic environment
Network engineers keep changing stuff so you are in constant
need of adjustment
8. OK, but there are other ways..
Bambillion !
Always posted by a lady in her 40s
9. Indeed, there are ways:
Ask professionals and pay them accordingly
(see below)
Setup a social media login or create an app
(a rather good
investment)
Use ready-made tools and solutions
(and pay for the useful ones)
DO IT YOURSELF – PARTISAN STYLE
13. Obstacles ahead
Facebook developers are smart so the road is a
bit thorny
Good tools are usually not free
Open source tools are usually not as good
Its mostly fine legally
14. … but I am not a
technical type.
a) Find someone who is
b) Break it down into little
steps
c) Your chance to stand
out
15. Tools to use
(where facebook meets google and google meets microsoft)
Facebook‘s own Graph API
https://developers.facebook.com/tools/explorer
OpenRefine
http://openrefine.org/download.html
Engineered at Google Inc., formerly named Google Refine
MS Excel / iOS Numbers
Programs > MS Office / ??
17. Subjects to examine
(pick any fanpage or group or event)
https://www.facebook.com/PilsnerUrquellCzech
18. Stand-off
Brand
More expensive,
high-end beer
Widely and wildly
consumed cheaper
beer
Quality, tradition,
national
heritage,craftmanship
Fun, shared
moments, soccer
Number of fans
204 734
47 566
Number of posts in
2013
415
425
Product
Image
Not really competitors,have the same mothership !
19. Hypothesis time
H1 : Their active fanbase consists of a less 10% of the total
fans
H2 : There is more than 10% overlap in their active fanbase
H3 : Gambrinus and Pilsner Urquell have the same
engagement per post
H4 :The interest positioning will show a small affinity as beer
is widely appreaciate across the population
21. Step 1 - Do not fear the Graph API
https://developers.facebook.com
22. Step 1 - Do not fear the Graph API
https://developers.facebook.com/tools/
23. Step 1 - Do not fear the Graph API
Access_token !
Result window
Fields selector
https://developers.facebook.com/tools/explorer
24. Step 1 – Facebook is nothing but a couple
big tables
https://developers.facebook.co
m/docs/reference/fql
25. Step 1 – The JSON result format
(JavaScript object notation)
Graph API gives you a
result in JSON Format.
Visually disturbing
yet convenient format
used in web applications.
Wait and see how
OpenRefine handles it..
No, not this Json
26. Step 2 – Making a simple Graph API query
Get the id of the fanpage - many ways to do it, f.e :
1) Click on a page profile pic
2) Look in the address bar and cut the last number before
„type“
146991996743
27. Step 2 – Making a simple Graph API query
1) Get a fresh access_token
Important, otherwise you
will only get a handful
2) And get data from your own timeline
123455687/posts?post_id&limit=50
28. Step 2 – Making a more complex query
1) Repeat with our Gambrinus.cz fanpage
2) And add some more fields – query likes and comments,
increase limit, reduce timespan with a unix timestamp (135..)
146991996743/posts?fields=likes,comments
&limit=20000&since=1356998400 (from 1.1.2013)
29. Step 3 – Build a string to post the same
query in browser address bar
A) URL :
https://graph.facebook.com/
B) query :
146991996743/posts?fields=likes,comments&limit=20000&since=13
56998400
C) Access token :
&access_token=XXXXXXXXX……and so on
Put together A+B+C :
https://graph.facebook.com/146991996743/posts?fields=likes,comm
ents&limit=20000&since=1356998400&access_token=XXXXX
30. Step 4 – Run OpenRefine
1) Run the programme
(it opens in your browser)
2) Select Web Addresses
31. Step 5 – Paste your address into the field
1) Take our query
https://graph.facebook.com/146991996743/posts?fields=likes,comments
&limit=20000&since=1356998400&access_token=XXXXXXX
2) Paste here
3) Click next
32. Step 6 – Transform your result
1) Tell the programme that
your result is JSON by
clicking on „JSON Files“
33. Step 7 – Pick an individual node !
This is one „like“ on a post made by user Maggu Ka
34. Step 7 – Behold !
Click on „Create Project“ in the upper left and download data
in Excel Sheet
Be sure this does
not exceed your
„limit“ in the query,
otherwise increase
the limit
35. Back to Step 3 !
The only thing you need to change is the id – instead of
Gambrinus, now try the Pilsner Urquell id
Don‘t remember?
https://www.youtube.com/watch?v=vUxdB-nl0Bw
36. Analysis
Note : The metrics chosen could
be re- designed to reflect other
stuff like time and location
(sort of)
37. Engagement, like .. ehm,kiwi.. has layers
Skin : All fans
Core : Fans
who interact
regularly
Inside :
Number fans
who interact
Sample question : Has my post attracted anyone outside the usual
bunch of followers who simply like everything?
38. Make crude metrics of those layers
Skin : All fans = 100%
Fans with more
than 1
interaction /
All fans = 2%
Unique Ids
within
ineractions /
All fans = 7%
Tip : By messing around with the column named created_time you can
see how your core fanbase has been losing and gaining interest in your
posts and whether it kept ineracting = compute a lifetime of a fan
39. Try it with real Gambrinus fanpage data
47 566 = 100%
575 interactors
with more than
1 action =
1.2% (28% of
all active fans)
2004 unique
interactors =
4.2%
Tip : What are these ratios among competitors ? Isn‘t that more
important than the widely cited number of fans?? Are any of your fans
also in the competitors core fanbase? Uhh, you nasty weasels !
40. And now the Pilsner Urquell
204 734= 100%
715 interactors
with more than
1 action =
0.03% (30% of
all active fans)
2358 unique
interactors =
1%
Tip : What are these ratios among competitors ? Isn‘t that more
important than the widely cited number of fans?? Are any of your fans
also in the competitors core fanbase? Uhh, you nasty weasels !
41. Stand-off revisited. H1 rejected and H2
confirmed
Brand
Number of fans
204 734
47 566
Number of posts in
2013
415
425
Number of active
fans in 2013
2358 / 1.1%
2004 / 4.2%
Number of
repeated
interactions
715 / 30% of active
575 / 28% of active
Fanbase overlap
5% of active
Variations : Share of all interactions created by the TOP 10% fans..
42. How to compute average engagement?
1) You may want to try to query the „insights“ table, but
mostly no success for pages other than yours
2) Else you need all the posts with likes,comments (and
shares) already aggregated
https://graph.facebook.com/fql?q=select post_id,
like_info,comment_info,share_info from stream where
source_id=146991996743 and created_time>1356998400 and
actor_id=146991996743 LIMIT 20000&access_token=XXXXX
3) Paste this query to OpenRefine like previously and work
with Excel sheet from there
Tip : Limit the type by adding type in(46,80,128,247) to the where clause so you don‘t get posts like „group created“
43. Stand-off again. H3 rejected
Brand
Average
engagement
248
74
Median
Engagement
144
29
10% Top trimmed
average
169 / diff of 79
44 / diff of 30
This may look surprising, especially considering the active fanbase is
more or less equal. Seems like the total fanbase does play a role.
Tip : For more precise information, you may want to exclude the top 5% fans to see how much it changes
44. Study competitor‘s top posts
https://www.facebook.com/Pils
nerUrquellCzech/posts/101513
04524945974
https://www.facebook.com/Gam
brinus.cz/posts/1015158166423
1744
Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
45. Some conclusions
Followers have a lifespan, some are
zombies, some have left Facebook
Large group of active followers is superior
to having large zombie fanbase =>
Facebook edge rank has buried your posts
for those guys anyway.
You can make up metrics once you have
the data > sometimes better to have the
data first
The Graph API returns errors all the time,
so don‘t be discouraged..
46. Step 4 –
• Sum it up
The dogdy part :
Know more
about the fans
47. The fans are well described by their
favorites, likes, interests, ...
48. Facebook ids of fans + Web Scraper
You have facebook id of someone
=> you can visit her profile
You have a web scraper (like
OpenRefine) => you can visit all
the profiles without actually
browsing throught them
.. And download whatever the
browser sees..
It is against the Facebook
policies to scrape profile pages
en-masse, but its „ok“ as a
training excercise.
Pete Warden scraped 200
000 000 FB profiles and they
let the lawyers off the leash
http://www.facebook.com/apps/site_scraping_tos_ter
ms.php
49. Step 2 – Preparing data for Outwit Hub
OutWit Hub is a free intelligent
scraper (limited amounts of data)
Prepare the links of Pilsner fans is a
notepad file like below and File=>
Open the txt. File in Outwit Hub
http://download.cnet.
com/OutWitHub/3000-11745_410846181.html
50. Step 3 – Creating a scraper in Outwit Hub
Prepare a scraper
1)
2)
3)
4)
Go to the „scrapers“ tab
Click new
Name the scraper somehow
Do the rest as below
Get everything
starting with -- and ending
with
51. Step 4 – Running the scraper on a couple of links
52. Step 5 – Calculate Affinity
Count occurences of individual fanpages in the results and
compare them to the occurence in the total czech facebook
population of 3 770 000
1)
2)
3)
4)
5)
Natural affinity = Total fans of the page / 3 770 000
Pilsner affinity = Occurences in results / Fans of Pilsner
Affinity ratio = Get the ratio of the two
Repeat for all fanpages
Bring up those where occurence is the largest
Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
54. Step 6 – Troubleshooting
a) Go to Preferences > Time Settings and make sure none of
the sliders is „in the red“. That would result in frequent
CAPTCHA checks on most protected servers..
b) Make sure your scraper is targeting the right domain
c) Make sure your „Marker Before“ and „Marker After“ are
actually present on the page..
d) It is becoming easier to programm an app than try to
scrape a meaningful amount of data
55. Thank you. Now to your questions.
fait@stemmark.cz
www.stemmark.cz
Credits for affinity idea :
Work by Jan Schmid & Josef Šlerka
Images :
Photopin.com
56. Download all materials at :
www.stemmark.cz/downloads/educ/fb_mining.zip
By the way, Mark Zuckerberg likes Pilsner
Urquell.