Part 2 of the "Making Sense of Twitter: Quantitative Analysis Using Twapperkeeper and Other Tools" workshop, presented at the Communities & Technologies 2011 conference, Brisbane, 29 June 2011.
internship ppt on smartinternz platform as salesforce developer
Mapping Online Publics (Part 2)
1. Mapping Online Publics Axel Bruns / Jean BurgessARC Centre of Excellence for Creative Industries and Innovation, Queensland University of Technology a.bruns@qut.edu.au – @snurb_dot_info / je.burgess@qut.edu.au – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/
2. Gathering Data Keyword / #hashtag archives Twapperkeeper.com No longer fully functional yourTwapperkeeper Open source solution Runs on your own server Use our modifications to be able to export CSV / TSV Uses Twitter streaming API to track keywords Including #hashtags, @mentions
4. Processing Data Gawk: Command-line tool for processing CSV / TSV data Can use ready-made scripts for complex processing Vol. 1 of our scripts collection now online at MOP Regular expressions (regex): Key tool for working with Gawk Powerful way of expressing search patterns E.g.: @[A-Za-z0-9_]+ = any @username See online regex primers...
5. # atextractfromtoonly.awk - Extract @replies for network visualisation # # this script takes a Twapperkeeper CSV/TSV archive of tweets, and reworks it into simple network data for visualisation # the output format for this script is always CSV, to enable import into Gephi and other visualisation tols # # expected data format: # text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type,geo_coordinates_0,geo_coordinates_1,created_at,time # # output format: # from,to # # the script extracts @replies from tweets, and creates duplicates where multiple @replies are # present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in # @user,@one and @user,@two # # Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au BEGIN { print "from,to" } /@([A-Za-z0-9_]+)/ { a=0 do { match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray) a=a+atArray[1, "start"]+atArray[1, "length"] if (atArray[1] != 0) print tolower($3) "," tolower(atArray[1]) } while(atArray[1, "start"] != 0) }
6. Running Gawk Scripts Gawk command line execution: Open terminal window Run command: #> gawk -F -f scriptsxplodetime.awk input.tsv >output.tsv Arguments: -F = field separator is a TAB (otherwise -F ,) -f scriptsxplodetime.awk = run the explodetime.awk script(adjust scripts path as required)
7. Basic #hashtag data: most active users Pivot table in Excel – ‘from_user’ against ‘count of text’
8. Identifying Time-Based Patterns #> gawk -F -f scriptsxplodetime.awk input.tsv >output.tsv Output: Additional time data: Original format + year,month,day,hour,minute Uses: Time series per year, month, day, hour, minute
9. Basic #hashtag data: activity over time Pivot table – ‘day’ against ‘count of text’
12. Basic @reply Network Visualisation Gephi: Open source network visualisation tool – Gephi.org Frequently updated, growing number of plugins Load CSV into Gephi Run ‘Average Degree’ network metric Filter for minimum degree / indegree / outdegree Adjust node size and node colour settings: E.g. colour = outdegree, size = indegree Run network visualisation: E.g. ForceAtlas – play with settings as appropriate
14. Tracking Themes (and More) over Time #> gawk -F -f multifilter.awk search="term1,term2,..." input.tsv >output.tsv term examples: (julia|gillard),(tony|abbott) .?,@[A-Za-z0-9_]+,RT @[A-Za-z0-9_]+,http Output: Basic network information: Original format + term1 match, term2 match, ... Uses: Use on output from explodetime.awk Graph occurrences of terms per time period (hour, day, ...)
15. Tracking Themes over Time Pivot table – ‘day’ against keyword bundles, normalised to 100%
16. Dynamic @reply Network Visualisation Multi-step process: Make sure tweets are in ascending chronological order Use timeframe.awk to select period to visualise: #> gawk -F , -f timeframe.awk start="2011 01 01 00 00 00" end="2011 01 01 23 59 59" tweets.csv >tweets-1Jan.csv start / end = start and end of period to select (YYYY MM DD HH MM SS) Use preparegexfatreplyintervals.awk to prepare data: #> gawk -F , -f preparegexfattimeintervals.awk tweets-1Jan.csv >tweets-1Jan-prep.csv Use gexfattimeintervals.awk to convert to Gephi GEXF format: #> gawk -F , -f gexfattimeintervals.awk decaytime="1800" tweets-1Jan-prep.csv >tweets-1Jan.gexf decaytime = time in seconds that an @reply remains ‘active’, once made This may take some time...