How to create an interactive online report from Twitter data. Data collected with Socioviz, Network with Gephi. Report itself with MS Power Bi.
Workshop data is available online with request
1. TWITTER - SOCIOVIZ – GEPHI
– MS POWER BI WORKSHOP
WORKFLOW OF SOCIAL MEDIA ANALYSIS. FROM DATA COLLECTION
TO INTERACTIVE ANALYSIS
2. MISSION OF THE WORKSHOP
• Find different clusters in discussion
• By hashtags network
• By User mentions network
• Find opinion leaders in each of those clusters
• Allow users to browse to most shared content from these
opinion leaders
• Have a look at possible sentiment analysis workflow
3. WORKFLOW
Data
preparing
•We have a look at socioviz platform in order to understand the datasource
•Data is ready in a zip
•We create an excel file for import to PBI
•15 mins
Netwrork
analysos
•We do some network pictures
•Create csv exports for excel -> pbi
•45 mins
Powerbi
reporting
•Import the data and look how data model is created
•Create some pages and visuals
•45 mins
4. TOOLS
• Socioviz – is an online platform for twitter data collection. This
gives us different network files ready and all the tweets in a
give search as an excel file
• Gephi. Open source network analytiics tool. Gives us teh
visualization of the network and also metrics for the MS Power
BI
• MS Power BI is a platform for big data analysis. Basically one
can think of it as an interactive hybrid of Excel and powerpoint,
which situates in the ’cloud’
5. SOCIOVIZ DATA
SOURCE
• We get in a zip
• All the tweets in an excel97
format. -> concert to xlsx
• Top10 excels for some
attributes. This is ’nice-to-
know”. But we want to create
our own report.
• Network files, for mentions,
hashtags, words and emojis
we use hastags and mentions
networks
• Limitiation in the data. No
tweet level metrics present,
like RT, favorite etc. we can
however, count the amount
of certain media is shared
6. GEPHI TO DO
LIST
Create network images
of
• Mentions network
• Hashtags network
Export nodes data with
metrics for both of the
above
Identify the clusters, in
other words find the the
set of top nodes for
each clusters
7. SOCIOVIZ DATA PREPARING IN EXCEL
• Fix the date column bug
• Add _nodata to empty fields in attribute column cells. This
will be handy later in Power Query ofnthe BI
• Import the Gephi node csv’s to excel and prepare cluster
identifications, in other words we give name to modularity
values of hashtags and mentions networks. This will be handy
also in the MS PowerBI later on
• In the mentions data, create column attribute for node type.
Here party, president of the party, media, ngo etc. also used
later on in the PBI
• In the mentions data. Create ranking by metrics. In other
words, nodes position in top lists by different metrics.
8. MS POWER BI TO
DO LIST
• Import data
• Create datamodel
• Import some visuals
• Create a filtering page
• Create a tweets content page
with drillthrough
• Create a matrix with hashtags
clusters on x and user clusters on
y. As values we take the ID’s from
the tweets table -> look at content
• Create top items page by
clusters (users, hashtags, words?)
9. MS POWER BI
VISUALS TIP 1
Enable this feature in settings
It will make creating many pages
easier
All the visuals automatically filter
each other
10. POSSIBLE
OUTCOME
Is this powerBI report that is now
published online
There are 6 different pages.
You can go and have a look prior to
workshop