Capstone project presentation from Team 1, Fall 2014. Project utilized data analytics processes and tools to perform text and network analytics on a sample of twitter accounts and their publicly available data.
2. Problem & Hypothesis
Problem
In late August 2014, terrorist group the Islamic Sate of Iraq and Syria
(ISIS) increased their use of Twitter to further their organizational
agenda and degrade U.S. competencies and reputation
Hypothesis
Collecting and analyzing ISIS Twitter data can enhance
understanding of their networks and tactics, techniques, and
procedures (TTPs)
• ISIS Twitter activity correlates to U.S. military efforts to combat the
organization
• Network analysis can identify the most influential users and their
communities
3. Data Challenges
Twitter removed known ISIS accounts from the web in early
September prompting a shift in collection strategy
Decided to select accounts associated with self-identified jihadists,
leveraging the following criteria:
• English language
• “Active” – posted consistently in last 30 days
• “Popular” – 200+ followers
4. @AbDujanah
The coward dies a
thousand deaths
*Collection error,
unable to obtain data
@julaybeeeeb
A very poor servant of
Allah longing for his
mercy
Data Collection
@4bu_Muhaj1r
Islamic State
@AbooJihad2013 @AbuTalha001 @Dawlat_Islam2
Amongst the Islamists in
Sham
Account Suspended
@AbuHussain104
Random British Mujahid
Somewhere In The Islamic
State
Account Suspended
@FarisBritani
OUR REVOLUTION IS
LIKE THE SALAH
@jab2victory
Al-Nusra Fi Bilaad Shaam
@onthatpath3
Tweets 118
Followers 202
Friends 37
Location Baqiyaa
Tweets 459
Followers 424
Friends 605
Location dunya
Tweets 30
Followers 749
Friends 80
Location Dowlatul
Islam
Tweets 832
Followers 2166
Friends 484
Location Syria
Tweets 872
Followers 984
Friends 340
Location NA
Tweets 1794
Followers 359
Friends 84
Location NA
Tweets 4470
Followers 1559
Friends 252
Location NA
Tweets 2160
Followers 1503
Friends 58
Location Sham
Tweets 458
Followers 1221
Friends 51
Location IS
6. Data Architecture
Identify, collect and prepare data:
1. Identified 10 jihadist Twitter accounts
2. Pulsed Twitter API for account friend,
follower, & status data
3. Extracted key data fields
4. Stored in MongoDB
NoSQL Database
MongoDB
Twitter
10 User
Accounts
Twython
Tweepy
Data Wrangling
Extract
Metadata
Data Ingestion
Capture Profile Data
7. Analytic Approach
Network Analysis
Isolate Jihadist communities
and influencers
Tools
• NetworkX
• Gephi
Approaches
• Louvain Method
• Eigenvector Centrality
Text Analytics
Identify common and like
terms; correlate to events
Tools
• NLTK • Numpy
• Pandas • Scipy
• Sklearn • Genism
Approaches
• Frequency Analysis
• TFIDF Cosine Similarity Matrix
• Trend Analysis (Man-Kendall
algorithm)
• Pearson Correlation
12. 800
700
600
500
400
300
200
100
0
Retweet s
Top Retweeted Accounts of the Nine
Retweeted User
*Chart reflects 2700 tweets of 13080
collected
4bu_Muhaj1r AbooJihad2013 AbuHussain104 AbuTalha001 Dawlat_Islam2
FarisBritani jab2victory julaybeeeeb onthatpath3
Nine
Accounts
20. Conclusions
There are two distinct communities of users connected by
@AbuTalha001:
• Users following self-proclaimed British foreign fighters in Syria
@FarisBritani and @AbuHussain104
• Smaller overlapping clusters following information about Arhar
ash-Sham and al-Nusrah Front
Syrian Islamist news and translation account
@IbnNabih identified as most influential user
in collective network
21. Conclusions
No statistical correlation between U.S. airstrikes and the volume
of Twitter data
• One user -- @Dawat_Islam2 – has a positive correlation
between an upward trend in “key words” and airstrikes
Very few tactical or threatening words/statements
Two primary categories for user statuses:
• Religious statements, top 10 words for each user included
“muslim(s)” and “allah”
• General news and updates about Islamist fighting in Syria
22. Lessons Learned & Next Steps
We need more data; iterating collection from users in our
network would enhance results
Traditional methods for text analytics limited in effectiveness for
Twitter data
Several other opportunities for analysis:
• Studying the way users interact
• The posting of original content vs. retweets (removal of
media statements)
• Collecting and analyzing text data of each community