SlideShare a Scribd company logo
1 of 44
Download to read offline
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

#bigdata in Communication Science
Some examples from research
by me and my students
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam

October 2013
#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

1 What’s big data?
2 Some examples

Rare events
Tone in tweets
Counting words and n-grams
Network analysis
3 Problems
4 A glimpse in the kitchen
5 Questions?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What’s big data?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What’s big data?

No definition, but . . .
• Existing data
• Too big to code manually
• Sometimes also too big to handle with normal tools
• New research questions
• Call to revisit the relationship between theory and empirical

research

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What’s big data?

Some sources
• Social Network Sites
• RSS-feeds
• Databases
• Scraping text from the web
• ...

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

It’s out there!
You only have to collect it.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Some examples

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

A recent master thesis

Rare events

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

A recent master thesis

Rare events
Imagine you want to analyze some very rare content.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

A recent master thesis

Rare events
Imagine you want to analyze some very rare content.
Normal sampling won’t work, that’s for sure.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites
We collected all articles from nine news sites during a period of
two months, resulting in a database with 74.000 articles.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites
We collected all articles from nine news sites during a period of
two months, resulting in a database with 74.000 articles.
In a second step, we filtered those articles containing specific
keywords. Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

It’s just one line of code!

url.txt
http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung
http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten
http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik
http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food
...
...
...

#bigdata

wget-commando
wget -i urls.txt

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents
We took lists with positive and negative words and with a
politician’s opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents
We took lists with positive and negative words and with a
politician’s opponents.
We used a Python-script to check which type of words were used
to refer to opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents
We took lists with positive and negative words and with a
politician’s opponents.
We used a Python-script to check which type of words were used
to refer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams
Imagine you want to know which words or expressions dominate a
discourse .

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams
Imagine you want to know which words or expressions dominate a
discourse .
There are plenty of possibilities to get an answer within minutes,
here’s one:

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

Again, just one or two lines of code!

For example with STATA
• Install the package wordscore (net install

http://www.tcd.ie/Political_Science/wordscores/wordscores)
• voor wordcounts: wordfreq /home/dami/texts/lab92.txt

/home/dami/texts/lab97.txt
• voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txt

lab97.txt

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

trigrams in Obama-Tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

Another approach

Network analysis

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

Another approach

Network analysis
Imagine you want to know who talks to whom and how networks
are interconnected .

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

Another approach

Network analysis
Imagine you want to know who talks to whom and how networks
are interconnected .
Use a tool like NodeXL or Gephi!

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API

(Twitter)

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API

(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user

(e.g., allmytweets.net), but if you want to capture a
#hashtag, you have to record it live

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API

(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user

(e.g., allmytweets.net), but if you want to capture a
#hashtag, you have to record it live
• Twitter doesn’t give you all tweets, but just about 1% (+ a

bunch of other limits)

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems

Not sure if this a problem or a great opportunity. . .
You cannot rely (only) on ready-made software but shout get ready
to use tools like bash-scripts, grep, python, . . . (Which can be fun!)

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

A glimpse in the kitchen

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What I’m doing right now

Analyzing #tvduell
• 570.000 tweets
• Identifyig clusters of nouns, verbs and adjectives
• Assigning positivity and negativity scores to tweets
• See if they can be interpreted as frames

⇒How are Merkel and Steinbrück framed on the Second Secreen
during the debate?

#bigdata

Damian Trilling
What’s big data?

#bigdata

Some examples

Problems

A glimpse in the kitchen

Questions?

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Something you can use?
1 What’s big data?
2 Some examples

Rare events
Tone in tweets
Counting words and n-grams
Network analysis
3 Problems
4 A glimpse in the kitchen
5 Questions?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Vragen of opmerkingen?

Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
#bigdata

Damian Trilling

More Related Content

What's hot (7)

BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6
 
What do you do with 280 million tweets from the 2016 U.S. election?
What do you do with 280 million tweets from the 2016 U.S. election?What do you do with 280 million tweets from the 2016 U.S. election?
What do you do with 280 million tweets from the 2016 U.S. election?
 
Guest lecture at Coding Culture, Utrecht
Guest lecture at Coding Culture, UtrechtGuest lecture at Coding Culture, Utrecht
Guest lecture at Coding Culture, Utrecht
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4
 
MKWI 2018 - Discussing the Value of Hate Speech Detection
MKWI 2018 - Discussing the Value of Hate Speech DetectionMKWI 2018 - Discussing the Value of Hate Speech Detection
MKWI 2018 - Discussing the Value of Hate Speech Detection
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2
 

Similar to Understanding Big Data with Examples

Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressMarcel Blattner, PhD
 
Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014JinYeong Bak
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016steveparkhurst
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsCitizens in the Making
 
The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...Thomas Lancaster
 
Science to Data Science: PhDs and postdocs moving to startups and industry (2...
Science to Data Science: PhDs and postdocs moving to startups and industry (2...Science to Data Science: PhDs and postdocs moving to startups and industry (2...
Science to Data Science: PhDs and postdocs moving to startups and industry (2...AI Guild
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizingKrist Wongsuphasawat
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviewsJay (Jianqiang) Wang
 

Similar to Understanding Big Data with Examples (20)

BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
 
BD-ACA week1b
BD-ACA week1bBD-ACA week1b
BD-ACA week1b
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Twitter dissertation questions
Twitter dissertation questionsTwitter dissertation questions
Twitter dissertation questions
 
BDACA1516s2 - Lecture4
 BDACA1516s2 - Lecture4 BDACA1516s2 - Lecture4
BDACA1516s2 - Lecture4
 
Bigdata
BigdataBigdata
Bigdata
 
Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016
 
BD-ACA week4a
BD-ACA week4aBD-ACA week4a
BD-ACA week4a
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
 
Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social Good
 
The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...
 
Unpacking Digital Methods
Unpacking Digital MethodsUnpacking Digital Methods
Unpacking Digital Methods
 
Science to Data Science: PhDs and postdocs moving to startups and industry (2...
Science to Data Science: PhDs and postdocs moving to startups and industry (2...Science to Data Science: PhDs and postdocs moving to startups and industry (2...
Science to Data Science: PhDs and postdocs moving to startups and industry (2...
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizing
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviews
 

More from Department of Communication Science, University of Amsterdam

More from Department of Communication Science, University of Amsterdam (19)

BDACA - Lecture8
BDACA - Lecture8BDACA - Lecture8
BDACA - Lecture8
 
BDACA - Lecture6
BDACA - Lecture6BDACA - Lecture6
BDACA - Lecture6
 
BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Tutorial5
 
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture5
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture3
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
 
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 
BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6
 
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
 
BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Understanding Big Data with Examples

  • 1. What’s big data? Some examples Problems A glimpse in the kitchen Questions? #bigdata in Communication Science Some examples from research by me and my students Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam October 2013 #bigdata Damian Trilling
  • 2. What’s big data? Some examples Problems A glimpse in the kitchen Questions? 1 What’s big data? 2 Some examples Rare events Tone in tweets Counting words and n-grams Network analysis 3 Problems 4 A glimpse in the kitchen 5 Questions? #bigdata Damian Trilling
  • 3. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? #bigdata Damian Trilling
  • 4. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? No definition, but . . . • Existing data • Too big to code manually • Sometimes also too big to handle with normal tools • New research questions • Call to revisit the relationship between theory and empirical research #bigdata Damian Trilling
  • 5. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? Some sources • Social Network Sites • RSS-feeds • Databases • Scraping text from the web • ... #bigdata Damian Trilling
  • 6. What’s big data? Some examples Problems A glimpse in the kitchen Questions? It’s out there! You only have to collect it. #bigdata Damian Trilling
  • 7. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Some examples #bigdata Damian Trilling
  • 8. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events #bigdata Damian Trilling
  • 9. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events Imagine you want to analyze some very rare content. #bigdata Damian Trilling
  • 10. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events Imagine you want to analyze some very rare content. Normal sampling won’t work, that’s for sure. #bigdata Damian Trilling
  • 11. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 12. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites We collected all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 13. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites We collected all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. In a second step, we filtered those articles containing specific keywords. Those 292 articles where then manually coded. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 14. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events #bigdata Damian Trilling
  • 15. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events It’s just one line of code! url.txt http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food ... ... ... #bigdata wget-commando wget -i urls.txt Damian Trilling
  • 16. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets #bigdata Damian Trilling
  • 17. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. #bigdata Damian Trilling
  • 18. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. Do you really want to go through thousands of tweets by hand? #bigdata Damian Trilling
  • 19. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 20. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 21. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. We used a Python-script to check which type of words were used to refer to opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 22. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. We used a Python-script to check which type of words were used to refer to opponents. For further analysis, the results where imported in SPSS. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 23. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets #bigdata Damian Trilling
  • 24. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets #bigdata Damian Trilling
  • 25. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams #bigdata Damian Trilling
  • 26. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams Imagine you want to know which words or expressions dominate a discourse . #bigdata Damian Trilling
  • 27. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams Imagine you want to know which words or expressions dominate a discourse . There are plenty of possibilities to get an answer within minutes, here’s one: #bigdata Damian Trilling
  • 28. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams Again, just one or two lines of code! For example with STATA • Install the package wordscore (net install http://www.tcd.ie/Political_Science/wordscores/wordscores) • voor wordcounts: wordfreq /home/dami/texts/lab92.txt /home/dami/texts/lab97.txt • voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txt lab97.txt #bigdata Damian Trilling
  • 29. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams trigrams in Obama-Tweets #bigdata Damian Trilling
  • 30. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis #bigdata Damian Trilling
  • 31. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis Imagine you want to know who talks to whom and how networks are interconnected . #bigdata Damian Trilling
  • 32. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis Imagine you want to know who talks to whom and how networks are interconnected . Use a tool like NodeXL or Gephi! #bigdata Damian Trilling
  • 33. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis #bigdata Damian Trilling
  • 34. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems #bigdata Damian Trilling
  • 35. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties #bigdata Damian Trilling
  • 36. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) #bigdata Damian Trilling
  • 37. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) • It’s rather easy to get (up to 3200) tweets from a specific user (e.g., allmytweets.net), but if you want to capture a #hashtag, you have to record it live #bigdata Damian Trilling
  • 38. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) • It’s rather easy to get (up to 3200) tweets from a specific user (e.g., allmytweets.net), but if you want to capture a #hashtag, you have to record it live • Twitter doesn’t give you all tweets, but just about 1% (+ a bunch of other limits) #bigdata Damian Trilling
  • 39. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems Not sure if this a problem or a great opportunity. . . You cannot rely (only) on ready-made software but shout get ready to use tools like bash-scripts, grep, python, . . . (Which can be fun!) #bigdata Damian Trilling
  • 40. What’s big data? Some examples Problems A glimpse in the kitchen Questions? A glimpse in the kitchen #bigdata Damian Trilling
  • 41. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What I’m doing right now Analyzing #tvduell • 570.000 tweets • Identifyig clusters of nouns, verbs and adjectives • Assigning positivity and negativity scores to tweets • See if they can be interpreted as frames ⇒How are Merkel and Steinbrück framed on the Second Secreen during the debate? #bigdata Damian Trilling
  • 42. What’s big data? #bigdata Some examples Problems A glimpse in the kitchen Questions? Damian Trilling
  • 43. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Something you can use? 1 What’s big data? 2 Some examples Rare events Tone in tweets Counting words and n-grams Network analysis 3 Problems 4 A glimpse in the kitchen 5 Questions? #bigdata Damian Trilling
  • 44. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling