Matrix Factorization

Matrix Factorization:
Beyond Simple Collaborative Filtering
Yusuke Yamamoto
Lecturer, Faculty of Informatics
yusuke_yamamoto@acm.org
Data Engineering （Recommender Systems 3）
2019.11.11

1
2
Problems on
Simple Collaborative Filtering

User-based Collaborative Filtering
3
Predicts a target user’s rating for an item
based on rating tendency of similar users
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖 = 𝑟,-
+
∑,∈12
𝑠𝑖𝑚(𝑢), 𝑢) 7 (𝑟,,8 − 𝑟,-
)
∑,∈12
𝑠𝑖𝑚(𝑢), 𝑢)
Item5 sim Average Rating
Alice ? 1 4
User1 3 0.85 2.4
User2 5 0.71 3.8
Similar
users

Idea about Item-based Collaborative Filtering
4
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
similar
Predicts unknown scores
based on rating tendency for similar items
similar
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖: =
∑8∈;2
𝑠𝑖𝑚(𝑖:, 𝑖) 7 𝑟,-,8
∑8∈;2
𝑠𝑖𝑚(𝑖:, 𝑖)

Problems on CF approaches (1/3)
5Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html
Real data is quite sparse!!
Even on large e-commerce sites, there are few intersections
between user vectors (& item vectors).

6
The curse of dimensionality
• In high-dimensional space, it’s difficult to handle similarity
• Usually, item/user vectors have quite high dimensionality
(b/c rating matrix is quite large)

7
High computational cost
• The rating matrix is directly used every time systems find
similar user/items and make predictions
• CF approaches do not scale for most real world scenarios
User 1
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Similar users
for user 1
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Suggested
items
for user 1
Compute Compute
User 2
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Similar users
for user 2
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Suggested
items
for user 2
Compute Compute

Recent approaches for recommender systems
8
Model-based approach
- Based on offline pre-processing
- At run-time, only the learned model is used for rating prediction
- Learned models can be updated
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Learned
Model
Computation
offline
User
Suggested
Items for user
Computation
online

Memory-based approach vs. model-based approach
9
Memory-based
approach
- User-based CF
- Item-based CF
Model-based
approach
- Matrix factorization
- Association rule mining
- Probabilistic model
- Other ML techniques

User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
Basic idea 1 (1/2)
11
I don’t like horror…
Sci-Fis often move me.
I love humane and
dramatic movies!
User Horror Sci-Fi Humanity Drama …
Alice 0.3 2.1 6.1 4.7 …
Latent factors (which cannot be observed)
Assumes that latent factors exist in users/items

Basic idea 1 (2/2)
12
Assumes that latent factors exist in users/items
Image from Amazon.com
Latent factors (which cannot be observed)
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
God
father 2.3 0.4 4.9 5.7 …

Basic idea 2
13
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
Alice 0.3 2.1 6.1 4.7 …
Assumes that rating scores derive from
latent factors of users and items
God
father 2.3 0.4 4.9 5.7 …
×

Summary of matrix factorization
14
3 … 3
5
2 … 1
…
Rating matrix = R
(m users × n items)
≈
P
1.3 … 0.7
0.2 … 4.9
…
×
0.2 1.1 … 0.9
3.1 … 0.7 1
…
Latent user matrix
(m users × k latent factors)
Latent item matrix
(k latent factors x n items)
QR ≈
T
×
• Rating matrix can be decomposed to latent factors
of users and items
• The dimension of latent factors (vectors) is much less
than the number of users and items (k ≪ m, n)
□
□
□
(□ = unknown scores)
User
Item
Rating

Prediction using matrix factorization
15
3 … 3
5
2 … 1
…
Predicted
rating matrix = R*
=
P
1.3 … 0.7
0.2 … 4.9
…
×
0.2 1.1 … 0.9
3.1 … 0.7 1
…
Latent user matrix
Latent item matrix
QR =
T
×
If we obtain latent user/item matrix, we can predict
unknown scores by multiplying the two latent matrix
How to obtain latent user/item matrix?
＊
2
5
4
□
□
□

SVD for recommender systems
16
SVD: singular value decomposition
- A famous linear algebra technique for matrix decomposition
- It is often used for dimensionality reduction
- SVD delivers essentially the same result as PCA does
U ΣX =
T
× V×
m x n matrix
m x m
unitary matrix
n x n
unitary matrix
Rectangular diagonal matrix
(diagonal values are called as singular values)

Example of SVD
17
U ΣX =
T
× V×
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=

Important features of SVD (1/4)
18
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Largest singular values

19
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Ignore unimportant values!

13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.90
20
Σ2=
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
U2=
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
V2=

21
U2 Σ2
T
× V2
×
=
1.08 2.24 3.29 2.98 0.84
2.72 4.17 1.52 2.41 3.04
1.46 2.93 4.02 3.71 1.19
3.24 5.03 2.04 3.04 3.59
0.57 1.64 3.86 3.18 0.15
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
≈ = X

Apply SVD for recommender systems (1/2)
22
Item1 Item2 Item3 Item4 Item5
Alice 1 3 3 3 ?
User1 2 4 2 2 4
User2 1 3 3 5 1
User3 4 5 2 3 3
User4 1 1 5 2 1
Regarded as zero
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
= X
To matrix

Apply SVD for recommender systems (2/2)
23
U2 Σ2
T
V2
U ΣX =
T
× V×
Focus on
important features
× ×
Step 1
Step 2
Run SVD
Multiply three matrixStep 3
=
1.08 2.24 3.29 2.98 0.84
2.72 4.17 1.52 2.41 3.04
1.46 2.93 4.02 3.71 1.19
3.24 5.03 2.04 3.04 3.59
0.57 1.64 3.86 3.18 0.15
Check the values
which were zero
before running SVD
Step 4

Problems on SVD
24
Predicted values are often negative
- SVD does not take rating score range into account
Zero replacement decreases prediction quality
- SVD analyzes the relation between all data in matrix
- The meaning of “zero” is different from that of “unknown”

Bad example of SVD-based recommendation
25
Ghibli 1 Ghibli 2 Ghibli 3 Ghibli 4
User1 5
User2 3 4
User3 2 1
User4 5 4
User5 5
5 0 0 0
3 4 0 0
2 0 1 0
0 5 0 4
0 0 0 5
Example from: http://smrmkt.hatenablog.jp/entry/2014/08/23/211555
U2, Σ2, V2
RunSVD
Zeroreplacement
Multiplymatrix
3.53 1.88 0.16 -0.26
3.62 4.04 0.13 2.17
1.44 0.76 0.07 -0.12
2.76 5.41 0.06 4.34
1.67 5.97 0 5.74
-0.26 2.91 -0.06 3.52

Netflix Prize (2006-2010)
26Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html
Netflix held the open competition to advance collaborative
filtering algorithms and to seek the best algorithm.

Simon Funk’s Matrix Factorization (2006)
27
Without using SVD (with zero replacement),
the Simon’s method learns matrix P and Q
to compute observed values only in R
𝑚𝑖𝑛=,> ?
,,8 ∈@
𝑟,,8 − 𝒑, 𝒒8
C D
+ 𝜆( 𝒑,
D
+ 𝒒8
D
)
Target optimization function
musers
n items
Rating
matrix
R
≈ ×User
Item
P Q
u: user u; i: item i;
pu: u’s latent vector; qi: i’s latent vector

Various approaches have been developed …
28Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath
2003 2006-2009 2010 2013
Scalable models
Amazon’s item-based CF
Netflix Prize
The rise of matrix factorization
like Simon Funk’s method
Factorization machine
Generalized matrix factorization
for dealing with various factors
Deep Learning
・Deep Factorization machine
・Content2Vec to get content
embeddings

3
29
Challenges for
recommender systems

Remaining challenges
30
Cold start problem
How to recommend new items? What to recommend to new users?
Serendipity
- Only recommendation of favorable items cannot expand user interests
- How to recommend unexpected and surprising items?
Providing explanations
- Recommendation algorithms just decide what to recommend
- How can systems persuade users to purchase recommended items?
Reviewer’s trust
How to remove spam reviewers? How to find high-quality reviewers?

Remaining challenges
31
Cold start problem
How to recommend new items? What to recommend to new users?
Serendipity
- Only recommendation of favorable items cannot expand user interests
- How to recommend unexpected and surprising items?
Providing explanations
- Recommendation algorithms just decide what to recommend
- How can systems persuade users to purchase recommended items?
Reviewer’s trust
How to remove spam reviewers? How to find high-quality reviewers?

Cold start problem
32
Item1 Item2 Item3 Item4 item5
Kate
User1 3 1 2 3
User2 4 3 4 3
User3 3 3 1 5
User4 1 5 5 2
New user
New item
CF approaches don’t work for new items/users
New items/users have no clues to predict unknown scores b/c
the CF cannot find neighbor users/items

Possible approach to cold start problem
33
On-the-fly preference prediction
Systems ask/force users to rate a set of items to understand
their preference
Content-based recommendation
Finds similar items based on user interactions and content features
ExtVision: Augmentation of Visual Experiences with
Generation of Context Images for Peripheral Vision Using
Deep Neural Network
ABSTRACT
We propose a system, called ExtVision, to augment visual
experiences by generating and projecting context-images
onto the periphery of the television or computer screen. A
peripheral projection of the context-image is one of the most
effective techniques to enhance visual experiences. However,
the projection is not commonly used at present, because of
the difficulty in preparing the context-image. In this paper,
we propose a deep neural network-based method to generate
context-images for peripheral projection. A user study was
performed to investigate the manner in which the proposed
system augments traditional visual experiences. In addition,
we present applications and future prospects of the
developed system.
Author Keywords
Spatially augmented reality; immersion; augmented video;
large field of view; human vision; video extrapolation
ACM Classification Keywords
H.5.1. Multimedia Information Systems: Artificial, aug-
mented, and virtual realities
INTRODUCTION
Presentation of images to a viewer’s peripheral vision makes
the experience of watching a video more immersive and
exciting. In this paper, we describe how images are displayed
as context-images in peripheral areas to enhance visual
experiences. Illumiroom [10] projects context-images or
visual effects around the TV screen and enhances the
traditional gaming experience. ScreenX [20] and
BarcoEscape [21] provide sub-screens to display video on
both sides of the main screen, providing a more immersive
movie experience.
However, peripheral presentation systems are not very
popular at present, because it is difficult to prepare the
context- images. Researchers have attempted the boundary
value shooting method and computer vision processing to
address this, but a self-sufficient method still remains to be
developed.
In recent years, deep neural network (DNN)-based methods
have shown considerable success in image completion. The
images generated by DNN-based methods are very plausible
but somewhat inaccurate. Human peripheral vision is lower
in resolution than the central visual field and is therefore
weak in distinguishing visual accuracy; therefore, DNN-
based methods are suitable for generating context-images
that meet the requirements of peripheral vision.
In this study, we investigated context-image generation using
DNN. The contributions of this study are as follows:
1. Implementation of two new DNN-based methods to
generate context-images
Naoki Kimura
University of Tokyo
Tokyo, Japan
kimura-naoki@g.ecc.u-tokyo.ac.jp
Jun Rekimoto
University of Tokyo / Sony CSL
Tokyo, Japan
rekimoto@acm.org
Figure 1. ExtVision augments visual experiences. These images are examples of our application. The context-images are
generated and projected onto the peripheral area around the TV by our system.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CHI 2018, April 21–26, 2018, Montreal, QC, Canada
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5620-6/18/04…$15.00
https://doi.org/10.1145/3173574.3174001
CHI 2018 Honourable Mention CHI 2018, April 21–26, 2018, Montréal, QC, Canada
Paper 427 Page 1
Disinformation on the Web: Impact, Characteristics, and
Detection of Wikipedia Hoaxes
Srijan Kumar
∗
University of Maryland
srijan@cs.umd.edu
Robert West
Stanford University
west@cs.stanford.edu
Jure Leskovec
Stanford University
jure@cs.stanford.edu
ABSTRACT
Wikipedia is a major source of information for many people. How-
ever, false information on Wikipedia raises concerns about its cred-
ibility. One way in which false information may be presented on
Wikipedia is in the form of hoax articles, i.e., articles containing
fabricated facts about nonexistent entities or events. In this paper
we study false information on Wikipedia by focusing on the hoax
articles that have been created throughout its history. We make
several contributions. First, we assess the real-world impact of
hoax articles by measuring how long they survive before being de-
bunked, how many pageviews they receive, and how heavily they
are referred to by documents on the Web. We find that, while most
hoaxes are detected quickly and have little impact on Wikipedia,
a small number of hoaxes survive long and are well cited across
the Web. Second, we characterize the nature of successful hoaxes
by comparing them to legitimate articles and to failed hoaxes that
were discovered shortly after being created. We find characteristic
differences in terms of article structure and content, embeddedness
into the rest of Wikipedia, and features of the editor who created
the hoax. Third, we successfully apply our findings to address a
series of classification tasks, most notably to determine whether a
given article is a hoax. And finally, we describe and evaluate a task
involving humans distinguishing hoaxes from non-hoaxes. We find
that humans are not good at solving this task and that our automated
classifier outperforms them by a big margin.
1. INTRODUCTION
The Web is a space for all, where, in principle, everybody can
read, and everybody can publish and share, information. Thus,
knowledge can be transmitted at a speed and breadth unprecedented
in human history, which has had tremendous positive effects on the
lives of billions of people. But there is also a dark side to the un-
reigned proliferation of information over the Web: it has become a
breeding ground for false information [6, 7, 12, 15, 19, 43].
The reasons for communicating false information vary widely:
on the one extreme, misinformation is conveyed in the honest but
∗Research done partly during a visit at Stanford University.
Copyright is held by the International World Wide Web Conference Com-
mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the
author’s site if the Material is used in electronic media.
WWW 2016, April 11–15, 2016, Montréal, Québec, Canada.
ACM 978-1-4503-4143-1/16/04.
http://dx.doi.org/10.1145/2872427.2883085.
mistaken belief that the relayed incorrect facts are true; on the other
extreme, disinformation denotes false facts that are conceived in
order to deliberately deceive or betray an audience [11, 17]. A third
class of false information has been called bullshit, where the agent’s
primary purpose is not to mislead an audience into believing false
facts, but rather to “convey a certain impression of himself” [14].
All these types of false information are abundant on the Web, and
regardless of whether a fact is fabricated or misrepresented on pur-
pose or not, the effects it has on people’s lives may be detrimental
and even fatal, as in the case of medical lies [16, 20, 22, 30].
Hoaxes. This paper focuses on a specific kind of disinformation,
namely hoaxes. Wikipedia defines a hoax as “a deliberately fabri-
cated falsehood made to masquerade as truth.” The Oxford English
Dictionary adds another aspect by defining a hoax as “a humorous
or mischievous deception” (italics ours).
We study hoaxes in the context of Wikipedia, for which there are
two good reasons: first, anyone can insert information into Wiki-
pedia by creating and editing articles; and second, as the world’s
largest encyclopedia and one of the most visited sites on the Web,
Wikipedia is a major source of information for many people. In
other words: Wikipedia has the potential to both attract and spread
false information in general, and hoaxes in particular.
The impact of some Wikipedia hoaxes has been considerable,
and anecdotes are aplenty. The hoax article about a fake language
called “Balboa Creole French”, supposed to be spoken on Balboa
Island in California, is reported to have resulted in “people com-
ing to [... ] Balboa Island to study this imaginary language” [38].
Some hoaxes have made it into books, as in the case of the al-
leged (but fake) Aboriginal Australian god “Jar’Edo Wens”, who
inspired a character’s name in a science fiction book [10] and has
been listed as a real god in at least one nonfiction book [24], all
before it came to light in March 2015 that the article was a hoax.
Another hoax (“Bicholim conflict”) was so elaborate that it was of-
ficially awarded “good article” status and maintained it for half a
decade, before finally being debunked in 2012 [27].
The list of extreme cases could be continued, and the popular
press has covered such incidents widely. What is less available,
however, is a more general understanding of Wikipedia hoaxes that
goes beyond such cherry-picked examples.
Our contributions: impact, characteristics, and detection of
Wikipedia hoaxes. This paper takes a broad perspective by start-
ing from the set of all hoax articles ever created on Wikipedia and
illuminating them from several angles. We study over 20,000 hoax
articles, identified by the fact that they were explicitly flagged as
potential hoaxes by a Wikipedia editor at some point and deleted
after a discussion among editors who concluded that the article was
sciencemag.org SCIENCE
ILLUSTRATION:SÉBASTIENTHIBAULT
By David M. J. Lazer, Matthew A. Baum,
Yochai Benkler, Adam J. Berinsky, Kelly
M. Greenhill, Filippo Menczer, Miriam
J. Metzger, Brendan Nyhan, Gordon
Pennycook, David Rothschild, Michael
Schudson, Steven A. Sloman, Cass R.
Sunstein, Emily A. Thorson, Duncan J.
Watts, Jonathan L. Zittrain
T
he rise of fake news highlights the
erosion of long-standing institutional
bulwarks against misinformation in
the internet age. Concern over the
problem is global. However, much
remains unknown regarding the vul-
nerabilities of individuals, institutions, and
society to manipulations by malicious actors.
A new system of safeguards is needed. Below,
we discuss extant social and computer sci-
ence research regarding belief in fake news
and the mechanisms by which it spreads.
Fake news has a long history, but we focus
on unanswered scientific questions raised by
the proliferation of its most recent, politically
oriented incarnation. Beyond selected refer-
ences in the text, suggested further reading
can be found in the supplementary materials.
WHAT IS FAKE NEWS?
We define “fake news” to be fabricated in-
formation that mimics news media content
in form but not in organizational process or
intent. Fake-news outlets, in turn, lack the
news media’s editorial norms and processes
for ensuring the accuracy and credibility of
information. Fake news overlaps with other
information disorders, such as misinforma-
tion (false or misleading information) and
disinformation (false information that is pur-
posely spread to deceive people).
Fake news has primarily drawn recent at-
tention in a political context but it also has
been documented in information promul-
gated about topics such as vaccination, nu-
trition, and stock values. It is particularly
pernicious in that it is parasitic on standard
news outlets, simultaneously benefiting from
and undermining their credibility.
Some—notably First Draft and Facebook—
favor the term “false news” because of the
use of fake news as a political weapon (1).
We have retained it because of its value as a
scientific construct, and because its politi-
cal salience draws attention to an impor-
tant subject.
THE HISTORICAL SETTING
Journalistic norms of objectivity and bal-
ance arose as a backlash among journalists
against the widespread use of propaganda
in World War I (particularly their own role
in propagating it) and the rise of corporate
public relations in the 1920s. Local and na-
tional oligopolies created by the dominant
20th century technologies of information
distribution (print and broadcast) sustained
these norms. The internet has lowered the
cost of entry to new competitors—many of
which have rejected those norms—and un-
dermined the business models of traditional
news sources that had enjoyed high levels of
public trust and credibility. General trust in
the mass media collapsed to historic lows in
2016, especially on the political right, with
SOCIAL SCIENCE
The science of fake news
Addressing fake news requires a multidisciplinary effort
INSIGHTS
POLICY FORUM
The list of author affiliations is provided in the supplementary
materials. Email: d.lazer@northeastern.edu
1094 9 MARCH 2018 • VOL 359 ISSUE 6380
DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM
Published by AAAS
onMarch13,2018http://science.sciencemag.org/Downloadedfrom
Paperlist
Click to view !!
Textually similar
(on TF-IDF, D2V etc.)
sciencemag.org SCIENCE
ILLUSTRATION:SÉBASTIENTHIBAULT
By David M. J. Lazer, Matthew A. Baum,
Yochai Benkler, Adam J. Berinsky, Kelly
M. Greenhill, Filippo Menczer, Miriam
J. Metzger, Brendan Nyhan, Gordon
Pennycook, David Rothschild, Michael
Schudson, Steven A. Sloman, Cass R.
Sunstein, Emily A. Thorson, Duncan J.
Watts, Jonathan L. Zittrain
T
he rise of fake news highlights the
erosion of long-standing institutional
bulwarks against misinformation in
the internet age. Concern over the
problem is global. However, much
remains unknown regarding the vul-
nerabilities of individuals, institutions, and
society to manipulations by malicious actors.
A new system of safeguards is needed. Below,
we discuss extant social and computer sci-
ence research regarding belief in fake news
and the mechanisms by which it spreads.
Fake news has a long history, but we focus
on unanswered scientific questions raised by
the proliferation of its most recent, politically
oriented incarnation. Beyond selected refer-
ences in the text, suggested further reading
can be found in the supplementary materials.
WHAT IS FAKE NEWS?
We define “fake news” to be fabricated in-
formation that mimics news media content
in form but not in organizational process or
intent. Fake-news outlets, in turn, lack the
news media’s editorial norms and processes
for ensuring the accuracy and credibility of
information. Fake news overlaps with other
information disorders, such as misinforma-
tion (false or misleading information) and
disinformation (false information that is pur-
posely spread to deceive people).
Fake news has primarily drawn recent at-
tention in a political context but it also has
been documented in information promul-
gated about topics such as vaccination, nu-
trition, and stock values. It is particularly
pernicious in that it is parasitic on standard
news outlets, simultaneously benefiting from
and undermining their credibility.
Some—notably First Draft and Facebook—
favor the term “false news” because of the
use of fake news as a political weapon (1).
We have retained it because of its value as a
scientific construct, and because its politi-
cal salience draws attention to an impor-
tant subject.
THE HISTORICAL SETTING
Journalistic norms of objectivity and bal-
ance arose as a backlash among journalists
against the widespread use of propaganda
in World War I (particularly their own role
in propagating it) and the rise of corporate
public relations in the 1920s. Local and na-
tional oligopolies created by the dominant
20th century technologies of information
distribution (print and broadcast) sustained
these norms. The internet has lowered the
cost of entry to new competitors—many of
which have rejected those norms—and un-
dermined the business models of traditional
news sources that had enjoyed high levels of
public trust and credibility. General trust in
the mass media collapsed to historic lows in
2016, especially on the political right, with
SOCIAL SCIENCE
The science of fake news
Addressing fake news requires a multidisciplinary effort
INSIGHTS
POLICY FORUM
The list of author affiliations is provided in the supplementary
materials. Email: d.lazer@northeastern.edu
1094 9 MARCH 2018 • VOL 359 ISSUE 6380
DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM
Published by AAAS
onMarch13,2018http://science.sciencemag.org/Downloadedfrom
Similar items
are recommended

Reviewer Trust
34T. Wang, D. Wang. (2014) “Why Amazon’s Ratings Might Mislead You: The Story of Herding Effects”, Journal of Big Data, Vol.2, No.4, pp.196-204.
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
P−Value
CumulativeDis
0 50 100 150 200 250
2
3
4
5
Sequence Number of Rating
MeanRating
Books
Electronics
Movies & TV
Music −0
−0
−0
0
0
Pearson’sCorrelationCoefficient
0.05
C
FIG. 1. (A) Cumulative distribution of p-values of Augment Dickey–Fuller test
Fig: Average rating scores on Amazon.com
How to find trustworthy reviewers (rating experts)?
- People often give good scores to items
- Some reviewers intentionally give too high/low scores to items
(spammers)

Matrix Factorization

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Matrix Factorization

Semelhante a Matrix Factorization (20)

Mais de Yusuke Yamamoto

Mais de Yusuke Yamamoto (20)

Último

Último (20)

Matrix Factorization