SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Matrix Factorization:
Beyond Simple Collaborative Filtering
Yusuke Yamamoto
Lecturer, Faculty of Informatics
yusuke_yamamoto@acm.org
Data Engineering (Recommender Systems 3)
2019.11.11
1
2
Problems on
Simple Collaborative Filtering
User-based Collaborative Filtering
3
Predicts a target user’s rating for an item
based on rating tendency of similar users
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖 = 𝑟,-
+
∑,∈12
𝑠𝑖𝑚(𝑢), 𝑢) 7 (𝑟,,8 − 𝑟,-
)
∑,∈12
𝑠𝑖𝑚(𝑢), 𝑢)
Item5 sim Average Rating
Alice ? 1 4
User1 3 0.85 2.4
User2 5 0.71 3.8
Similar
users
Idea about Item-based Collaborative Filtering
4
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
similar
Predicts unknown scores
based on rating tendency for similar items
similar
𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖: =
∑8∈;2
𝑠𝑖𝑚(𝑖:, 𝑖) 7 𝑟,-,8
∑8∈;2
𝑠𝑖𝑚(𝑖:, 𝑖)
Problems on CF approaches (1/3)
5Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html
Real data is quite sparse!!
Even on large e-commerce sites, there are few intersections
between user vectors (& item vectors).
Problems on CF approaches (2/3)
6
The curse of dimensionality
• In high-dimensional space, it’s difficult to handle similarity
• Usually, item/user vectors have quite high dimensionality
(b/c rating matrix is quite large)
Problems on CF approaches (3/3)
7
High computational cost
• The rating matrix is directly used every time systems find
similar user/items and make predictions
• CF approaches do not scale for most real world scenarios
User 1
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Similar users
for user 1
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Suggested
items
for user 1
Compute Compute
User 2
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Similar users
for user 2
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Suggested
items
for user 2
Compute Compute
Recent approaches for recommender systems
8
Model-based approach
- Based on offline pre-processing
- At run-time, only the learned model is used for rating prediction
- Learned models can be updated
3 2 4 … 3
5
2 4 … 1
…
Rating matrix
Learned
Model
Computation
offline
User
Suggested
Items for user
Computation
online
Memory-based approach vs. model-based approach
9
Memory-based
approach
- User-based CF
- Item-based CF
Model-based
approach
- Matrix factorization
- Association rule mining
- Probabilistic model
- Other ML techniques
2
10
Matrix Factorization
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
Basic idea 1 (1/2)
11
I don’t like horror…
Sci-Fis often move me.
I love humane and
dramatic movies!
User Horror Sci-Fi Humanity Drama …
Alice 0.3 2.1 6.1 4.7 …
Latent factors (which cannot be observed)
Assumes that latent factors exist in users/items
Basic idea 1 (2/2)
12
Assumes that latent factors exist in users/items
Image from Amazon.com
Latent factors (which cannot be observed)
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
User Horror Sci-Fi Humanity Drama …
God
father 2.3 0.4 4.9 5.7 …
Basic idea 2
13
User
God
father
Termin-
ator
Money
game
Titanic
Back to
the future
… X-men
Alice 5 1 4 4 3 … 2
User Horror Sci-Fi Humanity Drama …
Alice 0.3 2.1 6.1 4.7 …
Assumes that rating scores derive from
latent factors of users and items
User Horror Sci-Fi Humanity Drama …
God
father 2.3 0.4 4.9 5.7 …
×
Summary of matrix factorization
14
3 … 3
5
2 … 1
…
Rating matrix = R
(m users × n items)
≈
P
1.3 … 0.7
0.2 … 4.9
…
×
0.2 1.1 … 0.9
3.1 … 0.7 1
…
Latent user matrix
(m users × k latent factors)
Latent item matrix
(k latent factors x n items)
QR ≈
T
×
• Rating matrix can be decomposed to latent factors
of users and items
• The dimension of latent factors (vectors) is much less
than the number of users and items (k ≪ m, n)
□
□
□
(□ = unknown scores)
User
Item
Rating
Prediction using matrix factorization
15
3 … 3
5
2 … 1
…
Predicted
rating matrix = R*
=
P
1.3 … 0.7
0.2 … 4.9
…
×
0.2 1.1 … 0.9
3.1 … 0.7 1
…
Latent user matrix
Latent item matrix
QR =
T
×
If we obtain latent user/item matrix, we can predict
unknown scores by multiplying the two latent matrix
How to obtain latent user/item matrix?
*
2
5
4
□
□
□
SVD for recommender systems
16
SVD: singular value decomposition
- A famous linear algebra technique for matrix decomposition
- It is often used for dimensionality reduction
- SVD delivers essentially the same result as PCA does
U ΣX =
T
× V×
m x n matrix
m x m
unitary matrix
n x n
unitary matrix
Rectangular diagonal matrix
(diagonal values are called as singular values)
Example of SVD
17
U ΣX =
T
× V×
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Important features of SVD (1/4)
18
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Largest singular values
Important features of SVD (2/4)
19
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.904
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
U=
Σ=
V=
Ignore unimportant values!
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
13.368 0 0 0 0
0 4.708 0 0 0
0 0 2.792 0 0
0 0 0 1.586 0
0 0 0 0 0.90
Important features of SVD (3/4)
20
Σ2=
-0.369 -0.325 0.282 0.343 0.749
-0.459 0.448 -0.339 -0.584 0.364
-0.468 -0.359 0.544 -0.459 -0.381
-0.563 0.491 0.07 0.562 -0.348
-0.341 -0.569 -0.71 0.118 -0.202
U2=
-0.325 0.341 -0.101 0.682 -0.55
-0.562 0.345 0.273 0.153 0.684
-0.468 -0.642 -0.577 0.125 0.141
-0.504 -0.327 0.601 -0.324 -0.416
-0.324 0.496 -0.47 -0.626 -0.189
V2=
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
Important features of SVD (4/4)
21
U2 Σ2
T
× V2
×
=
1.08 2.24 3.29 2.98 0.84
2.72 4.17 1.52 2.41 3.04
1.46 2.93 4.02 3.71 1.19
3.24 5.03 2.04 3.04 3.59
0.57 1.64 3.86 3.18 0.15
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
≈ = X
We can approximate a given matrix by focusing
on the largest singular values and the vectors of U
and V which correspond to the singular values
Apply SVD for recommender systems (1/2)
22
Item1 Item2 Item3 Item4 Item5
Alice 1 3 3 3 ?
User1 2 4 2 2 4
User2 1 3 3 5 1
User3 4 5 2 3 3
User4 1 1 5 2 1
Regarded as zero
1 3 3 3 0
2 4 2 2 4
1 3 3 5 1
4 5 2 3 3
1 1 5 2 1
= X
To matrix
Apply SVD for recommender systems (2/2)
23
U2 Σ2
T
V2
U ΣX =
T
× V×
Focus on
important features
× ×
Step 1
Step 2
Run SVD
Multiply three matrixStep 3
=
1.08 2.24 3.29 2.98 0.84
2.72 4.17 1.52 2.41 3.04
1.46 2.93 4.02 3.71 1.19
3.24 5.03 2.04 3.04 3.59
0.57 1.64 3.86 3.18 0.15
Check the values
which were zero
before running SVD
Step 4
Problems on SVD
24
Predicted values are often negative
- SVD does not take rating score range into account
Zero replacement decreases prediction quality
- SVD analyzes the relation between all data in matrix
- The meaning of “zero” is different from that of “unknown”
Bad example of SVD-based recommendation
25
Ghibli 1 Ghibli 2 Ghibli 3 Ghibli 4
User1 5
User2 3 4
User3 2 1
User4 5 4
User5 5
5 0 0 0
3 4 0 0
2 0 1 0
0 5 0 4
0 0 0 5
Example from: http://smrmkt.hatenablog.jp/entry/2014/08/23/211555
U2, Σ2, V2
RunSVD
Zeroreplacement
Multiplymatrix
3.53 1.88 0.16 -0.26
3.62 4.04 0.13 2.17
1.44 0.76 0.07 -0.12
2.76 5.41 0.06 4.34
1.67 5.97 0 5.74
-0.26 2.91 -0.06 3.52
Netflix Prize (2006-2010)
26Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html
Netflix held the open competition to advance collaborative
filtering algorithms and to seek the best algorithm.
Simon Funk’s Matrix Factorization (2006)
27
Without using SVD (with zero replacement),
the Simon’s method learns matrix P and Q
to compute observed values only in R
𝑚𝑖𝑛=,> ?
,,8 ∈@
𝑟,,8 − 𝒑, 𝒒8
C D
+ 𝜆( 𝒑,
D
+ 𝒒8
D
)
Target optimization function
musers
n items
Rating
matrix
R
≈ ×User
Item
P Q
u: user u; i: item i;
pu: u’s latent vector; qi: i’s latent vector
Various approaches have been developed …
28Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath
2003 2006-2009 2010 2013
Scalable models
Amazon’s item-based CF
Netflix Prize
The rise of matrix factorization
like Simon Funk’s method
Factorization machine
Generalized matrix factorization
for dealing with various factors
Deep Learning
・Deep Factorization machine
・Content2Vec to get content
embeddings
3
29
Challenges for
recommender systems
Remaining challenges
30
Cold start problem
How to recommend new items? What to recommend to new users?
Serendipity
- Only recommendation of favorable items cannot expand user interests
- How to recommend unexpected and surprising items?
Providing explanations
- Recommendation algorithms just decide what to recommend
- How can systems persuade users to purchase recommended items?
Reviewer’s trust
How to remove spam reviewers? How to find high-quality reviewers?
Remaining challenges
31
Cold start problem
How to recommend new items? What to recommend to new users?
Serendipity
- Only recommendation of favorable items cannot expand user interests
- How to recommend unexpected and surprising items?
Providing explanations
- Recommendation algorithms just decide what to recommend
- How can systems persuade users to purchase recommended items?
Reviewer’s trust
How to remove spam reviewers? How to find high-quality reviewers?
Cold start problem
32
Item1 Item2 Item3 Item4 item5
Kate
User1 3 1 2 3
User2 4 3 4 3
User3 3 3 1 5
User4 1 5 5 2
New user
New item
CF approaches don’t work for new items/users
New items/users have no clues to predict unknown scores b/c
the CF cannot find neighbor users/items
Possible approach to cold start problem
33
On-the-fly preference prediction
Systems ask/force users to rate a set of items to understand
their preference
Content-based recommendation
Finds similar items based on user interactions and content features
ExtVision: Augmentation of Visual Experiences with
Generation of Context Images for Peripheral Vision Using
Deep Neural Network
ABSTRACT
We propose a system, called ExtVision, to augment visual
experiences by generating and projecting context-images
onto the periphery of the television or computer screen. A
peripheral projection of the context-image is one of the most
effective techniques to enhance visual experiences. However,
the projection is not commonly used at present, because of
the difficulty in preparing the context-image. In this paper,
we propose a deep neural network-based method to generate
context-images for peripheral projection. A user study was
performed to investigate the manner in which the proposed
system augments traditional visual experiences. In addition,
we present applications and future prospects of the
developed system.
Author Keywords
Spatially augmented reality; immersion; augmented video;
large field of view; human vision; video extrapolation
ACM Classification Keywords
H.5.1. Multimedia Information Systems: Artificial, aug-
mented, and virtual realities
INTRODUCTION
Presentation of images to a viewer’s peripheral vision makes
the experience of watching a video more immersive and
exciting. In this paper, we describe how images are displayed
as context-images in peripheral areas to enhance visual
experiences. Illumiroom [10] projects context-images or
visual effects around the TV screen and enhances the
traditional gaming experience. ScreenX [20] and
BarcoEscape [21] provide sub-screens to display video on
both sides of the main screen, providing a more immersive
movie experience.
However, peripheral presentation systems are not very
popular at present, because it is difficult to prepare the
context- images. Researchers have attempted the boundary
value shooting method and computer vision processing to
address this, but a self-sufficient method still remains to be
developed.
In recent years, deep neural network (DNN)-based methods
have shown considerable success in image completion. The
images generated by DNN-based methods are very plausible
but somewhat inaccurate. Human peripheral vision is lower
in resolution than the central visual field and is therefore
weak in distinguishing visual accuracy; therefore, DNN-
based methods are suitable for generating context-images
that meet the requirements of peripheral vision.
In this study, we investigated context-image generation using
DNN. The contributions of this study are as follows:
1. Implementation of two new DNN-based methods to
generate context-images
Naoki Kimura
University of Tokyo
Tokyo, Japan
kimura-naoki@g.ecc.u-tokyo.ac.jp
Jun Rekimoto
University of Tokyo / Sony CSL
Tokyo, Japan
rekimoto@acm.org
Figure 1. ExtVision augments visual experiences. These images are examples of our application. The context-images are
generated and projected onto the peripheral area around the TV by our system.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CHI 2018, April 21–26, 2018, Montreal, QC, Canada
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5620-6/18/04…$15.00
https://doi.org/10.1145/3173574.3174001
CHI 2018 Honourable Mention CHI 2018, April 21–26, 2018, Montréal, QC, Canada
Paper 427 Page 1
Disinformation on the Web: Impact, Characteristics, and
Detection of Wikipedia Hoaxes
Srijan Kumar
∗
University of Maryland
srijan@cs.umd.edu
Robert West
Stanford University
west@cs.stanford.edu
Jure Leskovec
Stanford University
jure@cs.stanford.edu
ABSTRACT
Wikipedia is a major source of information for many people. How-
ever, false information on Wikipedia raises concerns about its cred-
ibility. One way in which false information may be presented on
Wikipedia is in the form of hoax articles, i.e., articles containing
fabricated facts about nonexistent entities or events. In this paper
we study false information on Wikipedia by focusing on the hoax
articles that have been created throughout its history. We make
several contributions. First, we assess the real-world impact of
hoax articles by measuring how long they survive before being de-
bunked, how many pageviews they receive, and how heavily they
are referred to by documents on the Web. We find that, while most
hoaxes are detected quickly and have little impact on Wikipedia,
a small number of hoaxes survive long and are well cited across
the Web. Second, we characterize the nature of successful hoaxes
by comparing them to legitimate articles and to failed hoaxes that
were discovered shortly after being created. We find characteristic
differences in terms of article structure and content, embeddedness
into the rest of Wikipedia, and features of the editor who created
the hoax. Third, we successfully apply our findings to address a
series of classification tasks, most notably to determine whether a
given article is a hoax. And finally, we describe and evaluate a task
involving humans distinguishing hoaxes from non-hoaxes. We find
that humans are not good at solving this task and that our automated
classifier outperforms them by a big margin.
1. INTRODUCTION
The Web is a space for all, where, in principle, everybody can
read, and everybody can publish and share, information. Thus,
knowledge can be transmitted at a speed and breadth unprecedented
in human history, which has had tremendous positive effects on the
lives of billions of people. But there is also a dark side to the un-
reigned proliferation of information over the Web: it has become a
breeding ground for false information [6, 7, 12, 15, 19, 43].
The reasons for communicating false information vary widely:
on the one extreme, misinformation is conveyed in the honest but
∗Research done partly during a visit at Stanford University.
Copyright is held by the International World Wide Web Conference Com-
mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the
author’s site if the Material is used in electronic media.
WWW 2016, April 11–15, 2016, Montréal, Québec, Canada.
ACM 978-1-4503-4143-1/16/04.
http://dx.doi.org/10.1145/2872427.2883085.
mistaken belief that the relayed incorrect facts are true; on the other
extreme, disinformation denotes false facts that are conceived in
order to deliberately deceive or betray an audience [11, 17]. A third
class of false information has been called bullshit, where the agent’s
primary purpose is not to mislead an audience into believing false
facts, but rather to “convey a certain impression of himself” [14].
All these types of false information are abundant on the Web, and
regardless of whether a fact is fabricated or misrepresented on pur-
pose or not, the effects it has on people’s lives may be detrimental
and even fatal, as in the case of medical lies [16, 20, 22, 30].
Hoaxes. This paper focuses on a specific kind of disinformation,
namely hoaxes. Wikipedia defines a hoax as “a deliberately fabri-
cated falsehood made to masquerade as truth.” The Oxford English
Dictionary adds another aspect by defining a hoax as “a humorous
or mischievous deception” (italics ours).
We study hoaxes in the context of Wikipedia, for which there are
two good reasons: first, anyone can insert information into Wiki-
pedia by creating and editing articles; and second, as the world’s
largest encyclopedia and one of the most visited sites on the Web,
Wikipedia is a major source of information for many people. In
other words: Wikipedia has the potential to both attract and spread
false information in general, and hoaxes in particular.
The impact of some Wikipedia hoaxes has been considerable,
and anecdotes are aplenty. The hoax article about a fake language
called “Balboa Creole French”, supposed to be spoken on Balboa
Island in California, is reported to have resulted in “people com-
ing to [... ] Balboa Island to study this imaginary language” [38].
Some hoaxes have made it into books, as in the case of the al-
leged (but fake) Aboriginal Australian god “Jar’Edo Wens”, who
inspired a character’s name in a science fiction book [10] and has
been listed as a real god in at least one nonfiction book [24], all
before it came to light in March 2015 that the article was a hoax.
Another hoax (“Bicholim conflict”) was so elaborate that it was of-
ficially awarded “good article” status and maintained it for half a
decade, before finally being debunked in 2012 [27].
The list of extreme cases could be continued, and the popular
press has covered such incidents widely. What is less available,
however, is a more general understanding of Wikipedia hoaxes that
goes beyond such cherry-picked examples.
Our contributions: impact, characteristics, and detection of
Wikipedia hoaxes. This paper takes a broad perspective by start-
ing from the set of all hoax articles ever created on Wikipedia and
illuminating them from several angles. We study over 20,000 hoax
articles, identified by the fact that they were explicitly flagged as
potential hoaxes by a Wikipedia editor at some point and deleted
after a discussion among editors who concluded that the article was
sciencemag.org SCIENCE
ILLUSTRATION:SÉBASTIENTHIBAULT
By David M. J. Lazer, Matthew A. Baum,
Yochai Benkler, Adam J. Berinsky, Kelly
M. Greenhill, Filippo Menczer, Miriam
J. Metzger, Brendan Nyhan, Gordon
Pennycook, David Rothschild, Michael
Schudson, Steven A. Sloman, Cass R.
Sunstein, Emily A. Thorson, Duncan J.
Watts, Jonathan L. Zittrain
T
he rise of fake news highlights the
erosion of long-standing institutional
bulwarks against misinformation in
the internet age. Concern over the
problem is global. However, much
remains unknown regarding the vul-
nerabilities of individuals, institutions, and
society to manipulations by malicious actors.
A new system of safeguards is needed. Below,
we discuss extant social and computer sci-
ence research regarding belief in fake news
and the mechanisms by which it spreads.
Fake news has a long history, but we focus
on unanswered scientific questions raised by
the proliferation of its most recent, politically
oriented incarnation. Beyond selected refer-
ences in the text, suggested further reading
can be found in the supplementary materials.
WHAT IS FAKE NEWS?
We define “fake news” to be fabricated in-
formation that mimics news media content
in form but not in organizational process or
intent. Fake-news outlets, in turn, lack the
news media’s editorial norms and processes
for ensuring the accuracy and credibility of
information. Fake news overlaps with other
information disorders, such as misinforma-
tion (false or misleading information) and
disinformation (false information that is pur-
posely spread to deceive people).
Fake news has primarily drawn recent at-
tention in a political context but it also has
been documented in information promul-
gated about topics such as vaccination, nu-
trition, and stock values. It is particularly
pernicious in that it is parasitic on standard
news outlets, simultaneously benefiting from
and undermining their credibility.
Some—notably First Draft and Facebook—
favor the term “false news” because of the
use of fake news as a political weapon (1).
We have retained it because of its value as a
scientific construct, and because its politi-
cal salience draws attention to an impor-
tant subject.
THE HISTORICAL SETTING
Journalistic norms of objectivity and bal-
ance arose as a backlash among journalists
against the widespread use of propaganda
in World War I (particularly their own role
in propagating it) and the rise of corporate
public relations in the 1920s. Local and na-
tional oligopolies created by the dominant
20th century technologies of information
distribution (print and broadcast) sustained
these norms. The internet has lowered the
cost of entry to new competitors—many of
which have rejected those norms—and un-
dermined the business models of traditional
news sources that had enjoyed high levels of
public trust and credibility. General trust in
the mass media collapsed to historic lows in
2016, especially on the political right, with
SOCIAL SCIENCE
The science of fake news
Addressing fake news requires a multidisciplinary effort
INSIGHTS
POLICY FORUM
The list of author affiliations is provided in the supplementary
materials. Email: d.lazer@northeastern.edu
1094 9 MARCH 2018 • VOL 359 ISSUE 6380
DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM
Published by AAAS
onMarch13,2018http://science.sciencemag.org/Downloadedfrom
Paperlist
Click to view !!
Textually similar
(on TF-IDF, D2V etc.)
sciencemag.org SCIENCE
ILLUSTRATION:SÉBASTIENTHIBAULT
By David M. J. Lazer, Matthew A. Baum,
Yochai Benkler, Adam J. Berinsky, Kelly
M. Greenhill, Filippo Menczer, Miriam
J. Metzger, Brendan Nyhan, Gordon
Pennycook, David Rothschild, Michael
Schudson, Steven A. Sloman, Cass R.
Sunstein, Emily A. Thorson, Duncan J.
Watts, Jonathan L. Zittrain
T
he rise of fake news highlights the
erosion of long-standing institutional
bulwarks against misinformation in
the internet age. Concern over the
problem is global. However, much
remains unknown regarding the vul-
nerabilities of individuals, institutions, and
society to manipulations by malicious actors.
A new system of safeguards is needed. Below,
we discuss extant social and computer sci-
ence research regarding belief in fake news
and the mechanisms by which it spreads.
Fake news has a long history, but we focus
on unanswered scientific questions raised by
the proliferation of its most recent, politically
oriented incarnation. Beyond selected refer-
ences in the text, suggested further reading
can be found in the supplementary materials.
WHAT IS FAKE NEWS?
We define “fake news” to be fabricated in-
formation that mimics news media content
in form but not in organizational process or
intent. Fake-news outlets, in turn, lack the
news media’s editorial norms and processes
for ensuring the accuracy and credibility of
information. Fake news overlaps with other
information disorders, such as misinforma-
tion (false or misleading information) and
disinformation (false information that is pur-
posely spread to deceive people).
Fake news has primarily drawn recent at-
tention in a political context but it also has
been documented in information promul-
gated about topics such as vaccination, nu-
trition, and stock values. It is particularly
pernicious in that it is parasitic on standard
news outlets, simultaneously benefiting from
and undermining their credibility.
Some—notably First Draft and Facebook—
favor the term “false news” because of the
use of fake news as a political weapon (1).
We have retained it because of its value as a
scientific construct, and because its politi-
cal salience draws attention to an impor-
tant subject.
THE HISTORICAL SETTING
Journalistic norms of objectivity and bal-
ance arose as a backlash among journalists
against the widespread use of propaganda
in World War I (particularly their own role
in propagating it) and the rise of corporate
public relations in the 1920s. Local and na-
tional oligopolies created by the dominant
20th century technologies of information
distribution (print and broadcast) sustained
these norms. The internet has lowered the
cost of entry to new competitors—many of
which have rejected those norms—and un-
dermined the business models of traditional
news sources that had enjoyed high levels of
public trust and credibility. General trust in
the mass media collapsed to historic lows in
2016, especially on the political right, with
SOCIAL SCIENCE
The science of fake news
Addressing fake news requires a multidisciplinary effort
INSIGHTS
POLICY FORUM
The list of author affiliations is provided in the supplementary
materials. Email: d.lazer@northeastern.edu
1094 9 MARCH 2018 • VOL 359 ISSUE 6380
DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM
Published by AAAS
onMarch13,2018http://science.sciencemag.org/Downloadedfrom
Similar items
are recommended
Reviewer Trust
34T. Wang, D. Wang. (2014) “Why Amazon’s Ratings Might Mislead You: The Story of Herding Effects”, Journal of Big Data, Vol.2, No.4, pp.196-204.
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
P−Value
CumulativeDis
0 50 100 150 200 250
2
3
4
5
Sequence Number of Rating
MeanRating
Books
Electronics
Movies & TV
Music −0
−0
−0
0
0
Pearson’sCorrelationCoefficient
0.05
C
FIG. 1. (A) Cumulative distribution of p-values of Augment Dickey–Fuller test
Fig: Average rating scores on Amazon.com
How to find trustworthy reviewers (rating experts)?
- People often give good scores to items
- Some reviewers intentionally give too high/low scores to items
(spammers)

Mais conteúdo relacionado

Mais procurados

Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection methodAmir Razmjou
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Singular Value Decompostion (SVD)
Singular Value Decompostion (SVD)Singular Value Decompostion (SVD)
Singular Value Decompostion (SVD)Isaac Yowetu
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationLuis Serrano
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNEDavid Khosid
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 

Mais procurados (20)

Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Singular Value Decompostion (SVD)
Singular Value Decompostion (SVD)Singular Value Decompostion (SVD)
Singular Value Decompostion (SVD)
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 

Semelhante a Matrix Factorization

Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemszhu02
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFYusuke Yamamoto
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection ProcessBenjamin Bengfort
 
Graph Analytics with Greenplum and Apache MADlib
Graph Analytics with Greenplum and Apache MADlibGraph Analytics with Greenplum and Apache MADlib
Graph Analytics with Greenplum and Apache MADlibVMware Tanzu
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...hyunsung lee
 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoopYONG ZHENG
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemsfuchaoqun
 
Machine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’sMachine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’sSiddharth Chakravarty
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithmsrahmedraj93
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxFaridAliMousa1
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningBenjamin Bengfort
 

Semelhante a Matrix Factorization (20)

Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
 
Report
ReportReport
Report
 
Graph Analytics with Greenplum and Apache MADlib
Graph Analytics with Greenplum and Apache MADlibGraph Analytics with Greenplum and Apache MADlib
Graph Analytics with Greenplum and Apache MADlib
 
Data analytics concepts
Data analytics conceptsData analytics concepts
Data analytics concepts
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoop
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Machine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’sMachine learning algorithm for classification of activity of daily life’s
Machine learning algorithm for classification of activity of daily life’s
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Taller parcial 2
Taller parcial 2Taller parcial 2
Taller parcial 2
 
sheet6.pdf
sheet6.pdfsheet6.pdf
sheet6.pdf
 
doc6.pdf
doc6.pdfdoc6.pdf
doc6.pdf
 

Mais de Yusuke Yamamoto

Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
データ解析技術2019
データ解析技術2019データ解析技術2019
データ解析技術2019Yusuke Yamamoto
 
研究室紹介資料2019
研究室紹介資料2019研究室紹介資料2019
研究室紹介資料2019Yusuke Yamamoto
 
ACM WebSci 2018 presentation/発表資料
ACM WebSci 2018 presentation/発表資料ACM WebSci 2018 presentation/発表資料
ACM WebSci 2018 presentation/発表資料Yusuke Yamamoto
 
不便益システムシンポジウム2018発表資料
不便益システムシンポジウム2018発表資料不便益システムシンポジウム2018発表資料
不便益システムシンポジウム2018発表資料Yusuke Yamamoto
 
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319Yusuke Yamamoto
 
批判的ウェブ情報探索リテラシー尺度の開発
批判的ウェブ情報探索リテラシー尺度の開発批判的ウェブ情報探索リテラシー尺度の開発
批判的ウェブ情報探索リテラシー尺度の開発Yusuke Yamamoto
 
東北地区大学図書館協議会 第72回総会講演資料20170922
東北地区大学図書館協議会 第72回総会講演資料20170922東北地区大学図書館協議会 第72回総会講演資料20170922
東北地区大学図書館協議会 第72回総会講演資料20170922Yusuke Yamamoto
 
WI2研究会 Vol.10発表資料20170708
WI2研究会 Vol.10発表資料20170708WI2研究会 Vol.10発表資料20170708
WI2研究会 Vol.10発表資料20170708Yusuke Yamamoto
 
情報学応用論20170622
情報学応用論20170622情報学応用論20170622
情報学応用論20170622Yusuke Yamamoto
 
ビッグデータとITイノベーション
ビッグデータとITイノベーションビッグデータとITイノベーション
ビッグデータとITイノベーションYusuke Yamamoto
 
ウェブと研究者との関わり方20150302
ウェブと研究者との関わり方20150302ウェブと研究者との関わり方20150302
ウェブと研究者との関わり方20150302Yusuke Yamamoto
 
大学の研究力を考える
大学の研究力を考える大学の研究力を考える
大学の研究力を考えるYusuke Yamamoto
 
研究力DOWNシナリオ
研究力DOWNシナリオ研究力DOWNシナリオ
研究力DOWNシナリオYusuke Yamamoto
 
URAかるた 〜URA業務の理解・共有を促進するゲーム教材
URAかるた 〜URA業務の理解・共有を促進するゲーム教材URAかるた 〜URA業務の理解・共有を促進するゲーム教材
URAかるた 〜URA業務の理解・共有を促進するゲーム教材Yusuke Yamamoto
 
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」Yusuke Yamamoto
 

Mais de Yusuke Yamamoto (20)

WISE2019 presentation
WISE2019 presentationWISE2019 presentation
WISE2019 presentation
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
データ解析技術2019
データ解析技術2019データ解析技術2019
データ解析技術2019
 
研究室紹介資料2019
研究室紹介資料2019研究室紹介資料2019
研究室紹介資料2019
 
ACM WebSci 2018 presentation/発表資料
ACM WebSci 2018 presentation/発表資料ACM WebSci 2018 presentation/発表資料
ACM WebSci 2018 presentation/発表資料
 
不便益システムシンポジウム2018発表資料
不便益システムシンポジウム2018発表資料不便益システムシンポジウム2018発表資料
不便益システムシンポジウム2018発表資料
 
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
KURA HOUR拡大版・附属図書館研究開発室セミナー 20180319
 
批判的ウェブ情報探索リテラシー尺度の開発
批判的ウェブ情報探索リテラシー尺度の開発批判的ウェブ情報探索リテラシー尺度の開発
批判的ウェブ情報探索リテラシー尺度の開発
 
東北地区大学図書館協議会 第72回総会講演資料20170922
東北地区大学図書館協議会 第72回総会講演資料20170922東北地区大学図書館協議会 第72回総会講演資料20170922
東北地区大学図書館協議会 第72回総会講演資料20170922
 
WI2研究会 Vol.10発表資料20170708
WI2研究会 Vol.10発表資料20170708WI2研究会 Vol.10発表資料20170708
WI2研究会 Vol.10発表資料20170708
 
情報学応用論20170622
情報学応用論20170622情報学応用論20170622
情報学応用論20170622
 
情報学総論20170623
情報学総論20170623情報学総論20170623
情報学総論20170623
 
情報学総論20170616
情報学総論20170616情報学総論20170616
情報学総論20170616
 
ビッグデータとITイノベーション
ビッグデータとITイノベーションビッグデータとITイノベーション
ビッグデータとITイノベーション
 
ウェブと研究者との関わり方20150302
ウェブと研究者との関わり方20150302ウェブと研究者との関わり方20150302
ウェブと研究者との関わり方20150302
 
大学の研究力を考える
大学の研究力を考える大学の研究力を考える
大学の研究力を考える
 
研究力DOWNシナリオ
研究力DOWNシナリオ研究力DOWNシナリオ
研究力DOWNシナリオ
 
URAかるた 〜URA業務の理解・共有を促進するゲーム教材
URAかるた 〜URA業務の理解・共有を促進するゲーム教材URAかるた 〜URA業務の理解・共有を促進するゲーム教材
URAかるた 〜URA業務の理解・共有を促進するゲーム教材
 
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
ポスター「科研費申請書の教科書 ~ 作成に意味はあったのか?」
 

Último

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 

Último (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

Matrix Factorization

  • 1. Matrix Factorization: Beyond Simple Collaborative Filtering Yusuke Yamamoto Lecturer, Faculty of Informatics yusuke_yamamoto@acm.org Data Engineering (Recommender Systems 3) 2019.11.11
  • 3. User-based Collaborative Filtering 3 Predicts a target user’s rating for an item based on rating tendency of similar users 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖 = 𝑟,- + ∑,∈12 𝑠𝑖𝑚(𝑢), 𝑢) 7 (𝑟,,8 − 𝑟,- ) ∑,∈12 𝑠𝑖𝑚(𝑢), 𝑢) Item5 sim Average Rating Alice ? 1 4 User1 3 0.85 2.4 User2 5 0.71 3.8 Similar users
  • 4. Idea about Item-based Collaborative Filtering 4 Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 similar Predicts unknown scores based on rating tendency for similar items similar 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖: = ∑8∈;2 𝑠𝑖𝑚(𝑖:, 𝑖) 7 𝑟,-,8 ∑8∈;2 𝑠𝑖𝑚(𝑖:, 𝑖)
  • 5. Problems on CF approaches (1/3) 5Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html Real data is quite sparse!! Even on large e-commerce sites, there are few intersections between user vectors (& item vectors).
  • 6. Problems on CF approaches (2/3) 6 The curse of dimensionality • In high-dimensional space, it’s difficult to handle similarity • Usually, item/user vectors have quite high dimensionality (b/c rating matrix is quite large)
  • 7. Problems on CF approaches (3/3) 7 High computational cost • The rating matrix is directly used every time systems find similar user/items and make predictions • CF approaches do not scale for most real world scenarios User 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 1 Compute Compute User 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 2 Compute Compute
  • 8. Recent approaches for recommender systems 8 Model-based approach - Based on offline pre-processing - At run-time, only the learned model is used for rating prediction - Learned models can be updated 3 2 4 … 3 5 2 4 … 1 … Rating matrix Learned Model Computation offline User Suggested Items for user Computation online
  • 9. Memory-based approach vs. model-based approach 9 Memory-based approach - User-based CF - Item-based CF Model-based approach - Matrix factorization - Association rule mining - Probabilistic model - Other ML techniques
  • 11. User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 Basic idea 1 (1/2) 11 I don’t like horror… Sci-Fis often move me. I love humane and dramatic movies! User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Latent factors (which cannot be observed) Assumes that latent factors exist in users/items
  • 12. Basic idea 1 (2/2) 12 Assumes that latent factors exist in users/items Image from Amazon.com Latent factors (which cannot be observed) User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 …
  • 13. Basic idea 2 13 User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Assumes that rating scores derive from latent factors of users and items User Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 … ×
  • 14. Summary of matrix factorization 14 3 … 3 5 2 … 1 … Rating matrix = R (m users × n items) ≈ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix (m users × k latent factors) Latent item matrix (k latent factors x n items) QR ≈ T × • Rating matrix can be decomposed to latent factors of users and items • The dimension of latent factors (vectors) is much less than the number of users and items (k ≪ m, n) □ □ □ (□ = unknown scores) User Item Rating
  • 15. Prediction using matrix factorization 15 3 … 3 5 2 … 1 … Predicted rating matrix = R* = P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix Latent item matrix QR = T × If we obtain latent user/item matrix, we can predict unknown scores by multiplying the two latent matrix How to obtain latent user/item matrix? * 2 5 4 □ □ □
  • 16. SVD for recommender systems 16 SVD: singular value decomposition - A famous linear algebra technique for matrix decomposition - It is often used for dimensionality reduction - SVD delivers essentially the same result as PCA does U ΣX = T × V× m x n matrix m x m unitary matrix n x n unitary matrix Rectangular diagonal matrix (diagonal values are called as singular values)
  • 17. Example of SVD 17 U ΣX = T × V× 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V=
  • 18. Important features of SVD (1/4) 18 We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Largest singular values
  • 19. Important features of SVD (2/4) 19 -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Ignore unimportant values! We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
  • 20. 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.90 Important features of SVD (3/4) 20 Σ2= -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 U2= -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 V2= We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
  • 21. Important features of SVD (4/4) 21 U2 Σ2 T × V2 × = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 ≈ = X We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
  • 22. Apply SVD for recommender systems (1/2) 22 Item1 Item2 Item3 Item4 Item5 Alice 1 3 3 3 ? User1 2 4 2 2 4 User2 1 3 3 5 1 User3 4 5 2 3 3 User4 1 1 5 2 1 Regarded as zero 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 = X To matrix
  • 23. Apply SVD for recommender systems (2/2) 23 U2 Σ2 T V2 U ΣX = T × V× Focus on important features × × Step 1 Step 2 Run SVD Multiply three matrixStep 3 = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 Check the values which were zero before running SVD Step 4
  • 24. Problems on SVD 24 Predicted values are often negative - SVD does not take rating score range into account Zero replacement decreases prediction quality - SVD analyzes the relation between all data in matrix - The meaning of “zero” is different from that of “unknown”
  • 25. Bad example of SVD-based recommendation 25 Ghibli 1 Ghibli 2 Ghibli 3 Ghibli 4 User1 5 User2 3 4 User3 2 1 User4 5 4 User5 5 5 0 0 0 3 4 0 0 2 0 1 0 0 5 0 4 0 0 0 5 Example from: http://smrmkt.hatenablog.jp/entry/2014/08/23/211555 U2, Σ2, V2 RunSVD Zeroreplacement Multiplymatrix 3.53 1.88 0.16 -0.26 3.62 4.04 0.13 2.17 1.44 0.76 0.07 -0.12 2.76 5.41 0.06 4.34 1.67 5.97 0 5.74 -0.26 2.91 -0.06 3.52
  • 26. Netflix Prize (2006-2010) 26Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html Netflix held the open competition to advance collaborative filtering algorithms and to seek the best algorithm.
  • 27. Simon Funk’s Matrix Factorization (2006) 27 Without using SVD (with zero replacement), the Simon’s method learns matrix P and Q to compute observed values only in R 𝑚𝑖𝑛=,> ? ,,8 ∈@ 𝑟,,8 − 𝒑, 𝒒8 C D + 𝜆( 𝒑, D + 𝒒8 D ) Target optimization function musers n items Rating matrix R ≈ ×User Item P Q u: user u; i: item i; pu: u’s latent vector; qi: i’s latent vector
  • 28. Various approaches have been developed … 28Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath 2003 2006-2009 2010 2013 Scalable models Amazon’s item-based CF Netflix Prize The rise of matrix factorization like Simon Funk’s method Factorization machine Generalized matrix factorization for dealing with various factors Deep Learning ・Deep Factorization machine ・Content2Vec to get content embeddings
  • 30. Remaining challenges 30 Cold start problem How to recommend new items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
  • 31. Remaining challenges 31 Cold start problem How to recommend new items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
  • 32. Cold start problem 32 Item1 Item2 Item3 Item4 item5 Kate User1 3 1 2 3 User2 4 3 4 3 User3 3 3 1 5 User4 1 5 5 2 New user New item CF approaches don’t work for new items/users New items/users have no clues to predict unknown scores b/c the CF cannot find neighbor users/items
  • 33. Possible approach to cold start problem 33 On-the-fly preference prediction Systems ask/force users to rate a set of items to understand their preference Content-based recommendation Finds similar items based on user interactions and content features ExtVision: Augmentation of Visual Experiences with Generation of Context Images for Peripheral Vision Using Deep Neural Network ABSTRACT We propose a system, called ExtVision, to augment visual experiences by generating and projecting context-images onto the periphery of the television or computer screen. A peripheral projection of the context-image is one of the most effective techniques to enhance visual experiences. However, the projection is not commonly used at present, because of the difficulty in preparing the context-image. In this paper, we propose a deep neural network-based method to generate context-images for peripheral projection. A user study was performed to investigate the manner in which the proposed system augments traditional visual experiences. In addition, we present applications and future prospects of the developed system. Author Keywords Spatially augmented reality; immersion; augmented video; large field of view; human vision; video extrapolation ACM Classification Keywords H.5.1. Multimedia Information Systems: Artificial, aug- mented, and virtual realities INTRODUCTION Presentation of images to a viewer’s peripheral vision makes the experience of watching a video more immersive and exciting. In this paper, we describe how images are displayed as context-images in peripheral areas to enhance visual experiences. Illumiroom [10] projects context-images or visual effects around the TV screen and enhances the traditional gaming experience. ScreenX [20] and BarcoEscape [21] provide sub-screens to display video on both sides of the main screen, providing a more immersive movie experience. However, peripheral presentation systems are not very popular at present, because it is difficult to prepare the context- images. Researchers have attempted the boundary value shooting method and computer vision processing to address this, but a self-sufficient method still remains to be developed. In recent years, deep neural network (DNN)-based methods have shown considerable success in image completion. The images generated by DNN-based methods are very plausible but somewhat inaccurate. Human peripheral vision is lower in resolution than the central visual field and is therefore weak in distinguishing visual accuracy; therefore, DNN- based methods are suitable for generating context-images that meet the requirements of peripheral vision. In this study, we investigated context-image generation using DNN. The contributions of this study are as follows: 1. Implementation of two new DNN-based methods to generate context-images Naoki Kimura University of Tokyo Tokyo, Japan kimura-naoki@g.ecc.u-tokyo.ac.jp Jun Rekimoto University of Tokyo / Sony CSL Tokyo, Japan rekimoto@acm.org Figure 1. ExtVision augments visual experiences. These images are examples of our application. The context-images are generated and projected onto the peripheral area around the TV by our system. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. CHI 2018, April 21–26, 2018, Montreal, QC, Canada © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5620-6/18/04…$15.00 https://doi.org/10.1145/3173574.3174001 CHI 2018 Honourable Mention CHI 2018, April 21–26, 2018, Montréal, QC, Canada Paper 427 Page 1 Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes Srijan Kumar ∗ University of Maryland srijan@cs.umd.edu Robert West Stanford University west@cs.stanford.edu Jure Leskovec Stanford University jure@cs.stanford.edu ABSTRACT Wikipedia is a major source of information for many people. How- ever, false information on Wikipedia raises concerns about its cred- ibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e., articles containing fabricated facts about nonexistent entities or events. In this paper we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. We make several contributions. First, we assess the real-world impact of hoax articles by measuring how long they survive before being de- bunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not good at solving this task and that our automated classifier outperforms them by a big margin. 1. INTRODUCTION The Web is a space for all, where, in principle, everybody can read, and everybody can publish and share, information. Thus, knowledge can be transmitted at a speed and breadth unprecedented in human history, which has had tremendous positive effects on the lives of billions of people. But there is also a dark side to the un- reigned proliferation of information over the Web: it has become a breeding ground for false information [6, 7, 12, 15, 19, 43]. The reasons for communicating false information vary widely: on the one extreme, misinformation is conveyed in the honest but ∗Research done partly during a visit at Stanford University. Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. WWW 2016, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4143-1/16/04. http://dx.doi.org/10.1145/2872427.2883085. mistaken belief that the relayed incorrect facts are true; on the other extreme, disinformation denotes false facts that are conceived in order to deliberately deceive or betray an audience [11, 17]. A third class of false information has been called bullshit, where the agent’s primary purpose is not to mislead an audience into believing false facts, but rather to “convey a certain impression of himself” [14]. All these types of false information are abundant on the Web, and regardless of whether a fact is fabricated or misrepresented on pur- pose or not, the effects it has on people’s lives may be detrimental and even fatal, as in the case of medical lies [16, 20, 22, 30]. Hoaxes. This paper focuses on a specific kind of disinformation, namely hoaxes. Wikipedia defines a hoax as “a deliberately fabri- cated falsehood made to masquerade as truth.” The Oxford English Dictionary adds another aspect by defining a hoax as “a humorous or mischievous deception” (italics ours). We study hoaxes in the context of Wikipedia, for which there are two good reasons: first, anyone can insert information into Wiki- pedia by creating and editing articles; and second, as the world’s largest encyclopedia and one of the most visited sites on the Web, Wikipedia is a major source of information for many people. In other words: Wikipedia has the potential to both attract and spread false information in general, and hoaxes in particular. The impact of some Wikipedia hoaxes has been considerable, and anecdotes are aplenty. The hoax article about a fake language called “Balboa Creole French”, supposed to be spoken on Balboa Island in California, is reported to have resulted in “people com- ing to [... ] Balboa Island to study this imaginary language” [38]. Some hoaxes have made it into books, as in the case of the al- leged (but fake) Aboriginal Australian god “Jar’Edo Wens”, who inspired a character’s name in a science fiction book [10] and has been listed as a real god in at least one nonfiction book [24], all before it came to light in March 2015 that the article was a hoax. Another hoax (“Bicholim conflict”) was so elaborate that it was of- ficially awarded “good article” status and maintained it for half a decade, before finally being debunked in 2012 [27]. The list of extreme cases could be continued, and the popular press has covered such incidents widely. What is less available, however, is a more general understanding of Wikipedia hoaxes that goes beyond such cherry-picked examples. Our contributions: impact, characteristics, and detection of Wikipedia hoaxes. This paper takes a broad perspective by start- ing from the set of all hoax articles ever created on Wikipedia and illuminating them from several angles. We study over 20,000 hoax articles, identified by the fact that they were explicitly flagged as potential hoaxes by a Wikipedia editor at some point and deleted after a discussion among editors who concluded that the article was sciencemag.org SCIENCE ILLUSTRATION:SÉBASTIENTHIBAULT By David M. J. Lazer, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, Michael Schudson, Steven A. Sloman, Cass R. Sunstein, Emily A. Thorson, Duncan J. Watts, Jonathan L. Zittrain T he rise of fake news highlights the erosion of long-standing institutional bulwarks against misinformation in the internet age. Concern over the problem is global. However, much remains unknown regarding the vul- nerabilities of individuals, institutions, and society to manipulations by malicious actors. A new system of safeguards is needed. Below, we discuss extant social and computer sci- ence research regarding belief in fake news and the mechanisms by which it spreads. Fake news has a long history, but we focus on unanswered scientific questions raised by the proliferation of its most recent, politically oriented incarnation. Beyond selected refer- ences in the text, suggested further reading can be found in the supplementary materials. WHAT IS FAKE NEWS? We define “fake news” to be fabricated in- formation that mimics news media content in form but not in organizational process or intent. Fake-news outlets, in turn, lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information. Fake news overlaps with other information disorders, such as misinforma- tion (false or misleading information) and disinformation (false information that is pur- posely spread to deceive people). Fake news has primarily drawn recent at- tention in a political context but it also has been documented in information promul- gated about topics such as vaccination, nu- trition, and stock values. It is particularly pernicious in that it is parasitic on standard news outlets, simultaneously benefiting from and undermining their credibility. Some—notably First Draft and Facebook— favor the term “false news” because of the use of fake news as a political weapon (1). We have retained it because of its value as a scientific construct, and because its politi- cal salience draws attention to an impor- tant subject. THE HISTORICAL SETTING Journalistic norms of objectivity and bal- ance arose as a backlash among journalists against the widespread use of propaganda in World War I (particularly their own role in propagating it) and the rise of corporate public relations in the 1920s. Local and na- tional oligopolies created by the dominant 20th century technologies of information distribution (print and broadcast) sustained these norms. The internet has lowered the cost of entry to new competitors—many of which have rejected those norms—and un- dermined the business models of traditional news sources that had enjoyed high levels of public trust and credibility. General trust in the mass media collapsed to historic lows in 2016, especially on the political right, with SOCIAL SCIENCE The science of fake news Addressing fake news requires a multidisciplinary effort INSIGHTS POLICY FORUM The list of author affiliations is provided in the supplementary materials. Email: d.lazer@northeastern.edu 1094 9 MARCH 2018 • VOL 359 ISSUE 6380 DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM Published by AAAS onMarch13,2018http://science.sciencemag.org/Downloadedfrom Paperlist Click to view !! Textually similar (on TF-IDF, D2V etc.) sciencemag.org SCIENCE ILLUSTRATION:SÉBASTIENTHIBAULT By David M. J. Lazer, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, Michael Schudson, Steven A. Sloman, Cass R. Sunstein, Emily A. Thorson, Duncan J. Watts, Jonathan L. Zittrain T he rise of fake news highlights the erosion of long-standing institutional bulwarks against misinformation in the internet age. Concern over the problem is global. However, much remains unknown regarding the vul- nerabilities of individuals, institutions, and society to manipulations by malicious actors. A new system of safeguards is needed. Below, we discuss extant social and computer sci- ence research regarding belief in fake news and the mechanisms by which it spreads. Fake news has a long history, but we focus on unanswered scientific questions raised by the proliferation of its most recent, politically oriented incarnation. Beyond selected refer- ences in the text, suggested further reading can be found in the supplementary materials. WHAT IS FAKE NEWS? We define “fake news” to be fabricated in- formation that mimics news media content in form but not in organizational process or intent. Fake-news outlets, in turn, lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information. Fake news overlaps with other information disorders, such as misinforma- tion (false or misleading information) and disinformation (false information that is pur- posely spread to deceive people). Fake news has primarily drawn recent at- tention in a political context but it also has been documented in information promul- gated about topics such as vaccination, nu- trition, and stock values. It is particularly pernicious in that it is parasitic on standard news outlets, simultaneously benefiting from and undermining their credibility. Some—notably First Draft and Facebook— favor the term “false news” because of the use of fake news as a political weapon (1). We have retained it because of its value as a scientific construct, and because its politi- cal salience draws attention to an impor- tant subject. THE HISTORICAL SETTING Journalistic norms of objectivity and bal- ance arose as a backlash among journalists against the widespread use of propaganda in World War I (particularly their own role in propagating it) and the rise of corporate public relations in the 1920s. Local and na- tional oligopolies created by the dominant 20th century technologies of information distribution (print and broadcast) sustained these norms. The internet has lowered the cost of entry to new competitors—many of which have rejected those norms—and un- dermined the business models of traditional news sources that had enjoyed high levels of public trust and credibility. General trust in the mass media collapsed to historic lows in 2016, especially on the political right, with SOCIAL SCIENCE The science of fake news Addressing fake news requires a multidisciplinary effort INSIGHTS POLICY FORUM The list of author affiliations is provided in the supplementary materials. Email: d.lazer@northeastern.edu 1094 9 MARCH 2018 • VOL 359 ISSUE 6380 DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM Published by AAAS onMarch13,2018http://science.sciencemag.org/Downloadedfrom Similar items are recommended
  • 34. Reviewer Trust 34T. Wang, D. Wang. (2014) “Why Amazon’s Ratings Might Mislead You: The Story of Herding Effects”, Journal of Big Data, Vol.2, No.4, pp.196-204. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 P−Value CumulativeDis 0 50 100 150 200 250 2 3 4 5 Sequence Number of Rating MeanRating Books Electronics Movies & TV Music −0 −0 −0 0 0 Pearson’sCorrelationCoefficient 0.05 C FIG. 1. (A) Cumulative distribution of p-values of Augment Dickey–Fuller test Fig: Average rating scores on Amazon.com How to find trustworthy reviewers (rating experts)? - People often give good scores to items - Some reviewers intentionally give too high/low scores to items (spammers)