From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

From Data to Decisions,
A Mixed Path of Data
Visualization and Machine
Learning
Qianwen Wang
Hypothesis
p-value
thr:0.05
Model
Results
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
0.7405
0.5232
0.2961
0.8705
0.030
R(M, D+)<R(M, D)
0.000
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.006
R(M+, D+)>R(M, D+)
0.048
R(M+, D)<R(M, D+)
0.000
R(M+, D+)>R(M+, D)
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
is
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
ld
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
is
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
ld
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
lr
e
a
d
y
le
a
r
n
e
d
t
h
e
M
+
h
a
s
le
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
p
o
s
it
iv
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
n
e
g
a
t
iv
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
p
o
s
it
iv
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
n
e
g
a
t
iv
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
it
iv
e
ly
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
iv
e
ly
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
it
iv
e
ly
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
iv
e
ly
T
h
e
c
o
n
c
e
p
t
is
u
s
e
f
u
l
t
o
M
+

Advisor: Huamin Qu Advisor: Nils Gehlenborg
2020
2017 2019
2015
Machine Learning
Data
Visualization
Human
Computer
Interaction

Machine Learning
and
Data Visualization,
What are we talking about?

Machine
Learning
Data
Visualization
• An ability to learn from data,
extract patterns, and make
decisions with minimum human
intervention
• An accessible way for humans
to interpret data, identify
patterns, and make data-
driven decisions

Machine
Learning
Data
Visualization
Data
Decisions

http://querytreeapp.com/blog/ma
ke-sense-with-data-visualization/
Data
Visualization
Machine
Learning

Artificial intelligence is still
human intelligence
Data
Visualization
Machine
Learning

Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Machine
Learning
Data
Visualization
Data
Decisions
Data
Specification
Knowledge
Visualization Perception
Exploration
data visualization user
image
modify
specification
increase
knowledge
• VIS4ML
• ML4VIS
• A better collaboration
between ML and VIS

Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Human intervention is needed at each step
How can data visualization facilitate the process?

Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
How to choose a suitable model?

Overwhelmed by the Variety
12
DNN
DNN
D
N
N
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
D
N
N
DNN DNN
DNN DNN
Deep Neural Network (DNN)

DNN Genealogy
13
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN

V i s u a l G e n e a l o g y o f
D e e p N e u r a l N e t w o r k s
Qianwen Wang1, Jun Yuan2, Shuxin Chen2, Hang Su2, Huamin Qu1, and Shixia Liu2
Tshinghua
University

Visualization Module
15
Architecture
Evolution
Performance
http://dnn.hkustvis.org/

Case: Investigate Evolution Patterns
17

18

19
How to combine skip connection
with the main branch?
Gate
Addition
Concatenation
A mixture
+
||
+ ||

ATMSeer: Increasing
Transparency and Controllability
in Automated Machine Learning
Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu,
Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu

21
Developing ML Models
A model
for my task
SVM
MLP
Random
Forest
KNN
.
.
.
.
.
.
learning rate = ?
# layers = ?
batch size =?
# neurons = ?
.
.
.
.
.
.

…
SVM
MLP
Rando
m
Forest
KN
N
.
.
.
.
.
.
learning
rate
=
?
#
layers
=
?
batch
size
=?
#
neurons
=
?
.
.
.
.
.
.
Suppor
t
Vector
Machin
e ?
Ne
ura
l
Ne
tw
ork
?
Ra
nd
om
For
est
?
Hid
de
n
Lay
er
= ?
Le
arn
ing
Rat
e = ?
Ker
nel
Fun
ctio
n = ?
Ma
x
De
pt
h
= ?
A
c
ti
v
a
ti
o
n
= ?
K
Near
est
Neig
hbor ?
L
e
af
Si
z
e
= ?
Min
Sam
ples
Leaf
= ?
Min
Sam
ples
Split
= ?
Line
ar
Reg
ress
ion ?
Automated
Machine Learning
Make it automated!
22

controllability
transparency
…

25
Algorithm Level
HyperPartition
Level
HyperPartameter
Level

Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Can we conduct behavioral testing of ML
models that goes beyond accuracy?

How to examine Discrimination?
28

A College Admission Example
29
accepted females
accepted males
rejected
50%>42% Seems unfair?

30
accepted females
accepted males
rejected
Low score
High score
33.3%>26.7%
75%>65%

31
accepted females
accepted males
rejected
20%=20%
40%=40%
60%=60%
80%=80%
Low score
High score
CS
EE CS
EE

32
Two individuals who are similar with respect
to a task are treated equally

Visual Analysis of
Discrimination
in Machine Learning
Tshinghua
University
1. 2.
Qianwen
Wang1
Zhenhua
Xu1
Huamin
Qu1
Shixia
Liu2
Zhutian
Chen1
Yong
Wang1

34
!"#"$%&'( )* +,-
."/ 01*
."/ 210
3'$45678// "#968:;'< ='9$/>3""4 ;<6'?")-04 @'<!
A
B
Discriminatory
Itemset

Challenges in Analysis
36
3'$45678//
7'6%&'(
C#968:;'<D*%2E ='9$/>3""4 +,-
$"78:;'<DF
'3<%6=;7#
='9/"D
$"<:
68G;:87%&8;<D
)E000
?8$;:87D
#;('$6"#
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
$"78:;'<DF
<':%;<%!8?;7;I
='9/"D
'3<
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
='9/"D
'3<
Long and Complex Definition
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
?8$;:87D
#;('$6"#
Intertwining
Relationship

37

38
23< raised hands < 50
Attribute Matrix
Itemset
Attribute

Intertwining Relationships
39
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
$"78:;'<DF
<':%;<%!8?;7;I
='9/"D
'3<
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
='9/"D
'3<
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
?8$;:87D
#;('$6"#
RippleSet

Designing RippleSet
40
An item
Items ∈ set A
An item
Items ∈ set A

41
An item
Items ∈ set A
(C∩D)(AUBUE)
(A∩B∩C∩D)E
(A∩B∩C)(DUE)
(A∩B∩E)(CUD)
(B∩C∩E)(AUD)
Designing RippleSet

42
An item
Items ∈ set A
(C∩D)(AUBUE)
(A∩B∩C∩D)E
(A∩B∩C)(DUE)
(A∩B∩E)(CUD)
(B∩C∩E)(AUD)
ABC
ABE
BCE
ABCD
CD
Designing RippleSet

43
An item
Items ∈ set A
(C∩D)(AUBUE)
(A∩B∩C∩D)E
(A∩B∩C)(DUE)
(A∩B∩E)(CUD)
(B∩C∩E)(AUD)
ABC
ABE
BCE
ABCD
CD
Items belonging to the
same set are put together
D
D
Weighted DAG
Circle packing algorithm
Designing RippleSet

46
Hypothesize about the effect of the
Common Orientation of an object
Hypothesize about the effect of the Surrounding
environment of an object
What concepts has the model learned?
Are the learned concepts always useful?

Black-box Analysis
47
input model prediction

Black-box Analysis
48
Prospector
Krause et al. 2016
model prediction
input
What-if tool
Wexler et al. 2019
GMUT
Hohman et al. 2019
examine hypotheses about how perturbations to inputs affect the
ML model outputs
Not statistically-meaningful:
• Only observations on individual predictions

White-box Analysis
49
Deconvnet Zeiler and Fergus 2013
Guided back propagation Springenberg et al. 2013
What has a neuron learned?
Not statistical-meaningful:
• The depicted patterns provide largely a hunch rather than solid
conclusions
Not efficient:
• It is impossible to examine all neurons

Can we test concept-based
hypotheses in an efficient and
statistically-meaningful way ?
50

H y p o M L : V i s u a l A n a l y s i s
f o r H y p o t h e s i s - b a s e d
E v a l u a t i o n o f M a c h i n e
L e a r n i n g M o d e l s
Qianwen
Wang1
William
Alexander2
Huamin Qu1
Min Chen2
Jack
Pegg2

noise
noise
D
D
+
+
D
D
Concept-based Testing
52
D
+
noise
D
M+
M+
M
M
M+
M
2 ML models
ML Training
2 pairs of datasets
Extra data that contains
the testing concept

Concept-based Testing
53
D
+
noise
D
M+
M
2 ML models
ML Training
D
+
noise
D
M+
M
D
+
noise
D
M+
M
R(M+,D)
R(M,D)
4 sets of results
ML Testing
Extra data that contains
the testing concept
R(b)
R(a)
R(M+,D+)
R(M,D+)
2 pairs of datasets

R(b)
R(a)
Statistical Comparison
54
significantly lower than
significantly higher than
insignificantly lower or higher than
or , but not
or , but not
Many uncontrolled variables…….
µ(R(a))=0.878 > µ(R(b))=0.876

Top-down workflow
55
0.8133
0.8347
0.8365
0.8356
Statistical
Comparison
Model
Results
H1. The concept is useful to M+ and would be useful to M
H2. The concept is harmful to M+ and would be harmful to M
H3. M has learned the concept ξ adequately
H4. M+ has learned the concept ξ adequately
H5. The extra information in D+ has a positive effect on M
H6. The extra information in D+ has a negative effect on M
H7. The extra information in D+ has a positive effect on M+
H8. The extra information in D+ has a negative effect on M+
H11. Leaning with Dm+ affects the extra part of M+ positively
H12. Leaning with Dm+ afects the extra part of M+ negatively
H9. Leaning with Dm+ affects the M part of M+ positively
H10. Leaning with Dm+ affects the M part of M+ negatively
Hypotheses
p: 0.446
p: 0.098
p: 0.256
p: 0.377
p: 0.061
p: 0.079
R(M+,D+)
R(M+,D)
R(M,D+)
R(M,D)

Visual Analysis of Hypotheses
56
p-value
thr:0.05
Model
Results
0.8757
0.6471
0.6092
0.9188
0.032
R(M, D+)<R(M, D)
0.002
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.015
R(M+, D+)>R(M, D+)
0.405
R(M+, D)<R(M, D+)
0.002
R(M+, D+)>R(M+, D)
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
Hypothesis
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
i
s
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
l
r
e
a
d
y
l
e
a
r
n
e
d
t
h
e
M
+
h
a
s
l
e
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
Supported
Unproven
Rejected
A hypothesis is
based on the
analyses in
the row

p-value
thr:0.05
Model
Results
0.8757
0.6471
0.6092
0.9188
0.032
R(M, D+)<R(M, D)
0.002
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.015
R(M+, D+)>R(M, D+)
0.405
R(M+, D)<R(M, D+)
0.002
R(M+, D+)>R(M+, D)
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
Hypothesis
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
i
s
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
l
r
e
a
d
y
l
e
a
r
n
e
d
t
h
e
M
+
h
a
s
l
e
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
57
The analysis in row
rejects
supports
unproves
is conditional on
is unrelated to
the hypothesis in col

58
The difference is statistically
significant
insignificant
p-value
thr:0.05
Model
Results
0.8757
0.6471
0.6092
0.9188
0.032
R(M, D+)<R(M, D)
0.002
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.015
R(M+, D+)>R(M, D+)
0.405
R(M+, D)<R(M, D+)
0.002
R(M+, D+)>R(M+, D)
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
Hypothesis
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
i
s
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
l
r
e
a
d
y
l
e
a
r
n
e
d
t
h
e
M
+
h
a
s
l
e
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+

Testing Concept:
Color Space
64
RGB
CIELAB
HSV
HSL
CMYK
YCrC
How does the concept Color Space influence the ML
model?
RGB
HSV

65
HSV
RGB
Noise
RGB
M+
D+
D
M How to merge
Color Space:
Experiment Design
How to merge

66
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
add
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
Conv2D
Max Pooling
max
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
max
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
max
How to merge
Color Space:
Experiment Design
maxpool 2
maxpool 1
add
max

67
maxpool 1
The information from another color space HSV
contributes to the prediction of this model
Color Space:
Results

68
maxpool 2
maxpool 1
The hypothesis testing results change when we
merge at different positions
Color Space:
Results

69
add max
Merge using different methods
Color Space:
Results

Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
What can data visualization do
to facilitate the application of
ML in a specific domain ?

Qianwen
Wang
Nils
Gehlenborg
Kexin
Huang
Payal
Chandak
Marinka
Zitnik
DrugxAI: Interactive Visualization for
Explainable AI in Drug Discovery
71
Anatomy
Molecular
Function
Cellular
Component
Biological
Process
Phenoty
pe/Effect
Drug Disease
indication, contraindication, off-label use
drug side
effects disease symptoms/
phenotypes
Reactome
Pathway
present, absent
Protein/
Gene
relationships about drugs,
diseases, proteins, pathways,
effects as a heterogenous graph
Data about biomedicine

The challenges are more than just providing
explanations:
1) find a form of explanation that can be easily
interpreted by doctors in the context of biomedicine
2) present the explanations in a scalable, effective,
and steerable way.
known
relationships
new therapeutic use
deep learning
knowledge learned by
this model
reasons of
this prediction

D
a
t
a
Speci
ficati
on
Knowl
edge
Exploration
image
m
o
d
i
f
y
s
p
e
c
i
f
i
c
a
t
i
o
n
i
n
c
r
e
a
s
e
k
n
o
w
l
e
d
g
e
Machine
Learning
Data
Visualization
Data
Decisions

Data
Specification
Knowledge
Exploration
image
modify
specification
increase
knowledge
J. J. Van Wijk, “The value of visualization”, 2005
Data
Visualization

A p p l y i n g M a c h i n e L e a r n i n g
A d v a n c e s t o D a t a V i s u a l i z a t i o n :
A S u r v e y o f M L 4 V I S
Qianwen
Wang
Huamin
Qu
Zhutian
Chen
Yong
Wang

b
a
4
7
9
15
50
0 10 20 30 40 50
Other
DMM
ML
HCI
VIS
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
1 0 1 1 1
3
1
5
9
19
28
16
2020-

VIS-driven Data
Processing
Insight
Style
Data
Visualization
VIS
Interaction
VIS
Perception
Data
Presentation
Insight
Communication
Style Imitation
USER
VIS
DATA
User
Action

Raw
Data
Processed
Data
VIS-driven
Data
Preprocessing
VIS
Luo et al. Interactive Cleaning for Progressive Visualization through Composite Questions, 2020

Processed
Data
VIS
Data
Presentation
Dibia and Demiralp, Data2Vis, 2018
Hu et al. , VixML, 2018

Insights
VIS
Insight
Communication
Qian et al. 2020
Wang et al. 2020

Data
VIS of
A Specific Style
Style
Imitation
VIS
Tang et al, PlotThread, 2020
Wu et al., MobileVisFixer, 2020
Smart et al., 2019

DeepDrawing: A Deep Learning Approach to Graph Drawing
Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu
Graph Data
Style
Imitation
Graph Drawing
Graph
Drawing
Samples
The curved green arrows (real edges of graphs)
explicitly reflect the actual graph structure
The dotted yellow arrows (“fake” edges)
propagate the prior nodes’ overall influence on
the drawing of subsequent nodes
A graph-based LSTM for the
learning of graph drawing

DeepDrawing: A Deep Learning Approach to Graph Drawing
Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu
Graph Data
Style
Imitation
Graph Drawing
Graph
Drawing
Samples
Baseline Model:
a 4-layer bi-
directional LSTM

VIS
Data, Style,
Insights
VIS Perception
Bylinskii et al. 2017
Poco et al. 2017
Kafle et al. 2018

Towards Automated Infographic Design:
Deep Learning-based Auto-Extraction of Extensible Timeline
Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu
VIS
Perception
"encoding": { "x": {
"field": “sale”,
"scale": { "bandSize":
30 }, "type":
"quantitative" ..
Bitmap visualization
Visualization specification
Mask-RCNN
Post processing based on GrabCut

Towards Automated Infographic Design:
Deep Learning-based Auto-Extraction of Extensible Timeline
Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu
VIS
Perception
"encoding": { "x": {
"field": “sale”,
"scale": { "bandSize":
30 }, "type":
"quantitative" ..
Bitmap visualization
Visualization specification

VIS
VIS Interaction
User
Action
VIS
Chen et al. 2020
Ottley et al. 2019

ID paper venue
1 Gotz and Wen [86] IUI 2009 X X X
2 Savva et al. [107] UIST 2011 X X
3 Key et al. [11] SIGMOD 2012 X X
4 Steichen et al. [84] IUI 2013 X X
5 Brown et al. [62] TVCG 2014 X X
6 Lalle et al. [83] IUI 2014 X X
7 Toker et al. [12] IUI 2014 X X
8 Sedlmair and Aupetit [13] CGF 2015 X X
9 Mutlu et al. [14] TiiS 2016 X X
10 Aupetit and Sedlmair [95] PVis 2016 X X
11 Siegel et al. [102] ECCV 2016 X X
12 Kembhavi et al. [92] ECCV 2016 X x
13 Al-Zaidy et al. [15] AAAI 2016 X X
14 Poci et al. [88] VIS 2017 X X
15 Kwon et al. [74] VIS 2017 X x
16 Bylinskii et al. [64] UIST 2017 X X
17 Saha et al. [117] IJCAI 2017 X X
18 Kruiger et al. [16] EuroVis 2017 X X
19 Poco and Heer [89] EuroVis 2017 X X
20 Jung et al. [99] CHI 2017 X X
21 Bylinskii et al. [100] arxiv 2017 X X X
22 Al-Zaidy and Giles [17] AAAI 2017 X X
23 Siddiqui et al. [61] VLDB 2018 X X
24 Gramazio et al. [85] VIS 2018 X X
25 Moritz et al. [18] VIS 2018 X X x
26 Berger et al. [68] VIS 2018 X X
27 Wang et al. [53] VIS 2018 X X
28 Haehn et al. [19] VIS 2018 X x
29 Luo et al. [57] SIGMOD 2018 X X x
30 Milo and Somech [80] KDD 2018 X X
31 Zhou et al. [20] IJCAI 2018 X X
32 Kahou et al. [101] ICLR 2018 X X
33 Luo et al. [65] ICDE 2018 X X
34 [Fan and Hauser [79] EuroVis 2018 X X
35 Chegini et al. [96] EuroVis 2018 X X
36 Kafle et al. [63] CVPR 2018 X X x
37 Kim et al. [106] CVPR 2018 X x
38 Battle et al. [108] CHI 2018 X X
39 Dibia and Demiralp [54] CGA 2018 X X
40 Haleem et al. [94] CGA 2018 X X
41 Madan et al. [103] arxiv 2018 X x X
V
I
S
-
d
r
i
v
e
n
D
a
t
a
P
r
o
c
e
s
s
i
n
g
P
r
e
s
e
n
t
D
a
t
a
C
o
m
m
u
n
i
c
a
t
e
I
n
s
i
g
h
t
I
m
i
t
a
t
e
S
t
y
l
e
V
I
S
P
e
r
c
e
p
t
i
o
n
V
I
S
I
n
t
e
r
a
c
t
i
o
n
C
l
u
s
t
e
r
i
n
g
D
i
m
e
n
s
i
o
n
R
e
d
u
c
t
i
o
n
G
e
n
e
r
a
t
i
v
e
C
l
a
s
s
i
f
i
c
a
t
i
o
n
R
e
g
r
e
s
s
i
o
n
S
e
m
i
-
s
u
p
e
r
v
i
s
e
d
R
e
i
n
f
o
r
c
e
m
e
n
t
14 Poci et al. [88] VIS 2017 X X
15 Kwon et al. [74] VIS 2017 X x
16 Bylinskii et al. [64] UIST 2017 X X
17 Saha et al. [117] IJCAI 2017 X X
18 Kruiger et al. [16] EuroVis 2017 X X
19 Poco and Heer [89] EuroVis 2017 X X
20 Jung et al. [99] CHI 2017 X X
21 Bylinskii et al. [100] arxiv 2017 X X X
22 Al-Zaidy and Giles [17] AAAI 2017 X X
23 Siddiqui et al. [61] VLDB 2018 X X
24 Gramazio et al. [85] VIS 2018 X X
25 Moritz et al. [18] VIS 2018 X X x
26 Berger et al. [68] VIS 2018 X X
27 Wang et al. [53] VIS 2018 X X
28 Haehn et al. [19] VIS 2018 X x
29 Luo et al. [57] SIGMOD 2018 X X x
30 Milo and Somech [80] KDD 2018 X X
31 Zhou et al. [20] IJCAI 2018 X X
32 Kahou et al. [101] ICLR 2018 X X
33 Luo et al. [65] ICDE 2018 X X
34 [Fan and Hauser [79] EuroVis 2018 X X
35 Chegini et al. [96] EuroVis 2018 X X
36 Kafle et al. [63] CVPR 2018 X X x
37 Kim et al. [106] CVPR 2018 X x
38 Battle et al. [108] CHI 2018 X X
39 Dibia and Demiralp [54] CGA 2018 X X
40 Haleem et al. [94] CGA 2018 X X
41 Madan et al. [103] arxiv 2018 X x X
42 Yu and Silva [82] VIS 2019 X X
43 He et al. [69] VIS 2019 X X
44 Chen et al. [59] VIS 2019 X X
45 Han and Wang [67] VIS 2019 X X
47 Kwon and Ma [75] VIS 2019 X X
48 Wang et al. [2] VIS 2019 X x
49 Han et al. [120] VIS 2019 X X x
50 Wall et al. [111] VIS 2019 X X
51 Fujiwara et al. [118] VIS 2019 X X
52 Fu et al. [3] VIS 2019 X x X
53 Porter et al. [21] VIS 2019 X X
54 Jo and Seo [119] VIS 2019 X X x
55 Ma et al. [93] VIS 2019 X X
56 Wang et al. [73] VIS 2019 x X
57 Cui et al. [56] VIS 2019 X X
58 Chen et al. [5] VIS 2019 x X
60 Smart et al. [58] VIS 2019 X X
61 Huang et al. [104] VIS 2019 X X
62 Hong et al. [23] PacificVis 2019 X X
63 Fan and Hauser [122] EuroVis 2019 X X
64 Ottley et al. [60] EuroVis 2019 X X
65 Abbas et al. [121] EuroVis 2019 X x x
66 Kassel and Rohs [24] EuroVis 2019 X X X
67 Hu et al. [66] CHI 2019 X X
68 Fan and Hauser [25] CGA 2019 X X
69 Kafle et al. [26] arxiv 2019 X X
45 Han and Wang [67] VIS 2019 X X
47 Kwon and Ma [75] VIS 2019 X X
49 Han et al. [120] VIS 2019 X X x
50 Wall et al. [111] VIS 2019 X X
51 Fujiwara et al. [118] VIS 2019 X X
52 Fu et al. [3] VIS 2019 X x X
53 Porter et al. [21] VIS 2019 X X
54 Jo and Seo [119] VIS 2019 X X x
55 Ma et al. [93] VIS 2019 X X
57 Cui et al. [56] VIS 2019 X X
58 Chen et al. [5] VIS 2019 x X
60 Smart et al. [58] VIS 2019 X X
61 Huang et al. [104] VIS 2019 X X
62 Hong et al. [23] PacificVis 2019 X X
63 Fan and Hauser [122] EuroVis 2019 X X
64 Ottley et al. [60] EuroVis 2019 X X
65 Abbas et al. [121] EuroVis 2019 X x x
66 Kassel and Rohs [24] EuroVis 2019 X X X
67 Hu et al. [66] CHI 2019 X X
68 Fan and Hauser [25] CGA 2019 X X
69 Kafle et al. [26] arxiv 2019 X X
70 Mohammed [27] VLDB 2020 X x
71 Zhang et al. [90] VIS 2020 x x x
72 Wu et al. [77] VIS 2020 x x
73 Tang et al. [76] VIS 2020 x x
74 Qian et al. [28] VIS 2020 x x
76 Fosco et al. [112] UIST 2020 x x
77 Giovannangeli et al. [139] PacificVis 2020 x x
78 Liu et al. [105] PacificVis 2020 x x x
79 Luo et al. [52] ICDE 2020 X x x
80 Lekschas et al. [113] EuroVis 2020 x x X x
81 Zhao et al. [30] CHI 2020 X x
82 Lai et al. [31] CHI 2020 x x x
83 Kim et al. [32] CHI 2020 x x x
84 Lu et al. [33] CHI 2020 x x x
85 Zhou et al. [109] arxiv 2020 X X
Machine Learning Tasks:
Clustering,
Dimension Reduction,
Generation
Classification
Regression
Semi-supervised Learning
Reinforcement Learning
Visualization Process:
VIS-driven Data Processing
Data Presentation
Insight Communication
Style Imitation
VIS Perception
VIS Interaction

Classification is the most widely used

This might be caused by the success of deep learning
in computer vision tasks

We need to better embrace the diversity of ML
techniques.

ML4VIS:
Opportunities
& Challenges
Public High-quality Datasets
& Benchmark Tasks
Visualization-Tailored Machine Learning
User-Friendly ML4VIS

ML4VIS:
Opportunities
& Challenges
Public High-quality Datasets
& Benchmark Tasks
• Most papers constructed their own datasets due to the
lack of public visualization datasets
• The dataset quality may endanger the validity of the
obtained ML models.
e.g., DeepEye [luo et al.2019] learns to classify
“good”/“bad” visualizations based on the training
examples labelled by 100 students
• Benchmark tasks for ML4VIS remain unclear

ML4VIS:
Opportunities
& Challenges
Visualization-Tailored Machine Learning
• Most ML4VIS studies directly apply general ML
techniques developed in the field of ML
• General ML techniques not always suit well for the
specific problems in visualization

ML4VIS:
Opportunities
& Challenges
User-friendly ML4VIS
• The employment of ML not only provides opportunities
but also poses new challenges in designing
visualizations
• Some ML4VIS studies have discussed the usability
issues of ML4VIS, but these suggestions are scattered
among different papers
• Future studies are needed to help designers better
understand user behaviours and expectations in this
new ML4VIS scenario
https://qarea.com/blog/5-tips-for-creating-user-friendly-interface

Machine Learning
+
Data Visualization
+
Humans
Amount of Information
Few Large
Human
Head
Pure Machine Learning
Pure Data
Visualization
Task Definition
Fuzzy
Clear There is no panacea
A better combination between the
power of visualization, machine
learning, and human users:
• How to split tasks
• How to dynamically modify the
splitting based on user
preference and expertise
• How to design novel algorithms &
visualizations for the
collaboration
Machine Learning or Data Visualization?

Thanks!
https://wangqianwen0418.github.io/
qianwen_wang@hms.harvard.edu
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Data
Visualization
Data Decisions
VIS-driven Data
Processing
Insight
Style
Data
Visualization
VIS
Interaction
VIS
Perception
Data
Presentation
Insight
Communication
Style Imitation
USER
VIS
DATA
User
Action
Machine
Learning

From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

Semelhante a From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning (20)

Último

Último (20)

From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning