Slides for my talk at the VAST 2016 conference within IEEE VIS 2016. The details of the presented paper can be found on this page: http://www.gicentre.net/featuredpapers/#/turkaydesigning2016/
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis
1. Cagatay Turkay
Erdem Kaya
Selim Balcisoy
Helwig Hauser www.gicentre.net/vis2016
Designing Progressive and Interactive
Analytics Processes
for High-Dimensional Data Analysis
2. Visual analytics (VA) can be considered as an
interactive and iterative dialogue between the
human and the computer where the interactive
analysis process is a sequence of actions by the
user and responses by the computer motivated by
an analytical question …
Based on several works, e.g., [Keim et al. 2008], [Green et al. 2008]
3. …. this iterative discourse serves as the fundamental
mechanism through which analysts make observations
… we need to ensure that this dialogue takes place at a
pace that respects the temporal cognitive capabilities
of users …
8. This paper …
… visual data analysis processes where a computational
tool is integrated to support high-dimensional data analysis
9. … instead of forcing the user to wait for an interactive computation to
finish, we present a best possible result within an acceptable time
frame.
In essence ..
10. …. techniques and design considerations to
incorporate progressive methods within interactive
analysis processes that involve high-dimensional data ….
Online
algorithms
Visual
Representations
Levels of
Operation
Interactions
12. Levels of Operation
Level 1 (0.1 sec.)
What: (animated) transitions between (computation) results
Why: ensures perceptually smooth transitions
Level 2 (1 sec.)
What: guaranteed response time for intermediate results
Why: Maintains dialog nature
Level 3 (10 – 30 sec.)
What: analytical unit task completion
Why: Answer a specific question
e.g. finding groups, locate outliers
A framework to implement human time constants
13. Integrating online algorithms
Can operate on small batches of data (on random sample subsets)
Produce approximate results
Updates can be done efficiently
Online PCA (Ross et al., 2008) Online clustering (Sculley et al., 2010)
14. online PCA
- Incremental SVD computation
- Intermediate results at each 1 sec.
- Immediate response
- Compute on subset but project all
- Colouring & improved transitions
15. Adaptive random sampling
Guarantee response in a fixed period of time (i.e. 1 sec.)
Faster convergence
1st Batch size: %8 , Time taken: 0.3 sec.
2nd
Batch size: %16, Time taken: 0.7 sec.
3rd
Batch size: %33, Time taken: 1.3 sec.
4th
Batch size: %25, Time taken: 0.9 sec.
5th
Batch size: %25, Time taken: 0.9 sec.
Let’s say these are your data items
18. Interaction methods to moderate the process
Key-framed brushing [Turkay, 2014]
Well-defined sequences that can be represented in 30 sec.
Help define analytical unit task
19. Evalution Workshops
Problem: Credit card transactions segmentation -- groups of expenditures with similar
characteristics
Data: 300K+ CC transactions, 5K customers (demog., location, financial metrics, etc.)
Methodology:
- 2-months long case study, 4 analysis session
(1 for training) with 4 CRM analyst
- Fly-on-the-wall observations
- Semi-structured interviews
- Video and sound recorded, renounce times noted,
32 hours of video processed for the extraction
of inference moments and quotes transcribed.
- Insights, questions, hypothesis identified
21. Observed/reported positive aspects
- Generation and verification of hypotheses in short time
- Continuous engagement
“We could generate so many new hypotheses in a very short time without
waiting for the whole calculation to end.”
“..., [Visualization] is quite engaging as we don’t have to wait for even a
moment to get some initial results.”
22. Observed/reported positive aspects
- Generation and verification of hypotheses in short time
- Continuous engagement
- Stability is key in decisions
“ … It seems like the clustering will not change. ... let’s switch to some
other set …
23. Observed/reported issues
- Continuous update of the visualization can be distracting
“... it can be distracting to look at an ever-changing visualization. [If we were]
able to set the step size, … then we can have some time to talk about
intermediate results.”
- Uncertainty and unstability is an issue
“…. I’ve just seen a high response score for the selected cluster, but it has just
gone away. “
- Early decisions might be wrong
- Multiple views operating concurrently can be problematic
Unaligned convergence
24. Ten Design Recommendations
DR1: Employ human time constants as the underlying theoretical
framework that governs the pace of interaction in analytical processes
DR2: Employ online learning algorithms that are capable of handling
data in sub-batches to perform computational tasks.
DR3: Employ an adaptive sampling mechanism that estimates suitable sample sizes for
computations to ensure efficiency in convergence
while still respecting the temporal constraints.
DR4: Facilitate the immediate initiation of computations in response
to user interactions that limit the domain of the algorithms.
DR5: Provide users with interaction mechanisms enabling management (pause, step size, re-
run) of the progression.
DR6: During the interaction design of visual analytic solutions, consider the effects of possible
fluctuations due to unaligned progression
in multiple progressive views.
DR7: Provide interaction mechanisms to define structured investigation sequences for
systematic generation and comparisons of computational results.
DR8: Support the interpretation of the evolution of the results through
suitable visualization techniques.
DR9: Inform analysts on the progress of computations and indications
of time-to-completion.
DR10: Inform analysts on the uncertainty in the computations and the
way the computations develop.
25. Future challenges & opportunities
Better heuristics/quality metrics
Reproducibility?
- different samples in each run
Provenance
26. …. instead of forcing the user to adjust to the temporal and
cognitive capabilities of visual analysis solutions, we orient the
technical solutions at the communication characteristics of
the users.
To conclude …
27. Cagatay Turkay
Erdem Kaya
Selim Balcisoy
Helwig Hauser
Designing Progressive and Interactive
Analytics Processes
for High-Dimensional Data Analysis
Full list of giCentre VIS 2016 contributions
www.gicentre.net/vis2016