Data visualization is often used as the first step while performing a variety of analytical tasks. With the advent of large, high-dimensional datasets and strong interest in data science, there is a need for tools that can support rapid visual analysis. In this paper we describe our vision for a new class of visualization recommendation systems that can automatically identify and interactively recommend visualizations relevant to an analytical task.
What's New in Teams Calling, Meetings and Devices March 2024
Towards Visualization Recommendation Systems
1. Aditya Parameswaran
Assistant Professor
University of Illinois
(w/ ManasiVartak, Samuel Madden @ MIT;
Tarique Siddiqui, Silu Huang @ Illinois)
http://data-people.cs.illinois.edu
DSIAWorkshop,VIS 2015
TowardsVisualization
Recommendation Systems
1
2. “Bring out your dead!” courtesy Monty Python
The Dark Ages ofVisualization
Recommendations
Substantial manual effort and tedious trial-and-error
2
3. To the Age of Enlightenment:
the Holy Grail
Can we build systems that automatically recommend
visualizations highlighting patterns of interest?
3
“The Holy Grail” courtesy Monty Python
4. Why now?
Reason 1: Too much data: records and attributes
Most of the dataset is unexplored!
4
7. Limitations in CurrentTools
• Big Picture
– Poor comprehension of context
• Analyst Preferences
– Limited understanding of user interests
• Specification
– Insufficient means to specify trends of interest
• Exploration
– Inadequate navigation to unexplored areas
7
8. RecentAttempts atVizrec Systems
• Tableau Elastic
• Voyager
• Harvest
• Profiler
• Our systems
– SeeDB [VLDB 14 x 2,VLDB 16]
– zenvisage [unpublished]
This conference!
8
Still early days!
9. SeeDB: ComparativeTasks
Task:
Compare staplers (target, query)
with other products
Results:
Visualizations where staplers
“differ most” from other products
Issue: Many attributes Many many visualizations!9
50
10 10
30
MA CA IL NY
30
20
10
40
Stapler sales
Other sales
Stapler prod
9
Other prod
13. A Clarion Call to DSIA Researchers…
Visualization Recommendation Systems:
are critically important
are timely
lead to interesting viz, db, ml, hci problems
Let’s move towards the age of enlightenment!
“The Holy Grail” courtesy Monty Python
13
data-people.cs.illinois.edu/papers/dsia.pdf
14. Ongoing Projects in Interactive Analytics
Minimizing effort & maximizing efficiency
http://data-people.cs.illinois.edu
• Data Manipulation [VLDB’15 x 2]
• DataVisualization [VLDB’14 x 2,VLDB ’15,VLDB ‘16]
• Data Collaboration [VLDB ’15 x 2, CIDR ’15,TAPP ’15]
• Data Processing with [VLDB ’15, HCOMP ’15, KDD ‘15]
datahub
14
Recent Papers, Demos
POPULACE
16. ResearchThrust II: Crowds
Minimizing cost and maximizing accuracy in
human-powered data management
Data Processing
Algorithms
Auxiliary Plugins:
Quality, Pricing
Data Processing
Systems
Filter [SIGMOD12,VLDB14] Max [SIGMOD12]
Clean [KDD12,TKDD13] Categorize [VLDB11]
Search [ICDE14] Debug [NIPS12] Count [HCOMP15]
Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12]
DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11]
Conf [KDD13, ICDE15] Evict [TR12] Debias [KDD15]
Pricing[VLDB15] Quality [HCOMP14]
16
17. Human-in-the-loop
Data Management
Dual personalities
• Analysts supervising the analysis
– How do we help them get the insights they want?
• Crowds helping the analysis
– How do we best make use of them to process data?
17
20. User Study
Part I :Validate utility metric vs. other metrics
– See paper!
Part II : Study impact of recommendations
– H1: SeeDB finds interesting visualizations faster
– H2: Users prefer tool w/recommendations
21. I. SeeDB enables faster analysis
• Users view more visualizations with SeeDB
• Users bookmark more visualizations with SeeDB
• Bookmark rate 3X higher with SeeDB
# charts # bookmarks bookmark rate
Manual 6.3 +/- 3.8 1.1 +/- 1.45 0.14 +/- 0.16
SeeDB 10.8 +/- 4.41 3.4 +/- 1.35 0.43 +/- 0.23
22. II. Users Prefer SeeDB
100% users prefer SeeDB over Manual
“. . . quickly deciding what correlations are relevant” and
“[analyze] . . . a new dataset quickly”
“. . . great tool for proposing a set of initial queries for a
dataset”
“. . . potential downside may be that it made me lazy so I
didn’t bother thinking as much about what I really could study
or be interested in”
Despite the advent of visualization tools like Tableau, we’re still in
Current are akin to a movie catalog
Where you can see the list of available movies,
Select ones you want
And see information about them.
If you don’t know the movie you want to watch,
you’ll have to look at a whole lot of movies before you what you desire
In other words, current visualization systems involve sub
Before you get the desired result
Let’s move to
Much like netflix and amazon recommendations of today,
Why is this timely?
Increasingly larger datasets with large numbers of records and attributes
As a result
Motivating the need for recommendations for the unexplored areas
Second reason is that everyone wants to be a data scientist (and who are we to argue), but don’t really have the skills.
We need to build the tools that help them get the insights they need.
So what do current systems lack.
I’m a database guy, and for some reason, we love chemistry based acronyms, so here’s a new one.
Provide a.. Is the dip in february in sales expected? Or is it anomalous?
Do not take into account typical browsing patterns
For instance, if the analyst wants to find all products that took a hit in february? Can we find all attributes on which two products differ?
Often users focus on a tiny portion of the dataset, perhaps due to inexperience.
As it turns out.. We aren’t the only ones preaching this wisdom.
Partially addressing these limitations
Including one from tableau and one appearing at this very conf from the jeff and the uw folks
I’m going to tell you about our systems to give you a flavor of what we’re talking about
Caters to the user specification of a comparative task
What SeeDB will provide are .. Among all the vis
Key issue here is that
Caters to the user specification of a search task
In our workshop paper, we identified 5 recommendation axes:
Which is very hard
Ton of work from the viz community on this
In building these vizrec systems there are a number of interesting systems challenges
What should be done online and offline
Online, how do we maximize sharing and parallelism in evaluating these recs?
How do we … that we know are not useful
How do we leverage app to return results faster, or return approximate results?
In the age of data science
Overall architecture
Middleware layer that sits between the UI and the DBMS
User task (compare married/un) is broken down into a collection of q;
Optimizer handles these q using a combination of … optimizations and makes repeated q to the DBMS