SlideShare uma empresa Scribd logo
1 de 25
Aditya Parameswaran
Assistant Professor
University of Illinois
(w/ ManasiVartak, Samuel Madden @ MIT;
Tarique Siddiqui, Silu Huang @ Illinois)
http://data-people.cs.illinois.edu
DSIAWorkshop,VIS 2015
TowardsVisualization
Recommendation Systems
1
“Bring out your dead!” courtesy Monty Python
The Dark Ages ofVisualization
Recommendations
Substantial manual effort and tedious trial-and-error
2
To the Age of Enlightenment:
the Holy Grail
Can we build systems that automatically recommend
visualizations highlighting patterns of interest?
3
“The Holy Grail” courtesy Monty Python
Why now?
Reason 1: Too much data: records and attributes
Most of the dataset is unexplored!
4
Why now?
Reason 2: Lack of skills
Harvard Business Review Mashable.com
5
Limitations in CurrentTools
• Big Picture
• Analyst Preferences
• Specification
• Exploration
not ACID …
6
Limitations in CurrentTools
• Big Picture
– Poor comprehension of context
• Analyst Preferences
– Limited understanding of user interests
• Specification
– Insufficient means to specify trends of interest
• Exploration
– Inadequate navigation to unexplored areas
7
RecentAttempts atVizrec Systems
• Tableau Elastic
• Voyager
• Harvest
• Profiler
• Our systems
– SeeDB [VLDB 14 x 2,VLDB 16]
– zenvisage [unpublished]
This conference!
8
Still early days!
SeeDB: ComparativeTasks
Task:
Compare staplers (target, query)
with other products
Results:
Visualizations where staplers
“differ most” from other products
Issue: Many attributes  Many many visualizations!9
50
10 10
30
MA CA IL NY
30
20
10
40
Stapler sales
Other sales
Stapler prod
9
Other prod
: SearchTasks
Very early demo! Feedback welcome.
(you saw it here first...)
10
5 RecommendationAxes
• Specification of IntendedTask or Insight
– e.g., comparative (X vs.Y), search (find X with a
desired criteria), outliers (find unusual X)
• Data Characteristics
– e.g., typical correlations, patterns, trends across
attributes, across rows
• Semantics or Domain Knowledge
• Visual Ease of Understanding
• Analyst Preferences
11data-people.cs.illinois.edu/papers/dsia.pdf
Architectural Considerations
• Pre-computation
• Online computation
–Sharing
–Parallelism
–Pruning
–Approximations [VLDB’15]
12data-people.cs.illinois.edu/papers/dsia.pdf
A Clarion Call to DSIA Researchers…
Visualization Recommendation Systems:
are critically important
are timely
lead to interesting viz, db, ml, hci problems
Let’s move towards the age of enlightenment!
“The Holy Grail” courtesy Monty Python
13
data-people.cs.illinois.edu/papers/dsia.pdf
Ongoing Projects in Interactive Analytics
Minimizing effort & maximizing efficiency
http://data-people.cs.illinois.edu
• Data Manipulation [VLDB’15 x 2]
• DataVisualization [VLDB’14 x 2,VLDB ’15,VLDB ‘16]
• Data Collaboration [VLDB ’15 x 2, CIDR ’15,TAPP ’15]
• Data Processing with [VLDB ’15, HCOMP ’15, KDD ‘15]
datahub
14
Recent Papers, Demos
POPULACE
15
ResearchThrust II: Crowds
Minimizing cost and maximizing accuracy in
human-powered data management
Data Processing
Algorithms
Auxiliary Plugins:
Quality, Pricing
Data Processing
Systems
Filter [SIGMOD12,VLDB14] Max [SIGMOD12]
Clean [KDD12,TKDD13] Categorize [VLDB11]
Search [ICDE14] Debug [NIPS12] Count [HCOMP15]
Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12]
DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11]
Conf [KDD13, ICDE15] Evict [TR12] Debias [KDD15]
Pricing[VLDB15] Quality [HCOMP14]
16
Human-in-the-loop
Data Management
Dual personalities
• Analysts supervising the analysis
– How do we help them get the insights they want?
• Crowds helping the analysis
– How do we best make use of them to process data?
17
Visualizations
Queries (100s)
Sharing
Pruning
Optimizer
DBMS
Middleware
Layer
18
Task Specification
ManualVisualization Builder
Visualization Pane
Recommendation Bar
User Study
Part I :Validate utility metric vs. other metrics
– See paper!
Part II : Study impact of recommendations
– H1: SeeDB finds interesting visualizations faster
– H2: Users prefer tool w/recommendations
I. SeeDB enables faster analysis
• Users view more visualizations with SeeDB
• Users bookmark more visualizations with SeeDB
• Bookmark rate 3X higher with SeeDB
# charts # bookmarks bookmark rate
Manual 6.3 +/- 3.8 1.1 +/- 1.45 0.14 +/- 0.16
SeeDB 10.8 +/- 4.41 3.4 +/- 1.35 0.43 +/- 0.23
II. Users Prefer SeeDB
100% users prefer SeeDB over Manual
“. . . quickly deciding what correlations are relevant” and
“[analyze] . . . a new dataset quickly”
“. . . great tool for proposing a set of initial queries for a
dataset”
“. . . potential downside may be that it made me lazy so I
didn’t bother thinking as much about what I really could study
or be interested in”
Questions on Part 2?
Overall research agenda …
Human-in-the-loop
Data Management
24
25

Mais conteúdo relacionado

Mais procurados

Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningJulian Bright
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)heba_ahmad
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities台灣資料科學年會
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEuropean Data Forum
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive FrameworkRan Zhang
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 

Mais procurados (20)

Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 

Semelhante a Towards Visualization Recommendation Systems

ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxMrityunjay Emmi
 
3 джозеп курто превращаем вашу организацию в big data компанию
3 джозеп курто превращаем вашу организацию в big data компанию3 джозеп курто превращаем вашу организацию в big data компанию
3 джозеп курто превращаем вашу организацию в big data компаниюantishmanti
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteRich Clayton
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
 
TDWI BP Report Emerging Technologies
TDWI BP Report Emerging TechnologiesTDWI BP Report Emerging Technologies
TDWI BP Report Emerging TechnologiesAndrey Karpov
 
[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩NAVER D2
 
BDA_Module1.pptx
BDA_Module1.pptxBDA_Module1.pptx
BDA_Module1.pptxShrinivasa6
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 

Semelhante a Towards Visualization Recommendation Systems (20)

ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
3 джозеп курто превращаем вашу организацию в big data компанию
3 джозеп курто превращаем вашу организацию в big data компанию3 джозеп курто превращаем вашу организацию в big data компанию
3 джозеп курто превращаем вашу организацию в big data компанию
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
big_data_case_studies.pdf
big_data_case_studies.pdfbig_data_case_studies.pdf
big_data_case_studies.pdf
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium Keynote
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
TDWI BP Report Emerging Technologies
TDWI BP Report Emerging TechnologiesTDWI BP Report Emerging Technologies
TDWI BP Report Emerging Technologies
 
[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩
 
BDA_Module1.pptx
BDA_Module1.pptxBDA_Module1.pptx
BDA_Module1.pptx
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Towards Visualization Recommendation Systems

  • 1. Aditya Parameswaran Assistant Professor University of Illinois (w/ ManasiVartak, Samuel Madden @ MIT; Tarique Siddiqui, Silu Huang @ Illinois) http://data-people.cs.illinois.edu DSIAWorkshop,VIS 2015 TowardsVisualization Recommendation Systems 1
  • 2. “Bring out your dead!” courtesy Monty Python The Dark Ages ofVisualization Recommendations Substantial manual effort and tedious trial-and-error 2
  • 3. To the Age of Enlightenment: the Holy Grail Can we build systems that automatically recommend visualizations highlighting patterns of interest? 3 “The Holy Grail” courtesy Monty Python
  • 4. Why now? Reason 1: Too much data: records and attributes Most of the dataset is unexplored! 4
  • 5. Why now? Reason 2: Lack of skills Harvard Business Review Mashable.com 5
  • 6. Limitations in CurrentTools • Big Picture • Analyst Preferences • Specification • Exploration not ACID … 6
  • 7. Limitations in CurrentTools • Big Picture – Poor comprehension of context • Analyst Preferences – Limited understanding of user interests • Specification – Insufficient means to specify trends of interest • Exploration – Inadequate navigation to unexplored areas 7
  • 8. RecentAttempts atVizrec Systems • Tableau Elastic • Voyager • Harvest • Profiler • Our systems – SeeDB [VLDB 14 x 2,VLDB 16] – zenvisage [unpublished] This conference! 8 Still early days!
  • 9. SeeDB: ComparativeTasks Task: Compare staplers (target, query) with other products Results: Visualizations where staplers “differ most” from other products Issue: Many attributes  Many many visualizations!9 50 10 10 30 MA CA IL NY 30 20 10 40 Stapler sales Other sales Stapler prod 9 Other prod
  • 10. : SearchTasks Very early demo! Feedback welcome. (you saw it here first...) 10
  • 11. 5 RecommendationAxes • Specification of IntendedTask or Insight – e.g., comparative (X vs.Y), search (find X with a desired criteria), outliers (find unusual X) • Data Characteristics – e.g., typical correlations, patterns, trends across attributes, across rows • Semantics or Domain Knowledge • Visual Ease of Understanding • Analyst Preferences 11data-people.cs.illinois.edu/papers/dsia.pdf
  • 12. Architectural Considerations • Pre-computation • Online computation –Sharing –Parallelism –Pruning –Approximations [VLDB’15] 12data-people.cs.illinois.edu/papers/dsia.pdf
  • 13. A Clarion Call to DSIA Researchers… Visualization Recommendation Systems: are critically important are timely lead to interesting viz, db, ml, hci problems Let’s move towards the age of enlightenment! “The Holy Grail” courtesy Monty Python 13 data-people.cs.illinois.edu/papers/dsia.pdf
  • 14. Ongoing Projects in Interactive Analytics Minimizing effort & maximizing efficiency http://data-people.cs.illinois.edu • Data Manipulation [VLDB’15 x 2] • DataVisualization [VLDB’14 x 2,VLDB ’15,VLDB ‘16] • Data Collaboration [VLDB ’15 x 2, CIDR ’15,TAPP ’15] • Data Processing with [VLDB ’15, HCOMP ’15, KDD ‘15] datahub 14 Recent Papers, Demos POPULACE
  • 15. 15
  • 16. ResearchThrust II: Crowds Minimizing cost and maximizing accuracy in human-powered data management Data Processing Algorithms Auxiliary Plugins: Quality, Pricing Data Processing Systems Filter [SIGMOD12,VLDB14] Max [SIGMOD12] Clean [KDD12,TKDD13] Categorize [VLDB11] Search [ICDE14] Debug [NIPS12] Count [HCOMP15] Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12] DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11] Conf [KDD13, ICDE15] Evict [TR12] Debias [KDD15] Pricing[VLDB15] Quality [HCOMP14] 16
  • 17. Human-in-the-loop Data Management Dual personalities • Analysts supervising the analysis – How do we help them get the insights they want? • Crowds helping the analysis – How do we best make use of them to process data? 17
  • 20. User Study Part I :Validate utility metric vs. other metrics – See paper! Part II : Study impact of recommendations – H1: SeeDB finds interesting visualizations faster – H2: Users prefer tool w/recommendations
  • 21. I. SeeDB enables faster analysis • Users view more visualizations with SeeDB • Users bookmark more visualizations with SeeDB • Bookmark rate 3X higher with SeeDB # charts # bookmarks bookmark rate Manual 6.3 +/- 3.8 1.1 +/- 1.45 0.14 +/- 0.16 SeeDB 10.8 +/- 4.41 3.4 +/- 1.35 0.43 +/- 0.23
  • 22. II. Users Prefer SeeDB 100% users prefer SeeDB over Manual “. . . quickly deciding what correlations are relevant” and “[analyze] . . . a new dataset quickly” “. . . great tool for proposing a set of initial queries for a dataset” “. . . potential downside may be that it made me lazy so I didn’t bother thinking as much about what I really could study or be interested in”
  • 24. Overall research agenda … Human-in-the-loop Data Management 24
  • 25. 25

Notas do Editor

  1. Despite the advent of visualization tools like Tableau, we’re still in Current are akin to a movie catalog Where you can see the list of available movies, Select ones you want And see information about them. If you don’t know the movie you want to watch, you’ll have to look at a whole lot of movies before you what you desire In other words, current visualization systems involve sub Before you get the desired result
  2. Let’s move to Much like netflix and amazon recommendations of today,
  3. Why is this timely? Increasingly larger datasets with large numbers of records and attributes As a result Motivating the need for recommendations for the unexplored areas
  4. Second reason is that everyone wants to be a data scientist (and who are we to argue), but don’t really have the skills. We need to build the tools that help them get the insights they need.
  5. So what do current systems lack. I’m a database guy, and for some reason, we love chemistry based acronyms, so here’s a new one.
  6. Provide a.. Is the dip in february in sales expected? Or is it anomalous? Do not take into account typical browsing patterns For instance, if the analyst wants to find all products that took a hit in february? Can we find all attributes on which two products differ? Often users focus on a tiny portion of the dataset, perhaps due to inexperience.
  7. As it turns out.. We aren’t the only ones preaching this wisdom. Partially addressing these limitations Including one from tableau and one appearing at this very conf from the jeff and the uw folks I’m going to tell you about our systems to give you a flavor of what we’re talking about
  8. Caters to the user specification of a comparative task What SeeDB will provide are .. Among all the vis Key issue here is that
  9. Caters to the user specification of a search task
  10. In our workshop paper, we identified 5 recommendation axes: Which is very hard Ton of work from the viz community on this
  11. In building these vizrec systems there are a number of interesting systems challenges What should be done online and offline Online, how do we maximize sharing and parallelism in evaluating these recs? How do we … that we know are not useful How do we leverage app to return results faster, or return approximate results?
  12. In the age of data science
  13. Overall architecture Middleware layer that sits between the UI and the DBMS User task (compare married/un) is broken down into a collection of q; Optimizer handles these q using a combination of … optimizations and makes repeated q to the DBMS
  14. Note of caution