SlideShare uma empresa Scribd logo
Explainability in AI and RecSys:
let’s make it interactive!
Martijn Willemsen
Why do we need explainability?
• Model validation: avoid biases, unfairness or overfitting, detect
issues in the training data, adhere to ethical/legal requirements
• Model debugging and improvement: improving the model fit,
adversarial learning (fooling a model with ‘hacked’ inputs), reliability
& robustness (sensitivity to small input changes)
• Knowledge discovery: explanations provide feedback to the Data
Scientist or user that can result in new insights by revealing hidden
underlying correlations/patterns.
• Trust and technology acceptance: explanations might convince
users to adopt the technology more and have more control
“
”
Poll: What is a good explanation?
A: complete and accurate evidence for decision
B: gives a single good reason why this decision
C: tells me what I need to get a different decision
What is important for explainability in ML?
• Accuracy: does the explanation predict unseen data? Is it as
accurate as the model itself?
• Fidelity: does the explanation approximate the prediction of the
model? Especially important for black-box models (local
fidelity).
• Consistency: same explanations for different models?
• Stability: similar explanations for similar instances?
• Comprehensibility: do humans get it (see previous slide)
Some of these are hard to achieve with some models…
https://christophm.github.io/interpretable-ml-book/properties.html
What is a good explanation (for humans)?
Confalonieri et al. (2020) & Molnar (2020) based on Miller:
• Contrastive: why was this prediction made in stead of
another?
• Selective: focus on a few important causes (not all
features that contributed to the model).
• Social: should fit the mental model of the explainee /
target audience, consider the social context, and fit
their prior belie
• Abnormalness: humans like rare causes (related to
counterfactuals)
• (Truthfulness: less important for humans then
selectiveness!)
https://christophm.github.io/interpretable-ml-book/explanation.html
Machine learning / AI interpretability
Some methods are inherently interpretable (glass-box or white box models)
• Regression, decision trees, GAM
• Some RecSys algorithms (content based-or classical CF)
Many others are not: black-box models
• Neural networks (CNN/RNN), random forest, Matrix factorization etc
• often requires post-hoc explanations (leave the model intact)
Further distinction can be made between:
• Model-specific method (explanation is specific to the ML technique)
• Model-agnostic methods (explanation treats ML as black-box: use only the
input/outputs)
Explanations, can be global, component-based, or local
GAM
SHAP
Global explanation components / dependence plot local explanations
Interpreting Interpretability: Understanding Data Scientists'
Use of Interpretability Tools for Machine Learning
Kaur et al. CHI 2020
Data Scientists also do not get these visualizations… !
Global explanations (how does it work in general?)
How does the model perform on average for the dataset, overall
approximation of the (black box) ML model?
• Feature importance ranks: permutate/remove features and
see how the model output changes to find feature importance
• Feature effects: effect of a specific feature on the outcome of
the model: Partial Dependence Plots (marginal effects) or
Accumulated Local Effect plots (conditional effects)
local explanations: why do I get this prediction?
LIME (Local Interpretable Model-agnostic Explanations), an
algorithm that can explain the predictions of any classier or
regressor in a faithful way, by approximating it locally with an
interpretable (surrogate) model.
Local explanations that are model-agnostic…
By “explaining a prediction", we mean presenting textual or
visual artifacts that provide qualitative understanding of
the relationship between the instance's components (e.g.
words in text, patches in an image) and the model's
prediction.
Criteria:
Interpretable: provide qualitative understanding between the
input variables and the response.
local fidelity: for an explanation to be meaningful it must at
least be locally faithful
model-agnostic: an explainer should be able to explain any
model
LIME output: which algorithm works better?
Two algorithms with
similar accuracy
predicting if the text
below is about
Christianity or
atheism
Poll: Which model
should you trust
more 1 or 2?
Works very well, but…
Sentiment of the sentence “This is not bad”
LIME can show that the sentiment is
detected correctly because of the conjunction
of “not” and “bad”
Same results for two very different models
But do you notice a difference?
Valence of the decision class: which is more
understandable?
Logistic regression on unigrams
LSTM on sentence embeddings
Ribeiro et al. 2016, Model-Agnostic Interpretability of Machine Learning, arXiv:1606.05386v1
Improving understandability of feature contributions in
model-agnostic explainable AI tools (CHI 2022)
Sophia Hadash, Martijn Willemsen, Chris Snijders, and Wijnand IJsselsteijn
Jheronimus Academy of Data Science
Human-Technology Interaction, TU/e
Visualizations of LIME (and SHAP) can be counterintuitive!
Prediction class: bad (ineligible for loan) (Data: credit-g)
Cognitively challenging due to (double) negations!
Proposed improvements
Frame feature contributions towards the decision class that the
reader perceives positively.
Proposed improvements
2) Add semantic labels to the feature contributions.
Empirical User study
⚫ 133 participants (61 male), university database + convenience sampling
Factors:
⚫ Loan applications and music recommendations (within-subjects)
⚫ Framing: positive or negative (within-subjects)
⚫ Semantic labelling: no labels, “eligibility/like”, or “ineligibility/dislike”
⚫ Between-subject to prevent carry-over learning effects.
Measurement: perceived understandability using 4-pt Likert scale.
⚫ 6 trials per within-condition, 24 per participant
Results
Positive framing leads to higher
understandability, even when
the prediction/ decision class is
negative.
Results
Negatively framed semantic
labels do not improve
understandability.
⚫ (e.g. “+5% ineligibility”)
⚫ Not even when compatible
with the negative decision
class…
Results
Positively framed semantic
labels improves
understandability.
⚫ (e.g. “+5% eligibility”)
Framing is no longer relevant
for understandability.
Take away: do not forget the psychology!
Positive framing always works better than negative
framing (even for negative decision classes).
• Requires that decision-classes are inherently “positive” or “negative”
Use of semantic labelling can improve understandability
of the visualizations of interpretability tools.
• Reduces framing effects!
Drawbacks of post-hoc explanations
These tools still just provide a retrospective explanation of the
outcome…
• Static, lack on contrastive, counterfactual insights…
Ben Shneiderman promoted prospective user interfaces
• Interactive tools that show you what aspects influence and
change the outcome of an AI
How would that work? It has already been done for decades!
“
”
How do we make explanations contrastive,
and selective?
How do we make sure they fit our mental
models and beliefs?
Let’s make them interactive!
Interactive ML is not new…
Dudley (2018) and Amershi (2019) show that two
decades of research already have looked at these
issues in communities like IUI and CHI…
Example: Crayons, 2003
Fails & Olson, Crayons, IUI (2003)
Traditional ML
Amershi et al. 2014: Power to the People
• ML works with experts on
feature selection / data
representation
• Use ML, build predictions, go
back to expert for validation
• Long and slow cycle, big steps,
• exploration is mostly on the side
of the ML/ data scientist
Interactive ML
• User directly interacts with the
model
• Incremental but fast updates,
small steps, low-cost trial & error
• Smaller cycles, gives better
understanding what happens
• Can be done by low-expertise
users
• Examples: recommender
systems and tools like Crayons
Amershi et al. 2014: Power to the People
Interface elements of an IML (Dudley 2018, sec. 4)
‘These elements represent distinct
functionality that the interface
must typically support to deliver a
comprehensive IML workflow’
Not necessarily physically distinct:
e.g., crayons merges sample
review and feedback assignment
IML Workflow (sec 5)
Key Solution Principles according to Dudley (2018)
Exploit interactivity and promote rich interactions
• Interaction for understanding: many UX principles are hard to achieve
in IML (i.e. direct manipulation principles)
• Make the most of the user: balance effort and value of input, avoid
repeated requests, provide retrace of steps and undo
Engage the user
• Provide feedback, show partial predictions, do not ask trivial labeling
tasks
• might promote users to spend more time and improve the modeling
Guidelines for Human-AI interaction
Amershi et al., CHI 2019
18 guidelines
• UX design process
• Brings knowledge
from many related
fields together
• Goes back to earlier
classical work:
strongly founded in
Mixed initiatives
work of Horvitz (IUI
1999)
“
”
Two example applications of interactive AI
/ REcSys from my lab that I consider to be
Prospective user interfaces
Preparing for a marathon
Target finish times
Not too fast, not too slow
Pacing (min/km) strategy
Constant ‘flat’ speed is associated
with best performance
34
Heleen Rutjes
Prediction model for setting a
challenging, yet realistic finish time.
Model predictions are based on similar runners:
If runner *sunglasses* has had similar past performances as runner *hat*, yet has a
better Personal Best (PB), than runner *hat* can potentially achieve that too.
Approach: ‘case-based reasoning’ (CBR)
We asked coaches what aspects
they would like to control:
- Select similar runners?
- Select best races to serve as a case?
35 Research by Barry Smyth: http://medium.com/running-with-data/
Making the model interactive
Running coaches could indicate for every previous race how ‘representative’
they consider it.
By setting the slider, the model prediction
was continuously updated.
36
Model interactivity increased
trust and acceptance
Acceptance
Coaches showed to be more inclined to accept a model that they could interact with.
Trust
Model interactivity increased coaches’ perceived competence of the model.
37
“Without my adjustments the model did not make
sense, but by eliminating the race from Eindhoven,
we’re getting somewhere.”
(Coach 53, familiar runner, interactive condition)
Coaches improved the accuracy of the model
Model accuracy improved by coaches’ interactions
(Mean PercentError dropped from 3.14 to 2.33, p = 0.018)
What did the coaches adjust?
Systematic adjustments More recent races were indicated as more
representative. (p<0.001)
‘Anecdotal’ adjustments Based on knowledge of the specific runner,
running in general, environmental circumstances, etc.
Even when working with unfamiliar runners:
38
“There is clearly something going on with this lady. Maybe she
stopped training, or she has a persistent injury?”
(Coach 45, unfamiliar runner, non-interactive condition)
Music Genre Exploration with
mood control and visualizations
Work by Yu Liang (IUI 2021)
How to better support users to explore a new music genre?
40
[Millecamp, M., Htun, N. N., Jin, Y., & Verbert, K. 2018]
[Bostandjiev, S., O’Donovan, J., & Höllerer, T. (2012)]
[Andjelkovic, I., et al. 2019], [He, C., Parra, D., & Verbert, K. 2016]
Simple bar plot visualization to explain recommendation
41
[Millecamp, et al. 2019]
Easy to understand
Not very informative: present
only the averaged preferences
Bar charts
More complex contour plot visualization
42
1) Show the relation between the recommendations ,
users’ current preferences and the new genre
2) Show the preference intensity of users
Contour plots
Mood control
A bit hard to understand
Contour plot + Mood control (Most helpful?)
43
Easily see how
recommendation
changes
Contour plot + Mood control (Most helpful?)
44
Easily see how
recommendation
changes
Research questions
RQ1: How do different types of visualizations (bar charts/contour
plots) influence the perceived helpfulness for new music genre
exploration?
RQ2: How does mood control improve the perceived helpfulness
for new music genre exploration
45
Study design
2X2 mixed facotorial
design:
Mood control:
between-subject
Visualization:
within-subject
46
Interactive Music Genre Exploration with Visualization and Mood Control
Measurements
• Subjective measures: post-task quesionnaires
Perceived helpfulness, perceived control, perceived
informativeness and understandability
• Objective measures: user-interactions with the
system
• Musical Sophistication (active engagement &
emotional engagement)
47
Interactive Music Genre Exploration with Visualization and Mood Control
• Participants: mainly university students
• 102 valid reponses Fig. Genre selection frequencies
Genres they wanted to explore
Which is more helpful?
48
Interactive Music Genre Exploration with Visualization and Mood Control
Contour plot (vs bar charts):
• More helpful
• Total effect: 𝛽 = .378, 𝑠𝑒 =
.082, 𝑝 < .001)
Control (vs no control):
• Seems to be more helpful
• Total effect: 𝛽 = .238, 𝑠𝑒 = .123, 𝑝
= .053 (marginal significant)
Contour + control:
• More helpful
• Total effect: 𝛽 = 0.242, 𝑠𝑒
= 0.123, 𝑝 = .049).
What we have found….
Good visualization is key for understandability and explainability
Contour plot is perceived more helpful than the bar chart
• More informative, thus more understandable & helpful
• Better mental model?
Interaction only helps with good mental model/understanding
Mood control itself does not make the system more helpful
• paired with the contour plot it benefits the perceived
helpfulness mostly due to increased informativeness
49
Further work on genre exploration
RecSys 2021: the role of default settings on genre
selection and exploration:
• tradeoff slider: from genre representative to more
personalized songs
• Defaults had a strong effect on how far users explored…
RecSys 2022 (just accepted): Longitudinal study in which
they used the same tool for 4 weeks
• Default effects fade over the weeks
• Users find the tool helpful / keep exploring after 4 weeks
• Some actual change in music profile after 6 weeks!
Conclusions
Two separate worlds:
• interactive Machine Learning: interpretability for data scientists
• human-AI interaction work focused on the user at CHI, UMAP,
IUI (and RecSys)
We should learn from each other and bring them more together!
Human-AI interaction requires solid understand of mental models,
cognitive processes and biases, visualization guidelines and user
experience research!
Questions?
M.C.Willemsen@tue.nl
@MCWillemsen

Mais conteúdo relacionado

Mais procurados

A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
SaiPragnaKancheti
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Credit risk modelling using logistic regression in R
Credit risk modelling using logistic regression in RCredit risk modelling using logistic regression in R
Credit risk modelling using logistic regression in R
Kriti Doneria
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Machine Learning At Tubi
Machine Learning At TubiMachine Learning At Tubi
Machine Learning At Tubi
Jaya Kawale
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras
 
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
Countants
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Transformers - Part 1
Transformers - Part 1Transformers - Part 1
Transformers - Part 1
Akshika Wijesundara
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Arithmer Inc.
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Data Science Society
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
Marijn van Zelst
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
caa28steve
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 

Mais procurados (20)

A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Credit risk modelling using logistic regression in R
Credit risk modelling using logistic regression in RCredit risk modelling using logistic regression in R
Credit risk modelling using logistic regression in R
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Machine Learning At Tubi
Machine Learning At TubiMachine Learning At Tubi
Machine Learning At Tubi
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Transformers - Part 1
Transformers - Part 1Transformers - Part 1
Transformers - Part 1
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 

Semelhante a ​​Explainability in AI and Recommender systems: let’s make it interactive!

GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
James Anderson
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
Sri Ambati
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Analytics India Magazine
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
Pramit Choudhary
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
University of Córdoba
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
JishanAhmed24
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
vonaurum
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
Anthony (Tony) Sarris
 
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...
MITAILibrary
 
Interpretable Machine Learning_ Techniques for Model Explainability.
Interpretable Machine Learning_ Techniques for Model Explainability.Interpretable Machine Learning_ Techniques for Model Explainability.
Interpretable Machine Learning_ Techniques for Model Explainability.
Tyrion Lannister
 
Design considerations for machine learning system
Design considerations for machine learning systemDesign considerations for machine learning system
Design considerations for machine learning system
Akemi Tazaki
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
Aditya Bhattacharya
 
ML crash course
ML crash courseML crash course
ML crash course
mikaelhuss
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
Debmalya Biswas
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
2018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v22018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v2
Rune Sætre
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
NBER
 
Keepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler | Understanding your own predictive models
Keepler | Understanding your own predictive models
Keepler Data Tech
 

Semelhante a ​​Explainability in AI and Recommender systems: let’s make it interactive! (20)

GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
 
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...
 
Interpretable Machine Learning_ Techniques for Model Explainability.
Interpretable Machine Learning_ Techniques for Model Explainability.Interpretable Machine Learning_ Techniques for Model Explainability.
Interpretable Machine Learning_ Techniques for Model Explainability.
 
Design considerations for machine learning system
Design considerations for machine learning systemDesign considerations for machine learning system
Design considerations for machine learning system
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 
ML crash course
ML crash courseML crash course
ML crash course
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
2018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v22018.01.25 rune sætre_triallecture_xai_v2
2018.01.25 rune sætre_triallecture_xai_v2
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Keepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler | Understanding your own predictive models
Keepler | Understanding your own predictive models
 

Último

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 

Último (20)

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 

​​Explainability in AI and Recommender systems: let’s make it interactive!

  • 1. Explainability in AI and RecSys: let’s make it interactive! Martijn Willemsen
  • 2. Why do we need explainability? • Model validation: avoid biases, unfairness or overfitting, detect issues in the training data, adhere to ethical/legal requirements • Model debugging and improvement: improving the model fit, adversarial learning (fooling a model with ‘hacked’ inputs), reliability & robustness (sensitivity to small input changes) • Knowledge discovery: explanations provide feedback to the Data Scientist or user that can result in new insights by revealing hidden underlying correlations/patterns. • Trust and technology acceptance: explanations might convince users to adopt the technology more and have more control
  • 3. “ ” Poll: What is a good explanation? A: complete and accurate evidence for decision B: gives a single good reason why this decision C: tells me what I need to get a different decision
  • 4. What is important for explainability in ML? • Accuracy: does the explanation predict unseen data? Is it as accurate as the model itself? • Fidelity: does the explanation approximate the prediction of the model? Especially important for black-box models (local fidelity). • Consistency: same explanations for different models? • Stability: similar explanations for similar instances? • Comprehensibility: do humans get it (see previous slide) Some of these are hard to achieve with some models… https://christophm.github.io/interpretable-ml-book/properties.html
  • 5. What is a good explanation (for humans)? Confalonieri et al. (2020) & Molnar (2020) based on Miller: • Contrastive: why was this prediction made in stead of another? • Selective: focus on a few important causes (not all features that contributed to the model). • Social: should fit the mental model of the explainee / target audience, consider the social context, and fit their prior belie • Abnormalness: humans like rare causes (related to counterfactuals) • (Truthfulness: less important for humans then selectiveness!) https://christophm.github.io/interpretable-ml-book/explanation.html
  • 6. Machine learning / AI interpretability Some methods are inherently interpretable (glass-box or white box models) • Regression, decision trees, GAM • Some RecSys algorithms (content based-or classical CF) Many others are not: black-box models • Neural networks (CNN/RNN), random forest, Matrix factorization etc • often requires post-hoc explanations (leave the model intact) Further distinction can be made between: • Model-specific method (explanation is specific to the ML technique) • Model-agnostic methods (explanation treats ML as black-box: use only the input/outputs)
  • 7. Explanations, can be global, component-based, or local GAM SHAP Global explanation components / dependence plot local explanations Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning Kaur et al. CHI 2020 Data Scientists also do not get these visualizations… !
  • 8. Global explanations (how does it work in general?) How does the model perform on average for the dataset, overall approximation of the (black box) ML model? • Feature importance ranks: permutate/remove features and see how the model output changes to find feature importance • Feature effects: effect of a specific feature on the outcome of the model: Partial Dependence Plots (marginal effects) or Accumulated Local Effect plots (conditional effects)
  • 9. local explanations: why do I get this prediction? LIME (Local Interpretable Model-agnostic Explanations), an algorithm that can explain the predictions of any classier or regressor in a faithful way, by approximating it locally with an interpretable (surrogate) model.
  • 10. Local explanations that are model-agnostic… By “explaining a prediction", we mean presenting textual or visual artifacts that provide qualitative understanding of the relationship between the instance's components (e.g. words in text, patches in an image) and the model's prediction. Criteria: Interpretable: provide qualitative understanding between the input variables and the response. local fidelity: for an explanation to be meaningful it must at least be locally faithful model-agnostic: an explainer should be able to explain any model
  • 11. LIME output: which algorithm works better? Two algorithms with similar accuracy predicting if the text below is about Christianity or atheism Poll: Which model should you trust more 1 or 2?
  • 12. Works very well, but… Sentiment of the sentence “This is not bad” LIME can show that the sentiment is detected correctly because of the conjunction of “not” and “bad” Same results for two very different models But do you notice a difference? Valence of the decision class: which is more understandable? Logistic regression on unigrams LSTM on sentence embeddings Ribeiro et al. 2016, Model-Agnostic Interpretability of Machine Learning, arXiv:1606.05386v1
  • 13. Improving understandability of feature contributions in model-agnostic explainable AI tools (CHI 2022) Sophia Hadash, Martijn Willemsen, Chris Snijders, and Wijnand IJsselsteijn Jheronimus Academy of Data Science Human-Technology Interaction, TU/e
  • 14. Visualizations of LIME (and SHAP) can be counterintuitive! Prediction class: bad (ineligible for loan) (Data: credit-g) Cognitively challenging due to (double) negations!
  • 15. Proposed improvements Frame feature contributions towards the decision class that the reader perceives positively.
  • 16. Proposed improvements 2) Add semantic labels to the feature contributions.
  • 17. Empirical User study ⚫ 133 participants (61 male), university database + convenience sampling Factors: ⚫ Loan applications and music recommendations (within-subjects) ⚫ Framing: positive or negative (within-subjects) ⚫ Semantic labelling: no labels, “eligibility/like”, or “ineligibility/dislike” ⚫ Between-subject to prevent carry-over learning effects. Measurement: perceived understandability using 4-pt Likert scale. ⚫ 6 trials per within-condition, 24 per participant
  • 18. Results Positive framing leads to higher understandability, even when the prediction/ decision class is negative.
  • 19. Results Negatively framed semantic labels do not improve understandability. ⚫ (e.g. “+5% ineligibility”) ⚫ Not even when compatible with the negative decision class…
  • 20. Results Positively framed semantic labels improves understandability. ⚫ (e.g. “+5% eligibility”) Framing is no longer relevant for understandability.
  • 21. Take away: do not forget the psychology! Positive framing always works better than negative framing (even for negative decision classes). • Requires that decision-classes are inherently “positive” or “negative” Use of semantic labelling can improve understandability of the visualizations of interpretability tools. • Reduces framing effects!
  • 22. Drawbacks of post-hoc explanations These tools still just provide a retrospective explanation of the outcome… • Static, lack on contrastive, counterfactual insights… Ben Shneiderman promoted prospective user interfaces • Interactive tools that show you what aspects influence and change the outcome of an AI How would that work? It has already been done for decades!
  • 23. “ ” How do we make explanations contrastive, and selective? How do we make sure they fit our mental models and beliefs? Let’s make them interactive!
  • 24. Interactive ML is not new… Dudley (2018) and Amershi (2019) show that two decades of research already have looked at these issues in communities like IUI and CHI… Example: Crayons, 2003 Fails & Olson, Crayons, IUI (2003)
  • 25. Traditional ML Amershi et al. 2014: Power to the People • ML works with experts on feature selection / data representation • Use ML, build predictions, go back to expert for validation • Long and slow cycle, big steps, • exploration is mostly on the side of the ML/ data scientist
  • 26. Interactive ML • User directly interacts with the model • Incremental but fast updates, small steps, low-cost trial & error • Smaller cycles, gives better understanding what happens • Can be done by low-expertise users • Examples: recommender systems and tools like Crayons Amershi et al. 2014: Power to the People
  • 27. Interface elements of an IML (Dudley 2018, sec. 4) ‘These elements represent distinct functionality that the interface must typically support to deliver a comprehensive IML workflow’ Not necessarily physically distinct: e.g., crayons merges sample review and feedback assignment
  • 29. Key Solution Principles according to Dudley (2018) Exploit interactivity and promote rich interactions • Interaction for understanding: many UX principles are hard to achieve in IML (i.e. direct manipulation principles) • Make the most of the user: balance effort and value of input, avoid repeated requests, provide retrace of steps and undo Engage the user • Provide feedback, show partial predictions, do not ask trivial labeling tasks • might promote users to spend more time and improve the modeling
  • 30. Guidelines for Human-AI interaction Amershi et al., CHI 2019
  • 31. 18 guidelines • UX design process • Brings knowledge from many related fields together • Goes back to earlier classical work: strongly founded in Mixed initiatives work of Horvitz (IUI 1999)
  • 32. “ ” Two example applications of interactive AI / REcSys from my lab that I consider to be Prospective user interfaces
  • 33. Preparing for a marathon Target finish times Not too fast, not too slow Pacing (min/km) strategy Constant ‘flat’ speed is associated with best performance 34 Heleen Rutjes
  • 34. Prediction model for setting a challenging, yet realistic finish time. Model predictions are based on similar runners: If runner *sunglasses* has had similar past performances as runner *hat*, yet has a better Personal Best (PB), than runner *hat* can potentially achieve that too. Approach: ‘case-based reasoning’ (CBR) We asked coaches what aspects they would like to control: - Select similar runners? - Select best races to serve as a case? 35 Research by Barry Smyth: http://medium.com/running-with-data/
  • 35. Making the model interactive Running coaches could indicate for every previous race how ‘representative’ they consider it. By setting the slider, the model prediction was continuously updated. 36
  • 36. Model interactivity increased trust and acceptance Acceptance Coaches showed to be more inclined to accept a model that they could interact with. Trust Model interactivity increased coaches’ perceived competence of the model. 37 “Without my adjustments the model did not make sense, but by eliminating the race from Eindhoven, we’re getting somewhere.” (Coach 53, familiar runner, interactive condition)
  • 37. Coaches improved the accuracy of the model Model accuracy improved by coaches’ interactions (Mean PercentError dropped from 3.14 to 2.33, p = 0.018) What did the coaches adjust? Systematic adjustments More recent races were indicated as more representative. (p<0.001) ‘Anecdotal’ adjustments Based on knowledge of the specific runner, running in general, environmental circumstances, etc. Even when working with unfamiliar runners: 38 “There is clearly something going on with this lady. Maybe she stopped training, or she has a persistent injury?” (Coach 45, unfamiliar runner, non-interactive condition)
  • 38. Music Genre Exploration with mood control and visualizations Work by Yu Liang (IUI 2021)
  • 39. How to better support users to explore a new music genre? 40 [Millecamp, M., Htun, N. N., Jin, Y., & Verbert, K. 2018] [Bostandjiev, S., O’Donovan, J., & Höllerer, T. (2012)] [Andjelkovic, I., et al. 2019], [He, C., Parra, D., & Verbert, K. 2016]
  • 40. Simple bar plot visualization to explain recommendation 41 [Millecamp, et al. 2019] Easy to understand Not very informative: present only the averaged preferences Bar charts
  • 41. More complex contour plot visualization 42 1) Show the relation between the recommendations , users’ current preferences and the new genre 2) Show the preference intensity of users Contour plots Mood control A bit hard to understand
  • 42. Contour plot + Mood control (Most helpful?) 43 Easily see how recommendation changes
  • 43. Contour plot + Mood control (Most helpful?) 44 Easily see how recommendation changes
  • 44. Research questions RQ1: How do different types of visualizations (bar charts/contour plots) influence the perceived helpfulness for new music genre exploration? RQ2: How does mood control improve the perceived helpfulness for new music genre exploration 45
  • 45. Study design 2X2 mixed facotorial design: Mood control: between-subject Visualization: within-subject 46 Interactive Music Genre Exploration with Visualization and Mood Control
  • 46. Measurements • Subjective measures: post-task quesionnaires Perceived helpfulness, perceived control, perceived informativeness and understandability • Objective measures: user-interactions with the system • Musical Sophistication (active engagement & emotional engagement) 47 Interactive Music Genre Exploration with Visualization and Mood Control • Participants: mainly university students • 102 valid reponses Fig. Genre selection frequencies Genres they wanted to explore
  • 47. Which is more helpful? 48 Interactive Music Genre Exploration with Visualization and Mood Control Contour plot (vs bar charts): • More helpful • Total effect: 𝛽 = .378, 𝑠𝑒 = .082, 𝑝 < .001) Control (vs no control): • Seems to be more helpful • Total effect: 𝛽 = .238, 𝑠𝑒 = .123, 𝑝 = .053 (marginal significant) Contour + control: • More helpful • Total effect: 𝛽 = 0.242, 𝑠𝑒 = 0.123, 𝑝 = .049).
  • 48. What we have found…. Good visualization is key for understandability and explainability Contour plot is perceived more helpful than the bar chart • More informative, thus more understandable & helpful • Better mental model? Interaction only helps with good mental model/understanding Mood control itself does not make the system more helpful • paired with the contour plot it benefits the perceived helpfulness mostly due to increased informativeness 49
  • 49. Further work on genre exploration RecSys 2021: the role of default settings on genre selection and exploration: • tradeoff slider: from genre representative to more personalized songs • Defaults had a strong effect on how far users explored… RecSys 2022 (just accepted): Longitudinal study in which they used the same tool for 4 weeks • Default effects fade over the weeks • Users find the tool helpful / keep exploring after 4 weeks • Some actual change in music profile after 6 weeks!
  • 50. Conclusions Two separate worlds: • interactive Machine Learning: interpretability for data scientists • human-AI interaction work focused on the user at CHI, UMAP, IUI (and RecSys) We should learn from each other and bring them more together! Human-AI interaction requires solid understand of mental models, cognitive processes and biases, visualization guidelines and user experience research!