invited talk in the ExUM workshop in the UMAP 2022 conference
abstract:
Explainability has become an important topic both in Data Science and AI in general and in recommender systems in particular, as algorithms have become much less inherently explainable. However, explainability has different interpretations and goals in different fields. For example, interpretability and explanainability tools in machine learning are predominantly developed for Data Scientists to understand and scrutinize their models. Current tools are therefore often quite technical and not very ‘user-friendly’. I will illustrate this with our recent work on improving the explainability of model-agnostic tools such as LIME and SHAP. Another stream of research on explainability in the HCI and XAI fields focuses more on users’ needs for explainability, such as contrastive and selective explanations and explanations that fit with the mental models and beliefs of the user. However, how to satisfy those needs is still an open question. Based on recent work in interactive AI and machine learning, I will propose that explainability goes together with interactivity, and will illustrate this with examples from our own work in music genre exploration, that combines visualizations and interactive tools to help users understand and tune our exploration model.
Explainable AI is not yet Understandable AIepsilon_tud
Keynote of Dr. Nava Tintarev at RCIS'2020. Decision-making at individual, business, and societal levels is influenced by online content. Filtering and ranking algorithms such as those used in recommender systems are used to support these decisions. However, it is often not clear to a user whether the advice given is suitable to be followed, e.g., whether it is correct, whether the right information was taken into account, or if the user’s best interests were taken into consideration. In other words, there is a large mismatch between the representation of the advice by the system versus the representation assumed by its users. This talk addresses why we (might) want to develop advice-giving systems that can explain themselves, and how we can assess whether we are successful in this endeavor. This talk will also describe some of the state-of-the-art in explanations in a number of domains (music, tweets, and news articles) that help link the mental models of systems and people. However, it is not enough to generate rich and complex explanations; more is required in order to understand and be understood. This entails among other factors decisions around which information to select to show to people, and how to present that information, often depending on the target users and contextual factors
해당 자료는 풀잎스쿨 18기 중 "설명가능한 인공지능 기획!" 진행 중 Counterfactual Explanation 세션에 대해서 정리한 자료입니다.
논문, Youtube 및 하기 자료를 바탕으로 정리되었습니다.
https://christophm.github.io/interpretable-ml-book/
This document summarizes a presentation given by Xavier Amatriain from Netflix on their recommendation system and personalization techniques. Netflix uses a variety of machine learning models like SVD, RBMs, and linear regression to make personalized recommendations. They also personalize other aspects of the user experience like rankings, genres, and similar item suggestions. Netflix collects massive amounts of user data from ratings, searches, and streaming to train these models. The goal is to provide high quality recommendations that are accurate, novel, diverse, and increase user engagement.
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13.
Abstract:
Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.
This document provides an overview of sentiment analysis and discusses why it is an important area of research in language technology. Sentiment analysis involves detecting positive or negative opinions in text about products, politicians, or other topics. It has many applications, such as determining how consumers feel about a new product or predicting election outcomes based on public sentiment. The document also discusses challenges in modeling affective meaning in language at the lexical level in order to perform tasks like sentiment analysis.
Explainable AI makes the algorithms to be transparent where they interpret, visualize, explain and integrate for fair, secure and trustworthy AI applications.
Explainable AI is not yet Understandable AIepsilon_tud
Keynote of Dr. Nava Tintarev at RCIS'2020. Decision-making at individual, business, and societal levels is influenced by online content. Filtering and ranking algorithms such as those used in recommender systems are used to support these decisions. However, it is often not clear to a user whether the advice given is suitable to be followed, e.g., whether it is correct, whether the right information was taken into account, or if the user’s best interests were taken into consideration. In other words, there is a large mismatch between the representation of the advice by the system versus the representation assumed by its users. This talk addresses why we (might) want to develop advice-giving systems that can explain themselves, and how we can assess whether we are successful in this endeavor. This talk will also describe some of the state-of-the-art in explanations in a number of domains (music, tweets, and news articles) that help link the mental models of systems and people. However, it is not enough to generate rich and complex explanations; more is required in order to understand and be understood. This entails among other factors decisions around which information to select to show to people, and how to present that information, often depending on the target users and contextual factors
해당 자료는 풀잎스쿨 18기 중 "설명가능한 인공지능 기획!" 진행 중 Counterfactual Explanation 세션에 대해서 정리한 자료입니다.
논문, Youtube 및 하기 자료를 바탕으로 정리되었습니다.
https://christophm.github.io/interpretable-ml-book/
This document summarizes a presentation given by Xavier Amatriain from Netflix on their recommendation system and personalization techniques. Netflix uses a variety of machine learning models like SVD, RBMs, and linear regression to make personalized recommendations. They also personalize other aspects of the user experience like rankings, genres, and similar item suggestions. Netflix collects massive amounts of user data from ratings, searches, and streaming to train these models. The goal is to provide high quality recommendations that are accurate, novel, diverse, and increase user engagement.
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13.
Abstract:
Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.
This document provides an overview of sentiment analysis and discusses why it is an important area of research in language technology. Sentiment analysis involves detecting positive or negative opinions in text about products, politicians, or other topics. It has many applications, such as determining how consumers feel about a new product or predicting election outcomes based on public sentiment. The document also discusses challenges in modeling affective meaning in language at the lexical level in order to perform tasks like sentiment analysis.
Explainable AI makes the algorithms to be transparent where they interpret, visualize, explain and integrate for fair, secure and trustworthy AI applications.
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
The document presents a review of large language models (LLMs) for code generation. It discusses different types of LLMs including left-to-right, masked, and encoder-decoder models. Existing models for code generation like Codex, GPT-Neo, GPT-J, and CodeParrot are compared. A new model called PolyCoder with 2.7 billion parameters trained on 12 programming languages is introduced. Evaluation results show PolyCoder performs less well than comparably sized models but outperforms others on C language tasks. In general, performance improves with larger models and longer training, but training solely on code can be sufficient or advantageous for some languages.
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Sergey Karayev
The document discusses infrastructure and tooling for full stack deep learning. It provides an overview of the different components involved, including compute, data processing, experiment management, deployment, and software engineering practices. Specifically, it covers topics like GPU basics, cloud computing options, development versus training needs, popular programming languages and editors like Python and Jupyter Notebooks, and setting up development environments.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Credit risk modelling using logistic regression in RKriti Doneria
This document describes using logistic regression to model credit risk. It discusses CIBIL scores, the methodology of logistic regression including the regression equation and assumptions. It details the tools, technologies and dataset used which contains loan applicant information. The modeling process is described including variable selection, fine tuning the model, and observations around selecting the best model and cut-off value. Limitations of the model and conclusions are also summarized.
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
The document discusses optimizing question answering systems called RAG (Retrieve-and-Generate) stacks. It outlines challenges with naive RAG approaches and proposes solutions like improved data representations, advanced retrieval techniques, and fine-tuning large language models. Table stakes optimizations include tuning chunk sizes, prompt engineering, and customizing LLMs. More advanced techniques involve small-to-big retrieval, multi-document agents, embedding fine-tuning, and LLM fine-tuning.
This document summarizes machine learning applications at Tubi, a free ad-supported video streaming service. It discusses how ML is used for content understanding, personalized recommendations, and addressing challenges like feedback loops, non-stationary user preferences and catalogs, and algorithmic fairness. ML models power recommendations on Tubi's homepage using embeddings generated from content metadata. Addressing challenges like dynamic online evaluation and exploration-exploitation tradeoffs is an area of ongoing work.
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
Sudeep Das presented on recommender systems and advances in deep learning approaches. Matrix factorization is still the foundational method for collaborative filtering, but deep learning models are now augmenting these approaches. Deep neural networks can learn hierarchical representations of users and items from raw data like images, text, and sequences of user actions. Models like wide and deep networks combine the strengths of memorization and generalization. Sequence models like recurrent neural networks have also been applied to sessions for next item recommendation.
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
These are the slides from my talk presented at AI Next Con conference in Seattle in Jan 2019. Here I talk in a bit more detail about the intuition behind collaborative filtering and go a bit deeper into the details of non linear deep learned models.
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?Countants
Customer feedback sentiment analysis uses algorithms to categorize feedback as positive, negative, or neutral based on included words. This information can be analyzed using average sentiment scores, histograms, or word clouds. While challenging, adopting AI and machine learning can help sentiment analysis tools better detect sarcasm and extract a range of emotions from comments. Techniques like TF/IDF and TensorFlow CNN can help feed analytical data to AI engines for more accurate sentiment analysis.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
This document discusses the importance of time and causality in recommender systems. It summarizes that (1) time and causality are critical aspects that must be considered in data collection, experiment design, algorithms, and system design. (2) Recommender systems operate within a feedback loop where the recommendations influence future user behavior and data, so effects like reinforcement of biases can occur. (3) Both offline and online experimentation are needed to properly evaluate systems and generalization over time.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
This is part 1 of a talk on Transformers. Transformers are a deep learning architecture that has been used in NLP. They are now been used in the famous ChatGPT model (InstructGPT).
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc.
The topic is on "explainable AI".
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Review of Natural Language Processing tasks and examples of why it is so hard. Then he describes in detail text categorization and particularly sentiment analysis. A few common approaches for predicting sentiment are discussed, going even further, explaining statistical machine learning algorithms.
The document discusses text classification and different techniques for performing classification on text data, including dimensionality reduction, text embedding, and classification pipelines. It describes using dimensionality reduction techniques like TSNE to visualize high-dimensional text data in 2D and how this can aid classification. Text embedding techniques like doc2vec are discussed for converting text into fixed-dimensional vectors before classification. Several examples show doc2vec outperforming classification directly on word counts. The document concludes that extracting the right features from data is key and visualization can provide insight into feature quality.
This document provides an overview of building, evaluating, and optimizing a RAG (Retrieve-and-Generate) conversational agent for production. It discusses setting up the development environment, prototyping the initial system, addressing challenges when moving to production like latency, costs, and quality issues. It also covers approaches for systematically evaluating the system, including using LLMs as judges, and experimenting and optimizing components like retrieval and generation through configuration tuning, model fine-tuning, and customizing the pipeline.
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
This document discusses building a personalized messaging system at Netflix to recommend content to users. It covers four key considerations:
1) Personalizing messaging decisions using classification techniques like logistic regression on outcome features.
2) Removing bias from the system using techniques like Thompson sampling, exploration-exploitation, and propensity correction.
3) Maximizing causal impact by explicitly modeling past actions and comparing member satisfaction with and without messages.
4) Balancing reward against cost by imposing a volume constraint like an incrementality threshold and using reinforcement learning approaches.
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneJames Anderson
If Artificial Intelligence (AI) is a black-box, how can a human comprehend and trust the results of Machine Learning (ML) alogrithms? Explainable AI (XAI) tries to shed light into that AI black-box so humans can trust what is going on. Our speaker Meg Dickey-Kurdziolek is currently a UX Researcher for Google Cloud AI and Industry Solutions, where she focuses her research on Explainable AI and Model Understanding. Recording of the presentation: https://youtu.be/6N2DNN_HDWU
Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them.
This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
The document presents a review of large language models (LLMs) for code generation. It discusses different types of LLMs including left-to-right, masked, and encoder-decoder models. Existing models for code generation like Codex, GPT-Neo, GPT-J, and CodeParrot are compared. A new model called PolyCoder with 2.7 billion parameters trained on 12 programming languages is introduced. Evaluation results show PolyCoder performs less well than comparably sized models but outperforms others on C language tasks. In general, performance improves with larger models and longer training, but training solely on code can be sufficient or advantageous for some languages.
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Sergey Karayev
The document discusses infrastructure and tooling for full stack deep learning. It provides an overview of the different components involved, including compute, data processing, experiment management, deployment, and software engineering practices. Specifically, it covers topics like GPU basics, cloud computing options, development versus training needs, popular programming languages and editors like Python and Jupyter Notebooks, and setting up development environments.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Credit risk modelling using logistic regression in RKriti Doneria
This document describes using logistic regression to model credit risk. It discusses CIBIL scores, the methodology of logistic regression including the regression equation and assumptions. It details the tools, technologies and dataset used which contains loan applicant information. The modeling process is described including variable selection, fine tuning the model, and observations around selecting the best model and cut-off value. Limitations of the model and conclusions are also summarized.
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
The document discusses optimizing question answering systems called RAG (Retrieve-and-Generate) stacks. It outlines challenges with naive RAG approaches and proposes solutions like improved data representations, advanced retrieval techniques, and fine-tuning large language models. Table stakes optimizations include tuning chunk sizes, prompt engineering, and customizing LLMs. More advanced techniques involve small-to-big retrieval, multi-document agents, embedding fine-tuning, and LLM fine-tuning.
This document summarizes machine learning applications at Tubi, a free ad-supported video streaming service. It discusses how ML is used for content understanding, personalized recommendations, and addressing challenges like feedback loops, non-stationary user preferences and catalogs, and algorithmic fairness. ML models power recommendations on Tubi's homepage using embeddings generated from content metadata. Addressing challenges like dynamic online evaluation and exploration-exploitation tradeoffs is an area of ongoing work.
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
Sudeep Das presented on recommender systems and advances in deep learning approaches. Matrix factorization is still the foundational method for collaborative filtering, but deep learning models are now augmenting these approaches. Deep neural networks can learn hierarchical representations of users and items from raw data like images, text, and sequences of user actions. Models like wide and deep networks combine the strengths of memorization and generalization. Sequence models like recurrent neural networks have also been applied to sessions for next item recommendation.
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
These are the slides from my talk presented at AI Next Con conference in Seattle in Jan 2019. Here I talk in a bit more detail about the intuition behind collaborative filtering and go a bit deeper into the details of non linear deep learned models.
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?Countants
Customer feedback sentiment analysis uses algorithms to categorize feedback as positive, negative, or neutral based on included words. This information can be analyzed using average sentiment scores, histograms, or word clouds. While challenging, adopting AI and machine learning can help sentiment analysis tools better detect sarcasm and extract a range of emotions from comments. Techniques like TF/IDF and TensorFlow CNN can help feed analytical data to AI engines for more accurate sentiment analysis.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
This document discusses the importance of time and causality in recommender systems. It summarizes that (1) time and causality are critical aspects that must be considered in data collection, experiment design, algorithms, and system design. (2) Recommender systems operate within a feedback loop where the recommendations influence future user behavior and data, so effects like reinforcement of biases can occur. (3) Both offline and online experimentation are needed to properly evaluate systems and generalization over time.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
This is part 1 of a talk on Transformers. Transformers are a deep learning architecture that has been used in NLP. They are now been used in the famous ChatGPT model (InstructGPT).
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc.
The topic is on "explainable AI".
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Review of Natural Language Processing tasks and examples of why it is so hard. Then he describes in detail text categorization and particularly sentiment analysis. A few common approaches for predicting sentiment are discussed, going even further, explaining statistical machine learning algorithms.
The document discusses text classification and different techniques for performing classification on text data, including dimensionality reduction, text embedding, and classification pipelines. It describes using dimensionality reduction techniques like TSNE to visualize high-dimensional text data in 2D and how this can aid classification. Text embedding techniques like doc2vec are discussed for converting text into fixed-dimensional vectors before classification. Several examples show doc2vec outperforming classification directly on word counts. The document concludes that extracting the right features from data is key and visualization can provide insight into feature quality.
This document provides an overview of building, evaluating, and optimizing a RAG (Retrieve-and-Generate) conversational agent for production. It discusses setting up the development environment, prototyping the initial system, addressing challenges when moving to production like latency, costs, and quality issues. It also covers approaches for systematically evaluating the system, including using LLMs as judges, and experimenting and optimizing components like retrieval and generation through configuration tuning, model fine-tuning, and customizing the pipeline.
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
This document discusses building a personalized messaging system at Netflix to recommend content to users. It covers four key considerations:
1) Personalizing messaging decisions using classification techniques like logistic regression on outcome features.
2) Removing bias from the system using techniques like Thompson sampling, exploration-exploitation, and propensity correction.
3) Maximizing causal impact by explicitly modeling past actions and comparing member satisfaction with and without messages.
4) Balancing reward against cost by imposing a volume constraint like an incrementality threshold and using reinforcement learning approaches.
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneJames Anderson
If Artificial Intelligence (AI) is a black-box, how can a human comprehend and trust the results of Machine Learning (ML) alogrithms? Explainable AI (XAI) tries to shed light into that AI black-box so humans can trust what is going on. Our speaker Meg Dickey-Kurdziolek is currently a UX Researcher for Google Cloud AI and Industry Solutions, where she focuses her research on Explainable AI and Model Understanding. Recording of the presentation: https://youtu.be/6N2DNN_HDWU
Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them.
This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
Most organizations understand the predictive power and the potential gains from AIML, but AI and ML are still now a black box technology for them. While deep learning and neural networks can provide excellent inputs to businesses, leaders are challenged to use them because of the complete blind faith required to ‘trust’ AI. In this talk we will use the latest technological developments from researchers, the US defense department, and the industry to unbox the black box and provide businesses a clear understanding of the policy levers that they can pull, why, and by how much, to make effective decisions?
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
The document provides an overview of a presentation on enabling explainable artificial intelligence through Bayesian rule lists. Some key points:
- The presentation will cover challenges with model opacity, defining interpretability, and how Bayesian rule lists can be used to build naturally interpretable models through rule extraction.
- Bayesian rule lists work well for tabular datasets and generate human-understandable "if-then-else" rules. They aim to optimize over pre-mined frequent patterns to construct an ordered set of conditional statements.
- There is often a tension between model performance and interpretability. Bayesian rule lists can achieve accuracy comparable to more opaque models like random forests on benchmark datasets while maintaining interpretability.
Applying AI to software engineering problems: Do not forget the human!University of Córdoba
The application of artificial intelligence (AI) to software engineering (SE)-problem-solving has been around since the 80s when expert systems were first used. However, it is during the last 10 years that there has been a peak in the use of these techniques, first based on search and optimisation algorithms such as metaheuristics, and later based on machine learning algorithms. The aim is to help the software engineer to automate and optimise tasks of the software development process, and to use valuable information hidden in multiple data sources such as software repositories to execute insightful actions that generate improvements in the performance of the overall process. Today, the use of AI is trendy, and often overused as it could generate artificial results since it does not consider the subjective nature of the software development process requiring the experience and know-how of the engineer. With this Invited Talk, we will discuss different proposals to incorporate the human into the decision-making process in the application of AI for SE (AI4SE), from interactive algorithms to the generation of interpretable models or explanations.
The document presents a method called influence functions that can explain the predictions of black-box models. Influence functions trace a model's prediction back to its training data by calculating how the prediction would change if a particular training point was removed or modified. The method scales to large models using techniques like conjugate gradients to efficiently approximate influence. Influence functions can be used to debug models, detect errors in training data, and generate adversarial examples.
This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.
This is a presentation I delivered at Enterprise Data World 2018 to make the case for developing intelligent systems using a hybrid or blended approach combining statistical-based machine learning with knowledge-based approaches that involve ontologies, taxonomies or knowledge graphs.
DALL-E 2 - OpenAI imagery automation first developed by Vishal Coodye in 2021...MITAILibrary
The document provides a review of machine learning interpretability methods. It begins with an introduction to explainable artificial intelligence and a discussion of key concepts like interpretability and explainability. It then presents a taxonomy of interpretability methods that are divided into four main categories: methods for explaining black-box models, creating white-box models, promoting fairness, and analyzing model sensitivity. Specific machine learning interpretability techniques are summarized within each category.
Interpretable Machine Learning_ Techniques for Model Explainability.Tyrion Lannister
In this article, we will explore the importance of interpretable machine learning, its techniques, and its significance in the ever-evolving field of artificial intelligence.
Design considerations for machine learning systemAkemi Tazaki
Critical commentary based on my professional experience in designing apps with artificial intelligence and on desktop research. Presentation slides for Botscampe 2016.
Ethical AI: Establish an AI/ML Governance framework addressing Reproducibility, Explainability, Bias & Accountability for Enterprise AI use-cases.
Presentation on “Open Source Enterprise AI/ML Governance” at Linux Foundation’s Open Compliance Summit, Dec 2020 (https://events.linuxfoundation.org/open-compliance-summit/)
Full article: https://towardsdatascience.com/ethical-ai-its-implications-for-enterprise-ai-use-cases-and-governance-81602078f5db
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiSri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/vUqC8UPw9SU
Description:
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models: *Model visualizations including decision tree surrogate models, individual conditional expectation (ICE) plots, partial dependence plots, and residual analysis. *Reason code generation techniques like LIME, Shapley explanations, and Treeinterpreter. *Sensitivity Analysis. Plenty of guidance on when, and when not, to use these techniques will also be shared, and the talk will conclude by providing guidelines for testing generated explanations themselves for accuracy and stability. Open source examples (with lots of comments and helpful hints) for building interpretable machine learning systems are available to accompany the talk at: https://github.com/jphall663/interpretable_machine_learning_with_python Bio: Patrick Hall is senior director for data science products at H2O.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2O.ai, Patrick held global customer facing roles and research and development roles at SAS Institute.
Speaker's Bio:
Patrick Hall is a senior director for data science products at H2o.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2o.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick was the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
2. Why do we need explainability?
• Model validation: avoid biases, unfairness or overfitting, detect
issues in the training data, adhere to ethical/legal requirements
• Model debugging and improvement: improving the model fit,
adversarial learning (fooling a model with ‘hacked’ inputs), reliability
& robustness (sensitivity to small input changes)
• Knowledge discovery: explanations provide feedback to the Data
Scientist or user that can result in new insights by revealing hidden
underlying correlations/patterns.
• Trust and technology acceptance: explanations might convince
users to adopt the technology more and have more control
3. “
”
Poll: What is a good explanation?
A: complete and accurate evidence for decision
B: gives a single good reason why this decision
C: tells me what I need to get a different decision
4. What is important for explainability in ML?
• Accuracy: does the explanation predict unseen data? Is it as
accurate as the model itself?
• Fidelity: does the explanation approximate the prediction of the
model? Especially important for black-box models (local
fidelity).
• Consistency: same explanations for different models?
• Stability: similar explanations for similar instances?
• Comprehensibility: do humans get it (see previous slide)
Some of these are hard to achieve with some models…
https://christophm.github.io/interpretable-ml-book/properties.html
5. What is a good explanation (for humans)?
Confalonieri et al. (2020) & Molnar (2020) based on Miller:
• Contrastive: why was this prediction made in stead of
another?
• Selective: focus on a few important causes (not all
features that contributed to the model).
• Social: should fit the mental model of the explainee /
target audience, consider the social context, and fit
their prior belie
• Abnormalness: humans like rare causes (related to
counterfactuals)
• (Truthfulness: less important for humans then
selectiveness!)
https://christophm.github.io/interpretable-ml-book/explanation.html
6. Machine learning / AI interpretability
Some methods are inherently interpretable (glass-box or white box models)
• Regression, decision trees, GAM
• Some RecSys algorithms (content based-or classical CF)
Many others are not: black-box models
• Neural networks (CNN/RNN), random forest, Matrix factorization etc
• often requires post-hoc explanations (leave the model intact)
Further distinction can be made between:
• Model-specific method (explanation is specific to the ML technique)
• Model-agnostic methods (explanation treats ML as black-box: use only the
input/outputs)
7. Explanations, can be global, component-based, or local
GAM
SHAP
Global explanation components / dependence plot local explanations
Interpreting Interpretability: Understanding Data Scientists'
Use of Interpretability Tools for Machine Learning
Kaur et al. CHI 2020
Data Scientists also do not get these visualizations… !
8. Global explanations (how does it work in general?)
How does the model perform on average for the dataset, overall
approximation of the (black box) ML model?
• Feature importance ranks: permutate/remove features and
see how the model output changes to find feature importance
• Feature effects: effect of a specific feature on the outcome of
the model: Partial Dependence Plots (marginal effects) or
Accumulated Local Effect plots (conditional effects)
9. local explanations: why do I get this prediction?
LIME (Local Interpretable Model-agnostic Explanations), an
algorithm that can explain the predictions of any classier or
regressor in a faithful way, by approximating it locally with an
interpretable (surrogate) model.
10. Local explanations that are model-agnostic…
By “explaining a prediction", we mean presenting textual or
visual artifacts that provide qualitative understanding of
the relationship between the instance's components (e.g.
words in text, patches in an image) and the model's
prediction.
Criteria:
Interpretable: provide qualitative understanding between the
input variables and the response.
local fidelity: for an explanation to be meaningful it must at
least be locally faithful
model-agnostic: an explainer should be able to explain any
model
11. LIME output: which algorithm works better?
Two algorithms with
similar accuracy
predicting if the text
below is about
Christianity or
atheism
Poll: Which model
should you trust
more 1 or 2?
12. Works very well, but…
Sentiment of the sentence “This is not bad”
LIME can show that the sentiment is
detected correctly because of the conjunction
of “not” and “bad”
Same results for two very different models
But do you notice a difference?
Valence of the decision class: which is more
understandable?
Logistic regression on unigrams
LSTM on sentence embeddings
Ribeiro et al. 2016, Model-Agnostic Interpretability of Machine Learning, arXiv:1606.05386v1
13. Improving understandability of feature contributions in
model-agnostic explainable AI tools (CHI 2022)
Sophia Hadash, Martijn Willemsen, Chris Snijders, and Wijnand IJsselsteijn
Jheronimus Academy of Data Science
Human-Technology Interaction, TU/e
14. Visualizations of LIME (and SHAP) can be counterintuitive!
Prediction class: bad (ineligible for loan) (Data: credit-g)
Cognitively challenging due to (double) negations!
17. Empirical User study
⚫ 133 participants (61 male), university database + convenience sampling
Factors:
⚫ Loan applications and music recommendations (within-subjects)
⚫ Framing: positive or negative (within-subjects)
⚫ Semantic labelling: no labels, “eligibility/like”, or “ineligibility/dislike”
⚫ Between-subject to prevent carry-over learning effects.
Measurement: perceived understandability using 4-pt Likert scale.
⚫ 6 trials per within-condition, 24 per participant
19. Results
Negatively framed semantic
labels do not improve
understandability.
⚫ (e.g. “+5% ineligibility”)
⚫ Not even when compatible
with the negative decision
class…
21. Take away: do not forget the psychology!
Positive framing always works better than negative
framing (even for negative decision classes).
• Requires that decision-classes are inherently “positive” or “negative”
Use of semantic labelling can improve understandability
of the visualizations of interpretability tools.
• Reduces framing effects!
22. Drawbacks of post-hoc explanations
These tools still just provide a retrospective explanation of the
outcome…
• Static, lack on contrastive, counterfactual insights…
Ben Shneiderman promoted prospective user interfaces
• Interactive tools that show you what aspects influence and
change the outcome of an AI
How would that work? It has already been done for decades!
23. “
”
How do we make explanations contrastive,
and selective?
How do we make sure they fit our mental
models and beliefs?
Let’s make them interactive!
24. Interactive ML is not new…
Dudley (2018) and Amershi (2019) show that two
decades of research already have looked at these
issues in communities like IUI and CHI…
Example: Crayons, 2003
Fails & Olson, Crayons, IUI (2003)
25. Traditional ML
Amershi et al. 2014: Power to the People
• ML works with experts on
feature selection / data
representation
• Use ML, build predictions, go
back to expert for validation
• Long and slow cycle, big steps,
• exploration is mostly on the side
of the ML/ data scientist
26. Interactive ML
• User directly interacts with the
model
• Incremental but fast updates,
small steps, low-cost trial & error
• Smaller cycles, gives better
understanding what happens
• Can be done by low-expertise
users
• Examples: recommender
systems and tools like Crayons
Amershi et al. 2014: Power to the People
27. Interface elements of an IML (Dudley 2018, sec. 4)
‘These elements represent distinct
functionality that the interface
must typically support to deliver a
comprehensive IML workflow’
Not necessarily physically distinct:
e.g., crayons merges sample
review and feedback assignment
29. Key Solution Principles according to Dudley (2018)
Exploit interactivity and promote rich interactions
• Interaction for understanding: many UX principles are hard to achieve
in IML (i.e. direct manipulation principles)
• Make the most of the user: balance effort and value of input, avoid
repeated requests, provide retrace of steps and undo
Engage the user
• Provide feedback, show partial predictions, do not ask trivial labeling
tasks
• might promote users to spend more time and improve the modeling
31. 18 guidelines
• UX design process
• Brings knowledge
from many related
fields together
• Goes back to earlier
classical work:
strongly founded in
Mixed initiatives
work of Horvitz (IUI
1999)
32. “
”
Two example applications of interactive AI
/ REcSys from my lab that I consider to be
Prospective user interfaces
33. Preparing for a marathon
Target finish times
Not too fast, not too slow
Pacing (min/km) strategy
Constant ‘flat’ speed is associated
with best performance
34
Heleen Rutjes
34. Prediction model for setting a
challenging, yet realistic finish time.
Model predictions are based on similar runners:
If runner *sunglasses* has had similar past performances as runner *hat*, yet has a
better Personal Best (PB), than runner *hat* can potentially achieve that too.
Approach: ‘case-based reasoning’ (CBR)
We asked coaches what aspects
they would like to control:
- Select similar runners?
- Select best races to serve as a case?
35 Research by Barry Smyth: http://medium.com/running-with-data/
35. Making the model interactive
Running coaches could indicate for every previous race how ‘representative’
they consider it.
By setting the slider, the model prediction
was continuously updated.
36
36. Model interactivity increased
trust and acceptance
Acceptance
Coaches showed to be more inclined to accept a model that they could interact with.
Trust
Model interactivity increased coaches’ perceived competence of the model.
37
“Without my adjustments the model did not make
sense, but by eliminating the race from Eindhoven,
we’re getting somewhere.”
(Coach 53, familiar runner, interactive condition)
37. Coaches improved the accuracy of the model
Model accuracy improved by coaches’ interactions
(Mean PercentError dropped from 3.14 to 2.33, p = 0.018)
What did the coaches adjust?
Systematic adjustments More recent races were indicated as more
representative. (p<0.001)
‘Anecdotal’ adjustments Based on knowledge of the specific runner,
running in general, environmental circumstances, etc.
Even when working with unfamiliar runners:
38
“There is clearly something going on with this lady. Maybe she
stopped training, or she has a persistent injury?”
(Coach 45, unfamiliar runner, non-interactive condition)
39. How to better support users to explore a new music genre?
40
[Millecamp, M., Htun, N. N., Jin, Y., & Verbert, K. 2018]
[Bostandjiev, S., O’Donovan, J., & Höllerer, T. (2012)]
[Andjelkovic, I., et al. 2019], [He, C., Parra, D., & Verbert, K. 2016]
40. Simple bar plot visualization to explain recommendation
41
[Millecamp, et al. 2019]
Easy to understand
Not very informative: present
only the averaged preferences
Bar charts
41. More complex contour plot visualization
42
1) Show the relation between the recommendations ,
users’ current preferences and the new genre
2) Show the preference intensity of users
Contour plots
Mood control
A bit hard to understand
42. Contour plot + Mood control (Most helpful?)
43
Easily see how
recommendation
changes
43. Contour plot + Mood control (Most helpful?)
44
Easily see how
recommendation
changes
44. Research questions
RQ1: How do different types of visualizations (bar charts/contour
plots) influence the perceived helpfulness for new music genre
exploration?
RQ2: How does mood control improve the perceived helpfulness
for new music genre exploration
45
45. Study design
2X2 mixed facotorial
design:
Mood control:
between-subject
Visualization:
within-subject
46
Interactive Music Genre Exploration with Visualization and Mood Control
46. Measurements
• Subjective measures: post-task quesionnaires
Perceived helpfulness, perceived control, perceived
informativeness and understandability
• Objective measures: user-interactions with the
system
• Musical Sophistication (active engagement &
emotional engagement)
47
Interactive Music Genre Exploration with Visualization and Mood Control
• Participants: mainly university students
• 102 valid reponses Fig. Genre selection frequencies
Genres they wanted to explore
47. Which is more helpful?
48
Interactive Music Genre Exploration with Visualization and Mood Control
Contour plot (vs bar charts):
• More helpful
• Total effect: 𝛽 = .378, 𝑠𝑒 =
.082, 𝑝 < .001)
Control (vs no control):
• Seems to be more helpful
• Total effect: 𝛽 = .238, 𝑠𝑒 = .123, 𝑝
= .053 (marginal significant)
Contour + control:
• More helpful
• Total effect: 𝛽 = 0.242, 𝑠𝑒
= 0.123, 𝑝 = .049).
48. What we have found….
Good visualization is key for understandability and explainability
Contour plot is perceived more helpful than the bar chart
• More informative, thus more understandable & helpful
• Better mental model?
Interaction only helps with good mental model/understanding
Mood control itself does not make the system more helpful
• paired with the contour plot it benefits the perceived
helpfulness mostly due to increased informativeness
49
49. Further work on genre exploration
RecSys 2021: the role of default settings on genre
selection and exploration:
• tradeoff slider: from genre representative to more
personalized songs
• Defaults had a strong effect on how far users explored…
RecSys 2022 (just accepted): Longitudinal study in which
they used the same tool for 4 weeks
• Default effects fade over the weeks
• Users find the tool helpful / keep exploring after 4 weeks
• Some actual change in music profile after 6 weeks!
50. Conclusions
Two separate worlds:
• interactive Machine Learning: interpretability for data scientists
• human-AI interaction work focused on the user at CHI, UMAP,
IUI (and RecSys)
We should learn from each other and bring them more together!
Human-AI interaction requires solid understand of mental models,
cognitive processes and biases, visualization guidelines and user
experience research!