SlideShare a Scribd company logo
Walking the Talk: Random
Exploration of a Chatbot API
@hiccupps
James Thomas
Ministry of Testing, Norwich 2023-06-28
www.associationforsoftwaretesting.org
@hiccupps
@hiccupps
www.ada.com
supported by
@hiccupps
@hiccupps
Project: Extract a Basic Chatbot API
@hiccupps
The Task
Presentation-agnostic API.
Integration with text-based clients.
Turn-based medical assessment.
Short deadline.
@hiccupps
Risks, Questions, Test Ideas
@hiccupps
Internal IDEs
Medical case
runners
E2E tests
MQEs
Case permuting
MKEs
Unit and
integration tests
Exact match to
report
Expected
questions asked
Assessment
without error
Expected
suggestions
present
Thresholds for
various medical
metrics
Literature
Existing behaviour
Expertise
Requirements
Internal KB
Vignettes
Personas
The “assessment space” is vast
Data
Tools Comparisons
A Walker
@hiccupps
Covering the Assessment Space
Unintended consequences.
Discover edges and corners.
Exercise the API extensively.
@hiccupps
The idea
Run unattended medical assessments.
Can be iterated and customised quickly.
Identify places for deeper inspection.
@hiccupps
Start
“OK”
Random
Choose
randomly
Card
type?
More
turns?
Stop
yes
no
text
input
choice
The Simplest Thing That Could Possibly Work
“Welcome to
Ada.”
“What is your
name?”
“Which option,
A, B, or C?”
@hiccupps
✅❌ ✅❌
✅❌
✅❌
✅❌
❓
❓
❓
❓
❓ ❓ ❓
Navigate Explore
Survey
@hiccupps
Demonstration
@hiccupps
Implementation
@hiccupps
Configuration
Error checks
~1/200 turns
Another
symptom
~1/5 turns
@hiccupps
Detailed, Parsable Logs
Archive
config
Archive
state
@hiccupps
Parallel
@hiccupps
Long Assessments
500 Server Error ❌
❌
Back end
Back end
Back end
❌
@hiccupps
Great, but …
@hiccupps
LOGIC-MODEL
TANGLE
LOGIC MODEL
Model-Based Testing
@hiccupps
What is a Model?
A way of describing a system …
… typically to aid understanding …
… often to permit predictions about how the system will behave.
The "system" need not be all of whatever you're looking at.
@hiccupps
http://www.harryrobinson.net/MBT-on-a-shoestring.pdf @hiccupps
@hiccupps
@hiccupps
What is Altwalker?
@hiccupps
@hiccupps
Reflection
@hiccupps
@hiccupps
@hiccupps
@hiccupps
What Worked Well
Verbose logs.
Asserting generally and on fixes.
Randomisation for unknowns.
Configuration for directed exploration.
Toolkit (replay, parallel, analysis, …).
Question-driven development.
Code changes and dependencies.
Card identifiers.
Medical testing.
No explicit model.
What Worked Well … and What was Challenging
@hiccupps
GREAT SHOT KID,
THAT WAS ONE
IN A MILLION
@hiccupps
YEAH!
I TOOK A MILLION
SHOTS
@hiccupps
Questions?
@hiccupps
References
Ada screenshots: https://www.uisources.com/explainer/ada-diagnosing-via-chat-bot
Rope images: https://unsplash.com/photos/pqx4G7EPomY, https://unsplash.com/photos/y5XnFwUgr-k
Wipotec: https://www.wipotec-ocs.com/en/product-inspection/
Microscope: https://londonlaboquip.com/product/microscope-binocular-biological-sc302
Messy lab: https://imgur.com/gallery/bQiK6
Dice: https://www.richardhughesjones.com/luck-randomness/dice-gif/
Altwalker: https://altom.gitlab.io/altwalker/altwalker/
Star Wars: https://www.starwars.com/video/one-in-a-million-shot
@qahiccupps

More Related Content

Similar to Walking the Talk

Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails:Latest Trends in Bioscience Literature SearchText, Tags and Thumbnails:Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
marti_hearst
 

Similar to Walking the Talk (20)

Elsevier
ElsevierElsevier
Elsevier
 
AI x Design_Haverinen 2023.pdf
AI x Design_Haverinen 2023.pdfAI x Design_Haverinen 2023.pdf
AI x Design_Haverinen 2023.pdf
 
BI on Big Data Presentation
BI on Big Data PresentationBI on Big Data Presentation
BI on Big Data Presentation
 
Data Science-Why?What?How? By Hari Prasad
Data Science-Why?What?How? By Hari PrasadData Science-Why?What?How? By Hari Prasad
Data Science-Why?What?How? By Hari Prasad
 
Interpreting and Steering AI Explanations Via Interactive Visualizations
Interpreting and Steering AI Explanations Via Interactive VisualizationsInterpreting and Steering AI Explanations Via Interactive Visualizations
Interpreting and Steering AI Explanations Via Interactive Visualizations
 
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceScott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Lean Security
Lean SecurityLean Security
Lean Security
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and Challenges
 
Proffer Blockchain Hackathon $17K+ prizes | Launch Presentation
Proffer Blockchain Hackathon $17K+ prizes | Launch PresentationProffer Blockchain Hackathon $17K+ prizes | Launch Presentation
Proffer Blockchain Hackathon $17K+ prizes | Launch Presentation
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
 
Evaluation and User Study in HCI
Evaluation and User Study in HCIEvaluation and User Study in HCI
Evaluation and User Study in HCI
 
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails:Latest Trends in Bioscience Literature SearchText, Tags and Thumbnails:Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Weapons of Math Instruction: Evolving from Data0-Driven to Science-Driven
Weapons of Math Instruction: Evolving from Data0-Driven to Science-DrivenWeapons of Math Instruction: Evolving from Data0-Driven to Science-Driven
Weapons of Math Instruction: Evolving from Data0-Driven to Science-Driven
 
Hci Overview
Hci OverviewHci Overview
Hci Overview
 
Tune up your data science process
Tune up your data science processTune up your data science process
Tune up your data science process
 
The Myths + Realities of Machine-Learning Cybersecurity
The Myths + Realities of Machine-Learning CybersecurityThe Myths + Realities of Machine-Learning Cybersecurity
The Myths + Realities of Machine-Learning Cybersecurity
 
Construction and Querying of Dynamic Knowledge Graphs
Construction and Querying of Dynamic Knowledge GraphsConstruction and Querying of Dynamic Knowledge Graphs
Construction and Querying of Dynamic Knowledge Graphs
 

More from James Thomas

Your Testing is a Joke
Your Testing is a JokeYour Testing is a Joke
Your Testing is a Joke
James Thomas
 

More from James Thomas (14)

Exploring with Automation
Exploring with AutomationExploring with Automation
Exploring with Automation
 
How to Test Anything
How to Test AnythingHow to Test Anything
How to Test Anything
 
We Don't Know?
We Don't Know?We Don't Know?
We Don't Know?
 
People problems
People problemsPeople problems
People problems
 
When Support Calls
When Support CallsWhen Support Calls
When Support Calls
 
Testing vs Chicken
Testing vs ChickenTesting vs Chicken
Testing vs Chicken
 
James thomas
James thomasJames thomas
James thomas
 
Theoreticus Prime vs Praktikertron
Theoreticus Prime vs PraktikertronTheoreticus Prime vs Praktikertron
Theoreticus Prime vs Praktikertron
 
Testing All the Way Down, and Other Directions
Testing All the Way Down, and Other DirectionsTesting All the Way Down, and Other Directions
Testing All the Way Down, and Other Directions
 
What is What is Professional Testing?
What is What is Professional Testing?What is What is Professional Testing?
What is What is Professional Testing?
 
Bug-Free Software? Go For It!
Bug-Free Software? Go For It!Bug-Free Software? Go For It!
Bug-Free Software? Go For It!
 
Your Testing is a Joke
Your Testing is a JokeYour Testing is a Joke
Your Testing is a Joke
 
You're Having a Laugh
You're Having  a LaughYou're Having  a Laugh
You're Having a Laugh
 
It's Like That
It's Like ThatIt's Like That
It's Like That
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 

Walking the Talk

Editor's Notes

  1. **Navigating** Error handling Consistency of API E.g. Male only assessments - compromise test code to get a walking skeleton; POST vs GET, two similar but slightly different schemas in the API itself. These activities make me ask questions… what if I …? How about when? Could it possibly be that …? Problems with the walker and with the product. While automating I’m testing. Don’t be too quick to restrict to what you think the system wants. Can I get from start to finish? What assumptions are required? What workarounds are required? How might developers struggle here? … ** Checking** E.g. certain kinds of dialog turns have different properties to assert on - keys in DTO must be present, or in some kind of relationship. What can I assert specifically and generally? Where are the edge cases? (e.g. by general global assertions failing) What are the error cases? How valuable is it to check these things here? ** Exploring** When you explore you don’t know if you’ll find anything, and if you find something you won’t know whether it’s relevant, and if it’s relevant you won’t know whether it’s important. Code is a tool and a toolkit. Extend it to the next question you have. (Can I get to “call ambulance” outcomes? What would need to happen to do that? How could I avoid it?) Log paths and outcomes. Analyse outside the code for patterns (seen and missing) Don’t error check too heavily. Catch and investigate failures. (Expose assumptions) Check some positive cases by hand. Check failures by hand. Look for patterns. (Don’t have to catch all failures; TODO outcomes are fine because then you can filter them in later analysis.) Explore the data you produce. Replay for repeatability. Configuration to guide the direction of exploration. (Initially, I’d just hack the code) How can I take the extreme choice each time? How can I make the longest assessment? Can I run an assessment for ever? A toolkit to gather data … … for human analysis. Failures are targets. Patterns are indicators. …
  2. Parallel nothing clever - just run two or more copies!