James_Thomas_Walk_The_Talk_BCS.pptx

•Download as PPTX, PDF•

0 likes•73 views

Three related coverage risks stood out when I joined a new project to build a chatbot API for a medical symptom checker. With an infinite space of possible chats, how could we: 1. look for unintended consequences of changes. 2. discover some of the edge and corner cases bugs. 3. exercise the API significantly. To help mitigate these risks I built a client which would randomly walk through dialogs, unattended, and report on what it had found. In this talk, I'll describe how I implemented that client by iteratively adding functionality that I hoped would facilitate my exploration of changes and fixes to the emerging API. I'll give examples of features that worked well (such as configuration of probabilities for different types of answers) and those that did not (such as checking for specific classes of medical outcome), explain how I built on top of the client to make a load testing tool, and think about what I'd do differently next time.

Technology

Walking the Talk: Random
Exploration of a Chatbot API
James Thomas
British Computer Society 2022-11-16
@qahiccupps

www.associationforsoftwaretesting.org
@qahiccupps

Project: Extract a Basic Chatbot API
@qahiccupps
@qahiccupps

The Task
Presentation-agnostic API.
Integration with text-based clients.
Turn-based medical assessment.
Short deadline.
@qahiccupps

Risks, Questions, Test Ideas
@qahiccupps

Covering the Assessment Space
Unintended consequences.
Discover edges and corners.
Exercise the API extensively.
@@qahiccupps

The idea
Run unattended medical assessments.
Can be iterated and customised quickly.
Identify places for deeper inspection.
@qahiccupps

Start
“OK”
Random
Choose
randomly
Card
type?
More
turns?
Stop
yes
no
text
input
choice
The Simplest Thing That Could Possibly Work
@qahiccupps
“Welcome to
Ada.”
“What is your
name?”
“Which option,
A, B, or C?”

✅❌
@@qahiccupps
✅❌
✅❌
✅❌
✅❌
❓
❓
❓
❓
❓ ❓ ❓
Navigate Explore
Survey

Configuration
@qahiccupps
Another
symptom
~1/5 turns

Detailed, Parsable Logs
@qahiccupps
Archive
config
Archive
state

500 Server Error ❌
❌
Back end
Back end
Back end
❌
@qahiccupps

GREAT SHOT KID,
THAT WAS ONE
IN A MILLION
@qahiccupps

What Worked Well
Verbose logs.
Asserting generally and on fixes.
Randomisation for unknowns.
Configuration for directed exploration.
Toolkit (replay, parallel, analysis, …).
Question-driven development.
@qahiccupps
Code changes and dependencies.
Card identifiers.
Medical testing.
State.
No explicit model.
What Worked Well … and What was Challenging

References
Ada screenshots: https://www.uisources.com/explainer/ada-diagnosing-via-chat-bot
Wipotec: https://www.wipotec-ocs.com/en/product-inspection/
Microscope: https://londonlaboquip.com/product/microscope-binocular-biological-sc302
Messy lab: https://imgur.com/gallery/bQiK6
Dice: https://www.richardhughesjones.com/luck-randomness/dice-gif/
Altwalker: https://altom.gitlab.io/altwalker/altwalker/
Star Wars: https://www.starwars.com/video/one-in-a-million-shot
@qahiccupps

Similar to James_Thomas_Walk_The_Talk_BCS.pptx

When Support CallsJames Thomas

Leveling up your JavaScipt - DrupalJam 2017Christian Heilmann

Travel Hacking 101: The ROI of HackathonsClickslide

CoffeeScript: The Good PartsC4Media

AppSec Pipelines and Event based SecurityMatt Tesauro

JavaScript isn't evil.Christian Heilmann

WAPWG Clark defining capturing_web-based_ifSara Day Thomson

Alejandro Saucedo Presentation on IWMC 2015Iran Entrepreneurship Association

Coding a SaaSChris on Code

Leaping Forward: Finding The Future of Your API DocsPronovix

Creating Your MVP (or Startup Validation Hacks)Abby Fichtner

Y Pipes Mashup CampJinho Jung

Seminar report on captchakunalkiit

Similar to James_Thomas_Walk_The_Talk_BCS.pptx (13)

When Support Calls

Leveling up your JavaScipt - DrupalJam 2017

Travel Hacking 101: The ROI of Hackathons

CoffeeScript: The Good Parts

AppSec Pipelines and Event based Security

JavaScript isn't evil.

WAPWG Clark defining capturing_web-based_if

Alejandro Saucedo Presentation on IWMC 2015

Coding a SaaS

Leaping Forward: Finding The Future of Your API Docs

Creating Your MVP (or Startup Validation Hacks)

Y Pipes Mashup Camp

Seminar report on captcha

Recently uploaded

Structuring Teams and Portfolios for SuccessUXDXConf

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance

Top 10 Symfony Development Companies 2024TopCSSGallery

Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp

Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance

Introduction to Open Source RAG and RAG EvaluationZilliz

PLAI - Acceleration Program for Generative A.I. StartupsStefano

Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance

Demystifying gRPC in .Net by John StaveleyJohn Staveley

Designing for Hardware Accessibility at ComcastUXDXConf

Syngulon - Selection technology May 2024.pdfSyngulon

Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance

UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10

Speed Wins: From Kafka to APIs in Minutesconfluent

WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim

Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk

AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379

The Metaverse: Are We There Yet?Mark Billinghurst

A Business-Centric Approach to Design System StrategyUXDXConf

Recently uploaded (20)

Structuring Teams and Portfolios for Success

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf

Top 10 Symfony Development Companies 2024

Buy Epson EcoTank L3210 Colour Printer Online.pdf

Agentic RAG What it is its types applications and implementation.pdf

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...

Introduction to Open Source RAG and RAG Evaluation

PLAI - Acceleration Program for Generative A.I. Startups

Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf

Demystifying gRPC in .Net by John Staveley

Designing for Hardware Accessibility at Comcast

Syngulon - Selection technology May 2024.pdf

Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...

UiPath Test Automation using UiPath Test Suite series, part 2

Speed Wins: From Kafka to APIs in Minutes

WSO2CONMay2024OpenSourceConferenceDebrief.pptx

Intro in Product Management - Коротко про професію продакт менеджера

AI presentation and introduction - Retrieval Augmented Generation RAG 101

The Metaverse: Are We There Yet?

A Business-Centric Approach to Design System Strategy

James_Thomas_Walk_The_Talk_BCS.pptx

1. Walking the Talk: Random Exploration of a Chatbot API James Thomas British Computer Society 2022-11-16 @qahiccupps

2. www.associationforsoftwaretesting.org @qahiccupps

3. www.ada.com @qahiccupps supported by

4. @qahiccupps

5. Project: Extract a Basic Chatbot API @qahiccupps @qahiccupps

6. The Task Presentation-agnostic API. Integration with text-based clients. Turn-based medical assessment. Short deadline. @qahiccupps

7. Risks, Questions, Test Ideas @qahiccupps

8. Covering the Assessment Space Unintended consequences. Discover edges and corners. Exercise the API extensively. @@qahiccupps

9. A Walker @qahiccupps

10. The idea Run unattended medical assessments. Can be iterated and customised quickly. Identify places for deeper inspection. @qahiccupps

11. Start “OK” Random Choose randomly Card type? More turns? Stop yes no text input choice The Simplest Thing That Could Possibly Work @qahiccupps “Welcome to Ada.” “What is your name?” “Which option, A, B, or C?”

12. ✅❌ @@qahiccupps ✅❌ ✅❌ ✅❌ ✅❌ ❓ ❓ ❓ ❓ ❓ ❓ ❓ Navigate Explore Survey

13. Demonstration @qahiccupps

14.

15. Implementation @qahiccupps

16. Configuration @qahiccupps Another symptom ~1/5 turns

17. Detailed, Parsable Logs @qahiccupps Archive config Archive state

18. Parallel @qahiccupps

19. Long Assessments

20. 500 Server Error ❌ ❌ Back end Back end Back end ❌ @qahiccupps

21. Reflection @qahiccupps

22. @qahiccupps

23. @qahiccupps

24. @qahiccupps

25.

26. GREAT SHOT KID, THAT WAS ONE IN A MILLION @qahiccupps

27. YEAH! I TOOK A MILLION SHOTS

28. Questions? @qahiccupps

29.

30. What Worked Well Verbose logs. Asserting generally and on fixes. Randomisation for unknowns. Configuration for directed exploration. Toolkit (replay, parallel, analysis, …). Question-driven development. @qahiccupps Code changes and dependencies. Card identifiers. Medical testing. State. No explicit model. What Worked Well … and What was Challenging

31. References Ada screenshots: https://www.uisources.com/explainer/ada-diagnosing-via-chat-bot Wipotec: https://www.wipotec-ocs.com/en/product-inspection/ Microscope: https://londonlaboquip.com/product/microscope-binocular-biological-sc302 Messy lab: https://imgur.com/gallery/bQiK6 Dice: https://www.richardhughesjones.com/luck-randomness/dice-gif/ Altwalker: https://altom.gitlab.io/altwalker/altwalker/ Star Wars: https://www.starwars.com/video/one-in-a-million-shot @qahiccupps

Editor's Notes

**Navigating** Error handling Consistency of API E.g. Male only assessments - compromise test code to get a walking skeleton; POST vs GET, two similar but slightly different schemas in the API itself. These activities make me ask questions… what if I …? How about when? Could it possibly be that …? Problems with the walker and with the product. While automating I’m testing. Don’t be too quick to restrict to what you think the system wants.Can I get from start to finish? What assumptions are required? What workarounds are required? How might developers struggle here? … ** Checking** E.g. certain kinds of dialog turns have different properties to assert on - keys in DTO must be present, or in some kind of relationship. What can I assert specifically and generally? Where are the edge cases? (e.g. by general global assertions failing) What are the error cases? How valuable is it to check these things here? ** Exploring** When you explore you don’t know if you’ll find anything, and if you find something you won’t know whether it’s relevant, and if it’s relevant you won’t know whether it’s important.Code is a tool and a toolkit. Extend it to the next question you have. (Can I get to “call ambulance” outcomes? What would need to happen to do that? How could I avoid it?) Log paths and outcomes. Analyse outside the code for patterns (seen and missing) Don’t error check too heavily. Catch and investigate failures. (Expose assumptions) Check some positive cases by hand. Check failures by hand. Look for patterns. (Don’t have to catch all failures; TODO outcomes are fine because then you can filter them in later analysis.) Explore the data you produce. Replay for repeatability. Configuration to guide the direction of exploration. (Initially, I’d just hack the code) How can I take the extreme choice each time? How can I make the longest assessment? Can I run an assessment for ever? A toolkit to gather data … … for human analysis. Failures are targets. Patterns are indicators. …
Parallel nothing clever - just run two or more copies!

James_Thomas_Walk_The_Talk_BCS.pptx

Recommended

Recommended

More Related Content

Similar to James_Thomas_Walk_The_Talk_BCS.pptx

Similar to James_Thomas_Walk_The_Talk_BCS.pptx (13)

More from James Thomas

More from James Thomas (12)

Recently uploaded

Recently uploaded (20)

James_Thomas_Walk_The_Talk_BCS.pptx

Editor's Notes