SlideShare uma empresa Scribd logo
Data/AI driven product
development
from video streaming to telehealth
Xavier Amatriain
Co-founder/CTO Curai
(with Anitha Kannan, Head of ML Research, Curai)
August 18, 2022
About me...
● Researcher in Recommender Systems
● Started and led ML Algorithms at Netflix
● Head of Engineering at Quora
● Currently co-founder/CTO at Curai
2
Outline
1. Data/AI driven product development: experiences in recommender
systems
2. Data/AI driven product development in healthcare: the Curai
experience
3. Principles for data/AI driven product development
Principles for data/AI driven product development (preview)
1. Make data trustworthy and accessible
2. Follow a hypothesis-driven offline/online
experimentation approach with clearly defined metrics
3. Start from the simplest approach, ensure AI improves
over time, with data/metrics driving improvement
4. More data only matters if it’s better data, and if the
model is complex enough to learn from it
5. AI affects UX and UX affects AI
1. Data/AI driven
product
development:
Recsys
5
What we were interested in:
▪ Improving the product with data + AI
▪ Hypothesis: higher quality recommendations
will lead to higher member retention
Proxy (offline) question:
▪ Accuracy in predicted rating
▪ Improve by 10% = $1million!
▪ Metric:
▪ Top 2 algorithms
▪ SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Limitations
▪ Designed for 100M ratings, not XB ratings
▪ Not adaptable as users add ratings
▪ Performance issues
What about the final prize ensembles?
● Offline studies showed they were
too computationally intensive to
scale
● Expected improvement not worth
engineering effort
● Plus…. we uncovered that the
proxy question (offline
experiment) did not correlate with
online product gains
https://amatriain.net/blog/
Evolution of the Recommender Problem
Rating Ranking Page
Optimization
4.7
Context-aware
Recommendatio
ns
Context
Popularity
Predicted
Rating
1
2
3
4
5
Linear Model:
frank
(u,v) = w1
p(v) + w2
r(u,v) + b
Final
Ranking
Example: Two features, linear model
Popularity
1
2
3
4
5
Final
Ranking
Predicted
Rating
Example: Two features, linear model
Ranking - Quora Feed
Goal: Present most interesting stories for
a user at a given time
Interesting = topical relevance +
social relevance + timeliness
Stories = questions + answers
Model: Personalized learning-to-rank
approach
Relevance-ordered vs time-ordered =
big gains in engagement
From ranking to page composition and beyond
From “Modeling User Attention and
Interaction on the Web” 2014 - D. Lagun
2. Data/AI driven
product
development
revisited:
Healthcare
14
● >50% world with no access
to essential health services
○ ~30% of US adults
under-insured
● ~15 min. to capture
information, diagnose,
recommend treatment
● 30% of the medical errors
causing ~400k deaths a
year are due to
misdiagnosis
Healthcare access, quality, and scalability
shortage of 120,000 physicians by 2030
Towards an AI powered learning health system
● Mobile-First Care, always
on, accessible, affordable
● AI + human providers in
the loop for quality care
● Always-Learning system
● AI to operate in-the-wild
(EHR)
FEEDBACK
DATA
MODEL
AI-augmented
medical
conversations
17
What does it mean for AI to be part of medical
practice?
Breakthroughs in AI & healthcare
Research areas at Curai
● Medical Reasoning and
Diagnosis
● NLP/Conversational AI
● Multimodal AI
Healthcare is knowledge intensive
● Medical terminologies/ontologies
○ SNOMED, UMLS, ICD 10
● Expert systems for clinical decision making
○ 1000s diseases and 3500+ findings
○ 30+ years of expert curation
● Electronic access to medical research
● Online reputed websites
Adding domain knowledge to modern AI
approaches is an active area of research 20
21
1. Differential
Diagnosis
SOTA Medical reasoning and diagnosis
ML + Expert systems for Dx models
female
middle aged
fever
cough
Influenza 16.9
bacterial pneumonia 16.9
acute sinusitis 10.9
asthma 10.9
common cold 10.9
influenza 0.753
bacterial pneumonia 0.205
asthma 0.017
acute sinusitis 0.008
pulmonary tuberculosis 0.007
Inputs
DDx with expert system DDx with ML model
Expert
system
Clinical case
simulator
Clinical cases
DDx
ML
model
Common cold
UTI
Acute bronchitis
Female
Middle-aged
Chronic cough
Nasal congestion
Other data
(e.g. EHR)
COVID-aware modeling
Expert
system
Clinical case
simulator
Clinical cases with
DDx
ML
model
Common cold
UTI
Acute bronchitis
Female
Middle-aged
Chronic cough
Nasal congestion
COVID-19
assessment data
COVID-19
COVID-19
female
middle-age
cough
headache
nose discharge
cigarette smoking
hospital personnel
Evaluation
Clinical cases from Semigran dataset.
No clinical case corresponding to COVID
top-1 top-3 top-5
Practitioners 72.1% 84.3% -
Razzaki et.al. - 46.6% 64.7%
Expert system 66% 75% 86%
Ours - Baseline 67.6% 85.8% 92.9%
Ours - COVID as label 61.8% 84.4% 93.3%
Semigran et.al. Evaluation of symptom checkers for self diagnosis and triage: audit study, BMJ 2015
Adding COVID does not
adversely affect
performance
Previous best result based on
inference on graphical model
25
26
2. Conversational
History Taking
27
P: Right now my stomach hurts.
P: It feels like I do need to do a clean out. If you know what I
mean
D: Sorry for the abdominal pain. When did you have last
bowel movement?
P: It was yesterday
D: What was the consistency of stool. Was it soft
well-formed or was it hard?
P: Right now I just want and its watery and very loosely
P: That was was causing with my stomach hurts
D: Any blood or mucus with stools? Was it foul smelling?
P: Nope for all three
D: Any fever
P: Nope
D: I asked as blood or mucus in stool can be due an
underlying infection
D: Any nausea/vomiting?
P: Nope
P: Why does this happen to me?
P: Is it something I have ate?
D: Diarrhea can be often due to indigestion or infection. Did
you eat any outside food or packaged food?
P: yes
Patient-provider dialogue
*The conversation has been de-identified for privacy protection
Combining SOTA LLMs with knowledge
● LLMs are great at:
○ Ability to adapt to a broad range
of tasks and situations
○ Ability to engage with the
audience
○ Giving empathetic responses
○ Showing personality and
sounding natural
28
Thoppilan et.al. LaMDA: Language Models for Dialog Applications, 2022
Roller et. al. Recipes for building an open-domain chatbot 2020
Adiwardna et.al. Towards a human-like open-domain chatbot 2020
● LLMs are not great at:
○ Staying truthful. I.e. they often
hallucinate knowledge
○ Dealing with long-range
dependencies and solving tasks
with large output space
○ Reasoning. They can “retrieve”
knowledge without deeper
understanding or reasoning
29
Conversational history taking
1. Natural language
understanding
a. What did the patient say?
2. Dialog management
a. What to ask when?
b. How to decide when to stop
3. Natural language generation
a. How to ask?
30
3. Summarization
Medical summarization using LLMs
● Insight 1: LLMs (e.g. GPT-3) can be
prompted to produce good
summaries in a few-shot setting
● Insight 2: LLMs can be ensembled
and used as data generators to
improve quality of summarization
results
● Insight 3: Medical domain knowledge
can be injected into these models so
that they produce medically correct
and complete summaries
Hasn't used any thing to
help
Priming Inference
GPT-3
Hasn’t used any thing to help
other than hydrocortisone
Used nothing else to help
other than Benadryl and
hydrocortisone.
10
Trials
21 Labeled Examples
per Priming Context
GPT-3
GPT-3
GPT-3-ENS Labeled Dataset
+
Doctor Labeled/Corrected Dataset
In-House Summarization Model
Confidentiality note: In accordance with our privacy policy, the illustrative examples included in this document do NOT correspond to real patients. They are either synthetic or fully anonymized.
Chat Snippet:
DR: Thanks for ...
PT: No that’s everything
Qualitative Results
Snippet
Model trained on 6400
doctor-labeled
Model trained on 6400
GPT-3 Ensembled
Model trained on
doctor-labeled + GPT-3
Ensembled
DR: Have you ever been tested
for any underlying health
conditions such as diabetes,
hypothyroidism or polycystic
ovarian syndrome?
PT: No
PT: I have been told I have
prediabetes.
Has not been tested for
any underlying health
conditions.
Hasn’t tested for any
underlying health
conditions such as
diabetes, hypothyroidism
or polycystic ovarian
syndrome
Has not been tested for any
underlying health conditions.
Has been told has
prediabetes.
DR: Do you have pus appearing
discharge from the site?
PT: Yes. If the bubbles pop it
leaks out a watery substance
Has pus appearing from
the site.
Pus appearing from the
site
Pus discharge from the site.
If bubbles pop it leaks out a
substance.
33
*The conversation has been de-identified for privacy protection
Chintagunta et.al. Medically aware GPT-3 as a data generator for medical dialog summarization, MLHC 2021
3. Principles
34
Do I need more
data, better data,
better AI algorithms,
or all the above?
What do I need to be “really” data/AI driven?
The case(s) for more/bigger data
Norvig:
“Google does not have
better Algorithms only
more Data”
The case(s) against more data
Is it about bigger models then?
The “Big data paradox” is not a paradox
● Not all data is good data (aka more data only matters if it is “better
data”)
● Only more complex models can benefit from more data -
bias/variance tradeoff
● We need to combine better data with better/more complex models
● And… all of this does not hold for highly parametrized deep learning
models where the bias/variance tradeoff breaks for still unknown
reasons (maybe related to double descent)
Better data leads to better models
Year Breakthrough in AI Datasets (First Available) Algorithms (First Proposal)
1994 Human-level spontaneous speech recognition Spoken Wall Street Journal articles and other
texts (1991)
Hidden Markov Model (1984)
1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka “The
Extended Book” (1991)
Negascout planning algorithm (1983)
2005 Google’s Arabic- and Chinese-to-English
translation
1,8 trillion tokens from Google Web and News
pages (collected in 2005)
Statistical machine translation algorithm (1988)
2011 IBM watson become the world Jeopardy!
Champion
8,6 million documents from Wikipedia,
Wiktionary, Wikiquote, and Project Gutenberg
(updated in 2005)
Mixture-of-Experts algorithm (1991)
2014 Google’s GoogLeNet object classification at
near-human performance
ImageNet corpus of 1,5 million labeled images
and 1,000 object catagories (2010)
Convolution neural network algorithm (1989)
2015 Google’s Deepmind achieved human parity in
playing 29 Atari games by learning general
control from video
Arcade Learning Environment dataset of over
50 Atari games (2013)
Q-learning algorithm (1992)
Average No. Of Years to Breakthrough 3 years 18 years
The average elapsed time between key algorithm proposals and corresponding advances was about 18 years,
whereas the average elapsed time between key dataset availabilities and corresponding advances was less
than 3 years, or about 6 times faster.
WARNING! It is not
only about data +
models
Model learning depends on objective + metric
● Quora feed example:
○ Training data = implicit + explicit
○ Target function = Value of showing
a story to a user ~ weighted sum of
actions
■ Compute probability of each
action given a story, weight them
by their value to compute expected
value
○ Metric = Any ranking metric
UI is key: e.g. explanations
The importance of the experimentation framework
● Offline
○ Measure model performance, using metrics
○ Offline performance = indication to make decisions
on follow-up A/B tests
○ A critical (and mostly unsolved) issue is how offline
metrics correlate with A/B test results.
● Online
○ Measure differences in metrics across statistically
identical populations that each experience a
different algorithm.
○ Overall Evaluation Criteria (OEC)
■ Use long-term metrics whenever possible
■ Short-term metrics can be informative and
allow faster decisions. But, not always
aligned with OEC
PRINCIPLES
Principles for data/AI driven product development
● Make Data Trustworthy
● Make Data Accessible
● Follow a Hypothesis-driven approach
● Define Clear Metrics
● Measure offline/online
Principles for data/AI driven product development (I)
● Data/metrics drive AI
● AI should improve over time
● More data only matters if it’s better data
● Start with simplest model
● Increase model complexity and data size in parallel
● Connect AI to UI
Principles for data/AI driven product development (summary)
1. Make data trustworthy and accessible
2. Follow a hypothesis-driven offline/online
experimentation approach with clearly defined metrics
3. Start from the simplest approach, ensure AI improves
over time, with data/metrics driving improvement
4. More data only matters if it’s better data, and if the
model is complex enough to learn from it
5. AI affects UX and UX affects AI
2. Further
“reading”
49
4 hour lecture on recommendations
Carnegie Mellon (2014)
1 hour lecture on practical Deep Learning
UC Berkeley (2020)
10 minutes on AI for COVID
Stanford (2020)
1 hour podcast on AI for Healthcare
Gradient Dissent (2021)

Mais conteúdo relacionado

Mais procurados

Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DATAVERSITY
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
GirdhareeSaran
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
Big query
Big queryBig query
Big query
Tanvi Parikh
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Lucian Neghina
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
Seta Wicaksana
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
William Lyon
 
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Edureka!
 
HITS + Pagerank
HITS + PagerankHITS + Pagerank
HITS + Pagerank
ajkt
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Krishnaram Kenthapadi
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Edureka!
 
Reconciling your Enterprise Data Warehouse to Source Systems
Reconciling your Enterprise Data Warehouse to Source SystemsReconciling your Enterprise Data Warehouse to Source Systems
Reconciling your Enterprise Data Warehouse to Source Systems
Method360
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdf
Priyanka Aash
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
Gene Leybzon
 
AI and Future of Professions
AI and Future of ProfessionsAI and Future of Professions
AI and Future of Professions
Jeffrey Funk
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 

Mais procurados (20)

Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Big query
Big queryBig query
Big query
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
 
HITS + Pagerank
HITS + PagerankHITS + Pagerank
HITS + Pagerank
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Reconciling your Enterprise Data Warehouse to Source Systems
Reconciling your Enterprise Data Warehouse to Source SystemsReconciling your Enterprise Data Warehouse to Source Systems
Reconciling your Enterprise Data Warehouse to Source Systems
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdf
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
AI and Future of Professions
AI and Future of ProfessionsAI and Future of Professions
AI and Future of Professions
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

Semelhante a Data/AI driven product development: from video streaming to telehealth

AI Driven Product Innovation
AI Driven Product InnovationAI Driven Product Innovation
AI Driven Product Innovation
ebelani
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
Xavier Amatriain
 
The future interface of mental health with information technology: high touch...
The future interface of mental health with information technology: high touch...The future interface of mental health with information technology: high touch...
The future interface of mental health with information technology: high touch...
HealthXn
 
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...
Pei-Yun Sabrina Hsueh
 
Diabetes therapies and technology: implications for doctors and patients
Diabetes therapies and technology: implications for doctors and patientsDiabetes therapies and technology: implications for doctors and patients
Diabetes therapies and technology: implications for doctors and patients
HealthXn
 
The Skynet Effect: How HR Can Best Utilize AI
The Skynet Effect: How HR Can Best Utilize AIThe Skynet Effect: How HR Can Best Utilize AI
The Skynet Effect: How HR Can Best Utilize AI
Aggregage
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
Dale Sanders
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 update
Xavier Amatriain
 
Renish Dadhaniya - GlobeSync Technologies | Work at a glance
Renish Dadhaniya - GlobeSync Technologies | Work at a glanceRenish Dadhaniya - GlobeSync Technologies | Work at a glance
Renish Dadhaniya - GlobeSync Technologies | Work at a glance
GlobeSync Technologies
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
marcus evans Network
 
Big Data Means Big Potential Challenges for Nurse Execs Response.pdf
Big Data Means Big Potential Challenges for Nurse Execs Response.pdfBig Data Means Big Potential Challenges for Nurse Execs Response.pdf
Big Data Means Big Potential Challenges for Nurse Execs Response.pdf
bkbk37
 
mini project computer engineering e-doctor.pdf
mini project computer engineering e-doctor.pdfmini project computer engineering e-doctor.pdf
mini project computer engineering e-doctor.pdf
289khashshamkhan
 
Conversation research: leveraging the power of social media
Conversation research: leveraging the power of social mediaConversation research: leveraging the power of social media
Conversation research: leveraging the power of social media
SKIM
 
Ai applied in healthcare
Ai applied in healthcareAi applied in healthcare
Ai applied in healthcare
Javier Samir Rey
 
The Vision for Data @ the NIH
The Vision for Data @ the NIHThe Vision for Data @ the NIH
The Vision for Data @ the NIH
Philip Bourne
 
Mental Health Chatbot System by Using Machine Learning
Mental Health Chatbot System by Using Machine LearningMental Health Chatbot System by Using Machine Learning
Mental Health Chatbot System by Using Machine Learning
IRJET Journal
 
Swapnil soni Thesis_Presentation
Swapnil soni Thesis_PresentationSwapnil soni Thesis_Presentation
Swapnil soni Thesis_Presentation
Swapnil Soni
 
Domain Specific Document Retrieval Framework for Near Real-time Social Health...
Domain Specific Document Retrieval Framework for Near Real-time Social Health...Domain Specific Document Retrieval Framework for Near Real-time Social Health...
Domain Specific Document Retrieval Framework for Near Real-time Social Health...
Artificial Intelligence Institute at UofSC
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
Wessel Kraaij
 
From personal health data to a personalized advice
From personal health data to a personalized adviceFrom personal health data to a personalized advice
From personal health data to a personalized advice
Wessel Kraaij
 

Semelhante a Data/AI driven product development: from video streaming to telehealth (20)

AI Driven Product Innovation
AI Driven Product InnovationAI Driven Product Innovation
AI Driven Product Innovation
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
 
The future interface of mental health with information technology: high touch...
The future interface of mental health with information technology: high touch...The future interface of mental health with information technology: high touch...
The future interface of mental health with information technology: high touch...
 
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...
 
Diabetes therapies and technology: implications for doctors and patients
Diabetes therapies and technology: implications for doctors and patientsDiabetes therapies and technology: implications for doctors and patients
Diabetes therapies and technology: implications for doctors and patients
 
The Skynet Effect: How HR Can Best Utilize AI
The Skynet Effect: How HR Can Best Utilize AIThe Skynet Effect: How HR Can Best Utilize AI
The Skynet Effect: How HR Can Best Utilize AI
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 update
 
Renish Dadhaniya - GlobeSync Technologies | Work at a glance
Renish Dadhaniya - GlobeSync Technologies | Work at a glanceRenish Dadhaniya - GlobeSync Technologies | Work at a glance
Renish Dadhaniya - GlobeSync Technologies | Work at a glance
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
 
Big Data Means Big Potential Challenges for Nurse Execs Response.pdf
Big Data Means Big Potential Challenges for Nurse Execs Response.pdfBig Data Means Big Potential Challenges for Nurse Execs Response.pdf
Big Data Means Big Potential Challenges for Nurse Execs Response.pdf
 
mini project computer engineering e-doctor.pdf
mini project computer engineering e-doctor.pdfmini project computer engineering e-doctor.pdf
mini project computer engineering e-doctor.pdf
 
Conversation research: leveraging the power of social media
Conversation research: leveraging the power of social mediaConversation research: leveraging the power of social media
Conversation research: leveraging the power of social media
 
Ai applied in healthcare
Ai applied in healthcareAi applied in healthcare
Ai applied in healthcare
 
The Vision for Data @ the NIH
The Vision for Data @ the NIHThe Vision for Data @ the NIH
The Vision for Data @ the NIH
 
Mental Health Chatbot System by Using Machine Learning
Mental Health Chatbot System by Using Machine LearningMental Health Chatbot System by Using Machine Learning
Mental Health Chatbot System by Using Machine Learning
 
Swapnil soni Thesis_Presentation
Swapnil soni Thesis_PresentationSwapnil soni Thesis_Presentation
Swapnil soni Thesis_Presentation
 
Domain Specific Document Retrieval Framework for Near Real-time Social Health...
Domain Specific Document Retrieval Framework for Near Real-time Social Health...Domain Specific Document Retrieval Framework for Near Real-time Social Health...
Domain Specific Document Retrieval Framework for Near Real-time Social Health...
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
 
From personal health data to a personalized advice
From personal health data to a personalized adviceFrom personal health data to a personalized advice
From personal health data to a personalized advice
 

Mais de Xavier Amatriain

AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
Xavier Amatriain
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
Xavier Amatriain
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
Xavier Amatriain
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
Xavier Amatriain
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
Xavier Amatriain
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
Xavier Amatriain
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
Xavier Amatriain
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
Xavier Amatriain
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
Xavier Amatriain
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
Xavier Amatriain
 

Mais de Xavier Amatriain (20)

AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
 

Último

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Último (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Data/AI driven product development: from video streaming to telehealth

  • 1. Data/AI driven product development from video streaming to telehealth Xavier Amatriain Co-founder/CTO Curai (with Anitha Kannan, Head of ML Research, Curai) August 18, 2022
  • 2. About me... ● Researcher in Recommender Systems ● Started and led ML Algorithms at Netflix ● Head of Engineering at Quora ● Currently co-founder/CTO at Curai 2
  • 3. Outline 1. Data/AI driven product development: experiences in recommender systems 2. Data/AI driven product development in healthcare: the Curai experience 3. Principles for data/AI driven product development
  • 4. Principles for data/AI driven product development (preview) 1. Make data trustworthy and accessible 2. Follow a hypothesis-driven offline/online experimentation approach with clearly defined metrics 3. Start from the simplest approach, ensure AI improves over time, with data/metrics driving improvement 4. More data only matters if it’s better data, and if the model is complex enough to learn from it 5. AI affects UX and UX affects AI
  • 6. What we were interested in: ▪ Improving the product with data + AI ▪ Hypothesis: higher quality recommendations will lead to higher member retention Proxy (offline) question: ▪ Accuracy in predicted rating ▪ Improve by 10% = $1million! ▪ Metric: ▪ Top 2 algorithms ▪ SVD - Prize RMSE: 0.8914 ▪ RBM - Prize RMSE: 0.8990 ▪ Linear blend Prize RMSE: 0.88 ▪ Limitations ▪ Designed for 100M ratings, not XB ratings ▪ Not adaptable as users add ratings ▪ Performance issues
  • 7. What about the final prize ensembles? ● Offline studies showed they were too computationally intensive to scale ● Expected improvement not worth engineering effort ● Plus…. we uncovered that the proxy question (offline experiment) did not correlate with online product gains https://amatriain.net/blog/
  • 8. Evolution of the Recommender Problem Rating Ranking Page Optimization 4.7 Context-aware Recommendatio ns Context
  • 9. Popularity Predicted Rating 1 2 3 4 5 Linear Model: frank (u,v) = w1 p(v) + w2 r(u,v) + b Final Ranking Example: Two features, linear model
  • 11.
  • 12. Ranking - Quora Feed Goal: Present most interesting stories for a user at a given time Interesting = topical relevance + social relevance + timeliness Stories = questions + answers Model: Personalized learning-to-rank approach Relevance-ordered vs time-ordered = big gains in engagement
  • 13. From ranking to page composition and beyond From “Modeling User Attention and Interaction on the Web” 2014 - D. Lagun
  • 15. ● >50% world with no access to essential health services ○ ~30% of US adults under-insured ● ~15 min. to capture information, diagnose, recommend treatment ● 30% of the medical errors causing ~400k deaths a year are due to misdiagnosis Healthcare access, quality, and scalability shortage of 120,000 physicians by 2030
  • 16. Towards an AI powered learning health system ● Mobile-First Care, always on, accessible, affordable ● AI + human providers in the loop for quality care ● Always-Learning system ● AI to operate in-the-wild (EHR) FEEDBACK DATA MODEL AI-augmented medical conversations
  • 17. 17 What does it mean for AI to be part of medical practice?
  • 18. Breakthroughs in AI & healthcare
  • 19. Research areas at Curai ● Medical Reasoning and Diagnosis ● NLP/Conversational AI ● Multimodal AI
  • 20. Healthcare is knowledge intensive ● Medical terminologies/ontologies ○ SNOMED, UMLS, ICD 10 ● Expert systems for clinical decision making ○ 1000s diseases and 3500+ findings ○ 30+ years of expert curation ● Electronic access to medical research ● Online reputed websites Adding domain knowledge to modern AI approaches is an active area of research 20
  • 22. SOTA Medical reasoning and diagnosis
  • 23. ML + Expert systems for Dx models female middle aged fever cough Influenza 16.9 bacterial pneumonia 16.9 acute sinusitis 10.9 asthma 10.9 common cold 10.9 influenza 0.753 bacterial pneumonia 0.205 asthma 0.017 acute sinusitis 0.008 pulmonary tuberculosis 0.007 Inputs DDx with expert system DDx with ML model Expert system Clinical case simulator Clinical cases DDx ML model Common cold UTI Acute bronchitis Female Middle-aged Chronic cough Nasal congestion Other data (e.g. EHR)
  • 24. COVID-aware modeling Expert system Clinical case simulator Clinical cases with DDx ML model Common cold UTI Acute bronchitis Female Middle-aged Chronic cough Nasal congestion COVID-19 assessment data COVID-19 COVID-19 female middle-age cough headache nose discharge cigarette smoking hospital personnel
  • 25. Evaluation Clinical cases from Semigran dataset. No clinical case corresponding to COVID top-1 top-3 top-5 Practitioners 72.1% 84.3% - Razzaki et.al. - 46.6% 64.7% Expert system 66% 75% 86% Ours - Baseline 67.6% 85.8% 92.9% Ours - COVID as label 61.8% 84.4% 93.3% Semigran et.al. Evaluation of symptom checkers for self diagnosis and triage: audit study, BMJ 2015 Adding COVID does not adversely affect performance Previous best result based on inference on graphical model 25
  • 27. 27 P: Right now my stomach hurts. P: It feels like I do need to do a clean out. If you know what I mean D: Sorry for the abdominal pain. When did you have last bowel movement? P: It was yesterday D: What was the consistency of stool. Was it soft well-formed or was it hard? P: Right now I just want and its watery and very loosely P: That was was causing with my stomach hurts D: Any blood or mucus with stools? Was it foul smelling? P: Nope for all three D: Any fever P: Nope D: I asked as blood or mucus in stool can be due an underlying infection D: Any nausea/vomiting? P: Nope P: Why does this happen to me? P: Is it something I have ate? D: Diarrhea can be often due to indigestion or infection. Did you eat any outside food or packaged food? P: yes Patient-provider dialogue *The conversation has been de-identified for privacy protection
  • 28. Combining SOTA LLMs with knowledge ● LLMs are great at: ○ Ability to adapt to a broad range of tasks and situations ○ Ability to engage with the audience ○ Giving empathetic responses ○ Showing personality and sounding natural 28 Thoppilan et.al. LaMDA: Language Models for Dialog Applications, 2022 Roller et. al. Recipes for building an open-domain chatbot 2020 Adiwardna et.al. Towards a human-like open-domain chatbot 2020 ● LLMs are not great at: ○ Staying truthful. I.e. they often hallucinate knowledge ○ Dealing with long-range dependencies and solving tasks with large output space ○ Reasoning. They can “retrieve” knowledge without deeper understanding or reasoning
  • 29. 29 Conversational history taking 1. Natural language understanding a. What did the patient say? 2. Dialog management a. What to ask when? b. How to decide when to stop 3. Natural language generation a. How to ask?
  • 31. Medical summarization using LLMs ● Insight 1: LLMs (e.g. GPT-3) can be prompted to produce good summaries in a few-shot setting ● Insight 2: LLMs can be ensembled and used as data generators to improve quality of summarization results ● Insight 3: Medical domain knowledge can be injected into these models so that they produce medically correct and complete summaries
  • 32. Hasn't used any thing to help Priming Inference GPT-3 Hasn’t used any thing to help other than hydrocortisone Used nothing else to help other than Benadryl and hydrocortisone. 10 Trials 21 Labeled Examples per Priming Context GPT-3 GPT-3 GPT-3-ENS Labeled Dataset + Doctor Labeled/Corrected Dataset In-House Summarization Model Confidentiality note: In accordance with our privacy policy, the illustrative examples included in this document do NOT correspond to real patients. They are either synthetic or fully anonymized. Chat Snippet: DR: Thanks for ... PT: No that’s everything
  • 33. Qualitative Results Snippet Model trained on 6400 doctor-labeled Model trained on 6400 GPT-3 Ensembled Model trained on doctor-labeled + GPT-3 Ensembled DR: Have you ever been tested for any underlying health conditions such as diabetes, hypothyroidism or polycystic ovarian syndrome? PT: No PT: I have been told I have prediabetes. Has not been tested for any underlying health conditions. Hasn’t tested for any underlying health conditions such as diabetes, hypothyroidism or polycystic ovarian syndrome Has not been tested for any underlying health conditions. Has been told has prediabetes. DR: Do you have pus appearing discharge from the site? PT: Yes. If the bubbles pop it leaks out a watery substance Has pus appearing from the site. Pus appearing from the site Pus discharge from the site. If bubbles pop it leaks out a substance. 33 *The conversation has been de-identified for privacy protection Chintagunta et.al. Medically aware GPT-3 as a data generator for medical dialog summarization, MLHC 2021
  • 35. Do I need more data, better data, better AI algorithms, or all the above? What do I need to be “really” data/AI driven?
  • 36. The case(s) for more/bigger data Norvig: “Google does not have better Algorithms only more Data”
  • 37. The case(s) against more data
  • 38. Is it about bigger models then?
  • 39. The “Big data paradox” is not a paradox ● Not all data is good data (aka more data only matters if it is “better data”) ● Only more complex models can benefit from more data - bias/variance tradeoff ● We need to combine better data with better/more complex models ● And… all of this does not hold for highly parametrized deep learning models where the bias/variance tradeoff breaks for still unknown reasons (maybe related to double descent)
  • 40. Better data leads to better models Year Breakthrough in AI Datasets (First Available) Algorithms (First Proposal) 1994 Human-level spontaneous speech recognition Spoken Wall Street Journal articles and other texts (1991) Hidden Markov Model (1984) 1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka “The Extended Book” (1991) Negascout planning algorithm (1983) 2005 Google’s Arabic- and Chinese-to-English translation 1,8 trillion tokens from Google Web and News pages (collected in 2005) Statistical machine translation algorithm (1988) 2011 IBM watson become the world Jeopardy! Champion 8,6 million documents from Wikipedia, Wiktionary, Wikiquote, and Project Gutenberg (updated in 2005) Mixture-of-Experts algorithm (1991) 2014 Google’s GoogLeNet object classification at near-human performance ImageNet corpus of 1,5 million labeled images and 1,000 object catagories (2010) Convolution neural network algorithm (1989) 2015 Google’s Deepmind achieved human parity in playing 29 Atari games by learning general control from video Arcade Learning Environment dataset of over 50 Atari games (2013) Q-learning algorithm (1992) Average No. Of Years to Breakthrough 3 years 18 years The average elapsed time between key algorithm proposals and corresponding advances was about 18 years, whereas the average elapsed time between key dataset availabilities and corresponding advances was less than 3 years, or about 6 times faster.
  • 41. WARNING! It is not only about data + models
  • 42. Model learning depends on objective + metric ● Quora feed example: ○ Training data = implicit + explicit ○ Target function = Value of showing a story to a user ~ weighted sum of actions ■ Compute probability of each action given a story, weight them by their value to compute expected value ○ Metric = Any ranking metric
  • 43. UI is key: e.g. explanations
  • 44. The importance of the experimentation framework ● Offline ○ Measure model performance, using metrics ○ Offline performance = indication to make decisions on follow-up A/B tests ○ A critical (and mostly unsolved) issue is how offline metrics correlate with A/B test results. ● Online ○ Measure differences in metrics across statistically identical populations that each experience a different algorithm. ○ Overall Evaluation Criteria (OEC) ■ Use long-term metrics whenever possible ■ Short-term metrics can be informative and allow faster decisions. But, not always aligned with OEC
  • 46. Principles for data/AI driven product development ● Make Data Trustworthy ● Make Data Accessible ● Follow a Hypothesis-driven approach ● Define Clear Metrics ● Measure offline/online
  • 47. Principles for data/AI driven product development (I) ● Data/metrics drive AI ● AI should improve over time ● More data only matters if it’s better data ● Start with simplest model ● Increase model complexity and data size in parallel ● Connect AI to UI
  • 48. Principles for data/AI driven product development (summary) 1. Make data trustworthy and accessible 2. Follow a hypothesis-driven offline/online experimentation approach with clearly defined metrics 3. Start from the simplest approach, ensure AI improves over time, with data/metrics driving improvement 4. More data only matters if it’s better data, and if the model is complex enough to learn from it 5. AI affects UX and UX affects AI
  • 50. 4 hour lecture on recommendations Carnegie Mellon (2014) 1 hour lecture on practical Deep Learning UC Berkeley (2020) 10 minutes on AI for COVID Stanford (2020) 1 hour podcast on AI for Healthcare Gradient Dissent (2021)