SlideShare uma empresa Scribd logo
1 de 163
Baixar para ler offline
12 March 2021
Marco Altini, PhD
Twitter: @altini_marco
USER GENERATED DATA: A
PARADIGM SHIFT FOR RESEARCH
AND DATA PRODUCTS
2
Marco Altini
• PhD cum laude in Machine Learning
• MSc cum laude in Computer Science
Engineering
• MSc cum laude in Human Movement
Sciences, High Performance
Coaching
• Founder of HRV4Training (2013)
• Data Science Advisor at Oura
• Guest Lecturer at VU Amsterdam
• 50+ publications at the intersection
between technology, health and
performance
3
IN THIS LECTURE
What’s user generated data?
• Typical study and product
development workflow
• A new paradigm
4
IN THIS LECTURE
What’s user generated data?
• Typical study and product
development workflow
• A new paradigm
Challenges and opportunities
• Research and data products
5
IN THIS LECTURE
What’s user generated data?
• Typical study and product
development workflow
• A new paradigm
Challenges and opportunities
• Research and data products
All examples will be considering health and
sport science applications
WHAT’S USER GENERATED DATA?
7
MANY TYPES OF DATA
Content created by users of a product
8
MANY TYPES OF DATA
Content created by users of a product
Here we focus on sport and health:
• Wearables
• Phones
WHY DOES USER GENERATED
DATA MATTER?
10
DATA SCIENCE
As data scientists we can find new
clever ways to create value based
on the data collected:
• Research
• New features
• New products
• New insights
11
DATA SCIENCE
As data scientists we can find new
clever ways to create value based
on the data collected:
• Research
• New features
• New products
• New insights
User generated data opens new
opportunities due to larger sample
size, realistic settings, unforeseen
outcomes
SOME EXAMPLES
13
APPS
How can we create value for our
customers using data?
• HRV4Training
• Cardiac activity (HR/HRV)
• Context
14
APPS
How can we create value for our
customers using data?
• HRV4Training
• Identify / manage stressors
15
WEARABLES
What can we learn?
• Bloomlife
• Uterine and cardiac activity
16
WEARABLES
• Bloomlife
• Can we detect (or predict)
labour onset?
What can we learn?
17
WEARABLES
It’s not just the hardware anymore
• Oura ring
• Cardiac activity (HR/HRV)
• Temperature
• Movement
• Sleep stages
18
WEARABLES
It’s not just the hardware anymore
• Oura ring
• Can we detect (or predict)
an infection?
19
WHAT DO THESE
EXAMPLES HAVE IN
COMMON?
20
WHAT DO THESE
EXAMPLES HAVE IN
COMMON?
• None of these applications were
the original goal of the app or
wearable
21
WHAT DO THESE
EXAMPLES HAVE IN
COMMON?
• None of these applications were
the original goal of the app or
wearable
• User generated data made it
possible
22
WHAT DO THESE
EXAMPLES HAVE IN
COMMON?
• How?
• Contextual data
• Context / confounders /
additional parameters
monitored longitudinally
23
WHAT DO THESE
EXAMPLES HAVE IN
COMMON?
• How?
• Contextual data
• Context / confounders /
additional parameters
monitored longitudinally
• Reference points
• APIs
• Manually reported (e.g.
clinical outcomes)
24
WHAT DO THESE
EXAMPLES HAVE IN
COMMON?
• How?
• Contextual data
• Context / confounders /
additional parameters
monitored longitudinally
• Reference points
• APIs
• Manually reported (e.g.
clinical outcomes)
Let’s take a step back first
TYPICALY STUDY WORKFLOW
26
TYPICAL STUDY
WORKFLOW
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
27
TYPICAL STUDY
WORKFLOW
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants (small N)
28
TYPICAL STUDY
WORKFLOW
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants (small N)
3. Collect high quality data
29
TYPICAL STUDY
WORKFLOW
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants (small N)
3. Collect high quality data
4. Perform data analysis
30
TYPICAL STUDY
WORKFLOW
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants (small N)
3. Collect high quality data
4. Perform data analysis
5. Use the outcome
1. If academic research: write
a paper
2. If company research:
deploy to consumers
31
EXAMPLES
1. Paper: investigate the effect of
training intensity on heart rate
variability (HRV)
2. Product: estimate VO2max
based on physiological data
collected during workouts
EXAMPLE 1: PAPER ON THE EFFECT
OF TRAINING INTENSITY ON HEART
RATE VARIABILITY
33
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
1. Design the study
1. What dependent variables
to track: HRV
2. What independent
variables to track: training
intensity, age, sex, etc.
34
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
1. Design the study
1. What dependent variables
to track: HRV
2. What independent
variables to track: training
intensity, age, sex, etc.
2. Recruit participants (N = 10 male
students)
35
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
1. Design the study
1. What dependent variables
to track: HRV
2. What independent
variables to track: training
intensity, age, sex, etc.
2. Recruit participants (N = 10 male
students)
3. Collect high quality data
36
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
37
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants
3. Collect high quality data
4. Perform data analysis
5. Use the outcome
1. write a paper
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
38
How generalizable is this?
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
39
How generalizable is this?
• What about women?
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
40
How generalizable is this?
• What about women?
• What about different
phases of the menstrual
cycle?
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
41
How generalizable is this?
• What about women?
• What about different
phases of the menstrual
cycle?
• What about people of
different age groups?
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
42
How generalizable is this?
• What about women?
• What about different
phases of the menstrual
cycle?
• What about people of
different age groups?
• What about people with
different health conditions?
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
43
How generalizable is this?
• What about women?
• What about different
phases of the menstrual
cycle?
• What about people of
different age groups?
• What about people with
different health conditions?
• What about different
sports?
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
44
How generalizable is this?
• What about women?
• What about different
phases of the menstrual
cycle?
• What about people of
different age groups?
• What about people with
different health conditions?
• What about different
sports?
Not much
EXAMPLE 1: HEART RATE
VARIAIBLITY IN
RESPONSE TO EXERCISE
INTENSITY
EXAMPLE 2: PRODUCT FOR
VO2MAX ESTIMATION
46
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
47
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
1. Design the study
1. What dependent variables
to track: VO2max as
measured by indirect
calorimetry
2. What independent
variables to track:
• Age, sex, weight, height,
heart rate at a specific
intensity, etc.
48
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants
• We get N = 50
49
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants
• We get N = 50
3. Collect high quality data
50
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
51
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants (small N)
• We get N = 50
3. Collect high quality data
4. Perform data analysis
• Regression model to
estimate VO2max given
predictors
52
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
53
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
1. Design the study
1. What dependent variables
to track
2. What independent
variables to track
2. Recruit participants (small N)
• We get N = 50
3. Collect high quality data
4. Perform data analysis
5. Use the outcome
• Deploy to consumers
54
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
55
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
The real world is more complex:
- What about running on trails
where the relationship between
pace and heart rate changes?
- What about other sports, where
speed is less relevant, for
example cycling?
56
EXAMPLE 2: VO2MAX
ESTIMATION USING
WEARABLES
The real world is more complex:
- What about running on trails
where the relationship between
pace and heart rate changes?
- What about other sports, where
speed is less relevant, for
example cycling?
Also not really generalizable
TYPICAL LIMITATIONS
58
TYPICAL LIMITATIONS
• N = 2-10 in many sport science
studies
59
TYPICAL LIMITATIONS
• N = 2-10 in many sport science
studies
• Results valid only for the specific
sample analyzed
60
TYPICAL LIMITATIONS
• N = 2-10 in many sport science
studies
• Results valid only for the specific
sample analyzed
• What if we want to extend the
analysis?
• We need to run another
study.. (costs, time, etc.)
61
TYPICAL LIMITATIONS
• N = 2-10 in many sport science
studies
• Results valid only for the specific
sample analyzed
• What if we want to extend the
analysis?
• We need to run another
study.. (costs, time, etc.)
• We collected high quality data,
but was it representative of what
happens in real life?
• Come to the lab, don’t eat
or drink coffee, then “relax”
when I tell you to…
A NEW PARADIGM
63
OUTSOURCING DATA
COLLECTION
• In the past 10 years our ability to
run studies and monitor
physiology (and other variables)
outside of the lab has changed
dramatically
64
OUTSOURCING DATA
COLLECTION
• In the past 10 years our ability to
run studies and monitor
physiology (and other variables)
outside of the lab has changed
dramatically
• Phones (+ sensors) make data
acquisition possible anywhere
and at a larger scale
65
OUTSOURCING DATA
COLLECTION
• In the past 10 years our ability to
run studies and monitor
physiology (and other variables)
outside of the lab has changed
dramatically
• Phones (+ sensors) make data
acquisition possible anywhere
and at a larger scale
• More realistic settings,
unforeseen outcomes
66
OUTSOURCING DATA
COLLECTION
• In the past 10 years our ability to
run studies and monitor
physiology (and other variables)
outside of the lab has changed
dramatically
• Phones (+ sensors) make data
acquisition possible anywhere
and at a larger scale
• More realistic settings,
unforeseen outcomes
• Data science infrastructure
allows for cost-effective data
aggregation and analysis
WHAT DO WE NEED
TO GET THIS DONE?
68
THREE KEY STEPS
• Validate (or know the limitations
of) the technology to be
deployed
• Garbage in, garbage out
69
THREE KEY STEPS
• Validate (or know the limitations
of) the technology to be
deployed
• Garbage in, garbage out
• Deploy. Confirm lab-based
insights (if possible)
• Data preparation becomes
the most important step
70
THREE KEY STEPS
• Validate (or know the limitations
of) the technology to be
deployed
• Garbage in, garbage out
• Deploy. Confirm lab-based
insights (if possible)
• Data preparation becomes
the most important step
• Discover new relations, build
new products
71
EXAMPLES
1. Paper: investigate the effect of
training intensity on heart rate
variability (HRV)
2. Product: estimate VO2max
based on physiological data
collected during workouts
EXAMPLE 1: PAPER ON THE EFFECT
OF TRAINING INTENSITY ON HEART
RATE VARIABILITY
73
VALIDATE THE
TECHNOLOGY
Or use a validated tool
• Equivalency between phone
PPG and external ECG:
74
DEPLOY
• Collect data for months in
thousands of people. More than
50 000 measurements included
in the analysis
75
CONFIRM LAB BASED
INSIGHTS
• Reduction in HRV post higher
intensity exercise:
76
FIND NEW RELATIONS /
EXTEND ANALYSIS
• Same relationship in men and
women:
77
FIND NEW RELATIONS /
EXTEND ANALYSIS
• Same relationship in different
age groups:
78
FIND NEW RELATIONS /
EXTEND ANALYSIS
• What else?
79
FIND NEW RELATIONS /
EXTEND ANALYSIS
• What else?
80
FIND NEW RELATIONS /
EXTEND ANALYSIS
• What else?
• Relationship with different
stressors (alcohol, getting
sick, menstrual cycle, etc.)
81
FIND NEW RELATIONS /
EXTEND ANALYSIS
• What else?
• Relationship with different
stressors (alcohol, getting
sick, menstrual cycle, etc.)
• Relationship with different
outcomes (a new
pandemic?)
82
FIND NEW RELATIONS /
EXTEND ANALYSIS
• What else?
• Relationship with different
stressors (alcohol, getting
sick, menstrual cycle, etc.)
• Relationship with different
outcomes (a new
pandemic?)
EXAMPLE 2: PRODUCT FOR
VO2MAX ESTIMATION
84
WHAT IF WE TARGET
CYCLISTS NOW?
• We developed our initial model
thinking like a physiologist
85
WHAT IF WE TARGET
CYCLISTS NOW?
• We developed our initial model
thinking like a physiologist
• We can develop our new model
thinking like a data scientist
86
WHAT IF WE TARGET
CYCLISTS NOW?
• We have deployed our model to
thousands of users. Many are
runners, and are using the
feature
• The user provides as input:
• Anthropometrics
• Workouts from Strava
• The user gets as output the
VO2max estimate
87
VALIDATION
• Only using running data:
88
CONFIRM LAB BASED
INSIGHTS
Or get clever about it
• Estimated VO2max is correlated
to running performance as
derived from Strava workouts:
89
WHAT IF WE TARGET
CYCLISTS NOW?
• For cyclists, we have:
• Heart rate during exercise
• Power during exercise
90
WHAT IF WE TARGET
CYCLISTS NOW?
• For cyclists, we have:
• Heart rate during exercise
• Power during exercise
However, we do not have reference
VO2max data (from the lab) nor
estimated VO2max data (because
we can only estimate from heart
rate and speed)
91
WHAT IF WE TARGET
CYCLISTS NOW?
• For cyclists, we have:
• Heart rate during exercise
• Power during exercise
However, we do not have reference
VO2max data (from the lab) nor
estimated VO2max data (because
we can only estimate from heart
rate and speed)
The missing link: the triathlete
92
WHAT IF WE TARGET
CYCLISTS NOW?
Keep in the dataset only triathletes,
check again VO2max vs running
performance: still works
12 March 2021
Marco Altini, PhD
Twitter: @altini_marco
12 March 2021
Marco Altini, PhD
Twitter: @altini_marco
12 March 2021
Marco Altini, PhD
Twitter: @altini_marco
12 March 2021
Marco Altini, PhD
Twitter: @altini_marco
97
WHAT IF WE TARGET
CYCLISTS NOW?
Build models, predict VO2max
cycling, then validate (leave one
out cross-validation). R = 0.9
98
WHAT IF WE TARGET
CYCLISTS NOW?
Build models, predict VO2max
cycling, then validate (leave one
out cross-validation). R = 0.9
Deploy!
99
IN THIS LECTURE
What’s user generated data?
• Typical study and product
development workflow
• A new paradigm
Challenges and opportunities
• Research and data products
USER GENERATED DATA:
CHALLENGES
101
CHALLENGES
• Data preparation:
102
CHALLENGES
• Data preparation:
• Quality control
• Noisy data
• Missing data
103
CHALLENGES
• Data preparation:
• Quality control
• Noisy data
• Missing data
• Reference data
• What is available?
104
CHALLENGES
• Data preparation:
• Quality control
• Noisy data
• Missing data
• Reference data
• What is available?
• Data engineering (not covered
today)
QUALITY CONTROL
106
NOISY DATA
• Data collected from wearables
and apps is extremely noisy
• Inaccurate very often
• Typically no signal quality
metric is reported (think
about heart rate)
107
NOISY DATA
• Data collected from wearables
and apps is extremely noisy
• Inaccurate very often
• Typically no signal quality
metric is reported (think
about heart rate)
How do we deal with it?
108
NOISY DATA
Example: training intensity based
on heart rate
109
NOISY DATA
Example: training intensity based
on heart rate. To determine a
relative intensity, we need users'
maximal heart rate
110
NOISY DATA
Example: training intensity based
on heart rate. To determine a
relative intensity, we need users'
maximal heart rate
No lab tests. So we need to make
some assumptions:
111
NOISY DATA
Example: training intensity based
on heart rate. To determine a
relative intensity, we need users'
maximal heart rate
No lab tests. So we need to make
some assumptions:
• There will be some hard sessions
during the period we monitor
(hence it needs to be long
enough)
112
NOISY DATA
Here is data from 500 people,
including heart rates above 300
bpm (or below 100 bpm):
113
NOISY DATA
Data for one person
We can use simple statistical
methods to try to approximate this
person’s max heart rate
114
NOISY DATA
Data for one person
We can use simple statistical
methods to try to approximate this
person’s max heart rate
But did they ever go hard?
115
NOISY DATA
Estimated max heart rate:
116
MISSING DATA
Same example as before. What if:
• We don’t have any hard effort
• Workouts are missing
117
MISSING DATA
Same example as before. What if:
• We don’t have any hard effort
• Workouts are missing
We can sometime ignore or remove
individuals with missing data (we
have a lot of data after all) but this
could introduce a bias (we do not
have the full picture)
118
MISSING DATA
Same example as before. What if:
• We don’t have any hard effort
• Workouts are missing
We can sometime ignore or remove
individuals with missing data (we
have a lot of data after all) but this
could introduce a bias (we do not
have the full picture)
No universal answer, think critically
119
QUALITY CONTROL
• Only a fraction of the collected
data will be usable
• It is key to define methods
to keep track of what data
to trust, automatically, and
to clean the data
120
QUALITY CONTROL
• Only a fraction of the collected
data will be usable
• It is key to define methods
to keep track of what data
to trust, automatically, and
to clean the data
• Trade offs
• It is never enough data
anyways (you can always
do one more stratification)
REFERENCE DATA
122
REFERENCE DATA
One of the biggest challenges with
user generated data is lack of
reference data
123
REFERENCE DATA
One of the biggest challenges with
user generated data is lack of
reference data
Users don’t come to the lab for tests
or report outcomes that are key for
model development
What could help you in the future?
• Tags / annotations / APIs
124
REFERENCE DATA
COVID example
125
REFERENCE DATA
COVID example:
• When was the test done?
126
REFERENCE DATA
COVID example:
• When was the test done?
• Was it even done?
127
REFERENCE DATA
COVID example:
• When was the test done?
• Was it even done?
• Does it even matter?
128
REFERENCE DATA
COVID example:
• When was the test done?
• Was it even done?
• Does it even matter? Maybe they
were already infected earlier
with no / mild symptoms
129
REFERENCE DATA
• Not all collected data becomes
valuable research or enables
future data products. Much of it
has to do with reference data:
130
REFERENCE DATA
• Not all collected data becomes
valuable research or enables
future data products. Much of it
has to do with reference data:
• What are the outcomes?
• Can we track them?
131
REFERENCE DATA
• Not all collected data becomes
valuable research or enables
future data products. Much of it
has to do with reference data:
• What are the outcomes?
• Can we track them?
• Are we asking too much to
the user?
• Not a clinical study
• What can we do about it?
• Is it ethical to collect them?
USER GENERATED DATA:
OPPORTUNITIES
133
OPPORTUNITIES
• Large scale
• Insights that we cannot
sometimes even aim at in the lab
• New guidelines
• New products
134
WHAT HAPPENED
DURING THE PANDEMIC?
135
WHAT HAPPENED
DURING THE PANDEMIC?
136
WHAT HAPPENED
DURING THE PANDEMIC?
5500 people, 3 months of data per
person, half a million
measurements:
137
WHAT HAPPENED
DURING THE PANDEMIC?
Why? Travel, sleep, etc.
138
WHAT HAPPENED
DURING THE PANDEMIC?
Why? Travel, sleep, etc.
139
LIMITATIONS STILL APPLY
Who are we talking about?
Does this really generalize?
140
OPPORTUNITIES
• Large scale
• Insights that we cannot
sometimes even aim at in the lab
• New guidelines
• New products
141
COVID INFECTION
AND HRV, HR
142
COVID INFECTION
AND HRV, HR
Can we build a predictive model?
143
LIMITATIONS STILL APPLY
What about the flu?
144
LIMITATIONS STILL APPLY
What about the flu?
Can we just distinguish healthy vs
an infection or can we distinguish
infection type?
145
LIMITATIONS STILL APPLY
It’s easy to get fooled by the data
Think critically
146
OPPORTUNITIES
Large scale
Insights that we cannot sometimes
even aim at in the lab
New guidelines
New products
147
WHAT‘S OPTIMAL BLOOD
GLUCOSE?
148
WHAT‘S OPTIMAL BLOOD
GLUCOSE?
149
WHAT‘S OPTIMAL BLOOD
GLUCOSE?
150
WHAT‘S OPTIMAL BLOOD
GLUCOSE?
151
OPPORTUNITIES
Large scale
Insights that we cannot sometimes
even aim at in the lab
New guidelines
New products
152
ESTIMATING RUNNING
PERFORMANCE
One option could be to get a few
people on a treadmill in the lab,
and have them run a time trial
153
ESTIMATING RUNNING
PERFORMANCE
One option could be to get a few
people on a treadmill in the lab,
and have them run a time trial
Or, we could grab workouts from
apps like Strava, analyze training
patterns antecedent to their e.g.
best 10 km performance over a
year or so and build a model
154
ESTIMATING RUNNING
PERFORMANCE
N = 2100
RMSE = 2 minutes (4%)
155
ESTIMATING RUNNING
PERFORMANCE
156
ESTIMATING RUNNING
PERFORMANCE
THAT’S A WRAP
158
USER GENERATED DATA
• Not everything is (or can be) a
data product
159
USER GENERATED DATA
• Not everything is (or can be) a
data product
• Often data is collected but not
used in any meaningful way, no
value created (either for the
company or the user)
160
USER GENERATED DATA
• Not everything is (or can be) a
data product
• Often data is collected but not
used in any meaningful way, no
value created (either for the
company or the user)
• Reference points are key, you
can have unlimited data and still
have no use for it
161
USER GENERATED DATA
• Not everything is (or can be) a
data product
• Often data is collected but not
used in any meaningful way, no
value created (either for the
company or the user)
• Reference points are key, you
can have unlimited data and still
have no use for it
• More research is being carried
out using consumer products
162
USER GENERATED DATA
• Not everything is (or can be) a
data product
• Often data is collected but not
used in any meaningful way, no
value created (either for the
company or the user)
• Reference points are key, you
can have unlimited data and still
have no use for it
• More research is being carried
out using consumer products
• Think critically about reference
points, data preparation, and
other challenges (estimated vs
measured)
12 March 2021
Marco Altini, PhD
Twitter: @altini_marco
USER GENERATED DATA: A
PARADIGM SHIFT FOR RESEARCH
AND DATA PRODUCTS

Mais conteúdo relacionado

Semelhante a User generated data: a paradigm shift for research and data products

Presentation how to write a research protocol
Presentation how to write a research protocolPresentation how to write a research protocol
Presentation how to write a research protocol
Sushma Sharma
 
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET
 
Chapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docxChapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docx
keturahhazelhurst
 
Study design2 6_07
Study design2 6_07Study design2 6_07
Study design2 6_07
Dan Fisher
 
EBM_2016.pdf
EBM_2016.pdfEBM_2016.pdf
EBM_2016.pdf
mernahazazah
 

Semelhante a User generated data: a paradigm shift for research and data products (20)

Presentation how to write a research protocol
Presentation how to write a research protocolPresentation how to write a research protocol
Presentation how to write a research protocol
 
Critical appraisal of published article
Critical appraisal of published articleCritical appraisal of published article
Critical appraisal of published article
 
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionals
 
The Research Process
The Research ProcessThe Research Process
The Research Process
 
Data collection
Data collectionData collection
Data collection
 
Hlt 362 v Enhance teaching-snaptutorial.com
Hlt 362 v  Enhance teaching-snaptutorial.comHlt 362 v  Enhance teaching-snaptutorial.com
Hlt 362 v Enhance teaching-snaptutorial.com
 
2022_Fried_Workshop_theory_measurement.pptx
2022_Fried_Workshop_theory_measurement.pptx2022_Fried_Workshop_theory_measurement.pptx
2022_Fried_Workshop_theory_measurement.pptx
 
Hlt 362 v Believe Possibilities / snaptutorial.com
Hlt 362 v  Believe Possibilities / snaptutorial.comHlt 362 v  Believe Possibilities / snaptutorial.com
Hlt 362 v Believe Possibilities / snaptutorial.com
 
Chapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docxChapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docx
 
4 Epidemiological Study Designs 1.pdf
4 Epidemiological Study Designs 1.pdf4 Epidemiological Study Designs 1.pdf
4 Epidemiological Study Designs 1.pdf
 
Study design2 6_07
Study design2 6_07Study design2 6_07
Study design2 6_07
 
Hnc research methodology
Hnc research methodologyHnc research methodology
Hnc research methodology
 
EBM_2016.pdf
EBM_2016.pdfEBM_2016.pdf
EBM_2016.pdf
 
How American Students Conduct their Academic Research and Writing?
How American Students Conduct their Academic Research and Writing?How American Students Conduct their Academic Research and Writing?
How American Students Conduct their Academic Research and Writing?
 
Hlt 362 v Exceptional Education / snaptutorial.com
Hlt 362 v Exceptional Education / snaptutorial.comHlt 362 v Exceptional Education / snaptutorial.com
Hlt 362 v Exceptional Education / snaptutorial.com
 
Introduction to clinical research
Introduction to clinical researchIntroduction to clinical research
Introduction to clinical research
 
HLT 362V Education Organization - snaptutorial.com
HLT 362V  Education Organization - snaptutorial.comHLT 362V  Education Organization - snaptutorial.com
HLT 362V Education Organization - snaptutorial.com
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 
03 week 2 What Is the Question.pptx
03 week 2 What Is the Question.pptx03 week 2 What Is the Question.pptx
03 week 2 What Is the Question.pptx
 

Mais de Marco Altini

Mais de Marco Altini (9)

Analisi della variabilità cardiaca (HRV)
Analisi della variabilità cardiaca (HRV)Analisi della variabilità cardiaca (HRV)
Analisi della variabilità cardiaca (HRV)
 
Estimating Running Performance Combining Non-invasive Physiological Measureme...
Estimating Running Performance Combining Non-invasive Physiological Measureme...Estimating Running Performance Combining Non-invasive Physiological Measureme...
Estimating Running Performance Combining Non-invasive Physiological Measureme...
 
Towards Non-invasive Labour Detection: A Free- Living Evaluation
Towards Non-invasive Labour Detection: A Free- Living EvaluationTowards Non-invasive Labour Detection: A Free- Living Evaluation
Towards Non-invasive Labour Detection: A Free- Living Evaluation
 
Talk at the International Conference on Biomedical and Health Informatics (BH...
Talk at the International Conference on Biomedical and Health Informatics (BH...Talk at the International Conference on Biomedical and Health Informatics (BH...
Talk at the International Conference on Biomedical and Health Informatics (BH...
 
Talk at the International Conference on Biomedical and Health Informatics (BH...
Talk at the International Conference on Biomedical and Health Informatics (BH...Talk at the International Conference on Biomedical and Health Informatics (BH...
Talk at the International Conference on Biomedical and Health Informatics (BH...
 
Heart Rate Variability Logger - Quick Start Guide
Heart Rate Variability Logger - Quick Start GuideHeart Rate Variability Logger - Quick Start Guide
Heart Rate Variability Logger - Quick Start Guide
 
Demonstration paper - Personalized Physical Activity Monitoring on the Move
Demonstration paper - Personalized Physical Activity Monitoring on the MoveDemonstration paper - Personalized Physical Activity Monitoring on the Move
Demonstration paper - Personalized Physical Activity Monitoring on the Move
 
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
Body Weight-Normalized Energy Expenditure Estimation Using Combined Activity ...
 
Personalizing Energy Expenditure Estimation Using a Cardiorespiratory Fitness...
Personalizing Energy Expenditure Estimation Using a Cardiorespiratory Fitness...Personalizing Energy Expenditure Estimation Using a Cardiorespiratory Fitness...
Personalizing Energy Expenditure Estimation Using a Cardiorespiratory Fitness...
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

User generated data: a paradigm shift for research and data products

  • 1. 12 March 2021 Marco Altini, PhD Twitter: @altini_marco USER GENERATED DATA: A PARADIGM SHIFT FOR RESEARCH AND DATA PRODUCTS
  • 2. 2 Marco Altini • PhD cum laude in Machine Learning • MSc cum laude in Computer Science Engineering • MSc cum laude in Human Movement Sciences, High Performance Coaching • Founder of HRV4Training (2013) • Data Science Advisor at Oura • Guest Lecturer at VU Amsterdam • 50+ publications at the intersection between technology, health and performance
  • 3. 3 IN THIS LECTURE What’s user generated data? • Typical study and product development workflow • A new paradigm
  • 4. 4 IN THIS LECTURE What’s user generated data? • Typical study and product development workflow • A new paradigm Challenges and opportunities • Research and data products
  • 5. 5 IN THIS LECTURE What’s user generated data? • Typical study and product development workflow • A new paradigm Challenges and opportunities • Research and data products All examples will be considering health and sport science applications
  • 7. 7 MANY TYPES OF DATA Content created by users of a product
  • 8. 8 MANY TYPES OF DATA Content created by users of a product Here we focus on sport and health: • Wearables • Phones
  • 9. WHY DOES USER GENERATED DATA MATTER?
  • 10. 10 DATA SCIENCE As data scientists we can find new clever ways to create value based on the data collected: • Research • New features • New products • New insights
  • 11. 11 DATA SCIENCE As data scientists we can find new clever ways to create value based on the data collected: • Research • New features • New products • New insights User generated data opens new opportunities due to larger sample size, realistic settings, unforeseen outcomes
  • 13. 13 APPS How can we create value for our customers using data? • HRV4Training • Cardiac activity (HR/HRV) • Context
  • 14. 14 APPS How can we create value for our customers using data? • HRV4Training • Identify / manage stressors
  • 15. 15 WEARABLES What can we learn? • Bloomlife • Uterine and cardiac activity
  • 16. 16 WEARABLES • Bloomlife • Can we detect (or predict) labour onset? What can we learn?
  • 17. 17 WEARABLES It’s not just the hardware anymore • Oura ring • Cardiac activity (HR/HRV) • Temperature • Movement • Sleep stages
  • 18. 18 WEARABLES It’s not just the hardware anymore • Oura ring • Can we detect (or predict) an infection?
  • 19. 19 WHAT DO THESE EXAMPLES HAVE IN COMMON?
  • 20. 20 WHAT DO THESE EXAMPLES HAVE IN COMMON? • None of these applications were the original goal of the app or wearable
  • 21. 21 WHAT DO THESE EXAMPLES HAVE IN COMMON? • None of these applications were the original goal of the app or wearable • User generated data made it possible
  • 22. 22 WHAT DO THESE EXAMPLES HAVE IN COMMON? • How? • Contextual data • Context / confounders / additional parameters monitored longitudinally
  • 23. 23 WHAT DO THESE EXAMPLES HAVE IN COMMON? • How? • Contextual data • Context / confounders / additional parameters monitored longitudinally • Reference points • APIs • Manually reported (e.g. clinical outcomes)
  • 24. 24 WHAT DO THESE EXAMPLES HAVE IN COMMON? • How? • Contextual data • Context / confounders / additional parameters monitored longitudinally • Reference points • APIs • Manually reported (e.g. clinical outcomes) Let’s take a step back first
  • 26. 26 TYPICAL STUDY WORKFLOW 1. Design the study 1. What dependent variables to track 2. What independent variables to track
  • 27. 27 TYPICAL STUDY WORKFLOW 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants (small N)
  • 28. 28 TYPICAL STUDY WORKFLOW 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants (small N) 3. Collect high quality data
  • 29. 29 TYPICAL STUDY WORKFLOW 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants (small N) 3. Collect high quality data 4. Perform data analysis
  • 30. 30 TYPICAL STUDY WORKFLOW 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants (small N) 3. Collect high quality data 4. Perform data analysis 5. Use the outcome 1. If academic research: write a paper 2. If company research: deploy to consumers
  • 31. 31 EXAMPLES 1. Paper: investigate the effect of training intensity on heart rate variability (HRV) 2. Product: estimate VO2max based on physiological data collected during workouts
  • 32. EXAMPLE 1: PAPER ON THE EFFECT OF TRAINING INTENSITY ON HEART RATE VARIABILITY
  • 33. 33 EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY 1. Design the study 1. What dependent variables to track: HRV 2. What independent variables to track: training intensity, age, sex, etc.
  • 34. 34 EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY 1. Design the study 1. What dependent variables to track: HRV 2. What independent variables to track: training intensity, age, sex, etc. 2. Recruit participants (N = 10 male students)
  • 35. 35 EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY 1. Design the study 1. What dependent variables to track: HRV 2. What independent variables to track: training intensity, age, sex, etc. 2. Recruit participants (N = 10 male students) 3. Collect high quality data
  • 36. 36 EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 37. 37 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants 3. Collect high quality data 4. Perform data analysis 5. Use the outcome 1. write a paper EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 38. 38 How generalizable is this? EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 39. 39 How generalizable is this? • What about women? EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 40. 40 How generalizable is this? • What about women? • What about different phases of the menstrual cycle? EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 41. 41 How generalizable is this? • What about women? • What about different phases of the menstrual cycle? • What about people of different age groups? EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 42. 42 How generalizable is this? • What about women? • What about different phases of the menstrual cycle? • What about people of different age groups? • What about people with different health conditions? EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 43. 43 How generalizable is this? • What about women? • What about different phases of the menstrual cycle? • What about people of different age groups? • What about people with different health conditions? • What about different sports? EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 44. 44 How generalizable is this? • What about women? • What about different phases of the menstrual cycle? • What about people of different age groups? • What about people with different health conditions? • What about different sports? Not much EXAMPLE 1: HEART RATE VARIAIBLITY IN RESPONSE TO EXERCISE INTENSITY
  • 45. EXAMPLE 2: PRODUCT FOR VO2MAX ESTIMATION
  • 46. 46 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES 1. Design the study 1. What dependent variables to track 2. What independent variables to track
  • 47. 47 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES 1. Design the study 1. What dependent variables to track: VO2max as measured by indirect calorimetry 2. What independent variables to track: • Age, sex, weight, height, heart rate at a specific intensity, etc.
  • 48. 48 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants • We get N = 50
  • 49. 49 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants • We get N = 50 3. Collect high quality data
  • 51. 51 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants (small N) • We get N = 50 3. Collect high quality data 4. Perform data analysis • Regression model to estimate VO2max given predictors
  • 53. 53 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES 1. Design the study 1. What dependent variables to track 2. What independent variables to track 2. Recruit participants (small N) • We get N = 50 3. Collect high quality data 4. Perform data analysis 5. Use the outcome • Deploy to consumers
  • 55. 55 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES The real world is more complex: - What about running on trails where the relationship between pace and heart rate changes? - What about other sports, where speed is less relevant, for example cycling?
  • 56. 56 EXAMPLE 2: VO2MAX ESTIMATION USING WEARABLES The real world is more complex: - What about running on trails where the relationship between pace and heart rate changes? - What about other sports, where speed is less relevant, for example cycling? Also not really generalizable
  • 58. 58 TYPICAL LIMITATIONS • N = 2-10 in many sport science studies
  • 59. 59 TYPICAL LIMITATIONS • N = 2-10 in many sport science studies • Results valid only for the specific sample analyzed
  • 60. 60 TYPICAL LIMITATIONS • N = 2-10 in many sport science studies • Results valid only for the specific sample analyzed • What if we want to extend the analysis? • We need to run another study.. (costs, time, etc.)
  • 61. 61 TYPICAL LIMITATIONS • N = 2-10 in many sport science studies • Results valid only for the specific sample analyzed • What if we want to extend the analysis? • We need to run another study.. (costs, time, etc.) • We collected high quality data, but was it representative of what happens in real life? • Come to the lab, don’t eat or drink coffee, then “relax” when I tell you to…
  • 63. 63 OUTSOURCING DATA COLLECTION • In the past 10 years our ability to run studies and monitor physiology (and other variables) outside of the lab has changed dramatically
  • 64. 64 OUTSOURCING DATA COLLECTION • In the past 10 years our ability to run studies and monitor physiology (and other variables) outside of the lab has changed dramatically • Phones (+ sensors) make data acquisition possible anywhere and at a larger scale
  • 65. 65 OUTSOURCING DATA COLLECTION • In the past 10 years our ability to run studies and monitor physiology (and other variables) outside of the lab has changed dramatically • Phones (+ sensors) make data acquisition possible anywhere and at a larger scale • More realistic settings, unforeseen outcomes
  • 66. 66 OUTSOURCING DATA COLLECTION • In the past 10 years our ability to run studies and monitor physiology (and other variables) outside of the lab has changed dramatically • Phones (+ sensors) make data acquisition possible anywhere and at a larger scale • More realistic settings, unforeseen outcomes • Data science infrastructure allows for cost-effective data aggregation and analysis
  • 67. WHAT DO WE NEED TO GET THIS DONE?
  • 68. 68 THREE KEY STEPS • Validate (or know the limitations of) the technology to be deployed • Garbage in, garbage out
  • 69. 69 THREE KEY STEPS • Validate (or know the limitations of) the technology to be deployed • Garbage in, garbage out • Deploy. Confirm lab-based insights (if possible) • Data preparation becomes the most important step
  • 70. 70 THREE KEY STEPS • Validate (or know the limitations of) the technology to be deployed • Garbage in, garbage out • Deploy. Confirm lab-based insights (if possible) • Data preparation becomes the most important step • Discover new relations, build new products
  • 71. 71 EXAMPLES 1. Paper: investigate the effect of training intensity on heart rate variability (HRV) 2. Product: estimate VO2max based on physiological data collected during workouts
  • 72. EXAMPLE 1: PAPER ON THE EFFECT OF TRAINING INTENSITY ON HEART RATE VARIABILITY
  • 73. 73 VALIDATE THE TECHNOLOGY Or use a validated tool • Equivalency between phone PPG and external ECG:
  • 74. 74 DEPLOY • Collect data for months in thousands of people. More than 50 000 measurements included in the analysis
  • 75. 75 CONFIRM LAB BASED INSIGHTS • Reduction in HRV post higher intensity exercise:
  • 76. 76 FIND NEW RELATIONS / EXTEND ANALYSIS • Same relationship in men and women:
  • 77. 77 FIND NEW RELATIONS / EXTEND ANALYSIS • Same relationship in different age groups:
  • 78. 78 FIND NEW RELATIONS / EXTEND ANALYSIS • What else?
  • 79. 79 FIND NEW RELATIONS / EXTEND ANALYSIS • What else?
  • 80. 80 FIND NEW RELATIONS / EXTEND ANALYSIS • What else? • Relationship with different stressors (alcohol, getting sick, menstrual cycle, etc.)
  • 81. 81 FIND NEW RELATIONS / EXTEND ANALYSIS • What else? • Relationship with different stressors (alcohol, getting sick, menstrual cycle, etc.) • Relationship with different outcomes (a new pandemic?)
  • 82. 82 FIND NEW RELATIONS / EXTEND ANALYSIS • What else? • Relationship with different stressors (alcohol, getting sick, menstrual cycle, etc.) • Relationship with different outcomes (a new pandemic?)
  • 83. EXAMPLE 2: PRODUCT FOR VO2MAX ESTIMATION
  • 84. 84 WHAT IF WE TARGET CYCLISTS NOW? • We developed our initial model thinking like a physiologist
  • 85. 85 WHAT IF WE TARGET CYCLISTS NOW? • We developed our initial model thinking like a physiologist • We can develop our new model thinking like a data scientist
  • 86. 86 WHAT IF WE TARGET CYCLISTS NOW? • We have deployed our model to thousands of users. Many are runners, and are using the feature • The user provides as input: • Anthropometrics • Workouts from Strava • The user gets as output the VO2max estimate
  • 88. 88 CONFIRM LAB BASED INSIGHTS Or get clever about it • Estimated VO2max is correlated to running performance as derived from Strava workouts:
  • 89. 89 WHAT IF WE TARGET CYCLISTS NOW? • For cyclists, we have: • Heart rate during exercise • Power during exercise
  • 90. 90 WHAT IF WE TARGET CYCLISTS NOW? • For cyclists, we have: • Heart rate during exercise • Power during exercise However, we do not have reference VO2max data (from the lab) nor estimated VO2max data (because we can only estimate from heart rate and speed)
  • 91. 91 WHAT IF WE TARGET CYCLISTS NOW? • For cyclists, we have: • Heart rate during exercise • Power during exercise However, we do not have reference VO2max data (from the lab) nor estimated VO2max data (because we can only estimate from heart rate and speed) The missing link: the triathlete
  • 92. 92 WHAT IF WE TARGET CYCLISTS NOW? Keep in the dataset only triathletes, check again VO2max vs running performance: still works
  • 93. 12 March 2021 Marco Altini, PhD Twitter: @altini_marco
  • 94. 12 March 2021 Marco Altini, PhD Twitter: @altini_marco
  • 95. 12 March 2021 Marco Altini, PhD Twitter: @altini_marco
  • 96. 12 March 2021 Marco Altini, PhD Twitter: @altini_marco
  • 97. 97 WHAT IF WE TARGET CYCLISTS NOW? Build models, predict VO2max cycling, then validate (leave one out cross-validation). R = 0.9
  • 98. 98 WHAT IF WE TARGET CYCLISTS NOW? Build models, predict VO2max cycling, then validate (leave one out cross-validation). R = 0.9 Deploy!
  • 99. 99 IN THIS LECTURE What’s user generated data? • Typical study and product development workflow • A new paradigm Challenges and opportunities • Research and data products
  • 102. 102 CHALLENGES • Data preparation: • Quality control • Noisy data • Missing data
  • 103. 103 CHALLENGES • Data preparation: • Quality control • Noisy data • Missing data • Reference data • What is available?
  • 104. 104 CHALLENGES • Data preparation: • Quality control • Noisy data • Missing data • Reference data • What is available? • Data engineering (not covered today)
  • 106. 106 NOISY DATA • Data collected from wearables and apps is extremely noisy • Inaccurate very often • Typically no signal quality metric is reported (think about heart rate)
  • 107. 107 NOISY DATA • Data collected from wearables and apps is extremely noisy • Inaccurate very often • Typically no signal quality metric is reported (think about heart rate) How do we deal with it?
  • 108. 108 NOISY DATA Example: training intensity based on heart rate
  • 109. 109 NOISY DATA Example: training intensity based on heart rate. To determine a relative intensity, we need users' maximal heart rate
  • 110. 110 NOISY DATA Example: training intensity based on heart rate. To determine a relative intensity, we need users' maximal heart rate No lab tests. So we need to make some assumptions:
  • 111. 111 NOISY DATA Example: training intensity based on heart rate. To determine a relative intensity, we need users' maximal heart rate No lab tests. So we need to make some assumptions: • There will be some hard sessions during the period we monitor (hence it needs to be long enough)
  • 112. 112 NOISY DATA Here is data from 500 people, including heart rates above 300 bpm (or below 100 bpm):
  • 113. 113 NOISY DATA Data for one person We can use simple statistical methods to try to approximate this person’s max heart rate
  • 114. 114 NOISY DATA Data for one person We can use simple statistical methods to try to approximate this person’s max heart rate But did they ever go hard?
  • 116. 116 MISSING DATA Same example as before. What if: • We don’t have any hard effort • Workouts are missing
  • 117. 117 MISSING DATA Same example as before. What if: • We don’t have any hard effort • Workouts are missing We can sometime ignore or remove individuals with missing data (we have a lot of data after all) but this could introduce a bias (we do not have the full picture)
  • 118. 118 MISSING DATA Same example as before. What if: • We don’t have any hard effort • Workouts are missing We can sometime ignore or remove individuals with missing data (we have a lot of data after all) but this could introduce a bias (we do not have the full picture) No universal answer, think critically
  • 119. 119 QUALITY CONTROL • Only a fraction of the collected data will be usable • It is key to define methods to keep track of what data to trust, automatically, and to clean the data
  • 120. 120 QUALITY CONTROL • Only a fraction of the collected data will be usable • It is key to define methods to keep track of what data to trust, automatically, and to clean the data • Trade offs • It is never enough data anyways (you can always do one more stratification)
  • 122. 122 REFERENCE DATA One of the biggest challenges with user generated data is lack of reference data
  • 123. 123 REFERENCE DATA One of the biggest challenges with user generated data is lack of reference data Users don’t come to the lab for tests or report outcomes that are key for model development What could help you in the future? • Tags / annotations / APIs
  • 125. 125 REFERENCE DATA COVID example: • When was the test done?
  • 126. 126 REFERENCE DATA COVID example: • When was the test done? • Was it even done?
  • 127. 127 REFERENCE DATA COVID example: • When was the test done? • Was it even done? • Does it even matter?
  • 128. 128 REFERENCE DATA COVID example: • When was the test done? • Was it even done? • Does it even matter? Maybe they were already infected earlier with no / mild symptoms
  • 129. 129 REFERENCE DATA • Not all collected data becomes valuable research or enables future data products. Much of it has to do with reference data:
  • 130. 130 REFERENCE DATA • Not all collected data becomes valuable research or enables future data products. Much of it has to do with reference data: • What are the outcomes? • Can we track them?
  • 131. 131 REFERENCE DATA • Not all collected data becomes valuable research or enables future data products. Much of it has to do with reference data: • What are the outcomes? • Can we track them? • Are we asking too much to the user? • Not a clinical study • What can we do about it? • Is it ethical to collect them?
  • 133. 133 OPPORTUNITIES • Large scale • Insights that we cannot sometimes even aim at in the lab • New guidelines • New products
  • 136. 136 WHAT HAPPENED DURING THE PANDEMIC? 5500 people, 3 months of data per person, half a million measurements:
  • 137. 137 WHAT HAPPENED DURING THE PANDEMIC? Why? Travel, sleep, etc.
  • 138. 138 WHAT HAPPENED DURING THE PANDEMIC? Why? Travel, sleep, etc.
  • 139. 139 LIMITATIONS STILL APPLY Who are we talking about? Does this really generalize?
  • 140. 140 OPPORTUNITIES • Large scale • Insights that we cannot sometimes even aim at in the lab • New guidelines • New products
  • 142. 142 COVID INFECTION AND HRV, HR Can we build a predictive model?
  • 144. 144 LIMITATIONS STILL APPLY What about the flu? Can we just distinguish healthy vs an infection or can we distinguish infection type?
  • 145. 145 LIMITATIONS STILL APPLY It’s easy to get fooled by the data Think critically
  • 146. 146 OPPORTUNITIES Large scale Insights that we cannot sometimes even aim at in the lab New guidelines New products
  • 151. 151 OPPORTUNITIES Large scale Insights that we cannot sometimes even aim at in the lab New guidelines New products
  • 152. 152 ESTIMATING RUNNING PERFORMANCE One option could be to get a few people on a treadmill in the lab, and have them run a time trial
  • 153. 153 ESTIMATING RUNNING PERFORMANCE One option could be to get a few people on a treadmill in the lab, and have them run a time trial Or, we could grab workouts from apps like Strava, analyze training patterns antecedent to their e.g. best 10 km performance over a year or so and build a model
  • 154. 154 ESTIMATING RUNNING PERFORMANCE N = 2100 RMSE = 2 minutes (4%)
  • 158. 158 USER GENERATED DATA • Not everything is (or can be) a data product
  • 159. 159 USER GENERATED DATA • Not everything is (or can be) a data product • Often data is collected but not used in any meaningful way, no value created (either for the company or the user)
  • 160. 160 USER GENERATED DATA • Not everything is (or can be) a data product • Often data is collected but not used in any meaningful way, no value created (either for the company or the user) • Reference points are key, you can have unlimited data and still have no use for it
  • 161. 161 USER GENERATED DATA • Not everything is (or can be) a data product • Often data is collected but not used in any meaningful way, no value created (either for the company or the user) • Reference points are key, you can have unlimited data and still have no use for it • More research is being carried out using consumer products
  • 162. 162 USER GENERATED DATA • Not everything is (or can be) a data product • Often data is collected but not used in any meaningful way, no value created (either for the company or the user) • Reference points are key, you can have unlimited data and still have no use for it • More research is being carried out using consumer products • Think critically about reference points, data preparation, and other challenges (estimated vs measured)
  • 163. 12 March 2021 Marco Altini, PhD Twitter: @altini_marco USER GENERATED DATA: A PARADIGM SHIFT FOR RESEARCH AND DATA PRODUCTS