SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Machine Learning for COVID-19 targeted testing
A risk assessment tool that optimises the use of COVID19 tests in order to implement fact-based strategies for deconfinement.
A project supported by Compellio S.A.
15, côte d'Eich
L-1450 Luxembourg
https://compell.io (https://compell.io)
Contributors
All project contributors worked voluntarily in this project.
Christos Avrilionis, Credit Risk and Governance Manager, PayPal
Theo Papasternos, Business Manager, Compellio
Denis Avrilionis, CEO, Compellio
Vivi Tzekou, Machine Learning / Full-Stack Developer
Yuri Visovsiouk. Full-Stack Developer
Christina Dimopoulou, Business Operations, Compellio
Disclaimer: the opinions expressed in this publication are those of theauthors. They do not express the opinions of any entity whatsoever with
which they are affiliated.
Contacts
We would be delighted to further discuss this project with you. You can directly reach us at hello@compell.io (mailto:hello@compell.io).
The way forward
We are currently looking to join forces with governments, health organisations, laboratories, and pharmaceuticals. Interested parties can fill in the
partnership form found on the project’s website: https://covid19smartscreeningtool.launchaco.com (https://covid19smartscreeningtool.launchaco.com)
Stay healthy,
The COVID19 Smart Screening Tool Team
Hacking for #EUvsVirus
Overview
The project aims at the development and the deployment of a software platform that would:
Allow an individual to fill-in an electronic questionnaire with health and demographic information, securing and protecting sensitive personal data
using Compellio’s blockchain- enabled registry technology
Predict the likelihood of positive Covid-19 diagnosis of an individual at a given point in time using machine learning (ML) and artificial intelligence (AI)
Enable policy makers to build an optimal Covid-19 exit strategy based on the targeted use of Covid-19 tests of high-risk individuals
This solution will be implemented in 2 phases.
1. Phase 1: Data collection and modelling
a. Design the questionnaire
b. Collect medical and demographic data of a person when that person takes a test for Covid-19 using the questionnaire from step 1.a
c. Link the test’s outcome (Covid-19 positive or negative) to the data collected in step 1.b
d. Build a machine learning model on data from step 1.c
2. Phase 2: Deployment and general availability
a. Use the model from step 1.d to generate a prediction of Covid-19 positiveness of any person
b. Target Covid-19 tests for persons having a high likelihood of positive Covid-19 diagnosis
c. Link the test result (Covid-19 positive or negative) to the prediction calculated in step 2.a
d. Monitor model performance and fine-tune the machine learning model built in step 1.d
This document illustrates phases 1.b, 1.c, 1.d, 2.a and 2.b using simulated data
Phase 1.b.
Collect medical and demographic data of a person when that person takes a test for Covid-
19 using the questionnaire
As a result of phase 1.a, let's assume that we have a questionnaire of 23 questions about medical and demographic information.
For the purpose of this illustration, let's assume that:
Each question is referred to as q1, q2, ..., q23
Each patient has a unique identifier from 1 to 1000
The questionnaire was proposed to 1000 patients as part of the Covid-19 testing procedure
All patients answered all the questions
Each answer to each question is a continuous variable (this can be extended to categorical variables as well)
The output of phase 1.b. is a table similar to this (first 20 patients shown):
In [7]: df_X.head(20)
Out[7]:
q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14
Patient
ID
1 0.86360 -1.00490 -0.32880 0.63232 -0.76382 2.48937 0.30792 0.69260 0.87849 0.39463 1.00259 0.78609 -0.17739 0.77910
2 0.16313 -0.24986 0.13275 -1.50085 -2.92733 0.20983 0.66074 -0.65538 0.11255 0.59372 0.88890 -0.50316 0.25568 0.76523
3 -0.73682 -0.55274 -1.18046 -0.42813 -1.34706 -0.50470 0.23098 2.24479 -0.68447 -0.19708 -1.26557 -3.01053 0.82140 -0.71519
4 -1.04279 -0.65561 -0.76028 0.25715 -0.35681 -2.86298 -2.25225 -5.07991 0.63268 1.13516 0.00391 1.42961 0.43997 -1.27914
5 -0.51371 0.50819 -0.38651 -1.34063 1.36827 -0.89227 1.27378 -0.07773 0.77796 -0.47807 -2.04526 1.64837 -0.67013 1.54823
6 -2.55599 2.15763 -0.41735 -0.45903 -0.99303 0.68273 1.86696 0.92724 -0.51942 0.97351 1.01371 0.12639 -0.88108 0.90742
7 0.25446 -0.66239 0.14898 -1.14714 0.26427 -0.12783 -0.13116 0.68001 -0.22781 0.89755 -0.50767 -0.22261 -0.43984 -1.13151
8 -1.01046 -0.75284 0.22125 -0.25886 -1.23720 -1.26904 -0.06989 -0.54133 0.54484 0.41281 -0.17778 -2.23708 0.43346 -0.75199
9 0.21368 -1.49480 0.80215 -0.55766 -2.11991 0.22287 -2.60513 0.86176 0.86479 0.20923 -0.66948 -0.15163 0.98832 -1.26166
10 0.47976 0.04171 -2.12566 0.07869 0.83110 -0.12056 -1.66437 0.79751 -0.97663 1.29526 -0.57091 -1.01142 -0.88971 -2.32314
11 1.00421 0.99791 0.76168 0.40136 -0.52947 -1.03565 -0.96048 -0.63995 2.44512 1.08679 1.41085 2.73563 1.36788 -0.69813
12 0.52490 -0.47379 -0.65672 -1.35932 -2.25998 -2.31555 -0.89348 -4.19650 -0.84165 -0.69299 0.69274 2.07068 0.22034 0.43498
13 0.63667 1.19087 0.05326 1.21838 -0.08718 2.12081 0.13317 1.77220 -0.62710 -1.29448 -0.33494 -0.95674 0.53233 0.48086
14 -0.37914 -0.14957 0.76062 -0.83470 -0.77427 0.27242 1.21496 1.95481 -0.16722 0.31711 0.31330 -1.84803 0.40436 2.54233
15 -0.89963 -0.38678 0.95247 1.40977 2.22997 -2.98061 -0.18962 -5.50990 0.93988 0.17021 0.26472 2.38602 -0.99076 0.25673
16 -0.17194 0.40791 -0.80500 0.17541 0.35286 1.02670 -0.26927 0.56005 -0.07986 0.04604 -0.95871 -0.59690 1.51679 -1.01261
17 -2.05931 -1.57735 0.46265 -0.18091 2.63300 -2.58241 -0.87385 -4.06331 -0.57685 0.57172 1.38340 1.93061 -1.13699 -1.81634
18 0.17660 -1.64372 1.26173 -0.01106 -1.44764 1.86687 -0.28044 1.77416 -1.68730 -0.23558 -0.41543 -0.63237 -0.79932 0.64604
19 -0.58746 -0.40418 1.12721 -1.26260 -0.27359 -1.22690 -1.83024 -0.53333 0.93635 -0.91030 1.40048 2.55845 -0.12744 -0.34864
20 -0.00095 0.10829 -0.64400 0.06277 -0.76381 -0.67486 0.11844 2.36569 -0.63694 -0.69341 -1.27323 -0.42776 0.45477 0.90311
Phase 1.c.
Link the test’s outcome (Covid-19 positive or negative) to the data collected in step 1.b.
Let's assume that:
All 1000 patients from phase 1.b. have been tested for Covid-19 positiveness using a lab test
The tests were done on respiratory samples obtained by a nasopharyngeal swab using real-time reverse transcription polymerase chain reaction (rRT-
PCR)
The data of the test outcome are captured as follows:
If a patient is Covid-19 positive, the Covid-19 test outcome is equal to 1
If a patient is Covid-19 negative, the Covid-19 test outcome is equal to 0
The data of the test outcome for the first 20 patients are the following:
In [8]: df_y.head(20)
Out[8]:
Covid-19 test outcome
Patient ID
1 1
2 1
3 1
4 0
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 0
13 1
14 1
15 0
16 1
17 0
18 1
19 0
20 0
From the table above, we see that:
Patient with ID 3 is Covid-19 positive
Patient with ID 4 is Covid-19 negative
Let's assume that the proportion of Covid-19 positive patients is approximately 700 / 1000 (70%)
In [9]: df_y['Covid-19 test outcome'].value_counts()
In [10]: sns.countplot(x='Covid-19 test outcome', data=df_y, color='grey')
plt.ylabel('Count of patients')
plt.show()
Out[9]: 1 701
0 299
Name: Covid-19 test outcome, dtype: int64
Then, we link the patient's answers to the questionnaire with the test results.
The output of phase 1.c. is a table similar to this:
In [12]: df.head(20)
Out[12]:
Covid-
19 test
outcome
q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13
Patient
ID
1 1 0.86360 -1.00490 -0.32880 0.63232 -0.76382 2.48937 0.30792 0.69260 0.87849 0.39463 1.00259 0.78609 -0.17739
2 1 0.16313 -0.24986 0.13275 -1.50085 -2.92733 0.20983 0.66074 -0.65538 0.11255 0.59372 0.88890 -0.50316 0.25568
3 1 -0.73682 -0.55274 -1.18046 -0.42813 -1.34706 -0.50470 0.23098 2.24479 -0.68447 -0.19708 -1.26557 -3.01053 0.82140
4 0 -1.04279 -0.65561 -0.76028 0.25715 -0.35681 -2.86298 -2.25225 -5.07991 0.63268 1.13516 0.00391 1.42961 0.43997
5 1 -0.51371 0.50819 -0.38651 -1.34063 1.36827 -0.89227 1.27378 -0.07773 0.77796 -0.47807 -2.04526 1.64837 -0.67013
6 1 -2.55599 2.15763 -0.41735 -0.45903 -0.99303 0.68273 1.86696 0.92724 -0.51942 0.97351 1.01371 0.12639 -0.88108
7 1 0.25446 -0.66239 0.14898 -1.14714 0.26427 -0.12783 -0.13116 0.68001 -0.22781 0.89755 -0.50767 -0.22261 -0.43984
8 1 -1.01046 -0.75284 0.22125 -0.25886 -1.23720 -1.26904 -0.06989 -0.54133 0.54484 0.41281 -0.17778 -2.23708 0.43346
9 1 0.21368 -1.49480 0.80215 -0.55766 -2.11991 0.22287 -2.60513 0.86176 0.86479 0.20923 -0.66948 -0.15163 0.98832
10 1 0.47976 0.04171 -2.12566 0.07869 0.83110 -0.12056 -1.66437 0.79751 -0.97663 1.29526 -0.57091 -1.01142 -0.88971
11 1 1.00421 0.99791 0.76168 0.40136 -0.52947 -1.03565 -0.96048 -0.63995 2.44512 1.08679 1.41085 2.73563 1.36788
12 0 0.52490 -0.47379 -0.65672 -1.35932 -2.25998 -2.31555 -0.89348 -4.19650 -0.84165 -0.69299 0.69274 2.07068 0.22034
13 1 0.63667 1.19087 0.05326 1.21838 -0.08718 2.12081 0.13317 1.77220 -0.62710 -1.29448 -0.33494 -0.95674 0.53233
14 1 -0.37914 -0.14957 0.76062 -0.83470 -0.77427 0.27242 1.21496 1.95481 -0.16722 0.31711 0.31330 -1.84803 0.40436
15 0 -0.89963 -0.38678 0.95247 1.40977 2.22997 -2.98061 -0.18962 -5.50990 0.93988 0.17021 0.26472 2.38602 -0.99076
16 1 -0.17194 0.40791 -0.80500 0.17541 0.35286 1.02670 -0.26927 0.56005 -0.07986 0.04604 -0.95871 -0.59690 1.51679
17 0 -2.05931 -1.57735 0.46265 -0.18091 2.63300 -2.58241 -0.87385 -4.06331 -0.57685 0.57172 1.38340 1.93061 -1.13699
18 1 0.17660 -1.64372 1.26173 -0.01106 -1.44764 1.86687 -0.28044 1.77416 -1.68730 -0.23558 -0.41543 -0.63237 -0.79932
19 0 -0.58746 -0.40418 1.12721 -1.26260 -0.27359 -1.22690 -1.83024 -0.53333 0.93635 -0.91030 1.40048 2.55845 -0.12744
20 0 -0.00095 0.10829 -0.64400 0.06277 -0.76381 -0.67486 0.11844 2.36569 -0.63694 -0.69341 -1.27323 -0.42776 0.45477
The following figure illustrates the pairwise scaterplots of each combination of questions, as well as the distribution of values for each question.
Covid-19 positive patients are shown in orange.
Covid-19 negative patients are shown in blue.
In [13]: sns.set(style="ticks", color_codes=True)
df_sample = df.sample(frac=0.1, replace=False, random_state=0)
g = sns.pairplot(df_sample, hue='Covid-19 test outcome')
Phase 1.d.
Build a machine learning model on data from step 1.c
As a best practice, we leave aside 20% of the data (200 patients) in order to measure model performance in a subset of data which was not used to fit the
model
In [15]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=0)
In [16]: print('The training partition has', X_train.shape[0], 'rows (patients) and', X_train.shape[1],'inputs (a
nswered questions from the questionnaire)')
print('The training partition has', y_train.shape[0], 'class labels (results of the Covid-19 test for ea
ch patient)')
print('n')
print('The validation partition has', X_test.shape[0],'rows (patients) and', X_test.shape[1],'inputs (an
swered questions from the questionnaire)')
print('The validation partition has', y_test.shape[0], 'class labels (results of the Covid-19 test for e
ach patient)')
The proportion of the target variable (Covid-19 test outcome) in the training partition is the following:
In [17]: pd.Series(y_train).value_counts(normalize=True)
The proportion of the target variable (Covid-19 test outcome) in the validation partition is the following:
In [18]: pd.Series(y_test).value_counts(normalize=True)
The training partition has 800 rows (patients) and 23 inputs (answered questions from the questionnaire
)
The training partition has 800 class labels (results of the Covid-19 test for each patient)
The validation partition has 200 rows (patients) and 23 inputs (answered questions from the questionnai
re)
The validation partition has 200 class labels (results of the Covid-19 test for each patient)
Out[17]: 1 0.70125
0 0.29875
dtype: float64
Out[18]: 1 0.7
0 0.3
dtype: float64
The details about how the Covid-19 risk model is fit are not shown.
The graphs below show the distribution of the risk score for the training and the validation partition. We can see that the distribution has two distinct
spikes. Risk scores close to zero show low-risk people and the risk scores close to 1 show high-risk people.
In [31]: sns.distplot(pd.Series(pred_proba_train), kde=False)
plt.xlabel('Covid-19 risk score')
plt.ylabel('Count of patients')
plt.title('Covid-19 risk score for the training partition')
plt.show()
In [32]: sns.distplot(pd.Series(pred_proba_test), kde=False)
plt.xlabel('Covid-19 risk score')
plt.ylabel('Count of patients')
plt.title('Covid-19 risk score for the validation partition')
plt.show()
The following output shows the model performance on the training partition
In [36]: pred_train = adjusted_classes(pred_proba_train, prior_proba)
print('Training partition')
print('n')
print(pd.DataFrame(confusion_matrix(y_train, pred_train),
columns=['Predicted Covid-19 = 0', 'Predicted Covid-19 = 1'],
index=['Actual Covid-19 = 0', 'Actual Covid-19 = 1']))
print('n')
print(classification_report(y_train,pred_train))
The following output shows the model performance on the validation partition
Training partition
Predicted Covid-19 = 0 Predicted Covid-19 = 1
Actual Covid-19 = 0 239 0
Actual Covid-19 = 1 9 552
precision recall f1-score support
0 0.96 1.00 0.98 239
1 1.00 0.98 0.99 561
accuracy 0.99 800
macro avg 0.98 0.99 0.99 800
weighted avg 0.99 0.99 0.99 800
In [37]: pred_test = adjusted_classes(pred_proba_test, prior_proba)
print('Validation partition')
print('n')
print(pd.DataFrame(confusion_matrix(y_test, pred_test),
columns=['Predicted Covid-19 = 0', 'Predicted Covid-19 = 1'],
index=['Actual Covid-19 = 0', 'Actual Covid-19 = 1']))
print('n')
print(classification_report(y_test,pred_test))
The output of phase 1.d is a machine learning model that can be deployed at scale in order to calculate the risk score of any person, on the basis of
his/her answers to the questionnaire.
Validation partition
Predicted Covid-19 = 0 Predicted Covid-19 = 1
Actual Covid-19 = 0 58 2
Actual Covid-19 = 1 12 128
precision recall f1-score support
0 0.83 0.97 0.89 60
1 0.98 0.91 0.95 140
accuracy 0.93 200
macro avg 0.91 0.94 0.92 200
weighted avg 0.94 0.93 0.93 200
Phase 2.a.
Use the model from step 1.d to generate a prediction of Covid-19 positiveness of any
person
Post model deployment (in production) let's assume that there are 500 previously unknown patients that answered the questionnaire. The data for the first
10 of them are shown below.
In [45]: pd.set_option('precision', 5)
df_X_score.head(10)
Out[45]:
q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14
Patient
ID
1001 1.53175 -0.04577 0.24018 -3.01630 3.78771 0.14376 0.10031 1.51729 -1.76818 -1.63343 -0.17644 -0.93879 0.15535 1.46685
1002 -0.10546 0.27277 0.19986 -2.45632 2.71857 -0.67213 0.65718 1.81098 -1.08473 -2.55134 -0.01416 0.69048 -0.05910 0.62776
1003 -0.53627 1.91788 -1.27851 0.08680 0.73835 0.83755 -1.04482 -0.79318 0.05228 -0.39619 -0.05698 0.78990 0.75103 -1.12659
1004 0.22495 1.32366 2.31517 2.10098 0.12442 0.06402 0.13983 -2.27711 -0.34322 -0.45823 -0.86908 1.73886 -1.13832 -1.09103
1005 1.12422 2.21340 1.03132 2.05760 -3.09501 -3.00650 0.06349 -0.10149 1.79921 1.97837 0.02817 -0.22139 -0.08733 -0.17335
1006 0.13391 -0.76154 0.83318 1.40468 -1.82245 -0.19431 -0.20189 0.14828 -2.96337 -0.15200 -0.70969 0.09331 -0.62289 -0.52828
1007 0.55378 1.14685 1.57146 -1.62518 2.78392 -0.22966 1.07731 2.42854 -0.50233 1.09827 -0.25322 0.81109 -1.83957 1.25795
1008 -1.68730 1.69995 -0.99108 1.42300 -2.63067 -1.44764 0.89630 2.26976 -2.73905 0.17660 0.64604 1.48959 -1.64372 -1.62723
1009 -1.14398 1.17859 0.54627 0.11912 0.45548 -0.25665 -1.09810 -1.01112 0.41393 -0.73649 -0.62525 0.51227 0.55505 -1.23055
1010 -1.38965 1.11901 -1.67252 0.45198 -3.02987 0.72636 0.21500 -0.64388 -1.34596 -0.22011 0.13737 -0.67231 -2.50142 0.97951
Distribution of Covid-19 risk score for 500 previously unknown patients
In [48]: sns.distplot(pd.Series(pred_proba_score), kde=False)
plt.xlabel('Covid-19 risk score')
plt.ylabel('Count of patients')
plt.title('Covid-19 risk score for 500 previously unknown patients')
plt.show()
The table below shows the prediction for the first 10 previously unknown patients based on the Covid-19 risk score
In [50]: pred_score_df = pd.DataFrame(pred_score, index=idx_score, columns=['Predicted Covid-19 test outcome'])
pred_score_df.head(10)
Patient with ID 1003 has positive predicted Covid-19 test outcome, while patient 1004 has negative predicted Covid-19 outcome.
Phase 2.b.
Target Covid-19 tests for persons having a high likelihood of positive Covid-19
diagnosis
Out[50]:
Predicted Covid-19 test outcome
Patient ID
1001 1
1002 1
1003 1
1004 0
1005 0
1006 1
1007 1
1008 1
1009 0
1010 1
Get the top 20 persons with highest Covis-19 risk score
In [51]: pred_proba_score_df = pd.DataFrame(pred_proba_score, index=idx_score, columns=['Predicted Covid-19 risk
score'])
pd.concat([pred_proba_score_df, pred_score_df], axis=1).sort_values(by='Predicted Covid-19 risk score',
ascending=False).head(20).drop(columns='Predicted Covid-19 risk score')
Out[51]:
Predicted Covid-19 test outcome
Patient ID
1204 1
1057 1
1223 1
1212 1
1374 1
1375 1
1026 1
1386 1
1293 1
1270 1
1407 1
1267 1
1417 1
1272 1
1195 1
1038 1
1322 1
1481 1
1163 1
1225 1
The proposed solution is capable of registering and reporting the following information:
Daily count of participants taking the questionnaire
Average daily Covid-19 risk score
Average Covid-19 risk score by age group
Average Covid-19 risk score by geographical region
etc.
In [53]: plt.figure(figsize=(15, 10))
plt.title("Daily count of participants taking the questionnaire", fontsize=16)
plt.plot(daily_volume.index, daily_volume['Volume'], color="b", linestyle="-")
plt.ylabel("Volume", fontsize=14)
plt.xlabel("Date", fontsize=14)
plt.ylim(0, 1000)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.grid(True)
plt.show()
In [55]: plt.figure(figsize=(15, 10))
plt.title("Daily average Covid-19 risk score for predicted positive patients", fontsize=16)
plt.plot(daily_scores.index, daily_scores['Average Covid-19 risk score'], color="b", linestyle="-")
plt.ylabel("Average Covid-19 risk score", fontsize=14)
plt.xlabel("Date", fontsize=14)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.ylim(0, 1)
plt.grid(True)
plt.show()
In [57]: plt.figure(figsize=(15, 10))
plt.title("Weekly average Covid-19 risk score by age group", fontsize=16)
plt.plot(weekly_avg_score_age.index, weekly_avg_score_age['18-39'], color="b", linestyle="-", label='18-
39')
plt.plot(weekly_avg_score_age.index, weekly_avg_score_age['40-59'], color="r", linestyle="-", label='40-
59')
plt.plot(weekly_avg_score_age.index, weekly_avg_score_age['60+'], color="g", linestyle="-", label='60+')
plt.ylabel("Average Covid-19 risk score", fontsize=14)
plt.xlabel("Week", fontsize=14)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.grid(True)
plt.legend(loc='best', fontsize=14)
plt.show()
Covid19 Smart Assessment Tool

Mais conteúdo relacionado

Semelhante a Covid19 Smart Assessment Tool

Aminullah assagaf mp10 manajemen proyek
Aminullah assagaf mp10 manajemen proyekAminullah assagaf mp10 manajemen proyek
Aminullah assagaf mp10 manajemen proyek
Aminullah Assagaf
 
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
Mahmoud Bahgat
 
Tables•T-11Table entry for p and C isthe critical .docx
Tables•T-11Table entry for p and C isthe critical .docxTables•T-11Table entry for p and C isthe critical .docx
Tables•T-11Table entry for p and C isthe critical .docx
ssuserf9c51d
 
FIN-321Homework 5 – Fall 20191. In class, we learned that we.docx
FIN-321Homework 5 – Fall 20191. In class, we learned that we.docxFIN-321Homework 5 – Fall 20191. In class, we learned that we.docx
FIN-321Homework 5 – Fall 20191. In class, we learned that we.docx
lmelaine
 
Smarteyetrackingsystem.pdf
Smarteyetrackingsystem.pdfSmarteyetrackingsystem.pdf
Smarteyetrackingsystem.pdf
ManojTM6
 

Semelhante a Covid19 Smart Assessment Tool (20)

Scoring model
Scoring modelScoring model
Scoring model
 
Aminullah assagaf mp10 manajemen proyek
Aminullah assagaf mp10 manajemen proyekAminullah assagaf mp10 manajemen proyek
Aminullah assagaf mp10 manajemen proyek
 
Tim Pletcher Presentation
Tim Pletcher PresentationTim Pletcher Presentation
Tim Pletcher Presentation
 
The Analytics Opportunity in Healthcare
The Analytics Opportunity in HealthcareThe Analytics Opportunity in Healthcare
The Analytics Opportunity in Healthcare
 
Tim Pletcher Presentation
Tim Pletcher PresentationTim Pletcher Presentation
Tim Pletcher Presentation
 
Calculate the ROI of your SEO efforts
Calculate the ROI of your SEO effortsCalculate the ROI of your SEO efforts
Calculate the ROI of your SEO efforts
 
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
 
Tables•T-11Table entry for p and C isthe critical .docx
Tables•T-11Table entry for p and C isthe critical .docxTables•T-11Table entry for p and C isthe critical .docx
Tables•T-11Table entry for p and C isthe critical .docx
 
Multi Objective Optimization of PMEDM Process Parameter by Topsis Method
Multi Objective Optimization of PMEDM Process Parameter by Topsis MethodMulti Objective Optimization of PMEDM Process Parameter by Topsis Method
Multi Objective Optimization of PMEDM Process Parameter by Topsis Method
 
Covid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationCovid-19 Data Analysis and Visualization
Covid-19 Data Analysis and Visualization
 
SHPE Poster
SHPE PosterSHPE Poster
SHPE Poster
 
Statistical & Multi-objective Social Control of Infection Process of COVID-19
Statistical & Multi-objective Social Control of Infection Process of COVID-19Statistical & Multi-objective Social Control of Infection Process of COVID-19
Statistical & Multi-objective Social Control of Infection Process of COVID-19
 
FIN-321Homework 5 – Fall 20191. In class, we learned that we.docx
FIN-321Homework 5 – Fall 20191. In class, we learned that we.docxFIN-321Homework 5 – Fall 20191. In class, we learned that we.docx
FIN-321Homework 5 – Fall 20191. In class, we learned that we.docx
 
financial econometric
financial econometricfinancial econometric
financial econometric
 
Covid-19 Detection Using Deep Neural Networks
Covid-19 Detection Using Deep Neural NetworksCovid-19 Detection Using Deep Neural Networks
Covid-19 Detection Using Deep Neural Networks
 
R. De Santis, M. Fioramanti, A. Girardi, C. Pappalardo - The track record of...
R. De Santis, M. Fioramanti, A. Girardi,  C. Pappalardo - The track record of...R. De Santis, M. Fioramanti, A. Girardi,  C. Pappalardo - The track record of...
R. De Santis, M. Fioramanti, A. Girardi, C. Pappalardo - The track record of...
 
Marketing Research - Hypothetical Work-Life Balance App, presented at XIMB
Marketing Research - Hypothetical Work-Life Balance App, presented at XIMBMarketing Research - Hypothetical Work-Life Balance App, presented at XIMB
Marketing Research - Hypothetical Work-Life Balance App, presented at XIMB
 
Smarteyetrackingsystem.pdf
Smarteyetrackingsystem.pdfSmarteyetrackingsystem.pdf
Smarteyetrackingsystem.pdf
 
09.3 credit scoring
09.3   credit scoring09.3   credit scoring
09.3 credit scoring
 
Energy Analysis - Growth Rate 1984 - 2000 - 2019
Energy Analysis - Growth Rate 1984 - 2000 - 2019Energy Analysis - Growth Rate 1984 - 2000 - 2019
Energy Analysis - Growth Rate 1984 - 2000 - 2019
 

Último

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Covid19 Smart Assessment Tool

  • 1. Machine Learning for COVID-19 targeted testing A risk assessment tool that optimises the use of COVID19 tests in order to implement fact-based strategies for deconfinement. A project supported by Compellio S.A. 15, côte d'Eich L-1450 Luxembourg https://compell.io (https://compell.io) Contributors All project contributors worked voluntarily in this project. Christos Avrilionis, Credit Risk and Governance Manager, PayPal Theo Papasternos, Business Manager, Compellio Denis Avrilionis, CEO, Compellio Vivi Tzekou, Machine Learning / Full-Stack Developer Yuri Visovsiouk. Full-Stack Developer Christina Dimopoulou, Business Operations, Compellio Disclaimer: the opinions expressed in this publication are those of theauthors. They do not express the opinions of any entity whatsoever with which they are affiliated. Contacts We would be delighted to further discuss this project with you. You can directly reach us at hello@compell.io (mailto:hello@compell.io). The way forward
  • 2. We are currently looking to join forces with governments, health organisations, laboratories, and pharmaceuticals. Interested parties can fill in the partnership form found on the project’s website: https://covid19smartscreeningtool.launchaco.com (https://covid19smartscreeningtool.launchaco.com) Stay healthy, The COVID19 Smart Screening Tool Team Hacking for #EUvsVirus Overview The project aims at the development and the deployment of a software platform that would: Allow an individual to fill-in an electronic questionnaire with health and demographic information, securing and protecting sensitive personal data using Compellio’s blockchain- enabled registry technology Predict the likelihood of positive Covid-19 diagnosis of an individual at a given point in time using machine learning (ML) and artificial intelligence (AI) Enable policy makers to build an optimal Covid-19 exit strategy based on the targeted use of Covid-19 tests of high-risk individuals This solution will be implemented in 2 phases. 1. Phase 1: Data collection and modelling a. Design the questionnaire b. Collect medical and demographic data of a person when that person takes a test for Covid-19 using the questionnaire from step 1.a c. Link the test’s outcome (Covid-19 positive or negative) to the data collected in step 1.b d. Build a machine learning model on data from step 1.c 2. Phase 2: Deployment and general availability a. Use the model from step 1.d to generate a prediction of Covid-19 positiveness of any person b. Target Covid-19 tests for persons having a high likelihood of positive Covid-19 diagnosis c. Link the test result (Covid-19 positive or negative) to the prediction calculated in step 2.a d. Monitor model performance and fine-tune the machine learning model built in step 1.d This document illustrates phases 1.b, 1.c, 1.d, 2.a and 2.b using simulated data
  • 3. Phase 1.b. Collect medical and demographic data of a person when that person takes a test for Covid- 19 using the questionnaire As a result of phase 1.a, let's assume that we have a questionnaire of 23 questions about medical and demographic information. For the purpose of this illustration, let's assume that: Each question is referred to as q1, q2, ..., q23 Each patient has a unique identifier from 1 to 1000 The questionnaire was proposed to 1000 patients as part of the Covid-19 testing procedure All patients answered all the questions Each answer to each question is a continuous variable (this can be extended to categorical variables as well) The output of phase 1.b. is a table similar to this (first 20 patients shown): In [7]: df_X.head(20)
  • 4. Out[7]: q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 Patient ID 1 0.86360 -1.00490 -0.32880 0.63232 -0.76382 2.48937 0.30792 0.69260 0.87849 0.39463 1.00259 0.78609 -0.17739 0.77910 2 0.16313 -0.24986 0.13275 -1.50085 -2.92733 0.20983 0.66074 -0.65538 0.11255 0.59372 0.88890 -0.50316 0.25568 0.76523 3 -0.73682 -0.55274 -1.18046 -0.42813 -1.34706 -0.50470 0.23098 2.24479 -0.68447 -0.19708 -1.26557 -3.01053 0.82140 -0.71519 4 -1.04279 -0.65561 -0.76028 0.25715 -0.35681 -2.86298 -2.25225 -5.07991 0.63268 1.13516 0.00391 1.42961 0.43997 -1.27914 5 -0.51371 0.50819 -0.38651 -1.34063 1.36827 -0.89227 1.27378 -0.07773 0.77796 -0.47807 -2.04526 1.64837 -0.67013 1.54823 6 -2.55599 2.15763 -0.41735 -0.45903 -0.99303 0.68273 1.86696 0.92724 -0.51942 0.97351 1.01371 0.12639 -0.88108 0.90742 7 0.25446 -0.66239 0.14898 -1.14714 0.26427 -0.12783 -0.13116 0.68001 -0.22781 0.89755 -0.50767 -0.22261 -0.43984 -1.13151 8 -1.01046 -0.75284 0.22125 -0.25886 -1.23720 -1.26904 -0.06989 -0.54133 0.54484 0.41281 -0.17778 -2.23708 0.43346 -0.75199 9 0.21368 -1.49480 0.80215 -0.55766 -2.11991 0.22287 -2.60513 0.86176 0.86479 0.20923 -0.66948 -0.15163 0.98832 -1.26166 10 0.47976 0.04171 -2.12566 0.07869 0.83110 -0.12056 -1.66437 0.79751 -0.97663 1.29526 -0.57091 -1.01142 -0.88971 -2.32314 11 1.00421 0.99791 0.76168 0.40136 -0.52947 -1.03565 -0.96048 -0.63995 2.44512 1.08679 1.41085 2.73563 1.36788 -0.69813 12 0.52490 -0.47379 -0.65672 -1.35932 -2.25998 -2.31555 -0.89348 -4.19650 -0.84165 -0.69299 0.69274 2.07068 0.22034 0.43498 13 0.63667 1.19087 0.05326 1.21838 -0.08718 2.12081 0.13317 1.77220 -0.62710 -1.29448 -0.33494 -0.95674 0.53233 0.48086 14 -0.37914 -0.14957 0.76062 -0.83470 -0.77427 0.27242 1.21496 1.95481 -0.16722 0.31711 0.31330 -1.84803 0.40436 2.54233 15 -0.89963 -0.38678 0.95247 1.40977 2.22997 -2.98061 -0.18962 -5.50990 0.93988 0.17021 0.26472 2.38602 -0.99076 0.25673 16 -0.17194 0.40791 -0.80500 0.17541 0.35286 1.02670 -0.26927 0.56005 -0.07986 0.04604 -0.95871 -0.59690 1.51679 -1.01261 17 -2.05931 -1.57735 0.46265 -0.18091 2.63300 -2.58241 -0.87385 -4.06331 -0.57685 0.57172 1.38340 1.93061 -1.13699 -1.81634 18 0.17660 -1.64372 1.26173 -0.01106 -1.44764 1.86687 -0.28044 1.77416 -1.68730 -0.23558 -0.41543 -0.63237 -0.79932 0.64604 19 -0.58746 -0.40418 1.12721 -1.26260 -0.27359 -1.22690 -1.83024 -0.53333 0.93635 -0.91030 1.40048 2.55845 -0.12744 -0.34864 20 -0.00095 0.10829 -0.64400 0.06277 -0.76381 -0.67486 0.11844 2.36569 -0.63694 -0.69341 -1.27323 -0.42776 0.45477 0.90311
  • 5. Phase 1.c. Link the test’s outcome (Covid-19 positive or negative) to the data collected in step 1.b. Let's assume that: All 1000 patients from phase 1.b. have been tested for Covid-19 positiveness using a lab test The tests were done on respiratory samples obtained by a nasopharyngeal swab using real-time reverse transcription polymerase chain reaction (rRT- PCR) The data of the test outcome are captured as follows: If a patient is Covid-19 positive, the Covid-19 test outcome is equal to 1 If a patient is Covid-19 negative, the Covid-19 test outcome is equal to 0 The data of the test outcome for the first 20 patients are the following:
  • 6. In [8]: df_y.head(20) Out[8]: Covid-19 test outcome Patient ID 1 1 2 1 3 1 4 0 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 0 13 1 14 1 15 0 16 1 17 0 18 1 19 0 20 0
  • 7. From the table above, we see that: Patient with ID 3 is Covid-19 positive Patient with ID 4 is Covid-19 negative Let's assume that the proportion of Covid-19 positive patients is approximately 700 / 1000 (70%) In [9]: df_y['Covid-19 test outcome'].value_counts() In [10]: sns.countplot(x='Covid-19 test outcome', data=df_y, color='grey') plt.ylabel('Count of patients') plt.show() Out[9]: 1 701 0 299 Name: Covid-19 test outcome, dtype: int64
  • 8. Then, we link the patient's answers to the questionnaire with the test results. The output of phase 1.c. is a table similar to this: In [12]: df.head(20)
  • 9. Out[12]: Covid- 19 test outcome q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 Patient ID 1 1 0.86360 -1.00490 -0.32880 0.63232 -0.76382 2.48937 0.30792 0.69260 0.87849 0.39463 1.00259 0.78609 -0.17739 2 1 0.16313 -0.24986 0.13275 -1.50085 -2.92733 0.20983 0.66074 -0.65538 0.11255 0.59372 0.88890 -0.50316 0.25568 3 1 -0.73682 -0.55274 -1.18046 -0.42813 -1.34706 -0.50470 0.23098 2.24479 -0.68447 -0.19708 -1.26557 -3.01053 0.82140 4 0 -1.04279 -0.65561 -0.76028 0.25715 -0.35681 -2.86298 -2.25225 -5.07991 0.63268 1.13516 0.00391 1.42961 0.43997 5 1 -0.51371 0.50819 -0.38651 -1.34063 1.36827 -0.89227 1.27378 -0.07773 0.77796 -0.47807 -2.04526 1.64837 -0.67013 6 1 -2.55599 2.15763 -0.41735 -0.45903 -0.99303 0.68273 1.86696 0.92724 -0.51942 0.97351 1.01371 0.12639 -0.88108 7 1 0.25446 -0.66239 0.14898 -1.14714 0.26427 -0.12783 -0.13116 0.68001 -0.22781 0.89755 -0.50767 -0.22261 -0.43984 8 1 -1.01046 -0.75284 0.22125 -0.25886 -1.23720 -1.26904 -0.06989 -0.54133 0.54484 0.41281 -0.17778 -2.23708 0.43346 9 1 0.21368 -1.49480 0.80215 -0.55766 -2.11991 0.22287 -2.60513 0.86176 0.86479 0.20923 -0.66948 -0.15163 0.98832 10 1 0.47976 0.04171 -2.12566 0.07869 0.83110 -0.12056 -1.66437 0.79751 -0.97663 1.29526 -0.57091 -1.01142 -0.88971 11 1 1.00421 0.99791 0.76168 0.40136 -0.52947 -1.03565 -0.96048 -0.63995 2.44512 1.08679 1.41085 2.73563 1.36788 12 0 0.52490 -0.47379 -0.65672 -1.35932 -2.25998 -2.31555 -0.89348 -4.19650 -0.84165 -0.69299 0.69274 2.07068 0.22034 13 1 0.63667 1.19087 0.05326 1.21838 -0.08718 2.12081 0.13317 1.77220 -0.62710 -1.29448 -0.33494 -0.95674 0.53233 14 1 -0.37914 -0.14957 0.76062 -0.83470 -0.77427 0.27242 1.21496 1.95481 -0.16722 0.31711 0.31330 -1.84803 0.40436 15 0 -0.89963 -0.38678 0.95247 1.40977 2.22997 -2.98061 -0.18962 -5.50990 0.93988 0.17021 0.26472 2.38602 -0.99076 16 1 -0.17194 0.40791 -0.80500 0.17541 0.35286 1.02670 -0.26927 0.56005 -0.07986 0.04604 -0.95871 -0.59690 1.51679 17 0 -2.05931 -1.57735 0.46265 -0.18091 2.63300 -2.58241 -0.87385 -4.06331 -0.57685 0.57172 1.38340 1.93061 -1.13699 18 1 0.17660 -1.64372 1.26173 -0.01106 -1.44764 1.86687 -0.28044 1.77416 -1.68730 -0.23558 -0.41543 -0.63237 -0.79932 19 0 -0.58746 -0.40418 1.12721 -1.26260 -0.27359 -1.22690 -1.83024 -0.53333 0.93635 -0.91030 1.40048 2.55845 -0.12744 20 0 -0.00095 0.10829 -0.64400 0.06277 -0.76381 -0.67486 0.11844 2.36569 -0.63694 -0.69341 -1.27323 -0.42776 0.45477
  • 10. The following figure illustrates the pairwise scaterplots of each combination of questions, as well as the distribution of values for each question. Covid-19 positive patients are shown in orange. Covid-19 negative patients are shown in blue. In [13]: sns.set(style="ticks", color_codes=True) df_sample = df.sample(frac=0.1, replace=False, random_state=0) g = sns.pairplot(df_sample, hue='Covid-19 test outcome')
  • 11. Phase 1.d. Build a machine learning model on data from step 1.c As a best practice, we leave aside 20% of the data (200 patients) in order to measure model performance in a subset of data which was not used to fit the model In [15]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=0)
  • 12. In [16]: print('The training partition has', X_train.shape[0], 'rows (patients) and', X_train.shape[1],'inputs (a nswered questions from the questionnaire)') print('The training partition has', y_train.shape[0], 'class labels (results of the Covid-19 test for ea ch patient)') print('n') print('The validation partition has', X_test.shape[0],'rows (patients) and', X_test.shape[1],'inputs (an swered questions from the questionnaire)') print('The validation partition has', y_test.shape[0], 'class labels (results of the Covid-19 test for e ach patient)') The proportion of the target variable (Covid-19 test outcome) in the training partition is the following: In [17]: pd.Series(y_train).value_counts(normalize=True) The proportion of the target variable (Covid-19 test outcome) in the validation partition is the following: In [18]: pd.Series(y_test).value_counts(normalize=True) The training partition has 800 rows (patients) and 23 inputs (answered questions from the questionnaire ) The training partition has 800 class labels (results of the Covid-19 test for each patient) The validation partition has 200 rows (patients) and 23 inputs (answered questions from the questionnai re) The validation partition has 200 class labels (results of the Covid-19 test for each patient) Out[17]: 1 0.70125 0 0.29875 dtype: float64 Out[18]: 1 0.7 0 0.3 dtype: float64
  • 13. The details about how the Covid-19 risk model is fit are not shown. The graphs below show the distribution of the risk score for the training and the validation partition. We can see that the distribution has two distinct spikes. Risk scores close to zero show low-risk people and the risk scores close to 1 show high-risk people. In [31]: sns.distplot(pd.Series(pred_proba_train), kde=False) plt.xlabel('Covid-19 risk score') plt.ylabel('Count of patients') plt.title('Covid-19 risk score for the training partition') plt.show()
  • 14. In [32]: sns.distplot(pd.Series(pred_proba_test), kde=False) plt.xlabel('Covid-19 risk score') plt.ylabel('Count of patients') plt.title('Covid-19 risk score for the validation partition') plt.show() The following output shows the model performance on the training partition
  • 15. In [36]: pred_train = adjusted_classes(pred_proba_train, prior_proba) print('Training partition') print('n') print(pd.DataFrame(confusion_matrix(y_train, pred_train), columns=['Predicted Covid-19 = 0', 'Predicted Covid-19 = 1'], index=['Actual Covid-19 = 0', 'Actual Covid-19 = 1'])) print('n') print(classification_report(y_train,pred_train)) The following output shows the model performance on the validation partition Training partition Predicted Covid-19 = 0 Predicted Covid-19 = 1 Actual Covid-19 = 0 239 0 Actual Covid-19 = 1 9 552 precision recall f1-score support 0 0.96 1.00 0.98 239 1 1.00 0.98 0.99 561 accuracy 0.99 800 macro avg 0.98 0.99 0.99 800 weighted avg 0.99 0.99 0.99 800
  • 16. In [37]: pred_test = adjusted_classes(pred_proba_test, prior_proba) print('Validation partition') print('n') print(pd.DataFrame(confusion_matrix(y_test, pred_test), columns=['Predicted Covid-19 = 0', 'Predicted Covid-19 = 1'], index=['Actual Covid-19 = 0', 'Actual Covid-19 = 1'])) print('n') print(classification_report(y_test,pred_test)) The output of phase 1.d is a machine learning model that can be deployed at scale in order to calculate the risk score of any person, on the basis of his/her answers to the questionnaire. Validation partition Predicted Covid-19 = 0 Predicted Covid-19 = 1 Actual Covid-19 = 0 58 2 Actual Covid-19 = 1 12 128 precision recall f1-score support 0 0.83 0.97 0.89 60 1 0.98 0.91 0.95 140 accuracy 0.93 200 macro avg 0.91 0.94 0.92 200 weighted avg 0.94 0.93 0.93 200
  • 17. Phase 2.a. Use the model from step 1.d to generate a prediction of Covid-19 positiveness of any person Post model deployment (in production) let's assume that there are 500 previously unknown patients that answered the questionnaire. The data for the first 10 of them are shown below. In [45]: pd.set_option('precision', 5) df_X_score.head(10) Out[45]: q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 Patient ID 1001 1.53175 -0.04577 0.24018 -3.01630 3.78771 0.14376 0.10031 1.51729 -1.76818 -1.63343 -0.17644 -0.93879 0.15535 1.46685 1002 -0.10546 0.27277 0.19986 -2.45632 2.71857 -0.67213 0.65718 1.81098 -1.08473 -2.55134 -0.01416 0.69048 -0.05910 0.62776 1003 -0.53627 1.91788 -1.27851 0.08680 0.73835 0.83755 -1.04482 -0.79318 0.05228 -0.39619 -0.05698 0.78990 0.75103 -1.12659 1004 0.22495 1.32366 2.31517 2.10098 0.12442 0.06402 0.13983 -2.27711 -0.34322 -0.45823 -0.86908 1.73886 -1.13832 -1.09103 1005 1.12422 2.21340 1.03132 2.05760 -3.09501 -3.00650 0.06349 -0.10149 1.79921 1.97837 0.02817 -0.22139 -0.08733 -0.17335 1006 0.13391 -0.76154 0.83318 1.40468 -1.82245 -0.19431 -0.20189 0.14828 -2.96337 -0.15200 -0.70969 0.09331 -0.62289 -0.52828 1007 0.55378 1.14685 1.57146 -1.62518 2.78392 -0.22966 1.07731 2.42854 -0.50233 1.09827 -0.25322 0.81109 -1.83957 1.25795 1008 -1.68730 1.69995 -0.99108 1.42300 -2.63067 -1.44764 0.89630 2.26976 -2.73905 0.17660 0.64604 1.48959 -1.64372 -1.62723 1009 -1.14398 1.17859 0.54627 0.11912 0.45548 -0.25665 -1.09810 -1.01112 0.41393 -0.73649 -0.62525 0.51227 0.55505 -1.23055 1010 -1.38965 1.11901 -1.67252 0.45198 -3.02987 0.72636 0.21500 -0.64388 -1.34596 -0.22011 0.13737 -0.67231 -2.50142 0.97951
  • 18. Distribution of Covid-19 risk score for 500 previously unknown patients In [48]: sns.distplot(pd.Series(pred_proba_score), kde=False) plt.xlabel('Covid-19 risk score') plt.ylabel('Count of patients') plt.title('Covid-19 risk score for 500 previously unknown patients') plt.show() The table below shows the prediction for the first 10 previously unknown patients based on the Covid-19 risk score
  • 19. In [50]: pred_score_df = pd.DataFrame(pred_score, index=idx_score, columns=['Predicted Covid-19 test outcome']) pred_score_df.head(10) Patient with ID 1003 has positive predicted Covid-19 test outcome, while patient 1004 has negative predicted Covid-19 outcome. Phase 2.b. Target Covid-19 tests for persons having a high likelihood of positive Covid-19 diagnosis Out[50]: Predicted Covid-19 test outcome Patient ID 1001 1 1002 1 1003 1 1004 0 1005 0 1006 1 1007 1 1008 1 1009 0 1010 1
  • 20. Get the top 20 persons with highest Covis-19 risk score In [51]: pred_proba_score_df = pd.DataFrame(pred_proba_score, index=idx_score, columns=['Predicted Covid-19 risk score']) pd.concat([pred_proba_score_df, pred_score_df], axis=1).sort_values(by='Predicted Covid-19 risk score', ascending=False).head(20).drop(columns='Predicted Covid-19 risk score')
  • 21. Out[51]: Predicted Covid-19 test outcome Patient ID 1204 1 1057 1 1223 1 1212 1 1374 1 1375 1 1026 1 1386 1 1293 1 1270 1 1407 1 1267 1 1417 1 1272 1 1195 1 1038 1 1322 1 1481 1 1163 1 1225 1
  • 22. The proposed solution is capable of registering and reporting the following information: Daily count of participants taking the questionnaire Average daily Covid-19 risk score Average Covid-19 risk score by age group Average Covid-19 risk score by geographical region etc. In [53]: plt.figure(figsize=(15, 10)) plt.title("Daily count of participants taking the questionnaire", fontsize=16) plt.plot(daily_volume.index, daily_volume['Volume'], color="b", linestyle="-") plt.ylabel("Volume", fontsize=14) plt.xlabel("Date", fontsize=14) plt.ylim(0, 1000) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.grid(True) plt.show()
  • 23.
  • 24. In [55]: plt.figure(figsize=(15, 10)) plt.title("Daily average Covid-19 risk score for predicted positive patients", fontsize=16) plt.plot(daily_scores.index, daily_scores['Average Covid-19 risk score'], color="b", linestyle="-") plt.ylabel("Average Covid-19 risk score", fontsize=14) plt.xlabel("Date", fontsize=14) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.ylim(0, 1) plt.grid(True) plt.show()
  • 25.
  • 26. In [57]: plt.figure(figsize=(15, 10)) plt.title("Weekly average Covid-19 risk score by age group", fontsize=16) plt.plot(weekly_avg_score_age.index, weekly_avg_score_age['18-39'], color="b", linestyle="-", label='18- 39') plt.plot(weekly_avg_score_age.index, weekly_avg_score_age['40-59'], color="r", linestyle="-", label='40- 59') plt.plot(weekly_avg_score_age.index, weekly_avg_score_age['60+'], color="g", linestyle="-", label='60+') plt.ylabel("Average Covid-19 risk score", fontsize=14) plt.xlabel("Week", fontsize=14) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.grid(True) plt.legend(loc='best', fontsize=14) plt.show()