SlideShare uma empresa Scribd logo
1 de 23
BOOST MODELACCURACY OF IMBALANCED COVID-
19 MORTALITY PREDICTION USING GAN-BASED
OVERSAMPLING TECHNIQUE.
CONTENTS
• Abstract
• Introduction
• GAN
• Data Preprocessing
• Data Analysis
• Evaluation Metrics
• Model Comparison
• Conclusion
• References
ABSTRACT
The model uses the COVID-19 patient's geographical, travel, health, and
demographic data to predict the severity of the case and the possible
outcome, recovery, or death. The data analysis reveals a positive correlation
between patients' gender and deaths, and also indicates that the majority of
patients are aged between 20 and 70 years. This paper proposes a fine-tuned
Random Forest model boosted by the AdaBoost algorithm.
INTRODUCTION
• The solved cases and data from these forums or published research
publications understand their methodology, and try to improve accuracy or
reduce the error with additional steps.
• Conventional methods include Random Oversampling (ROS), Synthetic
Minority Oversampling Technique (SMOTE) and others can be applied.
• The data used in studies were trained using 222 patient records with 13
features.
GENERATIVE ADVERSARIAL NETWORKS (GAN)
• Generative adversarial networks are based on a game-theoretic scenario in
which the generator network must compete against an adversary.
• As GAN learns to mimic the distribution of data, It is applied in various fields
such as music, video, and natural language, and more recently to imbalanced
data problems.
GENERATIVE ADVERSARIAL
NETWORKS (GAN)
 Oversampling based on Generative
Adversarial Networks(GAN) over
comes the limitations of conventional
method such as overfitting, and
allows the development of a highly
accurate prediction model of
imbalanced data FIG 1: GAN BASED OVERSAMPLING
https://cdn.Analytics.Com/wp-content/uploads/2020/10/image2-2.Png
HOW GAN GENERATE SYNTHETIC DATA?
• Two neural networks compete against each other to learn the target distribution
and generate artificial data.
• A generator network training samples to fool the discriminator.
• A discriminator network D: discriminate training samples and generated samples.
Column Description
Values (for categorical
variables)
Type
id Patient Id NA Numeric
location
The location where the
patient belongs to
Multiple cities located
throughout the world
String, Categorical
country Patient’s native country Multiple countries String, Categorical
gender Patient’s gender Male, Female String, Categorical
age Patient’s age NA Numeric
sym_on
The date patient started
noticing the symptoms
NA Date
DATASET
DATA PRE-PROCESSING
• The dataset consists of columns with the data being the Date, String, and
Numeric type. We also have categorical variables in the dataset.
• Since the ML model requires all the data that is passed as input to be in the
numeric form, we performed label-encoding of the categorical variables.
• This assigns a number to every unique categorical value in the column.
DEFINING GENERATOR
• The generator takes input from latent space and generates new synthetic samples.
The leaky rectified linear activation unit (LeakyReLU) is a good practice to use
in both the generator and the discriminator model for handling some negative
values.
• It is used with the default recommended value of 0.2 and the appropriate weight
initializer “he uniform”.
• In the output layer, the SoftMax activation function is used for categorical
variables and sigmoid is used for continuous variables.
DEFINING DISCRIMINATOR
• The discriminator model will take a sample from our data, such as a vector, and
output a classification prediction as to whether the sample is real or fake.
• This is a binary classification problem, so sigmoid activation is used in the
output layer and binary cross-entropy loss function is used in model
compilation.
• The Adam optimization algorithm with the learning rate LR of 0.0002 and the
recommended beta1 momentum value of 0.5 is used.
DATAANALYSIS
 Fever, cough, cold, fatigue, body pain,
and malaise were the most common
symptoms that were noticed in patients
whose data is available in this dataset.
 Correlation between features of the
dataset provides crucial information
about the features and the degree of
influence they have over the target value. FIG 2 : SYMPTOMS IN PATIENTS
https://www.Ncbi.v/pmc/articles/PMC7350612/figure/F1/
EVALUATION METRICS
• The purpose of evaluating the model, is three evaluation metrics.
• ACCURACY: Given a dataset consisting of (TP + TN) data points, the accuracy is
equal to the ratio of total correct predictions (TP + TN + FP + FN) by the classifier
to the total data points. Accuracy is an important measure which is used to assess the
performance of the classification model.
• Accuracy = TP + TN
TP + TN + FP + FN 0.0<Accuracy<1.0
PRECISION
• Precision is equal to the ratio of the True Positive (TP) samples to the sum of
True Positive (TP) and False Positive (FP) samples.
• Precision is also a key metric to identify the number of correctly classified
patients in an imbalanced class dataset.
• Precision = TP
TP + FP
RECALL
• Recall is equal to the ratio of the True Positive (TP) samples to the sum of True
Positive (TP) and False Negative (FN) samples.
• Recall is a significant metric to identify the number of correctly classified patients
in an imbalanced class dataset out of all the patients that could have been
correctly predicted.
• Recall = TP
TP + FN
F1 SCORE
• F1 Score is equal to the harmonic mean of Recall and Precision value.
• The F1 Score strikes the perfect balance between Precision and Recall thereby
providing a correct evaluation of the model's performance in classifying
COVID-19 patients.
• This is the most significant measure that we will be using to evaluate the model.
• F1 Score = 2 × Precision × Recall
Precision + Recall
EVALUATION METRICES
MODEL COMPARISON
 The model performance is tested on the
actual (original) split test data.
 After splitting the original data into train
and test, generated data from GAN is
added to the train data to compare the
performance with the base model.
FIG 3 : COMPARISON OF VARIOUS MODELS
MODEL COMPARISON
Metric Score of Base Model*
Score with Augmented
Generated Data
Recall Score 0.75 0.83
Precision Score 1 1
F1 Score 0.86 0.9
Accuracy 0.9 0.95
CONCLUSION
The proposed model provides a more accurate and robust result compared to that
of the based model, showing that GAN-based oversampling overcomes the
limitations of the imbalanced data and it appropriately inflates the minority class.
REFERENCES
[1] WHO Situation Report-94 Coronavirus disease 2019 (COVID-19) (2020).
[2] Sujatha R, Chatterjee JM, Hassanien AE. (2020).
[3] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017).
[4] Kathiresan S, Sait ARW, Gupta D, Lakshmanaprabu SK, Pandey HM (2020).]
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition.
[5] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely
connected convolutional networks.
DATA AVAILABILITY STATEMENT
• Novel Corona Virus 2019 Dataset (accessed April 23, 2020).
• Bayes C, Valdivieso L. Modelling death rates due to COVID-19: a Bayesian
approach.arXiv.(accessed May 5, 2020).
• The datasets presented in the study can be found in online repositories.
• GitHub repository Link :
https://github.com/bindhu520/Boost-Model-Accuracy-of-Imbalanced-
COVID-19-Mortality-Prediction-Using-GAN-based-Oversampling-Techni
THANK YOU

Mais conteúdo relacionado

Mais procurados

A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsAlexander Decker
 
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...Pubrica
 
Errors in chemical analyses
Errors in chemical analysesErrors in chemical analyses
Errors in chemical analysesGrace de Jesus
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisWansuklangk
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysisSasquatch S
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesChristoph Molnar
 
Specification based or black box techniques
Specification based or black box techniques Specification based or black box techniques
Specification based or black box techniques Muhammad Ibnu Wardana
 
Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Andika Mardanu
 

Mais procurados (17)

Comparison and evaluation of alternative designs
Comparison and evaluation of alternative designsComparison and evaluation of alternative designs
Comparison and evaluation of alternative designs
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Dt33726730
Dt33726730Dt33726730
Dt33726730
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
 
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
 
Errors in chemical analyses
Errors in chemical analysesErrors in chemical analyses
Errors in chemical analyses
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two Cultures
 
Specification based or black box techniques
Specification based or black box techniques Specification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)
 

Semelhante a Boost model accuracy of imbalanced covid 19 mortality prediction

CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Predicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B PatientsPredicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B Patientsnabeelali11101999
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
Analysis of Surveillance Data
Analysis of Surveillance DataAnalysis of Surveillance Data
Analysis of Surveillance DataPerez Eric
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Pptbarthriley
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in Indiaarjun_bhardwaj
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptxSridhar Nomula
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithmHoopeer Hoopeer
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response AnalysisAlejandro Jaramillo
 
IRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET Journal
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Seval Çapraz
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation AssessmentsDr Lendy Spires
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
 
IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...
IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...
IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...IRJET Journal
 

Semelhante a Boost model accuracy of imbalanced covid 19 mortality prediction (20)

CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
cadd.pptx
cadd.pptxcadd.pptx
cadd.pptx
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Predicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B PatientsPredicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B Patients
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Analysis of Surveillance Data
Analysis of Surveillance DataAnalysis of Surveillance Data
Analysis of Surveillance Data
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in India
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis
 
IRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction System
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
 
IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...
IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...
IRJET- Detection of Chronic Kidney Disease using Machine Learning in the R-En...
 

Mais de BindhuBhargaviTalasi (20)

Inheritance
InheritanceInheritance
Inheritance
 
Blood relations
Blood relationsBlood relations
Blood relations
 
Battery
BatteryBattery
Battery
 
Batteries
BatteriesBatteries
Batteries
 
Water
WaterWater
Water
 
Stories
StoriesStories
Stories
 
Predicates
PredicatesPredicates
Predicates
 
Mathematical foundations of computer science
Mathematical foundations of computer scienceMathematical foundations of computer science
Mathematical foundations of computer science
 
Jdbc
JdbcJdbc
Jdbc
 
Blue jacking
Blue jackingBlue jacking
Blue jacking
 
Mathematical foundations of computer science
Mathematical foundations of computer scienceMathematical foundations of computer science
Mathematical foundations of computer science
 
Algebraic structures
Algebraic structuresAlgebraic structures
Algebraic structures
 
Bike sharing prediction
Bike sharing predictionBike sharing prediction
Bike sharing prediction
 
Travel agency
Travel agencyTravel agency
Travel agency
 
Functions
FunctionsFunctions
Functions
 
Introduction to set theory
Introduction to set theoryIntroduction to set theory
Introduction to set theory
 
Library system
Library systemLibrary system
Library system
 
Data analytics
Data analyticsData analytics
Data analytics
 
Agristore
AgristoreAgristore
Agristore
 
Collection framework
Collection frameworkCollection framework
Collection framework
 

Último

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 

Último (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 

Boost model accuracy of imbalanced covid 19 mortality prediction

  • 1. BOOST MODELACCURACY OF IMBALANCED COVID- 19 MORTALITY PREDICTION USING GAN-BASED OVERSAMPLING TECHNIQUE.
  • 2. CONTENTS • Abstract • Introduction • GAN • Data Preprocessing • Data Analysis • Evaluation Metrics • Model Comparison • Conclusion • References
  • 3. ABSTRACT The model uses the COVID-19 patient's geographical, travel, health, and demographic data to predict the severity of the case and the possible outcome, recovery, or death. The data analysis reveals a positive correlation between patients' gender and deaths, and also indicates that the majority of patients are aged between 20 and 70 years. This paper proposes a fine-tuned Random Forest model boosted by the AdaBoost algorithm.
  • 4. INTRODUCTION • The solved cases and data from these forums or published research publications understand their methodology, and try to improve accuracy or reduce the error with additional steps. • Conventional methods include Random Oversampling (ROS), Synthetic Minority Oversampling Technique (SMOTE) and others can be applied. • The data used in studies were trained using 222 patient records with 13 features.
  • 5. GENERATIVE ADVERSARIAL NETWORKS (GAN) • Generative adversarial networks are based on a game-theoretic scenario in which the generator network must compete against an adversary. • As GAN learns to mimic the distribution of data, It is applied in various fields such as music, video, and natural language, and more recently to imbalanced data problems.
  • 6. GENERATIVE ADVERSARIAL NETWORKS (GAN)  Oversampling based on Generative Adversarial Networks(GAN) over comes the limitations of conventional method such as overfitting, and allows the development of a highly accurate prediction model of imbalanced data FIG 1: GAN BASED OVERSAMPLING https://cdn.Analytics.Com/wp-content/uploads/2020/10/image2-2.Png
  • 7. HOW GAN GENERATE SYNTHETIC DATA? • Two neural networks compete against each other to learn the target distribution and generate artificial data. • A generator network training samples to fool the discriminator. • A discriminator network D: discriminate training samples and generated samples.
  • 8. Column Description Values (for categorical variables) Type id Patient Id NA Numeric location The location where the patient belongs to Multiple cities located throughout the world String, Categorical country Patient’s native country Multiple countries String, Categorical gender Patient’s gender Male, Female String, Categorical age Patient’s age NA Numeric sym_on The date patient started noticing the symptoms NA Date DATASET
  • 9. DATA PRE-PROCESSING • The dataset consists of columns with the data being the Date, String, and Numeric type. We also have categorical variables in the dataset. • Since the ML model requires all the data that is passed as input to be in the numeric form, we performed label-encoding of the categorical variables. • This assigns a number to every unique categorical value in the column.
  • 10. DEFINING GENERATOR • The generator takes input from latent space and generates new synthetic samples. The leaky rectified linear activation unit (LeakyReLU) is a good practice to use in both the generator and the discriminator model for handling some negative values. • It is used with the default recommended value of 0.2 and the appropriate weight initializer “he uniform”. • In the output layer, the SoftMax activation function is used for categorical variables and sigmoid is used for continuous variables.
  • 11. DEFINING DISCRIMINATOR • The discriminator model will take a sample from our data, such as a vector, and output a classification prediction as to whether the sample is real or fake. • This is a binary classification problem, so sigmoid activation is used in the output layer and binary cross-entropy loss function is used in model compilation. • The Adam optimization algorithm with the learning rate LR of 0.0002 and the recommended beta1 momentum value of 0.5 is used.
  • 12. DATAANALYSIS  Fever, cough, cold, fatigue, body pain, and malaise were the most common symptoms that were noticed in patients whose data is available in this dataset.  Correlation between features of the dataset provides crucial information about the features and the degree of influence they have over the target value. FIG 2 : SYMPTOMS IN PATIENTS https://www.Ncbi.v/pmc/articles/PMC7350612/figure/F1/
  • 13. EVALUATION METRICS • The purpose of evaluating the model, is three evaluation metrics. • ACCURACY: Given a dataset consisting of (TP + TN) data points, the accuracy is equal to the ratio of total correct predictions (TP + TN + FP + FN) by the classifier to the total data points. Accuracy is an important measure which is used to assess the performance of the classification model. • Accuracy = TP + TN TP + TN + FP + FN 0.0<Accuracy<1.0
  • 14. PRECISION • Precision is equal to the ratio of the True Positive (TP) samples to the sum of True Positive (TP) and False Positive (FP) samples. • Precision is also a key metric to identify the number of correctly classified patients in an imbalanced class dataset. • Precision = TP TP + FP
  • 15. RECALL • Recall is equal to the ratio of the True Positive (TP) samples to the sum of True Positive (TP) and False Negative (FN) samples. • Recall is a significant metric to identify the number of correctly classified patients in an imbalanced class dataset out of all the patients that could have been correctly predicted. • Recall = TP TP + FN
  • 16. F1 SCORE • F1 Score is equal to the harmonic mean of Recall and Precision value. • The F1 Score strikes the perfect balance between Precision and Recall thereby providing a correct evaluation of the model's performance in classifying COVID-19 patients. • This is the most significant measure that we will be using to evaluate the model. • F1 Score = 2 × Precision × Recall Precision + Recall
  • 18. MODEL COMPARISON  The model performance is tested on the actual (original) split test data.  After splitting the original data into train and test, generated data from GAN is added to the train data to compare the performance with the base model. FIG 3 : COMPARISON OF VARIOUS MODELS
  • 19. MODEL COMPARISON Metric Score of Base Model* Score with Augmented Generated Data Recall Score 0.75 0.83 Precision Score 1 1 F1 Score 0.86 0.9 Accuracy 0.9 0.95
  • 20. CONCLUSION The proposed model provides a more accurate and robust result compared to that of the based model, showing that GAN-based oversampling overcomes the limitations of the imbalanced data and it appropriately inflates the minority class.
  • 21. REFERENCES [1] WHO Situation Report-94 Coronavirus disease 2019 (COVID-19) (2020). [2] Sujatha R, Chatterjee JM, Hassanien AE. (2020). [3] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017). [4] Kathiresan S, Sait ARW, Gupta D, Lakshmanaprabu SK, Pandey HM (2020).] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. [5] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks.
  • 22. DATA AVAILABILITY STATEMENT • Novel Corona Virus 2019 Dataset (accessed April 23, 2020). • Bayes C, Valdivieso L. Modelling death rates due to COVID-19: a Bayesian approach.arXiv.(accessed May 5, 2020). • The datasets presented in the study can be found in online repositories. • GitHub repository Link : https://github.com/bindhu520/Boost-Model-Accuracy-of-Imbalanced- COVID-19-Mortality-Prediction-Using-GAN-based-Oversampling-Techni