SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
FASIL KM
S3MCA
Guided by,
Prof.SibuSkaria
CONTENTS
Introduction
Literature Review
Dataset
Analysis of Algorithm
Project Pipeline
System Design
Results and Discussion
Model Deployment
Conclusion
INTRODUCTION
With the population and the ever increasing need for various resources, we are
faced with a dilemma on how to manage our lives. In a struggle to do that, we
sometimes end up utilizing a poor or contaminated source of water for our use and
thus put our health on stake.
According to a recent survey of World Health Organization (WHO), more than 2.2
billion people in India face problems due to unsafe drinking water and 21% of the
diseases are related to impure water.
The proposed system aims to provide the solution to the same, by allowing users
to monitor the water quality from a given sample of water and predicts whether
the water is contaminated or not.
In the current scenario,facilities are available for testing the water samples by
bringing it to the water authorities.But the process is time consuming as it usually
takes several weeks for the reports to be received.This causes dissatisfaction to the
users.
LITERATURE REVIEW
PAPER 1 :- Nouraki, A.; Alavi, M.; Golabi, M.; Albaji, M. Prediction of water quality parameters using
machine learning models: A case study of the Karun River, Iran. Environ. Sci. Pollut. Res. 2021, 28, 57060–
57072. [CrossRef] [PubMed].
The growing worldwide emphasis on dealing with water quality is giving rise to
widespread research and expanding market for novel and astute monitoring systems.
The current method is laboratory process where samples are taken from water bodies
and testing is done in labs. This method is time consuming, wastage of manpower, and
not economical. So, Artificial Neural Network (ANN) is used to solve this problem. This
method eliminates chemical method of evaluating water quality parameters and is cost
effective. This paper gives brief methodology to predict unknown parameters such as
Alkalinity, Chloride, Sulphate values using known parameters such as pH, Electrical
Conductivity, TDS etc. using Levenberg-Marquardt algorithm, which helps in further
classification of water bodies for different application. Results gave accuracy of 83.94%,
87.9%, 81.736%, 79.48% in predicting chloride, total-hardness, sulphate, total alkalinity
respectively
PAPER 2 :- A. Abraham, D. Livingston, I. Guerra and J. Yang, "Exploring the Application of Machine Learning
Algorithms to Water Quality Analysis," 2022 IEEE/ACIS 7th International Conference on Big Data, Cloud
Computing, and Data Science (BCD), 2022, pp. 142-148, doi: 10.1109/BCD54882.2022.9900636.
In this experimental study, we use different Machine Learning algorithms to
decide the quality of the water within the San Antonio River and its tributaries
using datasets of previously collected water data from the San Antonio River
Authority along with the Kaggle water potability dataset. For each algorithm to
work, we used a set of parameters by which to measure the quality of the river
water. Out of all the algorithms utilized, we found that random forest and K-
Nearest Neighbor (KNN) were the best at achieving accurate results, with accuracy
ratings of 0.6520 and 0.6469, respectively. Using K-means, we were able to find
four distinct clusters in the San Antonio River data. The separation of these
clusters was low for the parameters used resulting in a silhouette score of 0.229.
From this data, we may be able to determine which sections of the main San
Antonio River, as well as which tributaries, are acceptable (or healthy enough) for
primary contact recreation use.
PAPER 3 :- H. Mohammed, I. A. Hameed and R. Seidu, "Random forest tree for predicting fecal indicator organisms in
drinking water supply," 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC),
2017, pp. 1-6, doi: 10.1109/BESC.2017.8256398.
Variety of modeling techniques have been widely applied for predicting levels of fecal indicator
organisms in raw water. However, deficiencies in the performances of some methods make it
difficult for implementation in full-scale water supply systems. This study examines the
efficiency of random forest (RF) which is made up of a number of decision trees in the prediction
of fecal indicator organisms in raw water based on records of conductivity, pH, color, turbidity
taken from a drinking water source in Bergen, Norway, as well as seasons. Results of the study
indicate that the method is capable of estimating important variations in levels of the
microorganisms in the raw water with acceptable accuracy. Color of water and the effect of
autumn season were the most important in explaining the variations in the levels of the coliform
bacteria, intestinal enterococci and E. coli in raw water in both the full and the reduced models.
Considerable reduction in the model out-of-bag sample error was achieved in the reduced
models, where only two most important variables were used as predictors. With further research
aimed at improving the estimation error, the random forest method can be a reliable tool for real
time prediction of potential levels of microorganisms in raw water.
Paper 1 gives a brief methodology to predict unknown parameters such as
Alkalinity, Chloride, Sulphate values using known parameters such as pH,
Electrical Conductivity, TDS etc. using Levenberg-Marquardt algorithm,
which helps in further classification of water bodies for different
application. Results gave accuracy of 83.94%, 87.9%, 81.736%, 79.48% in
predicting chloride, total-hardness, sulphate, total alkalinity respectively.
Paper 2 gives a conclusion that Random Forest method gives the best
accuracy when it comes to water quality prediction against various other
machine learning methods. Out of all the algorithms utilized, it was found
that random forest and K-Nearest Neighbor (KNN) were the best at
achieving accurate results, with accuracy ratings of 0.6520 and 0.6469,
respectively.
Findings and Proposals
In the last paper, results of the study indicate that the method is capable
of estimating important variations in levels of the microorganisms in the
raw water with acceptable accuracy. With further research aimed at
improving the estimation error, the Random Forest method can be a
reliable tool for real time prediction of potential levels of microorganisms
in raw water.
From the analysis of above three papers, we can understand that the
usage of machine learning, specifically Random Forest method gives more
accurate results when it comes to prediction of water quality. It can be
found that the most important features to be considered are pH,
Hardness, Conductivity, Turbidity etc
DATASET
Proposed system is implemented using the water potability dataset
from Kaggle. The water_potability.csv file contains water quality
metrics for 3276 dataset, 9 features and one class variable.
Feature lists are pH, Hardness, Solids, Chloramines, Sulfate,
Conductivity, Organic Carbon, Trihalomethanes, Turbidity
Potability is the class label.
URL: https://www.kaggle.com/datasets/adityakadiwal/water-potability
First five rows of the dataset is printed in the above figure. The dataset contains null values
which is later rectified by preprocessing techniques.
Data Preprocessing
As null values are present in
the dataset, data cleaning is
carried out and the missing
values are filled. Null values
in each feature list is filled
using the mean of the
respective feature.
Data Cleaning
Cleaned Dataset
Analysis of feature variables
pH value: PH is an important parameter in evaluating the acid–base balance
of water.
Hardness: Hardness is mainly caused by calcium and magnesium salts. These
salts are dissolved from geologic deposits through which water travels.
Solids (Total dissolved solids - TDS): Water has the ability to dissolve a wide
range of inorganic and some organic minerals or salts such as potassium,
calcium, sodium, bicarbonates, chlorides, magnesium, sulfates etc. These
minerals produced un-wanted taste and diluted color in appearance of water.
Chloramines: Chlorine and chloramine are the major disinfectants used in
public water systems. Chloramines are most commonly formed when
ammonia is added to chlorine to treat drinking water.
1.
2.
3.
4.
5. Sulfate: Sulfates are naturally occurring substances that are found in minerals,
soil, and rocks. They are present in ambient air, groundwater, plants, and food.
6. Conductivity: Pure water is not a good conductor of electric current rather’s a
good insulator. Increase in ions concentration enhances the electrical
conductivity of water.
7. Total Organic Carbon: (TOC) in source waters comes from decaying natural
organic matter (NOM) as well as synthetic sources. TOC is a measure of the total
amount of carbon in organic compounds in pure water.
8. Trihalomethanes: THMs are chemicals which may be found in water treated
with chlorine.
9. Turbidity: The turbidity of water depends on the quantity of solid matter
present in the suspended state. It is a measure of light emitting properties of
water
Figure displays the count of the class variable which is namely potability. The
dataset contains 1998 number of 0s which implies contaminated water and
1278 number of 1s which implies potable water.
Analysis of class variable
Data Visualization
Histogram
representation of all
the feature list and the
class variable:
ANALYSIS OF ALGORITHM
Random Forest
Random forests or random decision forests is an ensemble learning method for
classification, regression and other tasks that operates by constructing a
multitude of decision trees at training time.
For classification tasks, the output of the random forest is the class selected by
most trees. For regression tasks, the mean or average prediction of the
individual trees is returned.
Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts
the final output.
The random forest algorithm solves overfitting to a great extent.
The training algorithm for random forests applies the general technique of
bootstrap aggregating (bagging) to tree learners.
Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging
repeatedly (B times) selects a random sample with replacement of the
training set and fits trees to these samples:
Sample, with replacement, n training examples from X, Y; call these Xb, Yb.
Train a classification or regression tree fb on Xb, Yb.
After training, predictions for unseen samples x' can be made by averaging
the predictions from all the individual regression trees on x':
1.
2.
or by taking the majority vote in the case of classification trees.
Step 1: Select random samples from a given data or training set.
Step 2: This algorithm will construct a decision tree for every training
data.
Step 3: Voting will take place by averaging the decision tree.
Step 4: Finally, select the most voted prediction result as the final
prediction result.
The following steps explain the working Random Forest Algorithm:
This combination of multiple models is called Ensemble.
Working of algorithm
PROJECT PIPELINE
Data collection: A dataset with appropriate parameters like pH,
Hardness, Solids, Chloramines, Sulfate, Conductivity, Organic
Carbon, Trihalomethanes and class variable Potability is used.
Data Pre-processing: Make the acquired data set in an organized
format. Data Cleaning is the data pre-processing method we choose.
Missing values are filled in this phase.
Split Data: In this phase we split the data that is preprocessed into
training and test data. 80% data is taken for training and the
remaining 20% data is taken for testing.
Load Train Data: The training data is loaded for training the model
using the Random Forest algorithm.
Train Model: The loaded data is provided for training and a model is
created using the Random Forest algorithm and it is saved for further
use.
Confusion Matrix: Confusion matrix is plotted using the algorithm to
determine True Positive, True Negative, False Positive, False Negative
metrics.
Export trained model: The trained model is now exported for the
testing purposes.
Load trained model: The trained model is exported and then loaded
for testing.
Load test data: Finally test data(input) is provided to predict whether
the water sample is contaminated or not by analyzing the provided
parameters.
The result is obtained on the user interface where the input
parameters were provided.
SYSTEM DESIGN
Model Planning
By splitting the dataset, a portion is used for training the model and other
for testing the model. 80% of dataset is used as training data and
remaining 20% used as testing data.
Model Training
Model Testing
An accuracy of 71.5% is available for the trained model . Now a set of values from
the dataset is selected and used for prediction purposes. As the values that were
selected implies that the water is contaminated, 0 is displayed as the output.
RESULTS AND DISCUSSION
An accuracy score of 71.5% was obtained by repeated training of
the model by changing the hyper parameters like criterion,
n_estimators, random state etc..
The criterion was changed from entropy to gini as the accuracy
score for entropy was much lesser than gini.
n_estimators implies the number of trees and it was increased to
get the final accuracy.
The dataset splitting ratio was changed from 3:7 to 2:8.
MODEL DEPLOYMENT
CONCLUSION
The project is meant to be replacement to the existing manual
system of water testing as the existing system is very time
consuming and includes human labour.
The system automates the process of testing the water samples
using the various parameters of water.
Proposed system takes various parameters of water as the inputs
and predicts whether the water sample of the provided
parameters are contaminated or not.
This system is based on three IEEE papers, which suggests
Random Forest algorithm as the best algorithm for the prediction
of quality of water samples and hence Random Forest algorithm
is used for the implementation.
THANK
YOU

Mais conteúdo relacionado

Mais procurados

Data cube computation
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
JAVAID AHMAD WANI
 

Mais procurados (20)

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Delivering a Parcel Tracking System for the Future
Delivering a Parcel Tracking System for the FutureDelivering a Parcel Tracking System for the Future
Delivering a Parcel Tracking System for the Future
 
software project management Artifact set(spm)
software project management Artifact set(spm)software project management Artifact set(spm)
software project management Artifact set(spm)
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Mining data streams
Mining data streamsMining data streams
Mining data streams
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Srs for banking system
Srs for banking systemSrs for banking system
Srs for banking system
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognition
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
 
Spatial Database
Spatial DatabaseSpatial Database
Spatial Database
 
Bank Management System
Bank Management System Bank Management System
Bank Management System
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 

Semelhante a WATER QUALITY PREDICTION

Performance assessment of water filtration plants in pakistan - JBES
Performance assessment of water filtration plants in pakistan - JBESPerformance assessment of water filtration plants in pakistan - JBES
Performance assessment of water filtration plants in pakistan - JBES
Innspub Net
 
Statistical analysis to identify the main parameters to
Statistical analysis to identify the main parameters toStatistical analysis to identify the main parameters to
Statistical analysis to identify the main parameters to
eSAT Publishing House
 
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
José Demontier Vieira de Souza Filho
 

Semelhante a WATER QUALITY PREDICTION (20)

An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
 
An efficient method for assessing water
An efficient method for assessing waterAn efficient method for assessing water
An efficient method for assessing water
 
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
 
Estimating Fish Community Diversity through Linear and Non-Linear Statistical...
Estimating Fish Community Diversity through Linear and Non-Linear Statistical...Estimating Fish Community Diversity through Linear and Non-Linear Statistical...
Estimating Fish Community Diversity through Linear and Non-Linear Statistical...
 
Performance assessment of water filtration plants in pakistan - JBES
Performance assessment of water filtration plants in pakistan - JBESPerformance assessment of water filtration plants in pakistan - JBES
Performance assessment of water filtration plants in pakistan - JBES
 
Chlorine Dose Determination in Water Distribution System of Jabalpur City usi...
Chlorine Dose Determination in Water Distribution System of Jabalpur City usi...Chlorine Dose Determination in Water Distribution System of Jabalpur City usi...
Chlorine Dose Determination in Water Distribution System of Jabalpur City usi...
 
Gp3511691177
Gp3511691177Gp3511691177
Gp3511691177
 
IRJET- Hydrodynamic Integrated Modelling of Basic Water Quality and Nutrient ...
IRJET- Hydrodynamic Integrated Modelling of Basic Water Quality and Nutrient ...IRJET- Hydrodynamic Integrated Modelling of Basic Water Quality and Nutrient ...
IRJET- Hydrodynamic Integrated Modelling of Basic Water Quality and Nutrient ...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
water-07-01568
water-07-01568water-07-01568
water-07-01568
 
11.application of principal component analysis & multiple regression models i...
11.application of principal component analysis & multiple regression models i...11.application of principal component analysis & multiple regression models i...
11.application of principal component analysis & multiple regression models i...
 
Estimation of Chlorine in Water Samples-ELECTROANALYSIS
Estimation of Chlorine in Water Samples-ELECTROANALYSISEstimation of Chlorine in Water Samples-ELECTROANALYSIS
Estimation of Chlorine in Water Samples-ELECTROANALYSIS
 
Use of Fuzzy Set Theory in Environmental Engineering Applications: A Review
Use of Fuzzy Set Theory in Environmental Engineering Applications: A ReviewUse of Fuzzy Set Theory in Environmental Engineering Applications: A Review
Use of Fuzzy Set Theory in Environmental Engineering Applications: A Review
 
IRJET- ANN-Based Modeling for Coagulant Dosage in Drinking Water Treatment Plant
IRJET- ANN-Based Modeling for Coagulant Dosage in Drinking Water Treatment PlantIRJET- ANN-Based Modeling for Coagulant Dosage in Drinking Water Treatment Plant
IRJET- ANN-Based Modeling for Coagulant Dosage in Drinking Water Treatment Plant
 
thesis
thesisthesis
thesis
 
Statistical analysis to identify the main parameters to
Statistical analysis to identify the main parameters toStatistical analysis to identify the main parameters to
Statistical analysis to identify the main parameters to
 
Statistical analysis to identify the main parameters to
Statistical analysis to identify the main parameters toStatistical analysis to identify the main parameters to
Statistical analysis to identify the main parameters to
 
T0 numtq0ndy=
T0 numtq0ndy=T0 numtq0ndy=
T0 numtq0ndy=
 
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...Artigo pronto!   desinfecção de efluentes primário municipal de águas residua...
Artigo pronto! desinfecção de efluentes primário municipal de águas residua...
 
Data-Mining-Project
Data-Mining-ProjectData-Mining-Project
Data-Mining-Project
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Último (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

WATER QUALITY PREDICTION

  • 2. CONTENTS Introduction Literature Review Dataset Analysis of Algorithm Project Pipeline System Design Results and Discussion Model Deployment Conclusion
  • 3. INTRODUCTION With the population and the ever increasing need for various resources, we are faced with a dilemma on how to manage our lives. In a struggle to do that, we sometimes end up utilizing a poor or contaminated source of water for our use and thus put our health on stake. According to a recent survey of World Health Organization (WHO), more than 2.2 billion people in India face problems due to unsafe drinking water and 21% of the diseases are related to impure water.
  • 4. The proposed system aims to provide the solution to the same, by allowing users to monitor the water quality from a given sample of water and predicts whether the water is contaminated or not. In the current scenario,facilities are available for testing the water samples by bringing it to the water authorities.But the process is time consuming as it usually takes several weeks for the reports to be received.This causes dissatisfaction to the users.
  • 6. PAPER 1 :- Nouraki, A.; Alavi, M.; Golabi, M.; Albaji, M. Prediction of water quality parameters using machine learning models: A case study of the Karun River, Iran. Environ. Sci. Pollut. Res. 2021, 28, 57060– 57072. [CrossRef] [PubMed]. The growing worldwide emphasis on dealing with water quality is giving rise to widespread research and expanding market for novel and astute monitoring systems. The current method is laboratory process where samples are taken from water bodies and testing is done in labs. This method is time consuming, wastage of manpower, and not economical. So, Artificial Neural Network (ANN) is used to solve this problem. This method eliminates chemical method of evaluating water quality parameters and is cost effective. This paper gives brief methodology to predict unknown parameters such as Alkalinity, Chloride, Sulphate values using known parameters such as pH, Electrical Conductivity, TDS etc. using Levenberg-Marquardt algorithm, which helps in further classification of water bodies for different application. Results gave accuracy of 83.94%, 87.9%, 81.736%, 79.48% in predicting chloride, total-hardness, sulphate, total alkalinity respectively
  • 7. PAPER 2 :- A. Abraham, D. Livingston, I. Guerra and J. Yang, "Exploring the Application of Machine Learning Algorithms to Water Quality Analysis," 2022 IEEE/ACIS 7th International Conference on Big Data, Cloud Computing, and Data Science (BCD), 2022, pp. 142-148, doi: 10.1109/BCD54882.2022.9900636. In this experimental study, we use different Machine Learning algorithms to decide the quality of the water within the San Antonio River and its tributaries using datasets of previously collected water data from the San Antonio River Authority along with the Kaggle water potability dataset. For each algorithm to work, we used a set of parameters by which to measure the quality of the river water. Out of all the algorithms utilized, we found that random forest and K- Nearest Neighbor (KNN) were the best at achieving accurate results, with accuracy ratings of 0.6520 and 0.6469, respectively. Using K-means, we were able to find four distinct clusters in the San Antonio River data. The separation of these clusters was low for the parameters used resulting in a silhouette score of 0.229. From this data, we may be able to determine which sections of the main San Antonio River, as well as which tributaries, are acceptable (or healthy enough) for primary contact recreation use.
  • 8. PAPER 3 :- H. Mohammed, I. A. Hameed and R. Seidu, "Random forest tree for predicting fecal indicator organisms in drinking water supply," 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), 2017, pp. 1-6, doi: 10.1109/BESC.2017.8256398. Variety of modeling techniques have been widely applied for predicting levels of fecal indicator organisms in raw water. However, deficiencies in the performances of some methods make it difficult for implementation in full-scale water supply systems. This study examines the efficiency of random forest (RF) which is made up of a number of decision trees in the prediction of fecal indicator organisms in raw water based on records of conductivity, pH, color, turbidity taken from a drinking water source in Bergen, Norway, as well as seasons. Results of the study indicate that the method is capable of estimating important variations in levels of the microorganisms in the raw water with acceptable accuracy. Color of water and the effect of autumn season were the most important in explaining the variations in the levels of the coliform bacteria, intestinal enterococci and E. coli in raw water in both the full and the reduced models. Considerable reduction in the model out-of-bag sample error was achieved in the reduced models, where only two most important variables were used as predictors. With further research aimed at improving the estimation error, the random forest method can be a reliable tool for real time prediction of potential levels of microorganisms in raw water.
  • 9. Paper 1 gives a brief methodology to predict unknown parameters such as Alkalinity, Chloride, Sulphate values using known parameters such as pH, Electrical Conductivity, TDS etc. using Levenberg-Marquardt algorithm, which helps in further classification of water bodies for different application. Results gave accuracy of 83.94%, 87.9%, 81.736%, 79.48% in predicting chloride, total-hardness, sulphate, total alkalinity respectively. Paper 2 gives a conclusion that Random Forest method gives the best accuracy when it comes to water quality prediction against various other machine learning methods. Out of all the algorithms utilized, it was found that random forest and K-Nearest Neighbor (KNN) were the best at achieving accurate results, with accuracy ratings of 0.6520 and 0.6469, respectively. Findings and Proposals
  • 10. In the last paper, results of the study indicate that the method is capable of estimating important variations in levels of the microorganisms in the raw water with acceptable accuracy. With further research aimed at improving the estimation error, the Random Forest method can be a reliable tool for real time prediction of potential levels of microorganisms in raw water. From the analysis of above three papers, we can understand that the usage of machine learning, specifically Random Forest method gives more accurate results when it comes to prediction of water quality. It can be found that the most important features to be considered are pH, Hardness, Conductivity, Turbidity etc
  • 11. DATASET Proposed system is implemented using the water potability dataset from Kaggle. The water_potability.csv file contains water quality metrics for 3276 dataset, 9 features and one class variable. Feature lists are pH, Hardness, Solids, Chloramines, Sulfate, Conductivity, Organic Carbon, Trihalomethanes, Turbidity Potability is the class label. URL: https://www.kaggle.com/datasets/adityakadiwal/water-potability
  • 12. First five rows of the dataset is printed in the above figure. The dataset contains null values which is later rectified by preprocessing techniques.
  • 13. Data Preprocessing As null values are present in the dataset, data cleaning is carried out and the missing values are filled. Null values in each feature list is filled using the mean of the respective feature. Data Cleaning
  • 15. Analysis of feature variables pH value: PH is an important parameter in evaluating the acid–base balance of water. Hardness: Hardness is mainly caused by calcium and magnesium salts. These salts are dissolved from geologic deposits through which water travels. Solids (Total dissolved solids - TDS): Water has the ability to dissolve a wide range of inorganic and some organic minerals or salts such as potassium, calcium, sodium, bicarbonates, chlorides, magnesium, sulfates etc. These minerals produced un-wanted taste and diluted color in appearance of water. Chloramines: Chlorine and chloramine are the major disinfectants used in public water systems. Chloramines are most commonly formed when ammonia is added to chlorine to treat drinking water. 1. 2. 3. 4.
  • 16. 5. Sulfate: Sulfates are naturally occurring substances that are found in minerals, soil, and rocks. They are present in ambient air, groundwater, plants, and food. 6. Conductivity: Pure water is not a good conductor of electric current rather’s a good insulator. Increase in ions concentration enhances the electrical conductivity of water. 7. Total Organic Carbon: (TOC) in source waters comes from decaying natural organic matter (NOM) as well as synthetic sources. TOC is a measure of the total amount of carbon in organic compounds in pure water. 8. Trihalomethanes: THMs are chemicals which may be found in water treated with chlorine. 9. Turbidity: The turbidity of water depends on the quantity of solid matter present in the suspended state. It is a measure of light emitting properties of water
  • 17. Figure displays the count of the class variable which is namely potability. The dataset contains 1998 number of 0s which implies contaminated water and 1278 number of 1s which implies potable water. Analysis of class variable
  • 18. Data Visualization Histogram representation of all the feature list and the class variable:
  • 19. ANALYSIS OF ALGORITHM Random Forest Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output. The random forest algorithm solves overfitting to a great extent.
  • 20.
  • 21. The training algorithm for random forests applies the general technique of bootstrap aggregating (bagging) to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples: Sample, with replacement, n training examples from X, Y; call these Xb, Yb. Train a classification or regression tree fb on Xb, Yb. After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': 1. 2. or by taking the majority vote in the case of classification trees.
  • 22. Step 1: Select random samples from a given data or training set. Step 2: This algorithm will construct a decision tree for every training data. Step 3: Voting will take place by averaging the decision tree. Step 4: Finally, select the most voted prediction result as the final prediction result. The following steps explain the working Random Forest Algorithm: This combination of multiple models is called Ensemble. Working of algorithm
  • 24. Data collection: A dataset with appropriate parameters like pH, Hardness, Solids, Chloramines, Sulfate, Conductivity, Organic Carbon, Trihalomethanes and class variable Potability is used. Data Pre-processing: Make the acquired data set in an organized format. Data Cleaning is the data pre-processing method we choose. Missing values are filled in this phase. Split Data: In this phase we split the data that is preprocessed into training and test data. 80% data is taken for training and the remaining 20% data is taken for testing. Load Train Data: The training data is loaded for training the model using the Random Forest algorithm.
  • 25. Train Model: The loaded data is provided for training and a model is created using the Random Forest algorithm and it is saved for further use. Confusion Matrix: Confusion matrix is plotted using the algorithm to determine True Positive, True Negative, False Positive, False Negative metrics. Export trained model: The trained model is now exported for the testing purposes. Load trained model: The trained model is exported and then loaded for testing. Load test data: Finally test data(input) is provided to predict whether the water sample is contaminated or not by analyzing the provided parameters. The result is obtained on the user interface where the input parameters were provided.
  • 26. SYSTEM DESIGN Model Planning By splitting the dataset, a portion is used for training the model and other for testing the model. 80% of dataset is used as training data and remaining 20% used as testing data.
  • 28. Model Testing An accuracy of 71.5% is available for the trained model . Now a set of values from the dataset is selected and used for prediction purposes. As the values that were selected implies that the water is contaminated, 0 is displayed as the output.
  • 29. RESULTS AND DISCUSSION An accuracy score of 71.5% was obtained by repeated training of the model by changing the hyper parameters like criterion, n_estimators, random state etc.. The criterion was changed from entropy to gini as the accuracy score for entropy was much lesser than gini. n_estimators implies the number of trees and it was increased to get the final accuracy. The dataset splitting ratio was changed from 3:7 to 2:8.
  • 31. CONCLUSION The project is meant to be replacement to the existing manual system of water testing as the existing system is very time consuming and includes human labour. The system automates the process of testing the water samples using the various parameters of water. Proposed system takes various parameters of water as the inputs and predicts whether the water sample of the provided parameters are contaminated or not. This system is based on three IEEE papers, which suggests Random Forest algorithm as the best algorithm for the prediction of quality of water samples and hence Random Forest algorithm is used for the implementation.