SlideShare uma empresa Scribd logo
1 de 49
Air Pollution in Nova
Scotia: Analysis and
Predictions
Carlo Carandang Zhengyang Ma
Navjot Kaur Sahil Behl
Funto Akinyemi Maroof Rahman
Air Pollution in NS: Problem
Air Pollution
● Detrimental to health (asthma, breathing problems)
● Detrimental to environment
● Detrimental to NS economy
Air Pollution in NS: Business Case
● Recent push for green and smart cities
● NS economy based on tourism and nature
● To carry out these green initiatives, Policy Makers need:
○ accurate and convenient access to current and future air pollution conditions
○ to highlight pollution problems areas
○ to measure the effects of interventions to reduce pollution
○ to predict future pollution problems
2018 Open Data Hackathon
● Analyzed air particulate pollution datasets from NS Open Data Portal
● Produced time series prediction on concatenated table from the datasets
R Shiny App
https://sahil4.shinyapps.io/nsopendata/
Further Analysis of Air Pollution Data
● After Hackathon, wanted to further analyze the data
● Built a Data Warehouse in Oracle DB for Business Intelligence
● Cleaned, extracted, and transformed the data
● Loaded the data into a data warehouse
○ Dimension Tables loading into a Fact Table
● Fact table connected directly to Tableau for visualization
Technical: ETL and Data Warehouse
Technologies
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
Oracle SQL Developer
SQL*Loader
MS Excel
Cloud Dataprep
Tableau
Draw.io
Data Selection: Motivation
● We chose these datasets as it contained enough data for us to extract insights
into a business problem
● When searching, many of the open datasets available had much sparcity or was
not populated fully, and also the datasets were small (small number of rows and
columns)
● We ended up extracting the air particulate pollution datasets from the Nova
Scotia Open Data Portal, as there were several dataset on air particulate
matter from all around the Province
● We chose these datasets as it gave us a chance to extract multiple datasets and
perform ETL and analytics on them.
Obtaining the Data
To obtain the data, the following procedure was followed:
1. Go to https://data.novascotia.ca
2. Click on button ‘Data Catalogue’
3. In the search bar, search for ‘pollution air particulate’
4. You will see 30 results: sort by ‘Most Relevant’
5. You want the first 12 results (ignore the 11th result, which is a lookup table)
This will leave you with 11 csv files for analysis.
csv Description: 44 MB; 643,453 rows total
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Kentville_BAM.csv 498 KB 8,785 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_BAM.csv 4.7 MB 69,958 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Port Hawkesbury_BAM.csv 4.1 MB 55,559 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sable_Island_BAM.csv 4.6 MB 64,496 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Pictou_BAM.csv 6.8 MB 104,033 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_BAM.csv 3.4 MB 50,752 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_TEOM.csv 4 MB 60,691 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_TEOM.csv 790 KB 12,494 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Aylesford_BAM.csv 4.7 MB 68,025 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_BAM.csv 6.7 MB 96,433 rows
Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_TEOM.csv 3.8 MB 60,514 rows
Data Description
Column Sample Data Description
Date_Time 07/24/1998 03:00:00 PM date-time stamp
Pollutant PM2.5 fine particulate matter (microns)
Unit µg/m3 micrograms per cubic metre
Station Sydney location of instrument
Instrument TEOM instrument type
Average 24 average concentration
Time period coverage: 2001-01-01 to 2016-12-31
Cleaning
We chose to perform preprocessing of the data in Cloud Dataprep. We uploaded the
11 csv files that were extracted from the Nova Scotia Open Data Portal
Import the Data and Add to Flow
Recipe to Remove Rows with Null Values
Remove Rows with Null Values
Remove Rows with ‘InVld’
Remove Rows with ‘NoData’
Remove Rows with ‘<Samp’
Extraction: csv to staging tables
-- Extract csv to staging tables via Excel Macro using SQL Loader.
-- First, create upload table:
CREATE TABLE "DWUSER"."UPLOAD_TABLE"
( "DATE_TIME" VARCHAR2(250 BYTE),
"POLLUTANT" VARCHAR2(250 BYTE),
"UNIT" VARCHAR2(250 BYTE),
"STATION" VARCHAR2(250 BYTE),
"INSTRUMENT" VARCHAR2(250 BYTE),
"AVERAGE" FLOAT(126) ) ;
Extraction: csv to staging tables
Next, utilize Excel Macro to load csv’s via SQL Loader:
Extraction: csv to staging tables
-- Run SQL*Loader via control files, in conjunction with Excel Macros:
LOAD DATA
INFILE *
APPEND
INTO TABLE HLFX_BAM_MONTHLY_AVG
Fields terminated by '~'
Trailing Nullcols
(DATE_TIME,POLLUTANT,STATION,AVERAGE)
BEGINDATA
9-Jan-2008~PM2.5~Halifax~4.814259
-- Load all 11 csv files via this method
Testing
Log file of initial extraction of csv files into Oracle DB table:
SQL*Loader: Release 11.2.0.2.0 - Production on Fri Mar 9 14:05:54 2018
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Control File: C:JUNKINTREVERSE.CTL
Data File: C:JUNKINTREVERSE.CTL
Bad File: C:JUNKINTREVERSE.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional
Testing: continued
Table HLFX_BAM_MONTHLY_AVG, loaded from every logical record.
Insert option in effect for this table: APPEND
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
DATE_TIME FIRST * ~ CHARACTER
POLLUTANT NEXT * ~ CHARACTER
STATION NEXT * ~ CHARACTER
AVERAGE NEXT * ~ CHARACTER
Table HLFX_BAM_MONTHLY_AVG:
132 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Transformation
-- Upload table transformed into tables partitioned by Station (location):
CREATE TABLE "DWUSER"."EVTB_HALIFAX_UPLOAD"
( "DATE_TIME" TIMESTAMP (6),
"POLLUTANT" VARCHAR2(250 BYTE),
"UNIT" VARCHAR2(250 BYTE),
"STATION" VARCHAR2(250 BYTE),
"INSTRUMENT" VARCHAR2(250 BYTE),
"AVERAGE" FLOAT(126));
-- Repeat this for the 7 other Station Locations
Load Dim Tables and Fact Table
CREATE TABLE "DWUSER"."EVTB_FACT"
( "TIME_ID" VARCHAR2(250 BYTE),
"DATE_ID" VARCHAR2(250 BYTE),
"STATION_ID" VARCHAR2(250 BYTE),
"INSTRUMENT_ID" VARCHAR2(250 BYTE),
"AVERAGE" FLOAT(126),
"ID" VARCHAR2(250 BYTE),
CONSTRAINT "PK_FACT" PRIMARY KEY ("ID"));
Load Dim Tables and Fact Table
CREATE TABLE "DWUSER"."EVTB_DIM_STATION"
( "STATION_ID" VARCHAR2(250 BYTE),
"STATION" VARCHAR2(250 BYTE),
"DESCRIPTION" VARCHAR2(250 BYTE),
CONSTRAINT "PK_STATION" PRIMARY KEY ("STATION_ID"));
-- Repeat this for DIM_DATE, DIM_INSTRUMENT, and DIM_TIME
Visualization
Fact table and Dim tables from Oracle database connected directly to Tableau for
visualization

Mais conteúdo relacionado

Mais procurados

Grid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objectsGrid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objects
Yunsu Lee
 
Presentation TDG #2
Presentation TDG #2Presentation TDG #2
Presentation TDG #2
edofrederix
 
SAS writing example
SAS writing exampleSAS writing example
SAS writing example
Tianyue Wang
 
Nutch and lucene_framework
Nutch and lucene_frameworkNutch and lucene_framework
Nutch and lucene_framework
samuelhard
 

Mais procurados (14)

Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...Managing Multi-DBMS on a Single UI, a Web-based Spatial DB Manager-FOSS4G A...
Managing Multi-DBMS on a Single UI , a Web-based Spatial DB Manager-FOSS4G A...
 
Grid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objectsGrid based distributed in memory indexing for moving objects
Grid based distributed in memory indexing for moving objects
 
Weather scraper for your data warehouse
Weather scraper for your data warehouseWeather scraper for your data warehouse
Weather scraper for your data warehouse
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
Horizons doc
Horizons docHorizons doc
Horizons doc
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
Presentation TDG #2
Presentation TDG #2Presentation TDG #2
Presentation TDG #2
 
SAS writing example
SAS writing exampleSAS writing example
SAS writing example
 
Nutch and lucene_framework
Nutch and lucene_frameworkNutch and lucene_framework
Nutch and lucene_framework
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 

Semelhante a Air Pollution in Nova Scotia: Analysis and Predictions

Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
n5712036
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
Aashish Jain
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
INRIA-OAK
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
Ryousei Takano
 

Semelhante a Air Pollution in Nova Scotia: Analysis and Predictions (20)

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Module Owb Tuning
Module Owb TuningModule Owb Tuning
Module Owb Tuning
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Ns fundamentals 1
Ns fundamentals 1Ns fundamentals 1
Ns fundamentals 1
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Sorting_project_2.pdf
Sorting_project_2.pdfSorting_project_2.pdf
Sorting_project_2.pdf
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Wait! What’s going on inside my database?
Wait! What’s going on inside my database?Wait! What’s going on inside my database?
Wait! What’s going on inside my database?
 
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacionalio-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
 
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
New features in the version 4.6 of the CFD meteodyn WT dedicated to wind reso...
 

Mais de Carlo Carandang

Mais de Carlo Carandang (20)

Psychosis in Youth
Psychosis in YouthPsychosis in Youth
Psychosis in Youth
 
Metyrosine and Psychosis
Metyrosine and PsychosisMetyrosine and Psychosis
Metyrosine and Psychosis
 
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
Lamotrigine for Treatment Refractory Mood Disorders in Adolescents: A Case Se...
 
Anxiety Disorders
Anxiety DisordersAnxiety Disorders
Anxiety Disorders
 
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion SyndromeMetyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
Metyrosine in Adolescent Psychosis Associated with 22q11.2 Deletion Syndrome
 
Velocardiofacial Syndrome Associated with Adolescent Psychosis
Velocardiofacial Syndrome Associated with Adolescent PsychosisVelocardiofacial Syndrome Associated with Adolescent Psychosis
Velocardiofacial Syndrome Associated with Adolescent Psychosis
 
Clinical Assessment of Children and Adolescents with Depression
Clinical Assessment of Children and Adolescents with DepressionClinical Assessment of Children and Adolescents with Depression
Clinical Assessment of Children and Adolescents with Depression
 
Data Safety Monitoring Boards in Pediatric Clinical Trials
Data Safety Monitoring Boards in Pediatric Clinical TrialsData Safety Monitoring Boards in Pediatric Clinical Trials
Data Safety Monitoring Boards in Pediatric Clinical Trials
 
Teen Depression and Suicide
Teen Depression and SuicideTeen Depression and Suicide
Teen Depression and Suicide
 
Pediatric Bipolar Disorder
Pediatric Bipolar DisorderPediatric Bipolar Disorder
Pediatric Bipolar Disorder
 
SSRIs and Suicidality in Youth
SSRIs and Suicidality in YouthSSRIs and Suicidality in Youth
SSRIs and Suicidality in Youth
 
The Neurobiology of Adolescent Development
The Neurobiology of Adolescent DevelopmentThe Neurobiology of Adolescent Development
The Neurobiology of Adolescent Development
 
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
Canadian Psychiatry: The Case for Universal Health Care and How Psychiatry Be...
 
Clinical assessment of child and adolescent psychiatric emergencies
Clinical assessment of child and adolescent psychiatric emergenciesClinical assessment of child and adolescent psychiatric emergencies
Clinical assessment of child and adolescent psychiatric emergencies
 
Computer Anxiety
Computer AnxietyComputer Anxiety
Computer Anxiety
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
AI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and OverviewAI and Big Data in Psychiatry: An Introduction and Overview
AI and Big Data in Psychiatry: An Introduction and Overview
 
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
Workplace Disability from Stress, Anxiety, and Depression: Solutions and Prev...
 
Paxil Study 329 Retracted: A Critical Statistical Analysis
Paxil Study 329 Retracted: A Critical Statistical AnalysisPaxil Study 329 Retracted: A Critical Statistical Analysis
Paxil Study 329 Retracted: A Critical Statistical Analysis
 
How The Neurotransmitter GABA Works For Anxiety
How The Neurotransmitter GABA Works For AnxietyHow The Neurotransmitter GABA Works For Anxiety
How The Neurotransmitter GABA Works For Anxiety
 

Último

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Último (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Air Pollution in Nova Scotia: Analysis and Predictions

  • 1. Air Pollution in Nova Scotia: Analysis and Predictions Carlo Carandang Zhengyang Ma Navjot Kaur Sahil Behl Funto Akinyemi Maroof Rahman
  • 2. Air Pollution in NS: Problem Air Pollution ● Detrimental to health (asthma, breathing problems) ● Detrimental to environment ● Detrimental to NS economy
  • 3. Air Pollution in NS: Business Case ● Recent push for green and smart cities ● NS economy based on tourism and nature ● To carry out these green initiatives, Policy Makers need: ○ accurate and convenient access to current and future air pollution conditions ○ to highlight pollution problems areas ○ to measure the effects of interventions to reduce pollution ○ to predict future pollution problems
  • 4. 2018 Open Data Hackathon ● Analyzed air particulate pollution datasets from NS Open Data Portal ● Produced time series prediction on concatenated table from the datasets
  • 6. Further Analysis of Air Pollution Data ● After Hackathon, wanted to further analyze the data ● Built a Data Warehouse in Oracle DB for Business Intelligence ● Cleaned, extracted, and transformed the data ● Loaded the data into a data warehouse ○ Dimension Tables loading into a Fact Table ● Fact table connected directly to Tableau for visualization
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Technical: ETL and Data Warehouse
  • 20. Technologies Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production Oracle SQL Developer SQL*Loader MS Excel Cloud Dataprep Tableau Draw.io
  • 21.
  • 22. Data Selection: Motivation ● We chose these datasets as it contained enough data for us to extract insights into a business problem ● When searching, many of the open datasets available had much sparcity or was not populated fully, and also the datasets were small (small number of rows and columns) ● We ended up extracting the air particulate pollution datasets from the Nova Scotia Open Data Portal, as there were several dataset on air particulate matter from all around the Province ● We chose these datasets as it gave us a chance to extract multiple datasets and perform ETL and analytics on them.
  • 23. Obtaining the Data To obtain the data, the following procedure was followed: 1. Go to https://data.novascotia.ca 2. Click on button ‘Data Catalogue’ 3. In the search bar, search for ‘pollution air particulate’ 4. You will see 30 results: sort by ‘Most Relevant’ 5. You want the first 12 results (ignore the 11th result, which is a lookup table) This will leave you with 11 csv files for analysis.
  • 24. csv Description: 44 MB; 643,453 rows total Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Kentville_BAM.csv 498 KB 8,785 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_BAM.csv 4.7 MB 69,958 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Port Hawkesbury_BAM.csv 4.1 MB 55,559 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sable_Island_BAM.csv 4.6 MB 64,496 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Pictou_BAM.csv 6.8 MB 104,033 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_BAM.csv 3.4 MB 50,752 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_TEOM.csv 4 MB 60,691 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Halifax_TEOM.csv 790 KB 12,494 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Aylesford_BAM.csv 4.7 MB 68,025 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Lake_Major_BAM.csv 6.7 MB 96,433 rows Nova_Scotia_Provincial_Ambient_Fine_Particulate_Matter__Sydney_TEOM.csv 3.8 MB 60,514 rows
  • 25. Data Description Column Sample Data Description Date_Time 07/24/1998 03:00:00 PM date-time stamp Pollutant PM2.5 fine particulate matter (microns) Unit ¬µg/m3 micrograms per cubic metre Station Sydney location of instrument Instrument TEOM instrument type Average 24 average concentration Time period coverage: 2001-01-01 to 2016-12-31
  • 26.
  • 27. Cleaning We chose to perform preprocessing of the data in Cloud Dataprep. We uploaded the 11 csv files that were extracted from the Nova Scotia Open Data Portal
  • 28. Import the Data and Add to Flow
  • 29. Recipe to Remove Rows with Null Values
  • 30. Remove Rows with Null Values
  • 31. Remove Rows with ‘InVld’
  • 32. Remove Rows with ‘NoData’
  • 33. Remove Rows with ‘<Samp’
  • 34.
  • 35. Extraction: csv to staging tables -- Extract csv to staging tables via Excel Macro using SQL Loader. -- First, create upload table: CREATE TABLE "DWUSER"."UPLOAD_TABLE" ( "DATE_TIME" VARCHAR2(250 BYTE), "POLLUTANT" VARCHAR2(250 BYTE), "UNIT" VARCHAR2(250 BYTE), "STATION" VARCHAR2(250 BYTE), "INSTRUMENT" VARCHAR2(250 BYTE), "AVERAGE" FLOAT(126) ) ;
  • 36. Extraction: csv to staging tables Next, utilize Excel Macro to load csv’s via SQL Loader:
  • 37. Extraction: csv to staging tables -- Run SQL*Loader via control files, in conjunction with Excel Macros: LOAD DATA INFILE * APPEND INTO TABLE HLFX_BAM_MONTHLY_AVG Fields terminated by '~' Trailing Nullcols (DATE_TIME,POLLUTANT,STATION,AVERAGE) BEGINDATA 9-Jan-2008~PM2.5~Halifax~4.814259 -- Load all 11 csv files via this method
  • 38. Testing Log file of initial extraction of csv files into Oracle DB table: SQL*Loader: Release 11.2.0.2.0 - Production on Fri Mar 9 14:05:54 2018 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Control File: C:JUNKINTREVERSE.CTL Data File: C:JUNKINTREVERSE.CTL Bad File: C:JUNKINTREVERSE.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 0 Errors allowed: 50 Bind array: 64 rows, maximum of 256000 bytes Continuation: none specified Path used: Conventional
  • 39. Testing: continued Table HLFX_BAM_MONTHLY_AVG, loaded from every logical record. Insert option in effect for this table: APPEND TRAILING NULLCOLS option in effect Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- DATE_TIME FIRST * ~ CHARACTER POLLUTANT NEXT * ~ CHARACTER STATION NEXT * ~ CHARACTER AVERAGE NEXT * ~ CHARACTER Table HLFX_BAM_MONTHLY_AVG: 132 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null.
  • 40.
  • 41.
  • 42.
  • 43. Transformation -- Upload table transformed into tables partitioned by Station (location): CREATE TABLE "DWUSER"."EVTB_HALIFAX_UPLOAD" ( "DATE_TIME" TIMESTAMP (6), "POLLUTANT" VARCHAR2(250 BYTE), "UNIT" VARCHAR2(250 BYTE), "STATION" VARCHAR2(250 BYTE), "INSTRUMENT" VARCHAR2(250 BYTE), "AVERAGE" FLOAT(126)); -- Repeat this for the 7 other Station Locations
  • 44.
  • 45.
  • 46. Load Dim Tables and Fact Table CREATE TABLE "DWUSER"."EVTB_FACT" ( "TIME_ID" VARCHAR2(250 BYTE), "DATE_ID" VARCHAR2(250 BYTE), "STATION_ID" VARCHAR2(250 BYTE), "INSTRUMENT_ID" VARCHAR2(250 BYTE), "AVERAGE" FLOAT(126), "ID" VARCHAR2(250 BYTE), CONSTRAINT "PK_FACT" PRIMARY KEY ("ID"));
  • 47. Load Dim Tables and Fact Table CREATE TABLE "DWUSER"."EVTB_DIM_STATION" ( "STATION_ID" VARCHAR2(250 BYTE), "STATION" VARCHAR2(250 BYTE), "DESCRIPTION" VARCHAR2(250 BYTE), CONSTRAINT "PK_STATION" PRIMARY KEY ("STATION_ID")); -- Repeat this for DIM_DATE, DIM_INSTRUMENT, and DIM_TIME
  • 48.
  • 49. Visualization Fact table and Dim tables from Oracle database connected directly to Tableau for visualization