SlideShare uma empresa Scribd logo
1 de 34
Big Data
Airlines Project
ZIYAD SALEH
What is Big Data ?
Big data is a broad term for very large or complex data sets that are
difficult to process using traditional data processing applications.
Big Data is Terra bytes (1024 GB) of data to be processed and
analyzed, terra bytes of new data is being generated daily, which
means the speed of analyzing this huge flow of data is a challenge.
Big data can be described by the 4 Vs which are: Volume, Velocity,
Variety and Veracity.
Small Data Vs. Big Data
Map Reduce
Map Reduce model
Project Scope
The Scope is limited to :
1. Installing and configuring Hadoop Map/Reduce
platform.
2. Analyzing a big data sample belonging to U.S
domestic flights performance and delay for 5 years
to try to figure out
1. Top carriers experiencing delays.
2. Top airports and states with departure delays.
3. Plotting state delay in a thematic map of USA
Source of Data for the project
Datasets will be collected
from :
U.S. Department of
Transportation's (DOT) –
Statistical Computing
Dataset size will be between 500 GB and 1 TB and
covering 5 years of flight statistics.
Size of Data
Field Name Description
Year Year of the scheduled flight
Month Month of the scheduled flight (1–12).
Day Day of the month (1–31).
DepTime Actual departure time of the flight
CRSDepTime Scheduled departure time
ArrTime Actual arrival time in HH/MM format
CRSArrTime Scheduled arrival time
FlightNum Flight number.
ArrDelay Arrival delay
DepDelay departure delay, in minutes
CarrierDelay Delay (in minutes) caused by factors within control of the carrier.
WeatherDelay Delay (in minutes) caused by extreme weather conditions
NASDelay Delay (in minutes) within the control of the National Airspace System (NAS)
SecurityDelay Security delay (in minutes) caused by security reasons
LateAircraftDelay Delay (in minutes) due to the same aircraft arriving late at a previous airport.
Table 1 : Airline Dataset Dictionary.
Data Pre-Processing , Processing and Analytics
Data pre-processing:
Data will be cleansed and some artifacts will be filtered out as
necessary. Many fields in the airline data set need to be discarded as
they are irrelevant to the subject of delay that we are concerned on.
Data Processing and Analytics :
Data will be processed using java programming on Map/Reduce to
reduce the size of the data and produce an organized smaller
datasets.
Next, the resulting datasets will be analyzed using additional tools
like R.
Data Storage
Data will be stored in the HDFS multiple
storage nodes with total size between
500 GB and 1 TB.
Airlines
Big Data
HDFS
Target Analysis:
During the 5 years of all US domestic airlines flight
information
1. Which carriers have the most aggregated
delay in their flights ?
2. What are the states with most delays. ) ?
Design
Airlines Project Workflow and Design
Master Node Node 1
Node 2
Node 3
Node 4
Name
Node
Job
Tracker
Airlines
Big Data
Task
Java Code
Reducer Node
HDFS
Mapper
Reducer
Top Airlines
Implementation
Software and Tools
1. CentOS Linux Operating System.
2. Apache Hadoop
3. Cloudera CDH 5.3 virtual machine
4. Oracle VM Virtual Box Manager
5. Eclipse IDE
6. Java (Oracle JDK )
7. Maven
8. Microsoft Excel and Access 2010.
9. The R statistical tool
Mapper :
Reducer:
R:
Findings
US Airlines Delay (Per Carrier)
0
0.2
0.4
0.6
0.8
1
1.2
WN AA OO MQ US DL UA XE NW CO EV 9E FL YV OH B6 AS F9 HA AQ PI HP EA PS TW
ArrivalOnTime
ArrivalDelays
DepartureOnTime
DepartureDelays
Cancellations
Diversions
Thematic Map of US Airlines Delay (Per State)
Conclusion
Conclusion:
 Big Data is the large amount of continuously generated data that cannot be processed and
analyzed using traditional data management tools .
 Big data is a new topic that is rising dramatically , reshaping the future , and a large demand
for big data scientist is taking place and will continue to happen during the coming period of
time.
 Hadoop is an open source framework for storing and processing large datasets using clusters
of commodity hardware.
 Big Data analytics is attracting both business and policy makers to leverage from this new
phenomenon towards more informed decisions and planning for the future.
 Big Data now , Normal Data tomorrow.
Big Data Tutorials
Online Big Data Tutorials:
1. Udemy : https://www.udemy.com/course/subscribe/?courseId=336982&dtcode=lGCe31035ujY
2. Udacity : https://www.udacity.com/courses#!/data-science
3. EMC : https://education.emc.com/guest/campaign/data_science.aspx
4. Coursera : https://www.coursera.org/course/datasci
5. CalTech’s : Learning from Data http://work.caltech.edu/telecourse.html
6. MIT : Open Courseware http://ocw.mit.edu/courses/sloan-school-of-management/15-062-
data-mining-spring-2003/index.htm
7. Stanford’s OpenClassroom
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
8. Big Data University : https://bigdatauniversity.com/curriculum-map/
Thank You
Ziyad Saleh
34
‫ينفعنا‬ ‫ما‬ ‫علمنا‬ ‫اللهم‬..‫علمتنا‬ ‫بما‬ ‫وانفعنا‬
‫علما‬ ‫وزدنا‬

Mais conteúdo relacionado

Mais procurados

Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
UKinItaly
 
Oil & gas
Oil & gasOil & gas
Oil & gas
yoki78
 

Mais procurados (20)

Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
 
Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
 
Oil & gas
Oil & gasOil & gas
Oil & gas
 
ICARUS @ 27th ACRIS Meeting (February 2020, London)
ICARUS @ 27th ACRIS Meeting (February 2020, London)ICARUS @ 27th ACRIS Meeting (February 2020, London)
ICARUS @ 27th ACRIS Meeting (February 2020, London)
 
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
 
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
SXSW Proposal - Harnessing Data from Connected Vehicles
SXSW Proposal - Harnessing Data from Connected VehiclesSXSW Proposal - Harnessing Data from Connected Vehicles
SXSW Proposal - Harnessing Data from Connected Vehicles
 
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
 
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
 
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
 
Esri News for Petroleum Winter 2013/2014 newsletter
Esri News for Petroleum Winter 2013/2014 newsletterEsri News for Petroleum Winter 2013/2014 newsletter
Esri News for Petroleum Winter 2013/2014 newsletter
 
Chen - New data and frontier tools
Chen - New data and frontier toolsChen - New data and frontier tools
Chen - New data and frontier tools
 
Data Skipping Technology
Data Skipping TechnologyData Skipping Technology
Data Skipping Technology
 
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
HPC Market Update from IDC
HPC Market Update from IDCHPC Market Update from IDC
HPC Market Update from IDC
 
Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects list
 
Visualizing Big Data with augmented and virtual reality
Visualizing Big Data with augmented and virtual realityVisualizing Big Data with augmented and virtual reality
Visualizing Big Data with augmented and virtual reality
 

Semelhante a Big Data Airline Project at UAEU

AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUG
Jordan Alpert
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
Shelli Ciaschini
 

Semelhante a Big Data Airline Project at UAEU (20)

Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Airline Data Analysis
Airline Data AnalysisAirline Data Analysis
Airline Data Analysis
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUG
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...
AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...
AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 

Último

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Último (20)

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Big Data Airline Project at UAEU

  • 2. What is Big Data ? Big data is a broad term for very large or complex data sets that are difficult to process using traditional data processing applications. Big Data is Terra bytes (1024 GB) of data to be processed and analyzed, terra bytes of new data is being generated daily, which means the speed of analyzing this huge flow of data is a challenge. Big data can be described by the 4 Vs which are: Volume, Velocity, Variety and Veracity.
  • 3.
  • 4.
  • 5.
  • 6. Small Data Vs. Big Data
  • 7.
  • 8.
  • 11.
  • 13. The Scope is limited to : 1. Installing and configuring Hadoop Map/Reduce platform. 2. Analyzing a big data sample belonging to U.S domestic flights performance and delay for 5 years to try to figure out 1. Top carriers experiencing delays. 2. Top airports and states with departure delays. 3. Plotting state delay in a thematic map of USA
  • 14. Source of Data for the project Datasets will be collected from : U.S. Department of Transportation's (DOT) – Statistical Computing
  • 15. Dataset size will be between 500 GB and 1 TB and covering 5 years of flight statistics. Size of Data
  • 16. Field Name Description Year Year of the scheduled flight Month Month of the scheduled flight (1–12). Day Day of the month (1–31). DepTime Actual departure time of the flight CRSDepTime Scheduled departure time ArrTime Actual arrival time in HH/MM format CRSArrTime Scheduled arrival time FlightNum Flight number. ArrDelay Arrival delay DepDelay departure delay, in minutes CarrierDelay Delay (in minutes) caused by factors within control of the carrier. WeatherDelay Delay (in minutes) caused by extreme weather conditions NASDelay Delay (in minutes) within the control of the National Airspace System (NAS) SecurityDelay Security delay (in minutes) caused by security reasons LateAircraftDelay Delay (in minutes) due to the same aircraft arriving late at a previous airport. Table 1 : Airline Dataset Dictionary.
  • 17. Data Pre-Processing , Processing and Analytics Data pre-processing: Data will be cleansed and some artifacts will be filtered out as necessary. Many fields in the airline data set need to be discarded as they are irrelevant to the subject of delay that we are concerned on. Data Processing and Analytics : Data will be processed using java programming on Map/Reduce to reduce the size of the data and produce an organized smaller datasets. Next, the resulting datasets will be analyzed using additional tools like R.
  • 18. Data Storage Data will be stored in the HDFS multiple storage nodes with total size between 500 GB and 1 TB. Airlines Big Data HDFS
  • 19. Target Analysis: During the 5 years of all US domestic airlines flight information 1. Which carriers have the most aggregated delay in their flights ? 2. What are the states with most delays. ) ?
  • 21. Airlines Project Workflow and Design Master Node Node 1 Node 2 Node 3 Node 4 Name Node Job Tracker Airlines Big Data Task Java Code Reducer Node HDFS Mapper Reducer Top Airlines
  • 23. Software and Tools 1. CentOS Linux Operating System. 2. Apache Hadoop 3. Cloudera CDH 5.3 virtual machine 4. Oracle VM Virtual Box Manager 5. Eclipse IDE 6. Java (Oracle JDK ) 7. Maven 8. Microsoft Excel and Access 2010. 9. The R statistical tool
  • 26. R:
  • 28. US Airlines Delay (Per Carrier) 0 0.2 0.4 0.6 0.8 1 1.2 WN AA OO MQ US DL UA XE NW CO EV 9E FL YV OH B6 AS F9 HA AQ PI HP EA PS TW ArrivalOnTime ArrivalDelays DepartureOnTime DepartureDelays Cancellations Diversions
  • 29. Thematic Map of US Airlines Delay (Per State)
  • 31. Conclusion:  Big Data is the large amount of continuously generated data that cannot be processed and analyzed using traditional data management tools .  Big data is a new topic that is rising dramatically , reshaping the future , and a large demand for big data scientist is taking place and will continue to happen during the coming period of time.  Hadoop is an open source framework for storing and processing large datasets using clusters of commodity hardware.  Big Data analytics is attracting both business and policy makers to leverage from this new phenomenon towards more informed decisions and planning for the future.  Big Data now , Normal Data tomorrow.
  • 33. Online Big Data Tutorials: 1. Udemy : https://www.udemy.com/course/subscribe/?courseId=336982&dtcode=lGCe31035ujY 2. Udacity : https://www.udacity.com/courses#!/data-science 3. EMC : https://education.emc.com/guest/campaign/data_science.aspx 4. Coursera : https://www.coursera.org/course/datasci 5. CalTech’s : Learning from Data http://work.caltech.edu/telecourse.html 6. MIT : Open Courseware http://ocw.mit.edu/courses/sloan-school-of-management/15-062- data-mining-spring-2003/index.htm 7. Stanford’s OpenClassroom http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning 8. Big Data University : https://bigdatauniversity.com/curriculum-map/
  • 34. Thank You Ziyad Saleh 34 ‫ينفعنا‬ ‫ما‬ ‫علمنا‬ ‫اللهم‬..‫علمتنا‬ ‫بما‬ ‫وانفعنا‬ ‫علما‬ ‫وزدنا‬