SlideShare uma empresa Scribd logo
1 de 29
The Paradigm of Fog Computing
with Bio-inspired Search Methods
and the “5Vs” of Big Data
Presenters:
Richard Millham, Israel Edem
Agbehadji, and Samuel Ofori Frimpong
Durban Univeristy of Technology, South Afrca
Outline
• Introduction
• Growth of Big Data
• The 5Vs of Big Data
• Framework to Manage Big Data
• Data Streaming vs Datasets
• Edge/Fog Computing paradigm
• Challenges of Fog computing and Potential
Solutions
• Conclusion
Durban Univeristy of Technology, South Afrca
Introduction
• This presentation seeks to briefly present some of the issues of
big data:
• What characteristics constitute big data?
• What methods and phases are needed to process big data?
• Datasets vs data streaming? What is the difference?
• What is the role and domain of bio-inspired algorithms?
• The drivers for fog/edge computing architecture?
Durban Univeristy of Technology, South Afrca
Big data
• Like many concepts, there is no consensus of what constitutes big data
• Many will say Big data is a voluminous amount of varied data available at
high rate, but it possesses other characteristics as well (5 Vs)
• Big data yields neither meaning nor value, it is important to understand
the unique features of data which may inform the analysis
• Any framework of analysing big data must address big data characteristics
namely velocity, variety, veracity, volume and value
• Sources of big data are numerous but have evolved with our changing
society
• IOT and smart entities
• Enterprise systems
• Social media
Durban Univeristy of Technology, South Afrca
The growth of IOT, along with the subsequent growth of IOT data, is one of
the main contributors to the growth of Big Data and the need for methods to
manage it Durban Univeristy of Technology, South Afrca
Smart Cities and IOT Sensors/Data Analytics
Smart City IOT/Data Analytics
• Smart cities enables its citizens to enjoy a
wide range of new services:
• health sector to monitor quality of
service delivery
• Government gains better insights for
better social intervention programs to
citizens
• Companies to customers to understand
customers perception of products
• These services are enabled through the use of
IOT sensors to monitor the environment and
data analytics to make sense of the
monitored data collected
Durban Univeristy of Technology, South Afrca
The 5-Vs of Big Data
Durban Univeristy of Technology, South Afrca
Big Data Framework
• To manage big data, a framework consisting of a set of steps
and phases. Although some of these phases may overlap and
the steps may vary, this framework is as follows:
• Data Pre-Processing
• Data Cleansing
• Acquire data from a multitude of heterogeneous
devices: social media, IOT sensors, mobile phones,
enterprise system transactions, GPS devices, etc
• Estimate missing values, if needed
• Remove redundant values
• Reformat heterogeneous data into a more uniform
format(s)
Durban Univeristy of Technology, South Afrca
Big Data Framework (cont)
Data Scattered in 3-D space Data Cleansing (Data Reduction)
• One of the most important steps
in data cleansing is data
reduction (reducing the amount
of data to be processed by later
stages). This can be
accomplished by:
• Removing outliers (noise)
• Removing redundant data
• Removing non-interesting data
(with little value)
Durban Univeristy of Technology, South Afrca
Big Data Framework (cont)
• After data cleaning is complete,
the next step is data clustering
or the combining of similar items
together into groups for easier
processing of data in later stages
• Clustering methods include:
• K-Nearest Neighbour
• Density-Based scan discovers
different cluster shapes
Durban Univeristy of Technology, South Afrca
Big Data Framework (cont)
Feature Extraction and Classification
• The next step after data clustering
is feature extraction and
classification where important
features are extracted from the
data and classified (labeled). This
reduces the amount of resources
used to describe a group of data
• Many tools may be used including:
• Autoencoder (to learn unlabeled
data)
Durban Univeristy of Technology, South Afrca
Big Data
Framework (cont)
• Data Mining Phase
• This phase involves finding relationships
among groups of data identified during the
previous phase
• These relationships include correlations
(dependencies among variables) and
association rules (if-then rules) among others
• Methods include Apriori, PageRank etc.
• Many data mining tools exist, using a variety
of methods, including:
• Orange
• Weka
• Apache Mahout
• RapidMiner
• KNIME integrates various components
for machine learning and data mining.
Durban Univeristy of Technology, South Afrca
Big Data
Framework
(cont)
• Visualisation/Business Intelligence Phase
• In this phase, the data relationships and classes identified in previous stages may be visualized
in the form of pie graphs, charts, linear diagrams, etc and/or incorporated into business rules
within the organization.
• Some examples:
• Linear graph may show the increase/decrease in sales of particular products based on
particular features offered. Hence, businesses may be able to determine the most
popular features for each price range
• Business rules may find associations between different itemsets. An example, a store
might find a strong association between the sale of hamburgers and rolls.
Durban Univeristy of Technology, South Afrca
Datasets vs Data
Streams
• Datasets may consist of high volume, veracity,
value and variety but are often fixed in terms of
velocity. In other words, these datasets may
contain the 4 Vs of big data and are modelled on
high velocity data coming in during the formation
of the dataset. However, once this dataset is
formed, they are stable. Consequently, many
different methods and tools may be used to
analyse them
• Data streaming, on the other hand, contains the
same characteristics of datasets but also contain
continuous high velocity with often changing
varieties, values, and veracities of data. Analysis
of this data, due to these characteristics, is
problematic and requires huge resources in
computation (i.e. a supercomputer)
Durban Univeristy of Technology, South Afrca
Datasets vs Data
Streams (cont)
• As this solution is not usually practical,
different methods must be used to
manage data streams including:
• Fixed or random sampling of the
stream (ex: 1 in 50 frames) to get a
snapshot of current data
• Sliding windows to contain these
samples and to ensure that these
samples are current as the streams
may change
• Potentially different methods that are
used for data streams in order to
handle the high velocity and produce
satisfactory results
Durban Univeristy of Technology, South Afrca
Big Data Analytics
• Following diagram shows some of
the methods mentioned or to be
mentioned in presentation under
the term Big Data Analytics
• Batch (dataset) vs stream processing
• Machine learning and advanced
learning (feature extraction,
classification, and business rules)
• Data mining
• Stochastic (probability) models for
preprocessing of noise, feature
extraction, classification, etc
• Edge computing and cloud computing
Durban Univeristy of Technology, South Afrca
Durban Univeristy of Technology, South Afrca
Bio-inspired Computation
• Bio-inspired computation models the natural behavior of animals
(optimized over a very long time period) to achieve some set goal
• Numerous bio-inspired algorithms exist (200+) each with their
advantages and disadvantages
• One basic premise of these algorithms is exploration vs exploitation
• exploration:- search different regions of the solution space to find a global
solution
• exploitation:- search in a small region of the present solution in order to
improve its quality with a small perturbation
• Bio-inspired algorithms have been used in many application domains
such as route optimization, recommender systems, renewable energy
Durban Univeristy of Technology, South Afrca
Bio-inspired
Computation(Cont.)
• Search strategy based on the
behaviour of animals in their
natural habitat.
Durban Univeristy of Technology, South Afrca
Application domain of Bio-inspired
Durban Univeristy of Technology, South Afrca
Why is Edge/Fog Computing Needed?
Cloud Computing
Problems with Cloud – Need for New
Paradigm
• As illustrated in diagram, big data
(huge amounts from many types of
devices flow at high speed to the
cloud) to be processed using data
framework in cloud
• Network soon becomes overloaded
as many early phases
(preprocessing and data reduction)
are only done in the cloud
[Bottleneck]
Durban Univeristy of Technology, South Afrca
Fog Computing Paradigm
• The focus is on devices connected to the
edge of networks.
• The term fog computing or edge
computing operates on the concept that
instead of hosting devices to work from a
centralized location that is cloud server,
fog systems operate on network ends
(Naha et al. 2018).
• Advantage of fog computing is that it
avoids delay in processing of raw data
collected from edge networks rather than
sending it directly to the cloud for
processing
Durban Univeristy of Technology, South Afrca
Durban Univeristy of Technology, South Afrca
Durban Univeristy of Technology, South Afrca
Fog computing applications
SMART CITY
MONITORING
ENERGY EFFICIENT
MODEL
FOG COMPUTING IN
HEALTH MONITORING
Durban Univeristy of Technology, South Afrca
Quality Challenge of Fog computing and 5V’s and
Solution
• There are many issues in fog computing with big data but a key challenge is the issue of data quality.
• Solution: Fog Computing and “5Vs” for Quality-of-Use (QoU) Framework.
• This framework has analytical model that consider speed, size and type of data from
IoT devices and then determine the quality and importance of data to store on cloud
platform.
• The framework has two components, namely IoT (data) and fog computing
• The IoT (data) components is the location of sensors, Internet-enabled devices which
capture large data, at a speed and different types of data
• The data generated are processed and analyzed by fog computing component to
produce quality data that is useful
Durban Univeristy of Technology, South Afrca
More
Challenges in
Fog
Computing
and IoT
• The challenges include:
• energy consumption
• data distribution
• heterogeneity of edge devices
• dynamicity of fog network etc.
• This leads to finding new methods to
address the challenges
• One promising method is the use of bio-inspired
algorithms (a subset of Evolutionary algorithms)
to manage different aspects of these problems
Durban Univeristy of Technology, South Afrca
Fog Computing and Evolutionary
Algorithms Models
• Evolutionary Algorithm for Energy Efficient Model.
• Bio-Inspired Algorithm for Scheduling of Service Requests
to Virtual Machine (VMs).
• Bio-Inspired Algorithms and Fog Computing for Intelligent
Computing in Logistic Data Center.
• Ensemble of Swarm Algorithm for Fire-and-Rescue
Operations.
• Evolutionary Computation and Epidemic Models for Data
Availability in Fog Computing.
• Bio-Inspired Optimization for Job Scheduling in Fog
Computing.
Durban Univeristy of Technology, South Afrca
Conclusion
• This presentation is a brief overview of big data along with many of its aspects
• Increasing technological and societal changes make big data much more predominant
• With increasing prevalence of big data comes a demand to manage this data (particularly
data streams) through new methods and new architectures (edge/fog computing)
• Promising methods have emerged in the field of bio-inspired algorithms which have been
applied to a variety of domains, including challenges with new architectures
Durban Univeristy of Technology, South Afrca

Mais conteúdo relacionado

Mais procurados

Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
Vaticle
 
Making the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemMaking the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management System
Perficient
 

Mais procurados (20)

Data mining
Data mining Data mining
Data mining
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data ScienceIntroduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to Data Science - Week 3 - Steps involved in Data Science
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A Primer
 
Operations Research and ICT A Keynote Address
Operations Research and ICT A Keynote AddressOperations Research and ICT A Keynote Address
Operations Research and ICT A Keynote Address
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
Iot presentation
Iot presentationIot presentation
Iot presentation
 
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of CambridgeData quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Making the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemMaking the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management System
 
Digital data
Digital dataDigital data
Digital data
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Big data
Big dataBig data
Big data
 
Electronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approachElectronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approach
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
What is big data
What is big dataWhat is big data
What is big data
 
R180305120123
R180305120123R180305120123
R180305120123
 

Semelhante a The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 

Semelhante a The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data (20)

High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
UNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptxUNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptx
 
Big data
Big dataBig data
Big data
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data

  • 1. The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data Presenters: Richard Millham, Israel Edem Agbehadji, and Samuel Ofori Frimpong Durban Univeristy of Technology, South Afrca
  • 2. Outline • Introduction • Growth of Big Data • The 5Vs of Big Data • Framework to Manage Big Data • Data Streaming vs Datasets • Edge/Fog Computing paradigm • Challenges of Fog computing and Potential Solutions • Conclusion Durban Univeristy of Technology, South Afrca
  • 3. Introduction • This presentation seeks to briefly present some of the issues of big data: • What characteristics constitute big data? • What methods and phases are needed to process big data? • Datasets vs data streaming? What is the difference? • What is the role and domain of bio-inspired algorithms? • The drivers for fog/edge computing architecture? Durban Univeristy of Technology, South Afrca
  • 4. Big data • Like many concepts, there is no consensus of what constitutes big data • Many will say Big data is a voluminous amount of varied data available at high rate, but it possesses other characteristics as well (5 Vs) • Big data yields neither meaning nor value, it is important to understand the unique features of data which may inform the analysis • Any framework of analysing big data must address big data characteristics namely velocity, variety, veracity, volume and value • Sources of big data are numerous but have evolved with our changing society • IOT and smart entities • Enterprise systems • Social media Durban Univeristy of Technology, South Afrca
  • 5. The growth of IOT, along with the subsequent growth of IOT data, is one of the main contributors to the growth of Big Data and the need for methods to manage it Durban Univeristy of Technology, South Afrca
  • 6. Smart Cities and IOT Sensors/Data Analytics Smart City IOT/Data Analytics • Smart cities enables its citizens to enjoy a wide range of new services: • health sector to monitor quality of service delivery • Government gains better insights for better social intervention programs to citizens • Companies to customers to understand customers perception of products • These services are enabled through the use of IOT sensors to monitor the environment and data analytics to make sense of the monitored data collected Durban Univeristy of Technology, South Afrca
  • 7. The 5-Vs of Big Data Durban Univeristy of Technology, South Afrca
  • 8. Big Data Framework • To manage big data, a framework consisting of a set of steps and phases. Although some of these phases may overlap and the steps may vary, this framework is as follows: • Data Pre-Processing • Data Cleansing • Acquire data from a multitude of heterogeneous devices: social media, IOT sensors, mobile phones, enterprise system transactions, GPS devices, etc • Estimate missing values, if needed • Remove redundant values • Reformat heterogeneous data into a more uniform format(s) Durban Univeristy of Technology, South Afrca
  • 9. Big Data Framework (cont) Data Scattered in 3-D space Data Cleansing (Data Reduction) • One of the most important steps in data cleansing is data reduction (reducing the amount of data to be processed by later stages). This can be accomplished by: • Removing outliers (noise) • Removing redundant data • Removing non-interesting data (with little value) Durban Univeristy of Technology, South Afrca
  • 10. Big Data Framework (cont) • After data cleaning is complete, the next step is data clustering or the combining of similar items together into groups for easier processing of data in later stages • Clustering methods include: • K-Nearest Neighbour • Density-Based scan discovers different cluster shapes Durban Univeristy of Technology, South Afrca
  • 11. Big Data Framework (cont) Feature Extraction and Classification • The next step after data clustering is feature extraction and classification where important features are extracted from the data and classified (labeled). This reduces the amount of resources used to describe a group of data • Many tools may be used including: • Autoencoder (to learn unlabeled data) Durban Univeristy of Technology, South Afrca
  • 12. Big Data Framework (cont) • Data Mining Phase • This phase involves finding relationships among groups of data identified during the previous phase • These relationships include correlations (dependencies among variables) and association rules (if-then rules) among others • Methods include Apriori, PageRank etc. • Many data mining tools exist, using a variety of methods, including: • Orange • Weka • Apache Mahout • RapidMiner • KNIME integrates various components for machine learning and data mining. Durban Univeristy of Technology, South Afrca
  • 13. Big Data Framework (cont) • Visualisation/Business Intelligence Phase • In this phase, the data relationships and classes identified in previous stages may be visualized in the form of pie graphs, charts, linear diagrams, etc and/or incorporated into business rules within the organization. • Some examples: • Linear graph may show the increase/decrease in sales of particular products based on particular features offered. Hence, businesses may be able to determine the most popular features for each price range • Business rules may find associations between different itemsets. An example, a store might find a strong association between the sale of hamburgers and rolls. Durban Univeristy of Technology, South Afrca
  • 14. Datasets vs Data Streams • Datasets may consist of high volume, veracity, value and variety but are often fixed in terms of velocity. In other words, these datasets may contain the 4 Vs of big data and are modelled on high velocity data coming in during the formation of the dataset. However, once this dataset is formed, they are stable. Consequently, many different methods and tools may be used to analyse them • Data streaming, on the other hand, contains the same characteristics of datasets but also contain continuous high velocity with often changing varieties, values, and veracities of data. Analysis of this data, due to these characteristics, is problematic and requires huge resources in computation (i.e. a supercomputer) Durban Univeristy of Technology, South Afrca
  • 15. Datasets vs Data Streams (cont) • As this solution is not usually practical, different methods must be used to manage data streams including: • Fixed or random sampling of the stream (ex: 1 in 50 frames) to get a snapshot of current data • Sliding windows to contain these samples and to ensure that these samples are current as the streams may change • Potentially different methods that are used for data streams in order to handle the high velocity and produce satisfactory results Durban Univeristy of Technology, South Afrca
  • 16. Big Data Analytics • Following diagram shows some of the methods mentioned or to be mentioned in presentation under the term Big Data Analytics • Batch (dataset) vs stream processing • Machine learning and advanced learning (feature extraction, classification, and business rules) • Data mining • Stochastic (probability) models for preprocessing of noise, feature extraction, classification, etc • Edge computing and cloud computing Durban Univeristy of Technology, South Afrca
  • 17. Durban Univeristy of Technology, South Afrca
  • 18. Bio-inspired Computation • Bio-inspired computation models the natural behavior of animals (optimized over a very long time period) to achieve some set goal • Numerous bio-inspired algorithms exist (200+) each with their advantages and disadvantages • One basic premise of these algorithms is exploration vs exploitation • exploration:- search different regions of the solution space to find a global solution • exploitation:- search in a small region of the present solution in order to improve its quality with a small perturbation • Bio-inspired algorithms have been used in many application domains such as route optimization, recommender systems, renewable energy Durban Univeristy of Technology, South Afrca
  • 19. Bio-inspired Computation(Cont.) • Search strategy based on the behaviour of animals in their natural habitat. Durban Univeristy of Technology, South Afrca
  • 20. Application domain of Bio-inspired Durban Univeristy of Technology, South Afrca
  • 21. Why is Edge/Fog Computing Needed? Cloud Computing Problems with Cloud – Need for New Paradigm • As illustrated in diagram, big data (huge amounts from many types of devices flow at high speed to the cloud) to be processed using data framework in cloud • Network soon becomes overloaded as many early phases (preprocessing and data reduction) are only done in the cloud [Bottleneck] Durban Univeristy of Technology, South Afrca
  • 22. Fog Computing Paradigm • The focus is on devices connected to the edge of networks. • The term fog computing or edge computing operates on the concept that instead of hosting devices to work from a centralized location that is cloud server, fog systems operate on network ends (Naha et al. 2018). • Advantage of fog computing is that it avoids delay in processing of raw data collected from edge networks rather than sending it directly to the cloud for processing Durban Univeristy of Technology, South Afrca
  • 23. Durban Univeristy of Technology, South Afrca
  • 24. Durban Univeristy of Technology, South Afrca
  • 25. Fog computing applications SMART CITY MONITORING ENERGY EFFICIENT MODEL FOG COMPUTING IN HEALTH MONITORING Durban Univeristy of Technology, South Afrca
  • 26. Quality Challenge of Fog computing and 5V’s and Solution • There are many issues in fog computing with big data but a key challenge is the issue of data quality. • Solution: Fog Computing and “5Vs” for Quality-of-Use (QoU) Framework. • This framework has analytical model that consider speed, size and type of data from IoT devices and then determine the quality and importance of data to store on cloud platform. • The framework has two components, namely IoT (data) and fog computing • The IoT (data) components is the location of sensors, Internet-enabled devices which capture large data, at a speed and different types of data • The data generated are processed and analyzed by fog computing component to produce quality data that is useful Durban Univeristy of Technology, South Afrca
  • 27. More Challenges in Fog Computing and IoT • The challenges include: • energy consumption • data distribution • heterogeneity of edge devices • dynamicity of fog network etc. • This leads to finding new methods to address the challenges • One promising method is the use of bio-inspired algorithms (a subset of Evolutionary algorithms) to manage different aspects of these problems Durban Univeristy of Technology, South Afrca
  • 28. Fog Computing and Evolutionary Algorithms Models • Evolutionary Algorithm for Energy Efficient Model. • Bio-Inspired Algorithm for Scheduling of Service Requests to Virtual Machine (VMs). • Bio-Inspired Algorithms and Fog Computing for Intelligent Computing in Logistic Data Center. • Ensemble of Swarm Algorithm for Fire-and-Rescue Operations. • Evolutionary Computation and Epidemic Models for Data Availability in Fog Computing. • Bio-Inspired Optimization for Job Scheduling in Fog Computing. Durban Univeristy of Technology, South Afrca
  • 29. Conclusion • This presentation is a brief overview of big data along with many of its aspects • Increasing technological and societal changes make big data much more predominant • With increasing prevalence of big data comes a demand to manage this data (particularly data streams) through new methods and new architectures (edge/fog computing) • Promising methods have emerged in the field of bio-inspired algorithms which have been applied to a variety of domains, including challenges with new architectures Durban Univeristy of Technology, South Afrca

Notas do Editor

  1. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).
  2. Auto-encoder: is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning)
  3. PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web.