SlideShare uma empresa Scribd logo
1 de 28
BIG DATA INFRASTRUCTURE AND
ANALYTICS SOLUTION
Erdenebayar Erdenebileg, Oyun-Erdene Namsrai
School of Information Technology, National University of Mongolia
erdenebayar.erdenebileg@gmail.com, oyunerdene@num.edu.mn
Overview

•
•
•
•
•
•
•

Introduction
Methods
Proposed methods
Experimental results
Related work
Discussion
Future work

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Introduction
• BIG DATA is coming from structured
and unstructured information (Web
data, market purchases, Credit card
transactions …)
• BIG DATA: 10% is structured data, But
90% is unstructured data

• Nowadays, almost every organization
is facing BIG DATA problems in
Mongolia.
• They need to analyze and predict their
valuable information

School of Information Technology, National University of Mongolia

Why?
How?

FITAT/ISPM 2013
Why?
Why we are facing BIG DATA problem?
Big Data: 3V’s
We are facing big data problem
with Volume, Variety, Velocity
reasons:
• Transactional data is growing day
by day
• Storing different types of data
• Need to be processed fast

Real Time

Data Velocity
(Fast analyzing requirement)

Near Real Time

Periodic
Batch

Unstructured
Video

Table
Database

GB

Web

Social

Data Variety

MB

Photo

Audio

Mobile

TB
PB

(Many types of data)

Data Volume
(Large amount of data)
School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
How?
How to solve the BIG DATA problem?
How to solve problem?

To provide BI and Analytic tool

Full solution is
1. To construct BIG DATA
infrastructure
2. To find and develop data
transmission tools
3. To implement warehousing and
mining tools and techniques
4. To provide BI and Analytic tool

To implement warehousing and
mining tools and techniques
To construct BIG DATA
infrastructure

School of Information Technology, National University of Mongolia

To find and develop data
transmission tools

Data Sources
(Structured,
Semi-structured,
Unstructured)

FITAT/ISPM 2013
Methods and Comparison?
RDBMS versus NoSQL database?
RDMBS based infrastructure
From my experimental :
• Optimization requires more
cost (Licenses and Server), but
open source RDBMS is not
fitted with license
• RDBMS is not good with more
than gigabyte data
• It is not compatible to store
unstructured data (video, audio
etc…)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
HADOOP based infrastructure
From the biggest companies
experience (Facebook, Yahoo,
Twitter …), main advantages
are :
• Distributed File System
paradigm
• Powerful parallel computing
framework (MapReduce)
• It can be store any type of
data, which are structured,
semi-structured, unstructured
data
• It is Open source and easy to
integrate Hadoop related
products

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Brief introduction: HDFS Architecture
NameNode

BackupNode

Balancing, Replication, Failover

DataNode

DataNode

DataNode

DataNode

Data Node stores in local disks
School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Brief introduction : MapReduce framework
Job Tracker

2010

2011

2012

2013

1. We have a big GREEN data

3. Aggregation and calculation data

2. Data will separate to the different
server

4. Consolidated result to the client

Task Tracker /
Server

Task Tracker /
Server

School of Information Technology, National University of Mongolia

Task Tracker /
Server

Task Tracker /
Server

FITAT/ISPM 2013
Proposed method & solution
It is Hadoop and open source technologies
Proposed method selection (Hadoop stacks)
Proposed method selected with following reason:
• Data should be stored in Distributed system
• Aggregation and calculation should be done in parallel computing
paradigm
• Data type is structured and unstructured data, which are mobile
call detailed record
• Data size is about 20TB
• Method should be Open source technologies

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Full Infrastructure (3 main method)
Client Machine (Jasper Business Intelligence)

Client software
(Reporting tool)

JasperRepors Server
Hive connector

Machine 1 (Slave Hadoop)

HBase connector

Machine 2 (Master Hadoop)

Clustered Big Data Infrastructure and Data Processing

Physical Machine (Resources)

Data Sender
Data resources

Sensor Data (Phone, Web Log, Camera etc…)
Structured Data

Big Data
Infrastructure

Semi -Unstructured Data

School of Information Technology, National University of Mongolia

Unstructured Data

FITAT/ISPM 2013
Method 1: Clustered Big Data Infrastructure and Data Processing
• First task is configuring BIG DATA infrastructure with Analytic products
• This configuration clustered with TWO machine (Physical machine)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Method 2: Data transmission way
• Data resources consist RDBMS
and unstructured data (CDR file,
video …)
• If structured data stores such as
Relational databases, we need
Sqoop product for bulk data
transfer
• If unstructured data stores such
as video and file, we need custom
application development using
HDFS client (SSH)

•
•

School of Information Technology, National University of Mongolia

Manual data transfer way
Automatic data transfer way
(Custom application)

FITAT/ISPM 2013
Method 3: Analytics solution over the BIG DATA
This is the main method and trying to solve
following concepts

Predictive Analytics
They are focusing now

Prediction
(What will happen?)

Complexity

Business Intelligence
Almost every
organizations are
doing now

Monitoring
(What is happening now?)

Analysis
(Why did it happen?)

Reporting
(What happened?)

Business value
School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Method 3: Analytics solution over the BIG DATA
• This is describes how to Reporting, Analyzing, Monitoring and Predict over the
BIG DATA infrastructure
Hadoop Distributed File System (Resources)
Sensor
Data

Hive
Table

HBase
Table

Hive Warehouse Data
Hive Table
Summarization
(Reporting, Analyzing,and analysis
Creation
Monitoring)
Hive Query
Language (HQL)

Direct Access To
HDFS

HBase table
management
HBase Table
Creation
(Reporting,
Analyzing,
Monitoring)

Aggregated data

Ad-hoc query

Sensor
Data

Mined
Data

Mahout Machine
Mahout Machine
LearningMining) Data
and
Learning (Data

Thrift
Server

HBase query

Mining
(Prediction)

Direct Access To
HDFS

End User (Analytic Tool)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Experimental results
Testing, Monitoring, Working
Experimental results
Experimental work focused on following main job:
1. Install and configure BIG DATA infrastructure (Clustered 2
physical machine)
2. Import sample unstructured data to the HDFS using SSH (to the
Big data infrastructure)
3. Ran sample HiveQL query, HBase query and Mahout job over
the MapReduce framework

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Running and monitoring HDFS and MapReduce framework
Sample results: HDFS and MapReduce

Master Machine:
DataNode, JobTracker,
NameNode, SNN,
TaskTracker are running

Slave Machine:
DataNode, TaskTracker
are running

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Running and working Hive warehouse
Sample results: Hive warehouse and HiveQL

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Running and working HBase table management
Sample results: HBase table management and Rest-ful web service

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Future work and Conclusion
Keep continue data mining research
Future work

Keep continue my research work about BIG DATA
and Analytic solution:
1. Validate proposed infrastructure with real world data
(Mobile call logs, Camera sensor)
2. Keep research new technology to support to our
architecture
3. Predict and analyze real data over the infrastructure
(Market basket analyze, recommendation etc…)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Conclusion
1. This is the full analytics solution for Analyzing big data
over the Hadoop Distributed File System:
-

Reporting (What happened?) (Hive)

-

Analysis (Why did it happen?) (Hive, HBase)

-

Monitoring (What happening now?) (Hive)

-

Predict (What will happen?) (Mahout)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Thank you
Questions?

Mais conteúdo relacionado

Mais procurados

IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsAlan Quayle
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Denodo
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Denodo
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Impetus Technologies
 
1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...IBM
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategySustainableEnergyAut
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 

Mais procurados (20)

IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
 
1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...
 
BD&A Day
BD&A Day BD&A Day
BD&A Day
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data Strategy
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 

Semelhante a Big Data Infrastructure and Analytics Solution on FITAT2013

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Aravindharamanan S
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohenTaldor Group
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationLeMeniz Infotech
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big DataSnapLogic
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )fmarukanda
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdfjimjones227147
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
 

Semelhante a Big Data Infrastructure and Analytics Solution on FITAT2013 (20)

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
 

Último

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Big Data Infrastructure and Analytics Solution on FITAT2013

  • 1. BIG DATA INFRASTRUCTURE AND ANALYTICS SOLUTION Erdenebayar Erdenebileg, Oyun-Erdene Namsrai School of Information Technology, National University of Mongolia erdenebayar.erdenebileg@gmail.com, oyunerdene@num.edu.mn
  • 2. Overview • • • • • • • Introduction Methods Proposed methods Experimental results Related work Discussion Future work School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 3. Introduction • BIG DATA is coming from structured and unstructured information (Web data, market purchases, Credit card transactions …) • BIG DATA: 10% is structured data, But 90% is unstructured data • Nowadays, almost every organization is facing BIG DATA problems in Mongolia. • They need to analyze and predict their valuable information School of Information Technology, National University of Mongolia Why? How? FITAT/ISPM 2013
  • 4. Why? Why we are facing BIG DATA problem?
  • 5. Big Data: 3V’s We are facing big data problem with Volume, Variety, Velocity reasons: • Transactional data is growing day by day • Storing different types of data • Need to be processed fast Real Time Data Velocity (Fast analyzing requirement) Near Real Time Periodic Batch Unstructured Video Table Database GB Web Social Data Variety MB Photo Audio Mobile TB PB (Many types of data) Data Volume (Large amount of data) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 6. How? How to solve the BIG DATA problem?
  • 7. How to solve problem? To provide BI and Analytic tool Full solution is 1. To construct BIG DATA infrastructure 2. To find and develop data transmission tools 3. To implement warehousing and mining tools and techniques 4. To provide BI and Analytic tool To implement warehousing and mining tools and techniques To construct BIG DATA infrastructure School of Information Technology, National University of Mongolia To find and develop data transmission tools Data Sources (Structured, Semi-structured, Unstructured) FITAT/ISPM 2013
  • 8. Methods and Comparison? RDBMS versus NoSQL database?
  • 9. RDMBS based infrastructure From my experimental : • Optimization requires more cost (Licenses and Server), but open source RDBMS is not fitted with license • RDBMS is not good with more than gigabyte data • It is not compatible to store unstructured data (video, audio etc…) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 10. HADOOP based infrastructure From the biggest companies experience (Facebook, Yahoo, Twitter …), main advantages are : • Distributed File System paradigm • Powerful parallel computing framework (MapReduce) • It can be store any type of data, which are structured, semi-structured, unstructured data • It is Open source and easy to integrate Hadoop related products School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 11. Brief introduction: HDFS Architecture NameNode BackupNode Balancing, Replication, Failover DataNode DataNode DataNode DataNode Data Node stores in local disks School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 12. Brief introduction : MapReduce framework Job Tracker 2010 2011 2012 2013 1. We have a big GREEN data 3. Aggregation and calculation data 2. Data will separate to the different server 4. Consolidated result to the client Task Tracker / Server Task Tracker / Server School of Information Technology, National University of Mongolia Task Tracker / Server Task Tracker / Server FITAT/ISPM 2013
  • 13. Proposed method & solution It is Hadoop and open source technologies
  • 14. Proposed method selection (Hadoop stacks) Proposed method selected with following reason: • Data should be stored in Distributed system • Aggregation and calculation should be done in parallel computing paradigm • Data type is structured and unstructured data, which are mobile call detailed record • Data size is about 20TB • Method should be Open source technologies School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 15. Full Infrastructure (3 main method) Client Machine (Jasper Business Intelligence) Client software (Reporting tool) JasperRepors Server Hive connector Machine 1 (Slave Hadoop) HBase connector Machine 2 (Master Hadoop) Clustered Big Data Infrastructure and Data Processing Physical Machine (Resources) Data Sender Data resources Sensor Data (Phone, Web Log, Camera etc…) Structured Data Big Data Infrastructure Semi -Unstructured Data School of Information Technology, National University of Mongolia Unstructured Data FITAT/ISPM 2013
  • 16. Method 1: Clustered Big Data Infrastructure and Data Processing • First task is configuring BIG DATA infrastructure with Analytic products • This configuration clustered with TWO machine (Physical machine) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 17. Method 2: Data transmission way • Data resources consist RDBMS and unstructured data (CDR file, video …) • If structured data stores such as Relational databases, we need Sqoop product for bulk data transfer • If unstructured data stores such as video and file, we need custom application development using HDFS client (SSH) • • School of Information Technology, National University of Mongolia Manual data transfer way Automatic data transfer way (Custom application) FITAT/ISPM 2013
  • 18. Method 3: Analytics solution over the BIG DATA This is the main method and trying to solve following concepts Predictive Analytics They are focusing now Prediction (What will happen?) Complexity Business Intelligence Almost every organizations are doing now Monitoring (What is happening now?) Analysis (Why did it happen?) Reporting (What happened?) Business value School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 19. Method 3: Analytics solution over the BIG DATA • This is describes how to Reporting, Analyzing, Monitoring and Predict over the BIG DATA infrastructure Hadoop Distributed File System (Resources) Sensor Data Hive Table HBase Table Hive Warehouse Data Hive Table Summarization (Reporting, Analyzing,and analysis Creation Monitoring) Hive Query Language (HQL) Direct Access To HDFS HBase table management HBase Table Creation (Reporting, Analyzing, Monitoring) Aggregated data Ad-hoc query Sensor Data Mined Data Mahout Machine Mahout Machine LearningMining) Data and Learning (Data Thrift Server HBase query Mining (Prediction) Direct Access To HDFS End User (Analytic Tool) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 21. Experimental results Experimental work focused on following main job: 1. Install and configure BIG DATA infrastructure (Clustered 2 physical machine) 2. Import sample unstructured data to the HDFS using SSH (to the Big data infrastructure) 3. Ran sample HiveQL query, HBase query and Mahout job over the MapReduce framework School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 22. Running and monitoring HDFS and MapReduce framework Sample results: HDFS and MapReduce Master Machine: DataNode, JobTracker, NameNode, SNN, TaskTracker are running Slave Machine: DataNode, TaskTracker are running School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 23. Running and working Hive warehouse Sample results: Hive warehouse and HiveQL School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 24. Running and working HBase table management Sample results: HBase table management and Rest-ful web service School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 25. Future work and Conclusion Keep continue data mining research
  • 26. Future work Keep continue my research work about BIG DATA and Analytic solution: 1. Validate proposed infrastructure with real world data (Mobile call logs, Camera sensor) 2. Keep research new technology to support to our architecture 3. Predict and analyze real data over the infrastructure (Market basket analyze, recommendation etc…) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 27. Conclusion 1. This is the full analytics solution for Analyzing big data over the Hadoop Distributed File System: - Reporting (What happened?) (Hive) - Analysis (Why did it happen?) (Hive, HBase) - Monitoring (What happening now?) (Hive) - Predict (What will happen?) (Mahout) School of Information Technology, National University of Mongolia FITAT/ISPM 2013

Notas do Editor

  1. Good afternoon, Dear professors and teachers and students,My name is Erdenebayar, who is master student of School of Information Technology, National University of MongoliaI am very appreciate to have the chance to introduce our research work. It is one of my important moment of my life. Today I will introduce my research work about Big Data infrastructure and analytics solution
  2. This is the main topics
  3. First of all, I’ll introduce why I’m researching big data and analytic work.In Mongolia ….. Nowadays …..Because I’m working on Data Management team at one Software Development company and discussed with biggest customers (Government and Business companies).
  4. Currently we are facing big data problem with Volume, Variety, Velocity reasons.First one is Volume: Transactional data is growing day by day (MB, GB, TB, PB, ZB)Second one is Variety: It mainly about data types. Lot of different devices storing different type of dataLast one is Velocity: Every business companies need to analyze and process very fast to do future business
  5. Exactly we can decide Big Data problem and Business companies need with following way:This picture shows conceptual solution for that.
  6. In this topic, I will describe some method and comparison of different methodology.We can store big data (data) on the RDBMS and NoSQL Database.
  7. Hadoop product consists two main product, which are Hadoop Distributed File System and Data Processing MapReduce Framework.I will briefly introduce these two product
  8. I would like to thank you my Professor Oyun-Erdene, She always couch and teach me all of cases.
  9. Thank you for your attention.If you have any question, I would be happy to answer