O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Big Data Infrastructure and Analytics Solution on FITAT2013

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 28 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Big Data Infrastructure and Analytics Solution on FITAT2013 (20)

Anúncio

Mais recentes (20)

Big Data Infrastructure and Analytics Solution on FITAT2013

  1. 1. BIG DATA INFRASTRUCTURE AND ANALYTICS SOLUTION Erdenebayar Erdenebileg, Oyun-Erdene Namsrai School of Information Technology, National University of Mongolia erdenebayar.erdenebileg@gmail.com, oyunerdene@num.edu.mn
  2. 2. Overview • • • • • • • Introduction Methods Proposed methods Experimental results Related work Discussion Future work School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  3. 3. Introduction • BIG DATA is coming from structured and unstructured information (Web data, market purchases, Credit card transactions …) • BIG DATA: 10% is structured data, But 90% is unstructured data • Nowadays, almost every organization is facing BIG DATA problems in Mongolia. • They need to analyze and predict their valuable information School of Information Technology, National University of Mongolia Why? How? FITAT/ISPM 2013
  4. 4. Why? Why we are facing BIG DATA problem?
  5. 5. Big Data: 3V’s We are facing big data problem with Volume, Variety, Velocity reasons: • Transactional data is growing day by day • Storing different types of data • Need to be processed fast Real Time Data Velocity (Fast analyzing requirement) Near Real Time Periodic Batch Unstructured Video Table Database GB Web Social Data Variety MB Photo Audio Mobile TB PB (Many types of data) Data Volume (Large amount of data) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  6. 6. How? How to solve the BIG DATA problem?
  7. 7. How to solve problem? To provide BI and Analytic tool Full solution is 1. To construct BIG DATA infrastructure 2. To find and develop data transmission tools 3. To implement warehousing and mining tools and techniques 4. To provide BI and Analytic tool To implement warehousing and mining tools and techniques To construct BIG DATA infrastructure School of Information Technology, National University of Mongolia To find and develop data transmission tools Data Sources (Structured, Semi-structured, Unstructured) FITAT/ISPM 2013
  8. 8. Methods and Comparison? RDBMS versus NoSQL database?
  9. 9. RDMBS based infrastructure From my experimental : • Optimization requires more cost (Licenses and Server), but open source RDBMS is not fitted with license • RDBMS is not good with more than gigabyte data • It is not compatible to store unstructured data (video, audio etc…) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  10. 10. HADOOP based infrastructure From the biggest companies experience (Facebook, Yahoo, Twitter …), main advantages are : • Distributed File System paradigm • Powerful parallel computing framework (MapReduce) • It can be store any type of data, which are structured, semi-structured, unstructured data • It is Open source and easy to integrate Hadoop related products School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  11. 11. Brief introduction: HDFS Architecture NameNode BackupNode Balancing, Replication, Failover DataNode DataNode DataNode DataNode Data Node stores in local disks School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  12. 12. Brief introduction : MapReduce framework Job Tracker 2010 2011 2012 2013 1. We have a big GREEN data 3. Aggregation and calculation data 2. Data will separate to the different server 4. Consolidated result to the client Task Tracker / Server Task Tracker / Server School of Information Technology, National University of Mongolia Task Tracker / Server Task Tracker / Server FITAT/ISPM 2013
  13. 13. Proposed method & solution It is Hadoop and open source technologies
  14. 14. Proposed method selection (Hadoop stacks) Proposed method selected with following reason: • Data should be stored in Distributed system • Aggregation and calculation should be done in parallel computing paradigm • Data type is structured and unstructured data, which are mobile call detailed record • Data size is about 20TB • Method should be Open source technologies School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  15. 15. Full Infrastructure (3 main method) Client Machine (Jasper Business Intelligence) Client software (Reporting tool) JasperRepors Server Hive connector Machine 1 (Slave Hadoop) HBase connector Machine 2 (Master Hadoop) Clustered Big Data Infrastructure and Data Processing Physical Machine (Resources) Data Sender Data resources Sensor Data (Phone, Web Log, Camera etc…) Structured Data Big Data Infrastructure Semi -Unstructured Data School of Information Technology, National University of Mongolia Unstructured Data FITAT/ISPM 2013
  16. 16. Method 1: Clustered Big Data Infrastructure and Data Processing • First task is configuring BIG DATA infrastructure with Analytic products • This configuration clustered with TWO machine (Physical machine) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  17. 17. Method 2: Data transmission way • Data resources consist RDBMS and unstructured data (CDR file, video …) • If structured data stores such as Relational databases, we need Sqoop product for bulk data transfer • If unstructured data stores such as video and file, we need custom application development using HDFS client (SSH) • • School of Information Technology, National University of Mongolia Manual data transfer way Automatic data transfer way (Custom application) FITAT/ISPM 2013
  18. 18. Method 3: Analytics solution over the BIG DATA This is the main method and trying to solve following concepts Predictive Analytics They are focusing now Prediction (What will happen?) Complexity Business Intelligence Almost every organizations are doing now Monitoring (What is happening now?) Analysis (Why did it happen?) Reporting (What happened?) Business value School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  19. 19. Method 3: Analytics solution over the BIG DATA • This is describes how to Reporting, Analyzing, Monitoring and Predict over the BIG DATA infrastructure Hadoop Distributed File System (Resources) Sensor Data Hive Table HBase Table Hive Warehouse Data Hive Table Summarization (Reporting, Analyzing,and analysis Creation Monitoring) Hive Query Language (HQL) Direct Access To HDFS HBase table management HBase Table Creation (Reporting, Analyzing, Monitoring) Aggregated data Ad-hoc query Sensor Data Mined Data Mahout Machine Mahout Machine LearningMining) Data and Learning (Data Thrift Server HBase query Mining (Prediction) Direct Access To HDFS End User (Analytic Tool) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  20. 20. Experimental results Testing, Monitoring, Working
  21. 21. Experimental results Experimental work focused on following main job: 1. Install and configure BIG DATA infrastructure (Clustered 2 physical machine) 2. Import sample unstructured data to the HDFS using SSH (to the Big data infrastructure) 3. Ran sample HiveQL query, HBase query and Mahout job over the MapReduce framework School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  22. 22. Running and monitoring HDFS and MapReduce framework Sample results: HDFS and MapReduce Master Machine: DataNode, JobTracker, NameNode, SNN, TaskTracker are running Slave Machine: DataNode, TaskTracker are running School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  23. 23. Running and working Hive warehouse Sample results: Hive warehouse and HiveQL School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  24. 24. Running and working HBase table management Sample results: HBase table management and Rest-ful web service School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  25. 25. Future work and Conclusion Keep continue data mining research
  26. 26. Future work Keep continue my research work about BIG DATA and Analytic solution: 1. Validate proposed infrastructure with real world data (Mobile call logs, Camera sensor) 2. Keep research new technology to support to our architecture 3. Predict and analyze real data over the infrastructure (Market basket analyze, recommendation etc…) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  27. 27. Conclusion 1. This is the full analytics solution for Analyzing big data over the Hadoop Distributed File System: - Reporting (What happened?) (Hive) - Analysis (Why did it happen?) (Hive, HBase) - Monitoring (What happening now?) (Hive) - Predict (What will happen?) (Mahout) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  28. 28. Thank you Questions?

Notas do Editor

  • Good afternoon, Dear professors and teachers and students,My name is Erdenebayar, who is master student of School of Information Technology, National University of MongoliaI am very appreciate to have the chance to introduce our research work. It is one of my important moment of my life. Today I will introduce my research work about Big Data infrastructure and analytics solution
  • This is the main topics
  • First of all, I’ll introduce why I’m researching big data and analytic work.In Mongolia ….. Nowadays …..Because I’m working on Data Management team at one Software Development company and discussed with biggest customers (Government and Business companies).
  • Currently we are facing big data problem with Volume, Variety, Velocity reasons.First one is Volume: Transactional data is growing day by day (MB, GB, TB, PB, ZB)Second one is Variety: It mainly about data types. Lot of different devices storing different type of dataLast one is Velocity: Every business companies need to analyze and process very fast to do future business
  • Exactly we can decide Big Data problem and Business companies need with following way:This picture shows conceptual solution for that.
  • In this topic, I will describe some method and comparison of different methodology.We can store big data (data) on the RDBMS and NoSQL Database.
  • Hadoop product consists two main product, which are Hadoop Distributed File System and Data Processing MapReduce Framework.I will briefly introduce these two product
  • I would like to thank you my Professor Oyun-Erdene, She always couch and teach me all of cases.
  • Thank you for your attention.If you have any question, I would be happy to answer

×