SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
UNDERSTANDING BIG DATAS
THINK BIG
MEHMET BURAK AKGÜN
First of All What is Big Data?
• BIG-DATA is collected as
non-structural from
different sources Such as
Social media sharing,
network logs, blogs,
photos, videos, log files,
etc..
• BIG-DATA is Can not be
analyzed before Enormous
size and / or diversity
information over the data
1) Why Big Data ? How did we come this point?
2) What is the Big Data’s Components?
3) How will the integration with existing systems?
4) Is there any BIG Data Platform / Applications?
• Almost all data scientists believed that too important advantages of
companies using with the ability to analyze information collected from all
sources .
• For example, retailers; according to scientists analyze they can through data
can improve their operations by 60 percent profit margin. Similar rates are
also valid for the public sector.
• According to research, The US healthcare using to data scientists can save
300 million dollars a year.
• According to estimates in 2020, from household appliances to cars and
phones and about 50 billion devices will produce data and will be silently
communicate with each other.
• For companies which wish to take decisions in the forward-looking ,
predictions should be critically correct from these «Big Data».
● Energy companies, using smart grids and meters, consisting of the use
of data relating to individual subscribers, store, handle the event.
● Banks has become 7/24 branch as according to the information they
store regarding collecting customers , recognizing the user, the
Internet branch knows the day nor to enter and accordingly the main
page menu that makes the most efficient, customers who reminders,
offers customizable interfaces, rich, fast and convenient branch .
● Hospitals saving datas on their databases for effective to provide
medical services
● This information will be stored as "Big Data"
DATA = PRODUCT
Google without using conventional methods, created their technologies of
requirenments,improving itself was a success. Google has billions of web pages
on the Google File System keeps uses Big Table in the database, using
MapReduce for processing Big Data.
See http://www.google.com/about/datacenters
Google and Amazon publishes academic articles are related to their work. Some
developers who inspired by the articles such as Doug Cutting created similar
technology as secret. These are usually the most beautiful examples of the
Apache Lucene, Left, Hadoop, HBase such projects. Each of these projects can
successfully use the Big Datas .
The second generation of the companies such as Facebook, Twitter, Linkedin,
are going a step forward by publishing them store it as open-source projects
developed for Big data. Cassandra, Hive, Pig, Voldemort, Storm, indextank
projects are examples.
Peki Büyük Veri nasıl depolanacak?
Nasıl işlenecek?
Nasıl analiz edilecek?
Volume
of Tweets
create daily.
12+ terabytes
Variety
of different
types of data.
100’s
Veracity
decision makers trust
their information.
Only 1 in 3
4 «components» for being Big
trade events
per second.
5+ million
Velocity
• It is expecting to reach 3 billion users.In August
2015,
12TB / day «log data» is producing.
The end of 2014 expected reach is 500+ million
users.
160 million users are online.
•
•
• 100 million active users. 12+ TB of data
tweets / day! ..
Social Networks and Social Work
Google process 24
Petabytes data every day
4.6 Billion mobile
phone exist
2 Billion Internet users annual traffic
in 2014 equals 667 Exabytes
Social Networks and Social Work
Users are not just people! ..
Each engine 10TB / 30 Mins data produces.
“Data generated by machines and sensors
will exceed that generated by social media by
at least a factor of 10.” *
Leon Katsnelson
Program Director, Big Data & Cloud
Computing IBM
By the way, We don't have enough space to store all this data!
We actually appear as a game company.In fact we are
data analysis company..
Ken Rudin, Zynga VP of Analytics
• Offers completely free game facilities.
• Gaining revenue by selling virtual goods.
• The monthly average has 232m active users.
• 95% of players never visited shop!
• With Using Big Data analysis they disturbed to the game world.
Four Entry Points of Big Data
Unlock Big
Data
Simplify Your
Warehouse
Preprocess
Raw Data
Analyse
Streaming
Data
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data Warehouse
BI / Reporting Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
"Companies which give importance to Social media is gaining"
In Turkey Organized by the Teradata "Big Data: Great Opportunity" themed event brought together senior
executives of the company in Istanbul. President of Teradata EMEA Hermann Wimmer, ‘companies that want to
increase their profitability by preventing competitors from social media by analyzing the data obtained in a short
time, said he helped to make quick decisions.’
Technological solution is preferring when it provides advantages
Big Data and Open Source Nested
• Open Source Community contribution made over the years
- Apache Hadoop ve Jaql, Apache Derby, Apache Geronimo, Apache
Jakarta
- Eclipse: was founded by IBM.
- Lucene ;IBM Lucene Extension Library (ILEL)
- DRDA, XQuery, SQL, XML4J, XERCES, HTTP,
Java, Linux...
• Open source IBM Softwares
– WebSphere: Apache
– Rational: Eclipse and Apache
– InfoSphere: Eclipse and Apache
• IBM’s BigInsights (Hadoop) is %100 open source software.
Hadoop
• Open Source
• Distributed
Computing
• Very Simply
MapReduce
• Connected
computers
• Ebay – 532 node 532x8 Core – 4256 Core
• Facebook – 1100 node 8800 core 12 PB
Storage
• LinkedIn – 1200 + 580 + 120 node
• Quantcast – 3000 node 3.5 PB 1PB+ daily
data
Example Hadoop Installations
Hadoop
• Components:
– DataStorage-HDFS-Distributed DiskFileSystem
– DataProcessing-MapReduce
HDFS
• Google File System
• Distributed Disk File System
• File Clustering(Usually Files sizes are over then GBs
• Fault tolerant
• Replication
• HDFS is knowledge about the location of physical.
HDFS
• Accessible by with Hadoop Shell , Java API or Web UI
• Four application node:
– NameNode – manages the metadata of the file
– SystemJob Tracker – MapReduce
– Task Tracker – MapReduce
– Data Node – InformationshidingwithNameNode
BIG DATA is not just HADOOP
Manage & store huge volume
of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security, Lifecycle
Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation
MapReduce
• Big Data Processing -‐Google
• Distributed Calculation Model
• Key - the value binary data processing
• Easy programming framework
– for use:you should improve map() ve reduce()
functions
Map: First Step
• Make ready the next processing elements
while matching with a key
– Data Cleaning
– Simply Calculating
– Split Strings
Reduce: Last Step
• Takes the list of values for Same key with
Iterators
– Filtering
– Combining
– Samping
So reduced..
The result is written to HDFS or HBase
Architecture:MapReduce
HDFS CLUSTER
K1,V1
K1,V1
…..
K1,V1
K1,V1
…
K1,V1
K1,V1
….
K1,V1
K1,V1
…
M K2,V2
K2,V2
M K2,V2
K2,V2
M K2,V2
K2,V2
K2,V2
M …
M
M
M
M
K2,V2
K2,V2
K2,V2
K2,V2
K2,V2
K2,V2
K2,V2
K2,V2
K2,V2
…
GROUPING
RANKING
K2, Iterable<V2>
K2, Iterable<V2>
K2, Iterable <V2>
R
K3,V3
K3,V3
…
Apache Pig
• Yahoo!
• Designed to easy analysis of large data sets
• Easy than Map Reduce function in Java
– Pig Lan coding
– Can be improved
• Similar with each languages
• 10 lines of Pig code may be equivalent to hundreds of lines Java
Run To Pig !
• Grunt – Shell
• Java interface
• Eclipse and IntelliJ IDEA plugins
%tweets = load ‘/today/tweets’ as (user,
mention, tweet)
%twitters = group tweets by mention
Apache Hive
• Data Warehouse Project for Hadoop
• SummarizingData
• Instant queries
• SQL-like language –HiveQL
– Allows to define custom mapper and reducer
Special Note * : Hive compiler translates the
MapReduce operations to SQL queries
HBase
Every time we do not need to relational databases
We need Scalability
Table size can be very large so We want very fast
access
Distributed Key-‐Sorted Persistent Map
HBase
Google – Big Table clon
Works on HDFS
*fault tolerance
*scalability
*MapReduces input-output
Hbase = HDFS + Random read/ write
HBase
Where to Use:
Social Media
Recommended Systems
Search Engines
Intelligence and Monitoring Services
Financial Systems - fraud
Apache Mahout
• Let us go beyond the simple analysis
• Classification - Email Spam - Call Center
• Clustering - finding new news
• Recommended Systems
Apache Mahout
Ready Algorithms::
• Classification:
• Logisac Regression
• Bayesian
• Random Forests
– Application
• Call forwarding
• Face recognition
Apache Mahout
Ready Algorithms:
Recommendation:
• Distributed Item based
• Matrix Factorizaaon
• Non distributed item/user based
– Application:
• eCommerce - Amazon
• Movie recommendation - netflix
• Music recommendation - LastFM
• Venue Recommendation - foursquare
In summary:Some Big
Data Applications
 Log Analytics (IT for IT)
 Smart Grid / Smarter Utilities
 RFID Tracking & Analytics
 Fraud / Risk Management & Modeling
 360° View of the Customer
 Warehouse Extension
 Email / Call Center Transcript
Analysis
 Call Detail Record Analysis
 IBM Watson
The IBM Big Data Platform
IBM Big Data Platform
IBM Big Data & Netezza Product Group
InfoSphere BigInsights Hadoop-based,
low latency, diverse and high-volume data
analysis
Hadoop
IBM Netezza High
Capacity Appliance
Archived questionable
structural data
IBM Netezza 1000
BI+Ad Hoc
Structured Data Analysis
IBM Smart
Analytics System
Structural analysis
of operational data
IBM Informix
Timeseries
Time-structured analytics
IBM
InfoSphere
Warehouse
High volume, structural
veri analizi
Stream Computing
InfoSphere Streams Fluid
analysis for low latency data
MPP Data Warehouse
Information
IntegrationInfoSphere Information
Server High volume data
integration and transformation
Big Data Exploration: Value & Diagram
File
Systems
Relational
Data
Content
Management
Email
CRM
Supply
Chain
ERP
RSS Feeds
Cloud
Custom
Sources
DataExplorerApplication/
Users
Find, Visualize & Understand all
big data to improve business
knowledge
• Greater efficiencies in business
processes
• New insights from combining and
analyzing data types in new ways
• Develop new business models with
resulting increased market presence
and revenue
Data Analysis in Different Diversity
Making the analysis on the data in mixed feature.
Dinamic Data Analysis
High volume flow data, ad-hoc analysis
Explore and experiment
Data on Ad-hoc analysis, discovery and
inspection data
Manage and Plans
Data Rules,Data integrity checking and
application
Very High Volume Data Analysis
The data in the PB scale appropriate price /
performance criteria for analysis
What does IBM platform?
UNDERSTANDING BIG DATAS THINK BIG

Mais conteúdo relacionado

Mais procurados

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data TechnologiesMahindra Comviva
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 

Mais procurados (20)

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 

Destaque

From Tweets To Polls Linking Text Sentiment To Public Opinion Time Series
From Tweets To Polls Linking Text Sentiment To Public Opinion Time SeriesFrom Tweets To Polls Linking Text Sentiment To Public Opinion Time Series
From Tweets To Polls Linking Text Sentiment To Public Opinion Time SeriesJay Z
 
Lielplatones amatierteātris
Lielplatones amatierteātrisLielplatones amatierteātris
Lielplatones amatierteātrisLaura Gulbe
 
Timeline understanding
Timeline understandingTimeline understanding
Timeline understandingJJ Ly
 
business finance
business financebusiness finance
business financeezraken
 
active-directory-domain-services
active-directory-domain-servicesactive-directory-domain-services
active-directory-domain-services202066
 
American Heart Association Chicago 18 11-2014
American Heart Association Chicago 18 11-2014American Heart Association Chicago 18 11-2014
American Heart Association Chicago 18 11-2014Carlo Fino
 
Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...
Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...
Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...norainabdulrahim
 
Internet Security is an Oxymoron
Internet Security is an OxymoronInternet Security is an Oxymoron
Internet Security is an OxymoronMax Nokhrin
 

Destaque (19)

From Tweets To Polls Linking Text Sentiment To Public Opinion Time Series
From Tweets To Polls Linking Text Sentiment To Public Opinion Time SeriesFrom Tweets To Polls Linking Text Sentiment To Public Opinion Time Series
From Tweets To Polls Linking Text Sentiment To Public Opinion Time Series
 
Lielplatones amatierteātris
Lielplatones amatierteātrisLielplatones amatierteātris
Lielplatones amatierteātris
 
Timeline understanding
Timeline understandingTimeline understanding
Timeline understanding
 
cv
cvcv
cv
 
Best buy
Best buy Best buy
Best buy
 
business finance
business financebusiness finance
business finance
 
active-directory-domain-services
active-directory-domain-servicesactive-directory-domain-services
active-directory-domain-services
 
American Heart Association Chicago 18 11-2014
American Heart Association Chicago 18 11-2014American Heart Association Chicago 18 11-2014
American Heart Association Chicago 18 11-2014
 
b.arab
b.arabb.arab
b.arab
 
Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...
Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...
Kefahaman yang salah mengenai Nabi dan Rasul yang tersebar di kalangan pengan...
 
Adab darjah iii
Adab darjah iiiAdab darjah iii
Adab darjah iii
 
Social Media Sentiment Analysis
Social Media Sentiment AnalysisSocial Media Sentiment Analysis
Social Media Sentiment Analysis
 
Internet Security is an Oxymoron
Internet Security is an OxymoronInternet Security is an Oxymoron
Internet Security is an Oxymoron
 
Mongodb railscamphh
Mongodb railscamphhMongodb railscamphh
Mongodb railscamphh
 
Nosql part3
Nosql part3Nosql part3
Nosql part3
 
Ratchada
RatchadaRatchada
Ratchada
 
wisakon
wisakonwisakon
wisakon
 
Pm1
Pm1Pm1
Pm1
 
Rails南蛮通事
Rails南蛮通事Rails南蛮通事
Rails南蛮通事
 

Semelhante a UNDERSTANDING BIG DATAS THINK BIG

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopSri Kanth
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 

Semelhante a UNDERSTANDING BIG DATAS THINK BIG (20)

big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big data
Big dataBig data
Big data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Big data
Big dataBig data
Big data
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data
Big DataBig Data
Big Data
 

Último

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 

Último (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 

UNDERSTANDING BIG DATAS THINK BIG

  • 1. UNDERSTANDING BIG DATAS THINK BIG MEHMET BURAK AKGÜN
  • 2. First of All What is Big Data? • BIG-DATA is collected as non-structural from different sources Such as Social media sharing, network logs, blogs, photos, videos, log files, etc.. • BIG-DATA is Can not be analyzed before Enormous size and / or diversity information over the data
  • 3. 1) Why Big Data ? How did we come this point? 2) What is the Big Data’s Components? 3) How will the integration with existing systems? 4) Is there any BIG Data Platform / Applications?
  • 4. • Almost all data scientists believed that too important advantages of companies using with the ability to analyze information collected from all sources . • For example, retailers; according to scientists analyze they can through data can improve their operations by 60 percent profit margin. Similar rates are also valid for the public sector. • According to research, The US healthcare using to data scientists can save 300 million dollars a year. • According to estimates in 2020, from household appliances to cars and phones and about 50 billion devices will produce data and will be silently communicate with each other. • For companies which wish to take decisions in the forward-looking , predictions should be critically correct from these «Big Data».
  • 5. ● Energy companies, using smart grids and meters, consisting of the use of data relating to individual subscribers, store, handle the event. ● Banks has become 7/24 branch as according to the information they store regarding collecting customers , recognizing the user, the Internet branch knows the day nor to enter and accordingly the main page menu that makes the most efficient, customers who reminders, offers customizable interfaces, rich, fast and convenient branch . ● Hospitals saving datas on their databases for effective to provide medical services ● This information will be stored as "Big Data"
  • 7. Google without using conventional methods, created their technologies of requirenments,improving itself was a success. Google has billions of web pages on the Google File System keeps uses Big Table in the database, using MapReduce for processing Big Data. See http://www.google.com/about/datacenters Google and Amazon publishes academic articles are related to their work. Some developers who inspired by the articles such as Doug Cutting created similar technology as secret. These are usually the most beautiful examples of the Apache Lucene, Left, Hadoop, HBase such projects. Each of these projects can successfully use the Big Datas . The second generation of the companies such as Facebook, Twitter, Linkedin, are going a step forward by publishing them store it as open-source projects developed for Big data. Cassandra, Hive, Pig, Voldemort, Storm, indextank projects are examples.
  • 8. Peki Büyük Veri nasıl depolanacak? Nasıl işlenecek? Nasıl analiz edilecek?
  • 9. Volume of Tweets create daily. 12+ terabytes Variety of different types of data. 100’s Veracity decision makers trust their information. Only 1 in 3 4 «components» for being Big trade events per second. 5+ million Velocity
  • 10.
  • 11. • It is expecting to reach 3 billion users.In August 2015, 12TB / day «log data» is producing. The end of 2014 expected reach is 500+ million users. 160 million users are online. • • • 100 million active users. 12+ TB of data tweets / day! .. Social Networks and Social Work
  • 12. Google process 24 Petabytes data every day 4.6 Billion mobile phone exist 2 Billion Internet users annual traffic in 2014 equals 667 Exabytes Social Networks and Social Work
  • 13. Users are not just people! ..
  • 14. Each engine 10TB / 30 Mins data produces.
  • 15. “Data generated by machines and sensors will exceed that generated by social media by at least a factor of 10.” * Leon Katsnelson Program Director, Big Data & Cloud Computing IBM
  • 16. By the way, We don't have enough space to store all this data!
  • 17. We actually appear as a game company.In fact we are data analysis company.. Ken Rudin, Zynga VP of Analytics • Offers completely free game facilities. • Gaining revenue by selling virtual goods. • The monthly average has 232m active users. • 95% of players never visited shop! • With Using Big Data analysis they disturbed to the game world.
  • 18. Four Entry Points of Big Data Unlock Big Data Simplify Your Warehouse Preprocess Raw Data Analyse Streaming Data IBM Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications
  • 19. Applications for Big Data Analytics Homeland Security FinanceSmarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Fraud and Risk Log Analysis Search Quality Retail: Churn, NBO
  • 20. "Companies which give importance to Social media is gaining" In Turkey Organized by the Teradata "Big Data: Great Opportunity" themed event brought together senior executives of the company in Istanbul. President of Teradata EMEA Hermann Wimmer, ‘companies that want to increase their profitability by preventing competitors from social media by analyzing the data obtained in a short time, said he helped to make quick decisions.’
  • 21. Technological solution is preferring when it provides advantages
  • 22.
  • 23. Big Data and Open Source Nested • Open Source Community contribution made over the years - Apache Hadoop ve Jaql, Apache Derby, Apache Geronimo, Apache Jakarta - Eclipse: was founded by IBM. - Lucene ;IBM Lucene Extension Library (ILEL) - DRDA, XQuery, SQL, XML4J, XERCES, HTTP, Java, Linux... • Open source IBM Softwares – WebSphere: Apache – Rational: Eclipse and Apache – InfoSphere: Eclipse and Apache • IBM’s BigInsights (Hadoop) is %100 open source software.
  • 24. Hadoop • Open Source • Distributed Computing • Very Simply MapReduce • Connected computers
  • 25. • Ebay – 532 node 532x8 Core – 4256 Core • Facebook – 1100 node 8800 core 12 PB Storage • LinkedIn – 1200 + 580 + 120 node • Quantcast – 3000 node 3.5 PB 1PB+ daily data Example Hadoop Installations
  • 26. Hadoop • Components: – DataStorage-HDFS-Distributed DiskFileSystem – DataProcessing-MapReduce
  • 27. HDFS • Google File System • Distributed Disk File System • File Clustering(Usually Files sizes are over then GBs • Fault tolerant • Replication • HDFS is knowledge about the location of physical.
  • 28. HDFS • Accessible by with Hadoop Shell , Java API or Web UI • Four application node: – NameNode – manages the metadata of the file – SystemJob Tracker – MapReduce – Task Tracker – MapReduce – Data Node – InformationshidingwithNameNode
  • 29. BIG DATA is not just HADOOP Manage & store huge volume of any data Hadoop File System MapReduce Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Data WarehousingStructure and control data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM Understand and navigate federated big data sources Federated Discovery and Navigation
  • 30. MapReduce • Big Data Processing -‐Google • Distributed Calculation Model • Key - the value binary data processing • Easy programming framework – for use:you should improve map() ve reduce() functions
  • 31. Map: First Step • Make ready the next processing elements while matching with a key – Data Cleaning – Simply Calculating – Split Strings
  • 32. Reduce: Last Step • Takes the list of values for Same key with Iterators – Filtering – Combining – Samping So reduced.. The result is written to HDFS or HBase
  • 33. Architecture:MapReduce HDFS CLUSTER K1,V1 K1,V1 ….. K1,V1 K1,V1 … K1,V1 K1,V1 …. K1,V1 K1,V1 … M K2,V2 K2,V2 M K2,V2 K2,V2 M K2,V2 K2,V2 K2,V2 M … M M M M K2,V2 K2,V2 K2,V2 K2,V2 K2,V2 K2,V2 K2,V2 K2,V2 K2,V2 … GROUPING RANKING K2, Iterable<V2> K2, Iterable<V2> K2, Iterable <V2> R K3,V3 K3,V3 …
  • 34. Apache Pig • Yahoo! • Designed to easy analysis of large data sets • Easy than Map Reduce function in Java – Pig Lan coding – Can be improved • Similar with each languages • 10 lines of Pig code may be equivalent to hundreds of lines Java
  • 35. Run To Pig ! • Grunt – Shell • Java interface • Eclipse and IntelliJ IDEA plugins %tweets = load ‘/today/tweets’ as (user, mention, tweet) %twitters = group tweets by mention
  • 36. Apache Hive • Data Warehouse Project for Hadoop • SummarizingData • Instant queries • SQL-like language –HiveQL – Allows to define custom mapper and reducer Special Note * : Hive compiler translates the MapReduce operations to SQL queries
  • 37. HBase Every time we do not need to relational databases We need Scalability Table size can be very large so We want very fast access Distributed Key-‐Sorted Persistent Map
  • 38. HBase Google – Big Table clon Works on HDFS *fault tolerance *scalability *MapReduces input-output Hbase = HDFS + Random read/ write
  • 39. HBase Where to Use: Social Media Recommended Systems Search Engines Intelligence and Monitoring Services Financial Systems - fraud
  • 40. Apache Mahout • Let us go beyond the simple analysis • Classification - Email Spam - Call Center • Clustering - finding new news • Recommended Systems
  • 41. Apache Mahout Ready Algorithms:: • Classification: • Logisac Regression • Bayesian • Random Forests – Application • Call forwarding • Face recognition
  • 42. Apache Mahout Ready Algorithms: Recommendation: • Distributed Item based • Matrix Factorizaaon • Non distributed item/user based – Application: • eCommerce - Amazon • Movie recommendation - netflix • Music recommendation - LastFM • Venue Recommendation - foursquare
  • 43. In summary:Some Big Data Applications  Log Analytics (IT for IT)  Smart Grid / Smarter Utilities  RFID Tracking & Analytics  Fraud / Risk Management & Modeling  360° View of the Customer  Warehouse Extension  Email / Call Center Transcript Analysis  Call Detail Record Analysis  IBM Watson
  • 44. The IBM Big Data Platform
  • 45. IBM Big Data Platform IBM Big Data & Netezza Product Group InfoSphere BigInsights Hadoop-based, low latency, diverse and high-volume data analysis Hadoop IBM Netezza High Capacity Appliance Archived questionable structural data IBM Netezza 1000 BI+Ad Hoc Structured Data Analysis IBM Smart Analytics System Structural analysis of operational data IBM Informix Timeseries Time-structured analytics IBM InfoSphere Warehouse High volume, structural veri analizi Stream Computing InfoSphere Streams Fluid analysis for low latency data MPP Data Warehouse Information IntegrationInfoSphere Information Server High volume data integration and transformation
  • 46. Big Data Exploration: Value & Diagram File Systems Relational Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources DataExplorerApplication/ Users Find, Visualize & Understand all big data to improve business knowledge • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue
  • 47. Data Analysis in Different Diversity Making the analysis on the data in mixed feature. Dinamic Data Analysis High volume flow data, ad-hoc analysis Explore and experiment Data on Ad-hoc analysis, discovery and inspection data Manage and Plans Data Rules,Data integrity checking and application Very High Volume Data Analysis The data in the PB scale appropriate price / performance criteria for analysis What does IBM platform?