SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Intro to Big Data and Apache Hadoop
Dr. Amr Awadallah, CTO/Founder
@awadallah, aaa@cloudera.com
Who is Cloudera?
2
What the Enterprise
Requires
 The market-leading
Hadoop-based platform
with batch and real-time
processing frameworks
 A comprehensive suite of
system and data
management software
 Training and certification
programs
 Comprehensive support
and consulting services
Extensive Partner
Ecosystem
 Over 400 partners across
hardware, software and
services
The Leader in
Big Data
Management
 Deliver a revolutionary
data management
platform based on
Apache Hadoop
 Enable organizations to
improve operational
efficiency and Ask
Bigger Questions of all
their data
Customers & Users
Across Industries
 More production
deployments than all
other vendors combined
©2013 Cloudera, Inc. All Rights Reserved.
Data Has Changed in the Last 30 YearsDATAGROWTH
END-USER
APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED
MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
3 ©2013 Cloudera, Inc. All Rights Reserved.
What if you wanted to…
4
Data
Question
Speed
Usage
Type/Form
©2013 Cloudera, Inc. All Rights Reserved.
So what is Apache ?
Self-Healing
High-Bandwidth
Clustered Storage
Byte Streams
Fault-Tolerant
Distributed Processing
Schema-on-Read
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Input File
HDFS storage distribution
Node A Node B Node C Node D Node E
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Output File
MapReduce compute distribution
Node A Node B Node C Node D Node E
Storage
Compute
©2013 Cloudera, Inc. All Rights Reserved.5
6
Next-Gen Data Management
©2013 Cloudera, Inc. All Rights Reserved.
The Key Benefit: Agility/Flexibility
7
Schema-on-Read (Hadoop):Schema-on-Write (RDBMS):
• Prescriptive Data Modeling:
• Create static DB schema
• Transform data into RDBMS
• Query data in RDBMS format
• New columns must be added
explicitly before new data can
propagate into the system.
• Good for Known Unknowns
(Repetition)
• Descriptive Data Modeling:
• Copy data in its native format
• Create schema + parser
• Query Data in its native format
(does ETL on the fly)
• New data can start flowing any time
and will appear retroactively once the
schema/parser properly describes it.
• Good for Unknown Unknowns
(Exploration)
©2013 Cloudera, Inc. All Rights Reserved.
Scalable Technology + Scalable Development
8
Grows without requiring developers to
re-architect their algorithms/application
©2013 Cloudera, Inc. All Rights Reserved.
AUTO SCALE
Low ROB
(but still a ton of
aggregate value)
High ROB
Economics: Return on Byte
9 ©2013 Cloudera, Inc. All Rights Reserved.
Cloud Deployment
CDH: Cloudera Distribution incl. Apache Hadoop
Coordination
Data
Integration
Fast
Read/Write
Access
Batch Processing Languages
Web Console
Job Workflow
Metadata
APACHE ZOOKEEPER
APACHE FLUME,
APACHE SQOOP APACHE HBASE
APACHE PIG, APACHE HIVE
HUE
APACHE OOZIE
APACHE HIVE MetaStore
Interactive SQL
Data Mining Lib
Impala
APACHE MAHOUT
APACHE WHIRR
Build/Test:APACHEBIGTOP
Cloudera Manager Free Edition (Installation Wizard)
©2013 Cloudera, Inc. All Rights Reserved.10
Hadoop Core Kernel
MapReduce, HDFS
Connectivity
Data Processing Lib
DataFu for Pig
ODBC/JDBC/FUSE/HTTPS
Cloudera Enterprise
11 ©2013 Cloudera, Inc. All Rights Reserved.
The Cloudera Solution Stack
12
CLOUDERA
UNIVERSITY
DEVELOPER
TRAINING
ADMINISTRATOR
TRAINING
DATA SCIENCE
TRAINING
CERTIFICATION
PROGRAMS
PROFESSIONAL SERVICES
USE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPT
DEPLOYMENT CERTIFICATIONPROCESS & TEAM
DEVELOPMENT
PRODUCTION PILOTS
MANAGEMENT
SOFTWARE &
TECHNICAL SUPPORT
(SUBSCRIPTION)
CDH
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CM
CLOUDERA MANAGER
CS
CLOUDERA SUPPORT
OSS
APACHE HADOOP & OPEN SOURCE SOFTWARE
©2013 Cloudera, Inc. All Rights Reserved.
Powered by Cloudera Impala
13
BEFORE IMPALA
• With Impala:
Interactive ANSI-92 SQL queries
Native distributed query engine
Optimized for low-latency
• Provides:
Answers as fast as you can ask
Everyone can ask questions of all data
Big data storage and analytics together
WITH IMPALA
• Unified storage:
Supports HDFS and HBase
Flexible file formats and schemas
• Unified Metastore
• Unified Security
• Unified Client Interfaces:
ODBC/JDBC
SQL syntax
Hue Beeswax Web UI
BATCH PROCESSING
USER INTERFACE
REAL-TIME ACCESS
©2013 Cloudera, Inc. All Rights Reserved.
Cloudera in the Enterprise Stack
14 ©2013 Cloudera, Inc. All Rights Reserved.
Use Case: A Major Financial Institution
©2013 Cloudera, Inc. All Rights Reserved.15
The Challenge:
• Current EDW at capacity; cannot support growing data depth and width
• Performance issues in business critical apps; little room for innovation.
New solution saves tens of millions by
optimizing existing EDW for analytics
& reducing data storage costs by 99%
The Solution:
• Cloudera Enterprise offloads data
storage (S), processing (T) & some
analytics (Q) from the EDW.
• EDW resources can now be focused
on repeatable operational analytics.
• Month data scan in 4 secs vs. 4 hours
Operational
(44%)
ELT Processing
(42%)
Analytics (11%)
DATA WAREHOUSE
Analytics
Processing
Storage
CLOUDERA
Operational
(50%)
Analytics
(50%)
DATA WAREHOUSE
Beyond Data Warehousing
16
COMMUNICATIONS
Location-
based
advertising
HEALTH CARE
Patient sensors,
monitoring,
EHRs Quality
of care
LAW ENFORCEMENT
& DEFENSE
Threat analysis,
Social media
monitoring,
Photo analysis
EDUCATION
& RESEARCH
Experiment
sensor
analysis
FINANCIAL SERVICES
Risk & portfolio
analysis
New products
ON-LINE SERVICES /
SOCIAL MEDIA
People & career
matching
Website
optimization
UTILITIES
Smart Meter
analysis for
network
capacity
CONSUMER
PACKAGED GOODS
Sentiment
analysis
of what’s hot,
customer service
MEDIA /
ENTERTAINMENT
Viewers /
advertising
effectiveness
TRAVEL &
TRANSPORTATION
Sensor analysis for optimal
traffic flows
Customer
sentiment
LIFE SCIENCES
Clinical trials
Genomics
RETAIL
Consumer sentiment
Optimized
marketing
AUTOMOTIVE
Auto sensors
reporting location,
problems
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty
analysis
OIL & GAS
Drilling
exploration
sensor
analysis
©2013 Cloudera, Inc. All Rights Reserved.
17
The Road Ahead
Bringing
Compute
to Data
Bringing
Applications
to Data
2006-2012 2013-???
Flexibility
• Store any data
• Run any analysis
• Keep’s pace with the rate of change of incoming data
Scalability
• Proven growth to PBS/1,000s of nodes
• No need to rewrite queries, automatically scales
• Keep’s pace with the rate of growth of incoming data
Economics
• Cost per TB at a fraction of other options
• Keep all of your data alive in an active archive
• Powering the data beats algorithm movement
The Cloudera Platform for Big Data
18 ©2013 Cloudera, Inc. All Rights Reserved.
Dr. Amr Awadallah
CTO/Founder
@awadallah
aaa@cloudera.com

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
 
IoT Data as Service with Hadoop
IoT Data as Service with HadoopIoT Data as Service with Hadoop
IoT Data as Service with Hadoop
 
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPAlan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
 
Peak 10 Cloud Delivered Desktop
Peak 10 Cloud Delivered DesktopPeak 10 Cloud Delivered Desktop
Peak 10 Cloud Delivered Desktop
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOps
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of ChoiceTOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
 
Why and-how-to-choose-an-iot-platforms-201701
Why and-how-to-choose-an-iot-platforms-201701Why and-how-to-choose-an-iot-platforms-201701
Why and-how-to-choose-an-iot-platforms-201701
 
IoTMeetup
IoTMeetupIoTMeetup
IoTMeetup
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in Oracle
 
Big Data Analytics in Healthcare
Big Data Analytics in HealthcareBig Data Analytics in Healthcare
Big Data Analytics in Healthcare
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
CL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and Planning
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learningSplunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learning
 
Top 10 Reasons for Colocation
Top 10 Reasons for ColocationTop 10 Reasons for Colocation
Top 10 Reasons for Colocation
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 

Destaque

What is big data
What is big dataWhat is big data
What is big data
Cnu Federer
 
Putting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The EnterprisePutting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The Enterprise
DataWorks Summit
 
Intro to big data and hadoop ubc cs lecture series - g fawkes
Intro to big data and hadoop   ubc cs lecture series - g fawkesIntro to big data and hadoop   ubc cs lecture series - g fawkes
Intro to big data and hadoop ubc cs lecture series - g fawkes
gfawkesnew2
 
Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)
UzmaRuhy
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
lpaviglianiti
 

Destaque (20)

Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
What is big data
What is big dataWhat is big data
What is big data
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop - How It Works
Hadoop - How It WorksHadoop - How It Works
Hadoop - How It Works
 
What is hadoop and how it works?
What is hadoop and how it works?What is hadoop and how it works?
What is hadoop and how it works?
 
Putting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The EnterprisePutting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The Enterprise
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache Cassandra
 
Intro to big data and hadoop ubc cs lecture series - g fawkes
Intro to big data and hadoop   ubc cs lecture series - g fawkesIntro to big data and hadoop   ubc cs lecture series - g fawkes
Intro to big data and hadoop ubc cs lecture series - g fawkes
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2
 
Nuclear Weapons
Nuclear WeaponsNuclear Weapons
Nuclear Weapons
 
Hyperloop
HyperloopHyperloop
Hyperloop
 
A (very) short intro to Hadoop
A (very) short intro to HadoopA (very) short intro to Hadoop
A (very) short intro to Hadoop
 
Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)
 
Machine learning pour tous
Machine learning pour tousMachine learning pour tous
Machine learning pour tous
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data science a machine learning tour (french)
Data science a machine learning tour (french)Data science a machine learning tour (french)
Data science a machine learning tour (french)
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 

Semelhante a Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community

Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
Hortonworks
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 

Semelhante a Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community (20)

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 

Mais de TheInevitableCloud

Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
TheInevitableCloud
 
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
TheInevitableCloud
 
Cw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyCw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el mofty
TheInevitableCloud
 
Cw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend microCw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend micro
TheInevitableCloud
 
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
TheInevitableCloud
 
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawyCw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
TheInevitableCloud
 
Cw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradiCw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradi
TheInevitableCloud
 
Cw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed arabyCw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed araby
TheInevitableCloud
 
Cw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egyptCw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egypt
TheInevitableCloud
 
Cw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tellCw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tell
TheInevitableCloud
 
Cw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el moftyCw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el mofty
TheInevitableCloud
 
Cw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatCw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hat
TheInevitableCloud
 
Cw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9nersCw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9ners
TheInevitableCloud
 

Mais de TheInevitableCloud (13)

Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
 
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
 
Cw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyCw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el mofty
 
Cw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend microCw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend micro
 
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
 
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawyCw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
 
Cw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradiCw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradi
 
Cw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed arabyCw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed araby
 
Cw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egyptCw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egypt
 
Cw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tellCw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tell
 
Cw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el moftyCw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el mofty
 
Cw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatCw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hat
 
Cw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9nersCw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9ners
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community

  • 1. Intro to Big Data and Apache Hadoop Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com
  • 2. Who is Cloudera? 2 What the Enterprise Requires  The market-leading Hadoop-based platform with batch and real-time processing frameworks  A comprehensive suite of system and data management software  Training and certification programs  Comprehensive support and consulting services Extensive Partner Ecosystem  Over 400 partners across hardware, software and services The Leader in Big Data Management  Deliver a revolutionary data management platform based on Apache Hadoop  Enable organizations to improve operational efficiency and Ask Bigger Questions of all their data Customers & Users Across Industries  More production deployments than all other vendors combined ©2013 Cloudera, Inc. All Rights Reserved.
  • 3. Data Has Changed in the Last 30 YearsDATAGROWTH END-USER APPLICATIONS THE INTERNET MOBILE DEVICES SOPHISTICATED MACHINES STRUCTURED DATA – 10% 1980 2012 UNSTRUCTURED DATA – 90% 3 ©2013 Cloudera, Inc. All Rights Reserved.
  • 4. What if you wanted to… 4 Data Question Speed Usage Type/Form ©2013 Cloudera, Inc. All Rights Reserved.
  • 5. So what is Apache ? Self-Healing High-Bandwidth Clustered Storage Byte Streams Fault-Tolerant Distributed Processing Schema-on-Read 1 2 3 4 5 2 4 5 1 2 5 1 3 4 2 3 5 1 3 4 Input File HDFS storage distribution Node A Node B Node C Node D Node E 1 2 3 4 5 2 4 5 1 2 5 1 3 4 2 3 5 1 3 4 Output File MapReduce compute distribution Node A Node B Node C Node D Node E Storage Compute ©2013 Cloudera, Inc. All Rights Reserved.5
  • 6. 6 Next-Gen Data Management ©2013 Cloudera, Inc. All Rights Reserved.
  • 7. The Key Benefit: Agility/Flexibility 7 Schema-on-Read (Hadoop):Schema-on-Write (RDBMS): • Prescriptive Data Modeling: • Create static DB schema • Transform data into RDBMS • Query data in RDBMS format • New columns must be added explicitly before new data can propagate into the system. • Good for Known Unknowns (Repetition) • Descriptive Data Modeling: • Copy data in its native format • Create schema + parser • Query Data in its native format (does ETL on the fly) • New data can start flowing any time and will appear retroactively once the schema/parser properly describes it. • Good for Unknown Unknowns (Exploration) ©2013 Cloudera, Inc. All Rights Reserved.
  • 8. Scalable Technology + Scalable Development 8 Grows without requiring developers to re-architect their algorithms/application ©2013 Cloudera, Inc. All Rights Reserved. AUTO SCALE
  • 9. Low ROB (but still a ton of aggregate value) High ROB Economics: Return on Byte 9 ©2013 Cloudera, Inc. All Rights Reserved.
  • 10. Cloud Deployment CDH: Cloudera Distribution incl. Apache Hadoop Coordination Data Integration Fast Read/Write Access Batch Processing Languages Web Console Job Workflow Metadata APACHE ZOOKEEPER APACHE FLUME, APACHE SQOOP APACHE HBASE APACHE PIG, APACHE HIVE HUE APACHE OOZIE APACHE HIVE MetaStore Interactive SQL Data Mining Lib Impala APACHE MAHOUT APACHE WHIRR Build/Test:APACHEBIGTOP Cloudera Manager Free Edition (Installation Wizard) ©2013 Cloudera, Inc. All Rights Reserved.10 Hadoop Core Kernel MapReduce, HDFS Connectivity Data Processing Lib DataFu for Pig ODBC/JDBC/FUSE/HTTPS
  • 11. Cloudera Enterprise 11 ©2013 Cloudera, Inc. All Rights Reserved.
  • 12. The Cloudera Solution Stack 12 CLOUDERA UNIVERSITY DEVELOPER TRAINING ADMINISTRATOR TRAINING DATA SCIENCE TRAINING CERTIFICATION PROGRAMS PROFESSIONAL SERVICES USE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPT DEPLOYMENT CERTIFICATIONPROCESS & TEAM DEVELOPMENT PRODUCTION PILOTS MANAGEMENT SOFTWARE & TECHNICAL SUPPORT (SUBSCRIPTION) CDH INGEST STORE EXPLORE PROCESS ANALYZE SERVE CM CLOUDERA MANAGER CS CLOUDERA SUPPORT OSS APACHE HADOOP & OPEN SOURCE SOFTWARE ©2013 Cloudera, Inc. All Rights Reserved.
  • 13. Powered by Cloudera Impala 13 BEFORE IMPALA • With Impala: Interactive ANSI-92 SQL queries Native distributed query engine Optimized for low-latency • Provides: Answers as fast as you can ask Everyone can ask questions of all data Big data storage and analytics together WITH IMPALA • Unified storage: Supports HDFS and HBase Flexible file formats and schemas • Unified Metastore • Unified Security • Unified Client Interfaces: ODBC/JDBC SQL syntax Hue Beeswax Web UI BATCH PROCESSING USER INTERFACE REAL-TIME ACCESS ©2013 Cloudera, Inc. All Rights Reserved.
  • 14. Cloudera in the Enterprise Stack 14 ©2013 Cloudera, Inc. All Rights Reserved.
  • 15. Use Case: A Major Financial Institution ©2013 Cloudera, Inc. All Rights Reserved.15 The Challenge: • Current EDW at capacity; cannot support growing data depth and width • Performance issues in business critical apps; little room for innovation. New solution saves tens of millions by optimizing existing EDW for analytics & reducing data storage costs by 99% The Solution: • Cloudera Enterprise offloads data storage (S), processing (T) & some analytics (Q) from the EDW. • EDW resources can now be focused on repeatable operational analytics. • Month data scan in 4 secs vs. 4 hours Operational (44%) ELT Processing (42%) Analytics (11%) DATA WAREHOUSE Analytics Processing Storage CLOUDERA Operational (50%) Analytics (50%) DATA WAREHOUSE
  • 16. Beyond Data Warehousing 16 COMMUNICATIONS Location- based advertising HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis EDUCATION & RESEARCH Experiment sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization UTILITIES Smart Meter analysis for network capacity CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service MEDIA / ENTERTAINMENT Viewers / advertising effectiveness TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment LIFE SCIENCES Clinical trials Genomics RETAIL Consumer sentiment Optimized marketing AUTOMOTIVE Auto sensors reporting location, problems HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis OIL & GAS Drilling exploration sensor analysis ©2013 Cloudera, Inc. All Rights Reserved.
  • 17. 17 The Road Ahead Bringing Compute to Data Bringing Applications to Data 2006-2012 2013-???
  • 18. Flexibility • Store any data • Run any analysis • Keep’s pace with the rate of change of incoming data Scalability • Proven growth to PBS/1,000s of nodes • No need to rewrite queries, automatically scales • Keep’s pace with the rate of growth of incoming data Economics • Cost per TB at a fraction of other options • Keep all of your data alive in an active archive • Powering the data beats algorithm movement The Cloudera Platform for Big Data 18 ©2013 Cloudera, Inc. All Rights Reserved.