SlideShare uma empresa Scribd logo
1 de 19
Deutsche Telekom Perspective on
HADOOP and Big Data Technologies
Gregory Smith
VP Solution Design and Emerging Technologies and Architectures
T-Systems North America
Gregory.Smith@t-systems.com
Deutsche Telekom and T-Systems Key Stats
 Deutsche Telekom is Europe’s largest telecom service provider
– Revenue: $75 billion
– Employees: 232,342
 T-Systems is the enterprise division of Deutsche Telekom
– Revenue: $13 billion
– Employees: 52,742
– Services: data center, end user computing, networking, systems
integration, cloud and big data
1
Overwhelmed by new data types?
2
Sentiment
data
Call detail records (CDRs)
Sensor- / machine-based data
Big Data
Transactions, Interactions, Observations
Clickstream
data
80% of new data in 2015 will land on Hadoop!
3
Hadoop is like a data warehouse,
but it can store more data, more kinds of data,
and perform more flexible analyses
Hadoop is open source
and runs on industry standard hardware,
so it's 1-2 orders of magnitude more economical
than conventional data warehouse solutions
Hadoop provides more cost effective storage, processing,
and analysis. Some existing workloads run faster, cheaper, better
Hadoop can deliver a foundation for profitable growth:
Gain value from all your data by asking bigger questions
Reference architecture view of Hadoop
4
Security
Operations
Infrastructure
Virtualization Compute / Storage / Network
WorkflowandSchedulingManagementandMonitoring
DataIsolationAccessManagementDataEncryption
Data
Integration
Data Processing
Batch Processing
Real Time/Stream
Processing
Search and Indexing
Application
Analytics Apps Transactional Apps
Analytics
Middleware
Presentation
Data Visualization and
Reporting
Clients
Real Time
Ingestion
Batch
Ingestion
Data
Connectors
Metadata
Services
Data Management
Distributed
Processing
(MapReduce)
Non-relational
DB
Structured
In-Memory
Distributed
Storage
(HDFS)
Hadoop Core
Hadoop Projects
Adjacent Categories
Example application landscape
ETL
Real Time
Streams
(Social,
sensors)
Structured and Unstructured Data
(HDFS, MAPR)
Real Time
Database
(Shark,
Gemfire, hBase,
Cassandra)
Interactive
Analytics
(Impala,
Greenplum,
AsterData,
Netezza…)
Batch
Processing
(Map-Reduce)
Real-Time
Processing
(s4, storm,
spark) Data Visualization
(Excel, Tableau)
(Informatica, Talend,
Spring Integration)
Compute Storage Networking
Cloud Infrastructure
HIVE
Machine Learning
(Mahout, etc…)
Source: Vmware
Disruptive innovations in Big Data
6
Traditional
Database
HADOOP
NoSQL
Database
MPP
Analytics
Data
Warehouse
Schema
Pre-defined, fixed
Required on write
Required on read
Store first, ask questions later
Processing
No or limited
data processing
Processing coupled with data
Parallel processing / scale
out
Data typesStructured Any, including unstructured
..
Physical
infrastructure
Enterprise grade
Mission critical
Commodity is an option
Much cheaper storage
Business
problem
Technology
Solution
Legacy BI
 Backward-looking
analysis
 Using data out of
business applications
 SAP Business Objects
 IBM Cognos
 MicroStrategy
 Structured
 Limited (2 – 3 TB in
RAM)
High Performance
BI
 Quasi-real-time
analysis
 Using data out of
business applications
 Oracle Exadata
 SAP HANA
 Structured
 Limited (2 – 8 TB in
RAM)
“Hadoop”
Ecosystem
 Forward-looking
predictive analysis
 Questions defined in
the moment, using
data from many
sources
 Hadoop distributions
 No ACID transactions
 Limited SQL Set (joins)
 Structured or
unstructured
 Unlimited (20 – 30 PB)
„True“ big data
Legacy vendor definition of big data
Selected Vendors
Data Type/Scalability
Innovations: Hadoop is 100x cheaper per TB
than in-memory appliances like HANA and
handles unstructured data as well
Innovations:
Store first, ask questions later
8
SAN Storage
3-5€/GB
Based on HDS
SAN Storage
NAS Filers
1-3€/GB
Based on Netapp
FAS-Series
White Box DAS1)
0.50-1.00€/GB
Hardware can be
self-assembled
Data Cloud1)
0.10-0.30€ /GB
Based on large
scale object
storage interfaces
Enterprise Class
Hadoop Storage
???€/GB
Based on Netapp
E-Series (NOSH)
1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions
? !Illustrative acquisition cost
Much cheaper storage
but not just storage…
Target use cases
9
IT Infrastructure
& Operations
Business
Intelligence &
Data Warehousing
Line of Business &
Business Analysts
CXO
Time to value
LongerShorter
Lower
Higher
Potential
value
 Lower Cost
Storage
 Enterprise
Data Lake
 Enterprise Data
Warehouse
Offload
 Enterprise Data
Warehouse
Archive
 ETL Offload
 Capacity Planning &
Utilization
 Customer Profiling &
Revenue Analytics
 Targeted Advertising
Analytics
 Service Renewal
Implementation
 CDR based Data
Analytics
 Fraud Management
 New
Business
Models
Cost effective storage,
processing, and analysis
Foundation for
profitable growth
Enterprise data warehouse offload use case
10
The Challenge
 Many EDWs are at capacity
 Running out of budget before
running out of relevant data
 Older data archived “in the dark”,
not available for exploration
The Solution
 Hadoop for data storage and
processing: parse, cleanse,
apply structure and transform
 Free EDW for valuable queries
 Retain all data for analysis!
Operational (44%)
ETL Processing (42%)
Analytics (11%)
DATA WAREHOUSE
Storage & Processing
HADOOP
Operational (50%)
Analytics (50%)
DATA WAREHOUSE
Cost is
1/10th
GOAL:
Platform that natively supports
mixed workloads as shared service
AVOID:
Systems separated by workload
type due to contention
From data puddles and ponds to lakes and oceans
Page 11
Big
Data
BU1
Big
Data
BU2
Big
Data
BU3
Big Data
Transactions, Interactions, Observations
Refine Explore Enrich
Batch Interactive Online
Questions to ask in designing a solution
for a particular business use case
 Which distribution is right for your needs today vs. tomorrow?
 Which distribution will ensure you stay on the main path of
open source innovation, vs. trap you in proprietary forks?
12
Security
Operations
Infrastructure
Data
Inte-
gra-
tion Data Processing
Application
Presentation
Data Management
Note: Distributions include more than just the Data Management layer but are discussed at this point in the presentation.
Not shown: Intel, Fujitsu and other distributions
 Widely adopted, mature distribution
 GTM partners include Oracle, HP, Dell, IBM
 Fully open source distribution (incl. management tools)
 Reputation for cost-effective licensing
 Strong developer ecosystem momentum
 GTM partners include Microsoft, Teradata, Informatica, Talend
 More proprietary distribution with features that appeal to some
business critical use cases
 GTM partner AWS (M3 and M5 versions only)
 Just announced by EMC, very early stage
 Differentiator is HAWQ – claims manifold query speed
improvement, full SQL instruction set
Common objections to Hadoop
13
We don’t have big
data problems
We don’t have
petabytes of data
We can’t justify
the budget for a
new project
We don’t have
the skills
We’re not sure
Hadoop is
mature/secure/
enterprise-ready
We already have a
scale-out strategy
for our EDW/ETL
MYTH:
Big Data means “Petabytes”
 Not just Volume
 Remember Variety, Velocity
 Plenty of issues at smaller scales
– Data processing
– Unstructured data
 Often warehouse volumes are small
because the technology is
expensive, not because there is no
relevant data
 Scalability is about growing with the
business, affordably and predictably
Every organization has data problems!
Hadoop can help…
14
MYTH:
Big Data means Data Science
 Hadoop solves existing problems
faster, better, cheaper than
conventional technology, e.g.
– Landing zone – capturing and
refining multi-structured data
types with unknown future value
– Cost effective platform for
retaining lots of data for long
periods of time
 Walk before you run
 Big Data Is a State of Mind
Waves of adoption – crossing the chasm
15
Wave 1
Batch Orientation
Wave 2
Interactive Orientation
Wave 3
Real-Time Orientation
 Mainstream,
70% of organizations
 Early adopters,
20% of organizations
 Bleeding edge,
10% of organizations
Adoption
today*
 Refine:
archival and
transformation
 Explore:
query and
visualization
 Enrich:
real-time decisions
Example use
cases
 Hour(s)  Minutes  SecondsResponse time
 Volume  VelocityData
characteristic
 EDW / RDBMS talk
to Hadoop
 Analytic apps talk
directly to Hadoop
 Derived data also
stored in Hadoop
Architectural
characteristic
 MapReduce, Pig,
Hive
 ODBC/JDBC, Hive  HBase, NoSQL,
SQL
Example
technologies
* Among organizations using Hadoop
Hadoop in a nutshell
 The Hadoop open source ecosystem delivers powerful innovation
in storage, databases and business intelligence, promising
unprecedented price / performance compared to existing
technologies.
 Hadoop is becoming an enterprise-wide landing zone for large
amounts of data. Increasingly it is also used to transform data.
 Large enterprises have realized substantial cost reductions by
offloading some enterprise data warehouse, ETL and archiving
workloads to a Hadoop cluster.
16
Challenges in the Enterprise
 Use-case identification and cost justification
 Cooperation and coordination from independent business units
 As Hadoop increases its footprint in business-critical areas, the
business will demand mature enterprise capabilities, e.g. DR,
snap-shots, etc.
 Hadoop’s disruptive approve is challenging strong legacy EDW
People, processes and technologies.
 Data harmonization is often a significant challenge.
 Fear of forking (think UNIX)
 Proprietary absorption (Borged)
 Audience: Hadoop address business problems, not IT problems
 Fear of data complexity (“I hated statistics class!”)
17
Questions?
gregory.smith@t-systems.com

Mais conteúdo relacionado

Mais procurados

Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Real World Orchestration & Automation
Real World Orchestration & AutomationReal World Orchestration & Automation
Real World Orchestration & AutomationSmall Cell Forum
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationRising Media Ltd.
 
Ngen oss bss - architecture evolution
Ngen oss bss - architecture evolution Ngen oss bss - architecture evolution
Ngen oss bss - architecture evolution Grazio Panico
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Data Privacy and the GDPR
Data Privacy and the GDPRData Privacy and the GDPR
Data Privacy and the GDPRDemandbase
 
Data governance
Data governanceData governance
Data governanceSambaSoup
 
end-to-end service management with ServiceNow (English)
end-to-end service management with ServiceNow (English)end-to-end service management with ServiceNow (English)
end-to-end service management with ServiceNow (English)Orange Business Services
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner
 
Driving the Telecom Digital Transformation through Open Digital Architecture
Driving the Telecom Digital Transformation through Open Digital ArchitectureDriving the Telecom Digital Transformation through Open Digital Architecture
Driving the Telecom Digital Transformation through Open Digital ArchitectureSanjeewaRavi
 
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseTelecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseNeo4j
 
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and GovernanceDAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and GovernanceDATAVERSITY
 
The Modern Telco Network: Defining The Telco Cloud
The Modern Telco Network: Defining The Telco CloudThe Modern Telco Network: Defining The Telco Cloud
The Modern Telco Network: Defining The Telco CloudMarco Rodrigues
 
MNO, MVNO, MVNA, MVNE: Different types of mobile operators
MNO, MVNO, MVNA, MVNE: Different types of mobile operatorsMNO, MVNO, MVNA, MVNE: Different types of mobile operators
MNO, MVNO, MVNA, MVNE: Different types of mobile operators3G4G
 
ERGO AI Factory - Insurer Innovation Award 2022
ERGO AI Factory - Insurer Innovation Award 2022ERGO AI Factory - Insurer Innovation Award 2022
ERGO AI Factory - Insurer Innovation Award 2022The Digital Insurer
 
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...ThousandEyes
 

Mais procurados (20)

Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Real World Orchestration & Automation
Real World Orchestration & AutomationReal World Orchestration & Automation
Real World Orchestration & Automation
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in Telecommunication
 
Ngen oss bss - architecture evolution
Ngen oss bss - architecture evolution Ngen oss bss - architecture evolution
Ngen oss bss - architecture evolution
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
SD-WAN Economics 101 - VeloCloud
SD-WAN Economics 101 - VeloCloudSD-WAN Economics 101 - VeloCloud
SD-WAN Economics 101 - VeloCloud
 
Data Privacy and the GDPR
Data Privacy and the GDPRData Privacy and the GDPR
Data Privacy and the GDPR
 
An introduction to 5G
An introduction to 5GAn introduction to 5G
An introduction to 5G
 
Data governance
Data governanceData governance
Data governance
 
end-to-end service management with ServiceNow (English)
end-to-end service management with ServiceNow (English)end-to-end service management with ServiceNow (English)
end-to-end service management with ServiceNow (English)
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Driving the Telecom Digital Transformation through Open Digital Architecture
Driving the Telecom Digital Transformation through Open Digital ArchitectureDriving the Telecom Digital Transformation through Open Digital Architecture
Driving the Telecom Digital Transformation through Open Digital Architecture
 
Microsoft Teams' Direct Routing for UCaaS and CCaaS
Microsoft Teams' Direct Routing for UCaaS and CCaaSMicrosoft Teams' Direct Routing for UCaaS and CCaaS
Microsoft Teams' Direct Routing for UCaaS and CCaaS
 
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseTelecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
 
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and GovernanceDAS Slides: Master Data Management — Aligning Data, Process, and Governance
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
 
The Modern Telco Network: Defining The Telco Cloud
The Modern Telco Network: Defining The Telco CloudThe Modern Telco Network: Defining The Telco Cloud
The Modern Telco Network: Defining The Telco Cloud
 
MNO, MVNO, MVNA, MVNE: Different types of mobile operators
MNO, MVNO, MVNA, MVNE: Different types of mobile operatorsMNO, MVNO, MVNA, MVNE: Different types of mobile operators
MNO, MVNO, MVNA, MVNE: Different types of mobile operators
 
ERGO AI Factory - Insurer Innovation Award 2022
ERGO AI Factory - Insurer Innovation Award 2022ERGO AI Factory - Insurer Innovation Award 2022
ERGO AI Factory - Insurer Innovation Award 2022
 
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
 

Destaque

Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Gunvansh Khanna
 
Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Armando Vieira
 
Smart Data for Telcos
Smart Data for TelcosSmart Data for Telcos
Smart Data for TelcosMahesh Patil
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryDataWorks Summit
 
Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...
Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...
Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...mustafa sarac
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storySpazioDati
 
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdasBig data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdasProf Dr Mehmed ERDAS
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industryParviz Iskhakov
 
Status of Web Analytics - Survey Turkey 2015
Status of Web Analytics  - Survey Turkey 2015Status of Web Analytics  - Survey Turkey 2015
Status of Web Analytics - Survey Turkey 2015Ralf Haberich
 
Offre mobile SRM par Business & Decision et Microstrategy
Offre mobile SRM par Business & Decision et MicrostrategyOffre mobile SRM par Business & Decision et Microstrategy
Offre mobile SRM par Business & Decision et MicrostrategyJean-Michel Franco
 
telecom analytics ppt
telecom analytics ppttelecom analytics ppt
telecom analytics pptvineeth menon
 
Ms big data_cloud_fr_12p_lores
Ms big data_cloud_fr_12p_loresMs big data_cloud_fr_12p_lores
Ms big data_cloud_fr_12p_loresABC Systemes
 
Big Data & BI : Retour d'expérience
Big Data & BI : Retour d'expérienceBig Data & BI : Retour d'expérience
Big Data & BI : Retour d'expérienceRomain Casteres
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomChris Chen
 

Destaque (20)

Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
Big Data Telecom
Big Data TelecomBig Data Telecom
Big Data Telecom
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
 
Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning
 
Smart Data for Telcos
Smart Data for TelcosSmart Data for Telcos
Smart Data for Telcos
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
 
Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...
Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...
Big data & advanced analytics in Telecom: A multi-billion-dollar revenue oppo...
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the story
 
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdasBig data analytics for telecom operators final use cases 0712-2014_prof_m erdas
Big data analytics for telecom operators final use cases 0712-2014_prof_m erdas
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industry
 
Status of Web Analytics - Survey Turkey 2015
Status of Web Analytics  - Survey Turkey 2015Status of Web Analytics  - Survey Turkey 2015
Status of Web Analytics - Survey Turkey 2015
 
Offre mobile SRM par Business & Decision et Microstrategy
Offre mobile SRM par Business & Decision et MicrostrategyOffre mobile SRM par Business & Decision et Microstrategy
Offre mobile SRM par Business & Decision et Microstrategy
 
telecom analytics ppt
telecom analytics ppttelecom analytics ppt
telecom analytics ppt
 
Ms big data_cloud_fr_12p_lores
Ms big data_cloud_fr_12p_loresMs big data_cloud_fr_12p_lores
Ms big data_cloud_fr_12p_lores
 
Big Data & BI : Retour d'expérience
Big Data & BI : Retour d'expérienceBig Data & BI : Retour d'expérience
Big Data & BI : Retour d'expérience
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 

Semelhante a Deutsche Telekom Perspective on HADOOP and Big Data Technologies

Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big DataStratebi
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoopaziksa
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfTarekHassan840678
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 

Semelhante a Deutsche Telekom Perspective on HADOOP and Big Data Technologies (20)

Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big Data
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoop
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Deutsche Telekom Perspective on HADOOP and Big Data Technologies

  • 1. Deutsche Telekom Perspective on HADOOP and Big Data Technologies Gregory Smith VP Solution Design and Emerging Technologies and Architectures T-Systems North America Gregory.Smith@t-systems.com
  • 2. Deutsche Telekom and T-Systems Key Stats  Deutsche Telekom is Europe’s largest telecom service provider – Revenue: $75 billion – Employees: 232,342  T-Systems is the enterprise division of Deutsche Telekom – Revenue: $13 billion – Employees: 52,742 – Services: data center, end user computing, networking, systems integration, cloud and big data 1
  • 3. Overwhelmed by new data types? 2 Sentiment data Call detail records (CDRs) Sensor- / machine-based data Big Data Transactions, Interactions, Observations Clickstream data
  • 4. 80% of new data in 2015 will land on Hadoop! 3 Hadoop is like a data warehouse, but it can store more data, more kinds of data, and perform more flexible analyses Hadoop is open source and runs on industry standard hardware, so it's 1-2 orders of magnitude more economical than conventional data warehouse solutions Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better Hadoop can deliver a foundation for profitable growth: Gain value from all your data by asking bigger questions
  • 5. Reference architecture view of Hadoop 4 Security Operations Infrastructure Virtualization Compute / Storage / Network WorkflowandSchedulingManagementandMonitoring DataIsolationAccessManagementDataEncryption Data Integration Data Processing Batch Processing Real Time/Stream Processing Search and Indexing Application Analytics Apps Transactional Apps Analytics Middleware Presentation Data Visualization and Reporting Clients Real Time Ingestion Batch Ingestion Data Connectors Metadata Services Data Management Distributed Processing (MapReduce) Non-relational DB Structured In-Memory Distributed Storage (HDFS) Hadoop Core Hadoop Projects Adjacent Categories
  • 6. Example application landscape ETL Real Time Streams (Social, sensors) Structured and Unstructured Data (HDFS, MAPR) Real Time Database (Shark, Gemfire, hBase, Cassandra) Interactive Analytics (Impala, Greenplum, AsterData, Netezza…) Batch Processing (Map-Reduce) Real-Time Processing (s4, storm, spark) Data Visualization (Excel, Tableau) (Informatica, Talend, Spring Integration) Compute Storage Networking Cloud Infrastructure HIVE Machine Learning (Mahout, etc…) Source: Vmware
  • 7. Disruptive innovations in Big Data 6 Traditional Database HADOOP NoSQL Database MPP Analytics Data Warehouse Schema Pre-defined, fixed Required on write Required on read Store first, ask questions later Processing No or limited data processing Processing coupled with data Parallel processing / scale out Data typesStructured Any, including unstructured .. Physical infrastructure Enterprise grade Mission critical Commodity is an option Much cheaper storage
  • 8. Business problem Technology Solution Legacy BI  Backward-looking analysis  Using data out of business applications  SAP Business Objects  IBM Cognos  MicroStrategy  Structured  Limited (2 – 3 TB in RAM) High Performance BI  Quasi-real-time analysis  Using data out of business applications  Oracle Exadata  SAP HANA  Structured  Limited (2 – 8 TB in RAM) “Hadoop” Ecosystem  Forward-looking predictive analysis  Questions defined in the moment, using data from many sources  Hadoop distributions  No ACID transactions  Limited SQL Set (joins)  Structured or unstructured  Unlimited (20 – 30 PB) „True“ big data Legacy vendor definition of big data Selected Vendors Data Type/Scalability Innovations: Hadoop is 100x cheaper per TB than in-memory appliances like HANA and handles unstructured data as well
  • 9. Innovations: Store first, ask questions later 8 SAN Storage 3-5€/GB Based on HDS SAN Storage NAS Filers 1-3€/GB Based on Netapp FAS-Series White Box DAS1) 0.50-1.00€/GB Hardware can be self-assembled Data Cloud1) 0.10-0.30€ /GB Based on large scale object storage interfaces Enterprise Class Hadoop Storage ???€/GB Based on Netapp E-Series (NOSH) 1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions ? !Illustrative acquisition cost Much cheaper storage but not just storage…
  • 10. Target use cases 9 IT Infrastructure & Operations Business Intelligence & Data Warehousing Line of Business & Business Analysts CXO Time to value LongerShorter Lower Higher Potential value  Lower Cost Storage  Enterprise Data Lake  Enterprise Data Warehouse Offload  Enterprise Data Warehouse Archive  ETL Offload  Capacity Planning & Utilization  Customer Profiling & Revenue Analytics  Targeted Advertising Analytics  Service Renewal Implementation  CDR based Data Analytics  Fraud Management  New Business Models Cost effective storage, processing, and analysis Foundation for profitable growth
  • 11. Enterprise data warehouse offload use case 10 The Challenge  Many EDWs are at capacity  Running out of budget before running out of relevant data  Older data archived “in the dark”, not available for exploration The Solution  Hadoop for data storage and processing: parse, cleanse, apply structure and transform  Free EDW for valuable queries  Retain all data for analysis! Operational (44%) ETL Processing (42%) Analytics (11%) DATA WAREHOUSE Storage & Processing HADOOP Operational (50%) Analytics (50%) DATA WAREHOUSE Cost is 1/10th
  • 12. GOAL: Platform that natively supports mixed workloads as shared service AVOID: Systems separated by workload type due to contention From data puddles and ponds to lakes and oceans Page 11 Big Data BU1 Big Data BU2 Big Data BU3 Big Data Transactions, Interactions, Observations Refine Explore Enrich Batch Interactive Online
  • 13. Questions to ask in designing a solution for a particular business use case  Which distribution is right for your needs today vs. tomorrow?  Which distribution will ensure you stay on the main path of open source innovation, vs. trap you in proprietary forks? 12 Security Operations Infrastructure Data Inte- gra- tion Data Processing Application Presentation Data Management Note: Distributions include more than just the Data Management layer but are discussed at this point in the presentation. Not shown: Intel, Fujitsu and other distributions  Widely adopted, mature distribution  GTM partners include Oracle, HP, Dell, IBM  Fully open source distribution (incl. management tools)  Reputation for cost-effective licensing  Strong developer ecosystem momentum  GTM partners include Microsoft, Teradata, Informatica, Talend  More proprietary distribution with features that appeal to some business critical use cases  GTM partner AWS (M3 and M5 versions only)  Just announced by EMC, very early stage  Differentiator is HAWQ – claims manifold query speed improvement, full SQL instruction set
  • 14. Common objections to Hadoop 13 We don’t have big data problems We don’t have petabytes of data We can’t justify the budget for a new project We don’t have the skills We’re not sure Hadoop is mature/secure/ enterprise-ready We already have a scale-out strategy for our EDW/ETL
  • 15. MYTH: Big Data means “Petabytes”  Not just Volume  Remember Variety, Velocity  Plenty of issues at smaller scales – Data processing – Unstructured data  Often warehouse volumes are small because the technology is expensive, not because there is no relevant data  Scalability is about growing with the business, affordably and predictably Every organization has data problems! Hadoop can help… 14 MYTH: Big Data means Data Science  Hadoop solves existing problems faster, better, cheaper than conventional technology, e.g. – Landing zone – capturing and refining multi-structured data types with unknown future value – Cost effective platform for retaining lots of data for long periods of time  Walk before you run  Big Data Is a State of Mind
  • 16. Waves of adoption – crossing the chasm 15 Wave 1 Batch Orientation Wave 2 Interactive Orientation Wave 3 Real-Time Orientation  Mainstream, 70% of organizations  Early adopters, 20% of organizations  Bleeding edge, 10% of organizations Adoption today*  Refine: archival and transformation  Explore: query and visualization  Enrich: real-time decisions Example use cases  Hour(s)  Minutes  SecondsResponse time  Volume  VelocityData characteristic  EDW / RDBMS talk to Hadoop  Analytic apps talk directly to Hadoop  Derived data also stored in Hadoop Architectural characteristic  MapReduce, Pig, Hive  ODBC/JDBC, Hive  HBase, NoSQL, SQL Example technologies * Among organizations using Hadoop
  • 17. Hadoop in a nutshell  The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared to existing technologies.  Hadoop is becoming an enterprise-wide landing zone for large amounts of data. Increasingly it is also used to transform data.  Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster. 16
  • 18. Challenges in the Enterprise  Use-case identification and cost justification  Cooperation and coordination from independent business units  As Hadoop increases its footprint in business-critical areas, the business will demand mature enterprise capabilities, e.g. DR, snap-shots, etc.  Hadoop’s disruptive approve is challenging strong legacy EDW People, processes and technologies.  Data harmonization is often a significant challenge.  Fear of forking (think UNIX)  Proprietary absorption (Borged)  Audience: Hadoop address business problems, not IT problems  Fear of data complexity (“I hated statistics class!”) 17

Notas do Editor

  1. Big Data = Transactions + Interactions + ObservationsTransactions are pretty simple to understand.  This is our ERP data.  It is the data that we maintain and track in our OLTP systems.  It can be any record of any system-to-system or human-to-system interaction.  It can even be a human-to-human interaction as long as it is captured electronically. We use a lot of this data in our analytics today.Interactions are the points in time we relate with a system.  It could be a tweet or a facebook post.  It could be an electronic or paper customer satisfaction survey.  Interactions are web logs and A/B tests.  We have a lot of this data but typically no efficient way to understand or extract value from it.Observations are interesting because they represent a world of net new data sources that we once never thought of analyzing.  It is data that was once thought of as low to medium value data or even exhaust data that was too bulky and just too expensive to store. This can be machine-generated data from sensors or web logs and clickstreams or even audio/video or largely unstructured content.  Typically, we never even thought of this data before.
  2. Presentation Layer: Application Layer:Data Processing Layer: Infrastructure Layer: Data Ingestition Layer:Security Layer:Management & Monitoring LayerAmbari: Apache Ambari is a monitoring, administration and lifecycle management project for Apache Hadoop clusters. Hadoop clusters require many inter-related components that must be installed, configured, and managed across the entire cluster. Zookeeper: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. ZooKeeper is utilized significantly by many distributed applications such as HBase. HBase: HBase is the distributed Hadoop database, scalable and able to collect and store big data volumes on HDFS. This class of database is often categorized as NoSQL (Not only SQL). Pig: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Hive: Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. HCatalog: Apache HCatalog is a table and storage management service for data created using Apache Hadoop; this provides deep integration into Enterprise Data Warehouses (E.G. Teradata) and with Data Integration tools such as Talend. MapReduce: HadoopMapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. HDFS: Hadoop Distributed File System is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid parallel computations. • Talend Open Studio for Big Data: 100% Open Source Code Generator for Graphical User Interface used for Extract Transfer Load, Extract Load Transfer for data movement, cleansing in and out of Hadoop. Data Integration Services – HDP integrates Talend Open Studio for Big Data, the leading open source data integration platform for Apache Hadoop. Included is a visual development environment and hundreds of pre-built connectors to leading applications that allow you to connect to any data source without writing code. Centralized Metadata Services – HDP includes HCatalog, a metadata and table management system that simplifies data sharing both between Hadoop applications running on the platform and between Hadoop and other enterprise data systems. HDP’s open metadata infrastructure also enables deep integration with third-party tools.
  3. Line of BusinessDemand 360 view of customer, employee, market, etc, but cannot be certain about what matters for analysisBusiness AnalystsNeed to incorporate more data into analysis, LOBs not sure what matters; want to reuse existing skill setsData Warehouse OwnersMust efficiently store, process, organize, deliver massive and growing data volume and variety while meeting SLAsIT ManagementDrive innovation, reduce costs, meet growing analytic demands of LOBs, mitigate risk of adopting new technologySystem AdministratorsEnsure stability and reliability of systemsBuyers:VP AnalyticsVP/Director Business IntelligenceVP/Director Data Warehousing/ManagementVP/Director InfrastructureVP/Director Operations/IT SystemsFaster customer acquisitionBetter product developmentBetter qualityLower churn