SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
© 2015 IBM Corporation
IBM Leads the Way with Hadoop and Spark
The Keys to Getting Value out of Big Data
© 2015 IBM Corporation2
IBM’s Framework for Getting Value out of Big Data
 All agree on Big Data’s potential, but wide divergence on how to exploit it
 Pioneers who have started to harness Big Data have benefited greatly
 We see Big Data adoption as a continual process – maturity levels
 IBM’s approach enables faster adoption of Big Data technologies
 Open source innovation (Hadoop, Spark)
 Standards-based technologies (ODP, SQL, R)
 Familiar interfaces and integration with established tools (IBM innovations)
 Advanced analytics (IBM innovations)
 IBM’s commitment for continued innovation
© 2015 IBM Corporation3
Hadoop and Spark Offer Significant Business Benefits
Operations Data Warehousing Line of Business
and Analytics
New Business
Imperatives
Big Data Maturity High
High
Low
Data-Informed
Decision Making
• Full dataset analysis
(no more sampling)
• Extract value from
non-relational data
• 360
o
view of all
enterprise data
• Exploratory analysis
and discovery
Warehouse
Modernization
• Data lake
• Data offload
• ETL offload
• Queryable archive
and staging
Lower the Cost
of Storage
Business
Transformation
• Create new business
models
• Risk-aware decision
making
• Fight fraud and
counter threats
• Optimize operations
• Attract, grow, retain
customers
Value
© 2015 IBM Corporation4
IBM Investing in Four Catalysts for Big Data Adoption
Familiar Interfaces & Integration
with Established Tools
Open Source Innovation Technical Standards
New Analytics Capabilities
© 2015 IBM Corporation5
• Reliability
• Resiliency
• Security
• Multiple data sources
• Multiple applications
• Multiple users
Hadoop Advantages
• Files
• Semi-structured
• Databases
Unlimited Scale
Enterprise Platform
Wide Range of
Data Formats
© 2015 IBM Corporation6
Hadoop MapReduce Challenges
• Need deep Java skills
• Few abstractions available for
analysts
• No in-memory framework
• Application tasks write to disk
with each cycle
• Only suitable for batch
workloads
• Rigid processing model
In-Memory Performance
Ease of Development
Combine Workflows
© 2015 IBM Corporation7
In-Memory Performance
Ease of Development
• Easier APIs
• Python, Scala, Java
• Resilient Distributed Datasets
• Unify processing
Spark Advantages
• Batch
• Interactive
• Iterative algorithms
• Micro-batch
Combine Workflows
© 2015 IBM Corporation8
Spark Libraries
Apache Spark
Spark SQL
Spark
Streaming
GraphX MLlib SparkR
© 2015 IBM Corporation9
Spark on Hadoop
Apache Spark
Spark SQL
Spark
Streaming
GraphX MLlib SparkR
Apache Hadoop-HDFS
Apache Hadoop-YARN
Resource
management
Storage
management
Compute
layer
Slave node 1 Slave node 2 Slave node n…
© 2015 IBM Corporation10
Spark on Mesos
Apache Spark
Spark SQL
Spark
Streaming
GraphX MLlib SparkR
Apache Hadoop-HDFS
Apache Mesos
Resource
management
Storage
management
Compute
layer
Slave node 1 Slave node 2 Slave node n…
© 2015 IBM Corporation11
Spark as a Service
Apache Spark
Spark SQL
Spark
Streaming
GraphX MLlib SparkR
Amazon S3
Resource
management
Storage
management
Compute
layer
Apache Hadoop-YARN
Amazon EC2 node 1 Amazon EC2 node 2 Amazon EC2 node n…
© 2015 IBM Corporation12
Spark on the Amazon Cloud
Apache Spark
Spark SQL
Spark
Streaming
GraphX MLlib SparkR
Amazon S3
Resource
management
Storage
management
Compute
layer
Apache Hadoop-YARN
Amazon EC2 node 1 Amazon EC2 node 2 Amazon EC2 node n…
© 2015 IBM Corporation13
Spark Running in Standalone Mode
Apache Spark
Spark SQL
Spark
Streaming
GraphX MLlib SparkR
Single node, with local storage
Resource
management
Storage
management
Compute
layer
© 2015 IBM Corporation14
Spark Resilient Distributed Datasets
Slave node 1
c3 d2
a2 b1
partition3
partition1
partition2
Slave node 2
c2 d1
a1 b2
partition1
partition3
Slave node 3
c1 d2
a3 b3
partition2
partition2
partition1
RDD1
RDD2
RDD3
Spark RDD
In-memory distribution
HDFS
On-disk distribution
© 2015 IBM Corporation15
The Combination: The Flexibility of Spark on a Stable Hadoop Platform
In-Memory Performance
Ease of Development
Combine Workflows
Unlimited Scale
Enterprise Platform
Wide Range of
Data Formats
© 2015 IBM Corporation16
IBM Open Platform with Apache Hadoop
 100% open source code
 Commitment to currency: “days, not months”
 Includes Spark
 Free for production use
 Decoupled Apache Hadoop from IBM analytics and data science technologies
 Production support offering available
Apache Open Source Components
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
IBM Open Platform with Apache Hadoop
© 2015 IBM Corporation17
IBM is Committed to Open Source
 Open source technologies are the base for IBM software and solutions
 IBM’s long history of deep open source commitment
 Apache Software Foundation: Founding member in 1999
 Cloud Foundry: #1 contributor; Basis for Bluemix
 OpenStack: #4 contributor; Basis for IBM’s IaaS
 Linux: #3 contributor; IBM first enterprise backer of Linux
 Hadoop/Spark: Extensive investment in open source contribution; Integration with
Analytics software
Infrastructure
Systems
Application
© 2015 IBM Corporation18
Goal of the Apache Software Foundation: Let 1000 Flowers Bloom!
• 249 Top Level Projects, 40 Incubating
• 2 Million+ Code Commits
• IBM co-founded the ASF in 1999 and
is a Gold Sponsor
• The “Apache Way” is about fostering
open innovation
• Not a standards organization
© 2015 IBM Corporation19
Apache Hadoop Ecosystem: Rapid Innovation, Few Standards
 Distributions include different projects at different version levels
“This proliferation of baskets [Hadoop distributions with different project versions] creates significant drag
when it comes to building reliable applications ... makes it harder for customers to assess which basket of
Hadoop that they need and harder for application developers to create solutions that work broadly.”
– Raymie Stata, CEO, Altiscale
 Even though the project versions match, there are interface differences
“Setting a baseline of Hive 13 so we get access to some new syntax. Try it on one, it works great... Try it
on another that says it also has Hive 13, and we get ‘syntax error’ …”
- Craig Rubendall, VP, SAS
If the industry is truly committed to developing big data technologies and solutions …, it will require an
ecosystem of providers … to create a consistent framework around which everyone can develop.
- Siki Giunta, SVP, Verizon
 The Hadoop ecosystem is evolving at a faster pace than is comfortable
“My personal speculation is that it comes from some who have been evaluating for a while seeing
change occur so rapidly that they are dropping back for another look.”
– Merv Adrian, VP, Gartner
© 2015 IBM Corporation20
Certify a standard “ODP Core” set of
open source Hadoop family projects
with specific versions and patch levels
Develop tools and methods to help
solution providers to test applications
against the ODP Core.
Contribute changes and fixes in the
ODP Core Hadoop family projects to
the ASF using the ASF processes.
http://opendataplatform.org/
© 2015 IBM Corporation21
Open Data Platform Initiative
Representation across the
Hadoop ecosystem…
• Hadoop distribution vendors
• Software application providers
• System integrators/consultants
• Hardware vendors
• Customers
… who all believe in the need for a community-based effort to
standardize Hadoop, which will lead to improved adoption
© 2015 IBM Corporation22
IBM Open Platform with Apache Hadoop adopts ODP Core
 BigInsights will include ODP certified Apache packages
 ODP will initially target core packages of a Hadoop distribution
 Packages will expand over time
 First certification set expected this summer
 Our goal for BigInsights on ODP
 Better compatibility and less testing against ecosystem software
 Enable IBM Hadoop capabilities to run on other ODP-certified Hadoop
distributions
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
ODP
* Candidate set of certified ODP modules – expected summer 2015
Apache Open Source Components
IBM Open Platform with Apache Hadoop
© 2015 IBM Corporation23
Goal of the ODP: Enable Innovation to Flourish on a Common Platform
• Complements the Apache Software
Foundation’s governance model
• ODP efforts focus on integration,
testing, and certifying a standard core
of Apache Hadoop ecosystem projects
• Fixes for issues found in ODP testing
will be contributed to the ASF projects
in line with ASF processes
• The ODP will not override or replace
any aspect of ASF governance
© 2015 IBM Corporation24
Text Analytics
POSIX Distributed File System
Multi-workload, Multi-tenant
scheduling
IBM BigInsights
Enterprise Management
Machine Learning
with Big R
Big R
IBM Open Platform with Apache Hadoop
IBM BigInsights
Data Scientist
IBM BigInsights
Analyst
Big SQL
BigSheets
Big SQL
BigSheets
for Apache Hadoop
IBM BigInsights for Apache Hadoop
© 2015 IBM Corporation25
IBM BigInsights for Apache Hadoop
IBM System zIBM PowerIntel Servers On Cloud
Your choice of infrastructure and deployment model
© 2015 IBM Corporation26
IBM Analytic Platform Capabilities
IBM Software Integrates and Extends Hadoop and Spark
Data Warehousing
PureData for Analytics, Operational Analytics
Entity Extraction and Matching
Big Match
Security and Compliance
Optim, Guardium Audit and Encryption
Data Integration and Governance
Information Server
Enterprise Search
Watson Explorer
Real-time Analytics
Streams
Predictive Modeling and Descriptive Statistics
SPSS, Big R and Scalable Algorithms
Analysis, Reporting, and Exploration
Watson Analytics, Cognos, BigSheets
Fast, ANSI SQL 2011, and Secure SQL
Big SQL
Enterprise File System
GPFS-FPO
Cluster Resource and Workload Management
Platform Symphony
Large Scale Text Extraction
Big Text
IBM Open Platform with Apache Hadoop
© 2015 IBM Corporation27
IBM Leads the Market and Analysts Agree
“IBM’s all-in bet on Apache Hadoop clearly has had the
biggest impact among developers we polled”
- Evans Big Data Survey
Leading Hadoop Distribution Leading Streaming Analytics Solution
© 2015 IBM Corporation28
IBM’s Investment in the Big Data Community
Over 250,000 benefit from free Big Data skills training
http://bigdatauniversity.com
© 2015 IBM Corporation29
Spark Technology Center
 Focal point for IBM investment in Spark
 Code contributions to Apache Spark project
 Build industry solutions using Spark
 Evangelize Spark technology inside/outside IBM
 Agile engagement across IBM divisions
 Systems: contribute enhancements to Spark core, and optimized
infrastructure (hardware/software) for Spark
 Analytics: IBM Analytics software will exploit Spark processing
 Research: build innovations above (solutions that use Spark), inside
(improvements to Spark core), and below (improve systems that execute
Spark) the Spark stack
Goal: To be the #1 contributor and adopter in the Spark ecosystem
© 2015 IBM Corporation30
The IBM Difference
 IBM delivers the foundation for Big Data – now and in the future
 Embraces open source
 Establishes standards
 Integrates with familiar interfaces and established systems
 Delivers advanced analytic capabilities
 Enables you to benefit from broader data and analytics capabilities
 Data Integration and Governance
 Predictive and Real-time Analytics
 Provides expertise to help you on your journey
 6,000 partners
 Analytics services and solution centers
Ibm leads way with hadoop and spark 2015 may 15

Mais conteúdo relacionado

Mais procurados

Data Governance with IBM Streams V4.1
Data Governance with IBM Streams V4.1Data Governance with IBM Streams V4.1
Data Governance with IBM Streams V4.1lisanl
 
Converged application solutions yujin lee(hp)
Converged application solutions yujin lee(hp)Converged application solutions yujin lee(hp)
Converged application solutions yujin lee(hp)Microsoft Singapore
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017Helene Lyon
 
Enterprise analytics journey from Helene Lyon
Enterprise analytics journey from Helene LyonEnterprise analytics journey from Helene Lyon
Enterprise analytics journey from Helene LyonHelene Lyon
 
Analytics with IMS Assets - 2017
Analytics with IMS Assets - 2017Analytics with IMS Assets - 2017
Analytics with IMS Assets - 2017Helene Lyon
 
Beyond Oracle EPM metadata synchronization
Beyond Oracle EPM metadata synchronizationBeyond Oracle EPM metadata synchronization
Beyond Oracle EPM metadata synchronizationOrchestra Networks
 
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...InSync2011
 
Grizzard webinar final 082510
Grizzard webinar final 082510Grizzard webinar final 082510
Grizzard webinar final 082510Sean O'Connell
 
Benefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica CloudBenefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica CloudAshwin V.
 
Oracle ERP Cloud implementation tips
Oracle ERP Cloud implementation tipsOracle ERP Cloud implementation tips
Oracle ERP Cloud implementation tipsPrabal Saha
 
PureApplication: System, Service, Software
PureApplication: System, Service, SoftwarePureApplication: System, Service, Software
PureApplication: System, Service, SoftwareProlifics
 
Integration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speedIntegration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speedKenneth Peeples
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 
Informatica
InformaticaInformatica
Informaticamukharji
 
Sabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large EnterpriseSabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large EnterpriseOrchestra Networks
 
Migration to Oracle ERP Cloud: A must read winning recipe for all
Migration to Oracle ERP Cloud: A must read winning recipe for allMigration to Oracle ERP Cloud: A must read winning recipe for all
Migration to Oracle ERP Cloud: A must read winning recipe for allJim Pang
 

Mais procurados (20)

Data Governance with IBM Streams V4.1
Data Governance with IBM Streams V4.1Data Governance with IBM Streams V4.1
Data Governance with IBM Streams V4.1
 
Converged application solutions yujin lee(hp)
Converged application solutions yujin lee(hp)Converged application solutions yujin lee(hp)
Converged application solutions yujin lee(hp)
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017
 
Enterprise analytics journey from Helene Lyon
Enterprise analytics journey from Helene LyonEnterprise analytics journey from Helene Lyon
Enterprise analytics journey from Helene Lyon
 
Analytics with IMS Assets - 2017
Analytics with IMS Assets - 2017Analytics with IMS Assets - 2017
Analytics with IMS Assets - 2017
 
Beyond Oracle EPM metadata synchronization
Beyond Oracle EPM metadata synchronizationBeyond Oracle EPM metadata synchronization
Beyond Oracle EPM metadata synchronization
 
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
 
Grizzard webinar final 082510
Grizzard webinar final 082510Grizzard webinar final 082510
Grizzard webinar final 082510
 
Benefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica CloudBenefits of Extending PowerCenter with Informatica Cloud
Benefits of Extending PowerCenter with Informatica Cloud
 
Oracle ERP Cloud implementation tips
Oracle ERP Cloud implementation tipsOracle ERP Cloud implementation tips
Oracle ERP Cloud implementation tips
 
PureApplication: System, Service, Software
PureApplication: System, Service, SoftwarePureApplication: System, Service, Software
PureApplication: System, Service, Software
 
Integration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speedIntegration intervention: Get your apps and data up to speed
Integration intervention: Get your apps and data up to speed
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Introduction to integration
Introduction to integrationIntroduction to integration
Introduction to integration
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 
Informatica
InformaticaInformatica
Informatica
 
Sabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large EnterpriseSabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large Enterprise
 
Migration to Oracle ERP Cloud: A must read winning recipe for all
Migration to Oracle ERP Cloud: A must read winning recipe for allMigration to Oracle ERP Cloud: A must read winning recipe for all
Migration to Oracle ERP Cloud: A must read winning recipe for all
 
Resume Pallavi Mishra as of 2017 Feb
Resume Pallavi Mishra as of 2017 FebResume Pallavi Mishra as of 2017 Feb
Resume Pallavi Mishra as of 2017 Feb
 

Semelhante a Ibm leads way with hadoop and spark 2015 may 15

IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integrationibi
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Romeo Kienzler
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...tdc-globalcode
 
20150617 spark meetup zagreb
20150617 spark meetup zagreb20150617 spark meetup zagreb
20150617 spark meetup zagrebAndrey Vykhodtsev
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsStephan Reimann
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsHortonworks
 
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleDevOps.com
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Inside Analysis
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 

Semelhante a Ibm leads way with hadoop and spark 2015 may 15 (20)

IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
 
20150617 spark meetup zagreb
20150617 spark meetup zagreb20150617 spark meetup zagreb
20150617 spark meetup zagreb
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
 
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 

Mais de IBMInfoSphereUGFR

Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3IBMInfoSphereUGFR
 
IBM InfoSphere Data Replication Products
IBM InfoSphere Data Replication ProductsIBM InfoSphere Data Replication Products
IBM InfoSphere Data Replication ProductsIBMInfoSphereUGFR
 
Présentation IBM DB2 Blu - Fabrizio DANUSSO
Présentation IBM DB2 Blu - Fabrizio DANUSSOPrésentation IBM DB2 Blu - Fabrizio DANUSSO
Présentation IBM DB2 Blu - Fabrizio DANUSSOIBMInfoSphereUGFR
 
IBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZIBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZIBMInfoSphereUGFR
 
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUX
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUXInfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUX
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUXIBMInfoSphereUGFR
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupIBMInfoSphereUGFR
 
IBM MDM 10.1 What's New - Aomar Bariz
IBM MDM 10.1  What's New - Aomar BarizIBM MDM 10.1  What's New - Aomar Bariz
IBM MDM 10.1 What's New - Aomar BarizIBMInfoSphereUGFR
 

Mais de IBMInfoSphereUGFR (8)

Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3
 
IBM Data lake
IBM Data lakeIBM Data lake
IBM Data lake
 
IBM InfoSphere Data Replication Products
IBM InfoSphere Data Replication ProductsIBM InfoSphere Data Replication Products
IBM InfoSphere Data Replication Products
 
Présentation IBM DB2 Blu - Fabrizio DANUSSO
Présentation IBM DB2 Blu - Fabrizio DANUSSOPrésentation IBM DB2 Blu - Fabrizio DANUSSO
Présentation IBM DB2 Blu - Fabrizio DANUSSO
 
IBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZIBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZ
 
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUX
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUXInfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUX
InfoSphere Streams Technical Overview - Use Cases Big Data - Jerome CHAILLOUX
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroup
 
IBM MDM 10.1 What's New - Aomar Bariz
IBM MDM 10.1  What's New - Aomar BarizIBM MDM 10.1  What's New - Aomar Bariz
IBM MDM 10.1 What's New - Aomar Bariz
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Ibm leads way with hadoop and spark 2015 may 15

  • 1. © 2015 IBM Corporation IBM Leads the Way with Hadoop and Spark The Keys to Getting Value out of Big Data
  • 2. © 2015 IBM Corporation2 IBM’s Framework for Getting Value out of Big Data  All agree on Big Data’s potential, but wide divergence on how to exploit it  Pioneers who have started to harness Big Data have benefited greatly  We see Big Data adoption as a continual process – maturity levels  IBM’s approach enables faster adoption of Big Data technologies  Open source innovation (Hadoop, Spark)  Standards-based technologies (ODP, SQL, R)  Familiar interfaces and integration with established tools (IBM innovations)  Advanced analytics (IBM innovations)  IBM’s commitment for continued innovation
  • 3. © 2015 IBM Corporation3 Hadoop and Spark Offer Significant Business Benefits Operations Data Warehousing Line of Business and Analytics New Business Imperatives Big Data Maturity High High Low Data-Informed Decision Making • Full dataset analysis (no more sampling) • Extract value from non-relational data • 360 o view of all enterprise data • Exploratory analysis and discovery Warehouse Modernization • Data lake • Data offload • ETL offload • Queryable archive and staging Lower the Cost of Storage Business Transformation • Create new business models • Risk-aware decision making • Fight fraud and counter threats • Optimize operations • Attract, grow, retain customers Value
  • 4. © 2015 IBM Corporation4 IBM Investing in Four Catalysts for Big Data Adoption Familiar Interfaces & Integration with Established Tools Open Source Innovation Technical Standards New Analytics Capabilities
  • 5. © 2015 IBM Corporation5 • Reliability • Resiliency • Security • Multiple data sources • Multiple applications • Multiple users Hadoop Advantages • Files • Semi-structured • Databases Unlimited Scale Enterprise Platform Wide Range of Data Formats
  • 6. © 2015 IBM Corporation6 Hadoop MapReduce Challenges • Need deep Java skills • Few abstractions available for analysts • No in-memory framework • Application tasks write to disk with each cycle • Only suitable for batch workloads • Rigid processing model In-Memory Performance Ease of Development Combine Workflows
  • 7. © 2015 IBM Corporation7 In-Memory Performance Ease of Development • Easier APIs • Python, Scala, Java • Resilient Distributed Datasets • Unify processing Spark Advantages • Batch • Interactive • Iterative algorithms • Micro-batch Combine Workflows
  • 8. © 2015 IBM Corporation8 Spark Libraries Apache Spark Spark SQL Spark Streaming GraphX MLlib SparkR
  • 9. © 2015 IBM Corporation9 Spark on Hadoop Apache Spark Spark SQL Spark Streaming GraphX MLlib SparkR Apache Hadoop-HDFS Apache Hadoop-YARN Resource management Storage management Compute layer Slave node 1 Slave node 2 Slave node n…
  • 10. © 2015 IBM Corporation10 Spark on Mesos Apache Spark Spark SQL Spark Streaming GraphX MLlib SparkR Apache Hadoop-HDFS Apache Mesos Resource management Storage management Compute layer Slave node 1 Slave node 2 Slave node n…
  • 11. © 2015 IBM Corporation11 Spark as a Service Apache Spark Spark SQL Spark Streaming GraphX MLlib SparkR Amazon S3 Resource management Storage management Compute layer Apache Hadoop-YARN Amazon EC2 node 1 Amazon EC2 node 2 Amazon EC2 node n…
  • 12. © 2015 IBM Corporation12 Spark on the Amazon Cloud Apache Spark Spark SQL Spark Streaming GraphX MLlib SparkR Amazon S3 Resource management Storage management Compute layer Apache Hadoop-YARN Amazon EC2 node 1 Amazon EC2 node 2 Amazon EC2 node n…
  • 13. © 2015 IBM Corporation13 Spark Running in Standalone Mode Apache Spark Spark SQL Spark Streaming GraphX MLlib SparkR Single node, with local storage Resource management Storage management Compute layer
  • 14. © 2015 IBM Corporation14 Spark Resilient Distributed Datasets Slave node 1 c3 d2 a2 b1 partition3 partition1 partition2 Slave node 2 c2 d1 a1 b2 partition1 partition3 Slave node 3 c1 d2 a3 b3 partition2 partition2 partition1 RDD1 RDD2 RDD3 Spark RDD In-memory distribution HDFS On-disk distribution
  • 15. © 2015 IBM Corporation15 The Combination: The Flexibility of Spark on a Stable Hadoop Platform In-Memory Performance Ease of Development Combine Workflows Unlimited Scale Enterprise Platform Wide Range of Data Formats
  • 16. © 2015 IBM Corporation16 IBM Open Platform with Apache Hadoop  100% open source code  Commitment to currency: “days, not months”  Includes Spark  Free for production use  Decoupled Apache Hadoop from IBM analytics and data science technologies  Production support offering available Apache Open Source Components HDFS YARN MapReduce Ambari HBase Spark Flume Hive Pig Sqoop HCatalog Solr/Lucene IBM Open Platform with Apache Hadoop
  • 17. © 2015 IBM Corporation17 IBM is Committed to Open Source  Open source technologies are the base for IBM software and solutions  IBM’s long history of deep open source commitment  Apache Software Foundation: Founding member in 1999  Cloud Foundry: #1 contributor; Basis for Bluemix  OpenStack: #4 contributor; Basis for IBM’s IaaS  Linux: #3 contributor; IBM first enterprise backer of Linux  Hadoop/Spark: Extensive investment in open source contribution; Integration with Analytics software Infrastructure Systems Application
  • 18. © 2015 IBM Corporation18 Goal of the Apache Software Foundation: Let 1000 Flowers Bloom! • 249 Top Level Projects, 40 Incubating • 2 Million+ Code Commits • IBM co-founded the ASF in 1999 and is a Gold Sponsor • The “Apache Way” is about fostering open innovation • Not a standards organization
  • 19. © 2015 IBM Corporation19 Apache Hadoop Ecosystem: Rapid Innovation, Few Standards  Distributions include different projects at different version levels “This proliferation of baskets [Hadoop distributions with different project versions] creates significant drag when it comes to building reliable applications ... makes it harder for customers to assess which basket of Hadoop that they need and harder for application developers to create solutions that work broadly.” – Raymie Stata, CEO, Altiscale  Even though the project versions match, there are interface differences “Setting a baseline of Hive 13 so we get access to some new syntax. Try it on one, it works great... Try it on another that says it also has Hive 13, and we get ‘syntax error’ …” - Craig Rubendall, VP, SAS If the industry is truly committed to developing big data technologies and solutions …, it will require an ecosystem of providers … to create a consistent framework around which everyone can develop. - Siki Giunta, SVP, Verizon  The Hadoop ecosystem is evolving at a faster pace than is comfortable “My personal speculation is that it comes from some who have been evaluating for a while seeing change occur so rapidly that they are dropping back for another look.” – Merv Adrian, VP, Gartner
  • 20. © 2015 IBM Corporation20 Certify a standard “ODP Core” set of open source Hadoop family projects with specific versions and patch levels Develop tools and methods to help solution providers to test applications against the ODP Core. Contribute changes and fixes in the ODP Core Hadoop family projects to the ASF using the ASF processes. http://opendataplatform.org/
  • 21. © 2015 IBM Corporation21 Open Data Platform Initiative Representation across the Hadoop ecosystem… • Hadoop distribution vendors • Software application providers • System integrators/consultants • Hardware vendors • Customers … who all believe in the need for a community-based effort to standardize Hadoop, which will lead to improved adoption
  • 22. © 2015 IBM Corporation22 IBM Open Platform with Apache Hadoop adopts ODP Core  BigInsights will include ODP certified Apache packages  ODP will initially target core packages of a Hadoop distribution  Packages will expand over time  First certification set expected this summer  Our goal for BigInsights on ODP  Better compatibility and less testing against ecosystem software  Enable IBM Hadoop capabilities to run on other ODP-certified Hadoop distributions HDFS YARN MapReduce Ambari HBase Spark Flume Hive Pig Sqoop HCatalog Solr/Lucene ODP * Candidate set of certified ODP modules – expected summer 2015 Apache Open Source Components IBM Open Platform with Apache Hadoop
  • 23. © 2015 IBM Corporation23 Goal of the ODP: Enable Innovation to Flourish on a Common Platform • Complements the Apache Software Foundation’s governance model • ODP efforts focus on integration, testing, and certifying a standard core of Apache Hadoop ecosystem projects • Fixes for issues found in ODP testing will be contributed to the ASF projects in line with ASF processes • The ODP will not override or replace any aspect of ASF governance
  • 24. © 2015 IBM Corporation24 Text Analytics POSIX Distributed File System Multi-workload, Multi-tenant scheduling IBM BigInsights Enterprise Management Machine Learning with Big R Big R IBM Open Platform with Apache Hadoop IBM BigInsights Data Scientist IBM BigInsights Analyst Big SQL BigSheets Big SQL BigSheets for Apache Hadoop IBM BigInsights for Apache Hadoop
  • 25. © 2015 IBM Corporation25 IBM BigInsights for Apache Hadoop IBM System zIBM PowerIntel Servers On Cloud Your choice of infrastructure and deployment model
  • 26. © 2015 IBM Corporation26 IBM Analytic Platform Capabilities IBM Software Integrates and Extends Hadoop and Spark Data Warehousing PureData for Analytics, Operational Analytics Entity Extraction and Matching Big Match Security and Compliance Optim, Guardium Audit and Encryption Data Integration and Governance Information Server Enterprise Search Watson Explorer Real-time Analytics Streams Predictive Modeling and Descriptive Statistics SPSS, Big R and Scalable Algorithms Analysis, Reporting, and Exploration Watson Analytics, Cognos, BigSheets Fast, ANSI SQL 2011, and Secure SQL Big SQL Enterprise File System GPFS-FPO Cluster Resource and Workload Management Platform Symphony Large Scale Text Extraction Big Text IBM Open Platform with Apache Hadoop
  • 27. © 2015 IBM Corporation27 IBM Leads the Market and Analysts Agree “IBM’s all-in bet on Apache Hadoop clearly has had the biggest impact among developers we polled” - Evans Big Data Survey Leading Hadoop Distribution Leading Streaming Analytics Solution
  • 28. © 2015 IBM Corporation28 IBM’s Investment in the Big Data Community Over 250,000 benefit from free Big Data skills training http://bigdatauniversity.com
  • 29. © 2015 IBM Corporation29 Spark Technology Center  Focal point for IBM investment in Spark  Code contributions to Apache Spark project  Build industry solutions using Spark  Evangelize Spark technology inside/outside IBM  Agile engagement across IBM divisions  Systems: contribute enhancements to Spark core, and optimized infrastructure (hardware/software) for Spark  Analytics: IBM Analytics software will exploit Spark processing  Research: build innovations above (solutions that use Spark), inside (improvements to Spark core), and below (improve systems that execute Spark) the Spark stack Goal: To be the #1 contributor and adopter in the Spark ecosystem
  • 30. © 2015 IBM Corporation30 The IBM Difference  IBM delivers the foundation for Big Data – now and in the future  Embraces open source  Establishes standards  Integrates with familiar interfaces and established systems  Delivers advanced analytic capabilities  Enables you to benefit from broader data and analytics capabilities  Data Integration and Governance  Predictive and Real-time Analytics  Provides expertise to help you on your journey  6,000 partners  Analytics services and solution centers