SlideShare uma empresa Scribd logo
1 de 27
Breaking the Silos:
Storage for Analytics & AI
Agenda
• IBM Software Defined Storage for Analytics & AI
• IBM AI Infrastructure Reference Architecture
• Why customers are choosing IBM Spectrum Scale storage for Hadoop?
• Popular analytics use cases with IBM Spectrum Scale storage
IBM Spectrum Scale is a flexible and scalable software defined file storage
GLOBAL Namespace
Powered by
IBM Spectrum Scale
Automated data placement and data migration
Disk Tape Shared Nothing
Cluster
Flash
Transparent Cloud
Tier
JBOD/JBOF
Spectrum Scale RAID
NFS SMBPOSIX HDFS Object
HPC
Genomics Traditional
applications
New Gen
applications
Enterprise class functionality:
Encryption
Compression
Synchronous Replication
Asynchronous Replication
Backup
Disaster Recovery
Audit Logging
4000+
clients
IBM Spectrum Scale supports file systems with sizes of tens of petabytes that contain billions of files and can be
accessed by thousands of nodes in a cluster.
4
IBM Spectrum Scale – Deployment models
Software
Install software on your own
choice of Industry standard x86/
POWER servers
Pre-built Systems
Elastic Storage Server(ESS)
with Spectrum Scale SW RAID
Cloud Services
Spectrum Scale can be deployed
on IBM Cloud and Amazon Web
Services (AWS)
Spectrum Scale
4
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
EXP3524
8
9
16
17
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
EXP3524
8
9
16
17
EXP3524
8
9
16
17
EXP3524
8
9
16
17
5
 #1 Pure Open Source Hadoop Distribution
 1300+ customers and 2100+ ecosystem partners
 Employs the original architects, developers and
operators of Hadoop from Yahoo!
 Best-in-class 24x7 customer support
 Leading professional services and training
 #1 SQL Engine for complex, analytical workloads
 #1 Data Science Platform (Source: Gartner)
 Leader in On-premise and Hybrid Cloud solutions
 OpenPOWER performance leadership
 Software defined storage with unmatched scalability
+
The Power of ONEOne enterprise end-to-end solution for big data
#1 open source Hadoop platform + IBM’s leading value adds
IBM Systems:
A Reference
Architecture for
AI Infrastructure
June 2018
7
June 19th Announcement
IBM Systems is announcing IBM PowerAI Enterprise and an
AI infrastructure Reference Architecture for on-premises AI deployments.
IBM Systems is addressing the challenges organizations face experimenting
with PoCs, growing into multitenant, production systems, then expanding to
enterprise scale, all while integrating into an organization’s existing IT
infrastructure.
With a set of easy to use, integrated software tools built on optimized,
accelerated hardware, the architecture enables organizations to jump start AI
and Deep Learning projects, speeds time to model accuracy and provides
Enterprise-grade security, interoperability and support.
8
Autonomous
driving
Accident
avoidance
Location-based
advertising
Sentiment analysis of
what’s hot, problems
$
Market prediction
Fraud/Risk
Experiment sensor
analysis
Drilling exploration
sensor analysis
Consumer
sentiment Analysis
Sensor analysis for
optimal traffic flows
Smart Meter
analysis
for network
capacity,
Threat analysis - social
media monitoring, video
Surveillance
Clinical trials, drug
discovery,
Genomics
People & career
matching
Patient sensors,
medical image interpretation
Captioning,
search, real time
translation
Mfg. quality
Warranty
analysis
AI Examples in Every Industry
9
Data Science is a Team Sport
and Iterative
Extract
Data
Build
models
Prepare
Data
Train
Models
Evaluate Deploy
Use
models
Monetize
$$$
Monitor
Building cognitive apps using deep learning requires multiple skillsets
Connected infrastructure for data, development and iteration.
A common data platform and workflow is crucial for enterprise success.
Biz Analyst Dev OpsData Engineer App DeveloperDev OpsData Scientist
IT Supports & Services the Complete Workflow
10
91% I&O Leaders
Across Inquiries
Cited "Data" as a
Main Inhibitor of AI
Initiatives.
This is not easy…
Source: Gartner "AI State of The Market - and Where HPC intersects”
11
Data Source
New Data
Years of
Data
Work flow and data flow is complex
Inference
Trained Model
Deploy in
Production using
Trained Model
Seconds
to results
Data Preparation
Data Cleansing &
Pre-Processing
Training
Dataset
Testing
Dataset
Weeks &
months
Heavy IO
Iterate
Build, Train, Optimize Models
AI Deep Learning
Frameworks
(Tensorflow & Caffe)
Monitor &
Advise
Instrumentation
Distributed &
Elastic Deep Learning
Parallel Hyper-Parameter
Search & Optimization
Network
Models
Hyper-
Parameters
Days & weeks
Traditional
Business
IoT &
Sensors
Collaboration
Partners
Mobile Apps &
Social Media
Legacy
Training
Dataset
Testing
Dataset
12
Production
Data
Sensor Data
Data from
collaboration
partners
Data from mobile
app and social
media
Legacy Data
Data Preparation
Pre-Processing
Data Source Model Training Inference
AI Deep Learning
Frameworks
(Tensorflow & IBM Caffe)
Monitor &
Advise
Instrumentation
Iterate
Distributed & Elastic Deep
Learning (Fabric)
Parallel Hyper-Parameter
Search & Optimization
Network
Models
Hyper-
Parameters
Trained Model
Deploy in
Production using
Trained Model
New Data
Years of
Data
Hours of
preparation
Weeks and
months of
training
Seconds to
results
Data requirements varies significantly
Data Variety
Data Quantity
Geo-dispersed,
On-perm & Cloud
Data Efficiency
Data Quality
Data
Gravity
HDFS/Spark
Model Velocity
Workflow Integration
Data Access Density
Data Velocity : Low latency
High throughput
Data Caching
Data Security, Governance and Resilience
13© Copyright IBM Corporation 2017
IBM AI Architecture from Experimentation to Expansion
Experimentation
Single Tenant
Stabilization & Production
Secure Multitenant
Expansion
Enterprise Scale / Multiple Lines of Business
Data
Scientist’s
workstations
Internal
SAS
drives &
NVM’s
IBM
Power
Systems
AC922
High-Speed
Network
Subsystem
Existing
Organization
Infrastructure
IBM
Elastic
Storage
Server
(ESS)
Training & Inference Cluster
IBM Power Systems AC922, LC921 & LC922
Master & Failover Master
Nodes IBM Power Systems
LC921 & LC922
Login Nodes
IBM Power Systems
LC921 & LC922
Training Cluster
IBM Power Systems AC922
IBM
Elastic
Storage
Server
(ESS)
High-Speed
Network
Subsystem
Existing
Organization
Infrastructure
One software stack from experimentation to expansion
IBM PowerAI Enterprise
Red Hat Enterprise Linux (RHEL)
IBM Power System & x86 Servers
Services&
Support
IBM Spectrum Scale / IBM Elastic Storage Server (ESS)
AI Adoption Cycle
–Single node
–Single user/tenant
–Small scale data
–Algorithm prototyping,
hyperparameter optimization
Experimentation Production Expansion
–Expanding use cases
–Multi-node
–Cluster
–Medium scale data
–Security
–Data Science Shared
Service
–Multitenant
–Upstream data pipeline
–Model iteration
–Scalable Inference
14
AI Data Journey
–Single node
–Single user/tenant
–Small scale data
–Algorithm prototyping,
hyperparameter optimization
Experimentation Production Expansion
–Expanding use cases
–Multi-node
–Cluster
–Medium scale data
–Security
–Data Science Shared
Service
–Multitenant
–Upstream data pipeline
–Model iteration
–Scalable Inference
15
Hadoop and Spark are the choice for data pipeline.
16
Why customers are choosing
IBM Spectrum Scale Storage
with Hadoop?
17
Reduce datacenter footprint and get
faster ingest with in-place analytics
Data
NFS
SMB POSIX Object
HDFS API
Access to the data using any of the industry standard protocols.
No need to maintain separate copies for different applications.
Grow storage independent of compute with the best data
protection technology
Grow storage independent of compute with pre-integrated ESS system. Eliminate
need for 3 copies of data with SW RAID, Faster disk rebuilds, No data corruption
Extreme scalability with
parallel file system architecture
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Data + Metadata
Node
Scale to billions of files.
No centralized metadata node bottleneck.
Global namespace that spans geographies
Stretch clusters and Active – Active replicas of data for real time global collaboration
ESS
Why customers are choosing Spectrum Scale storage for Hadoop?
Faster ingest, unmatched scalability, up-to 60% less storage footprint for Hadoop workloads
1 2
3 4
18
Data Lake: Up to 60% less storage footprint
| 18
Ingest
ObjectFile
Direct Access
POSIX
Raw Data
Analysis
Less hardware
• HDFS Shared Nothing: 15 PB of physical for 5 PB usable
• Spectrum Scale on ESS: 6.5 PB of physical for 5 PB usable
Analytics in place
• No need to maintain copies of data for traditional applications
and analytics applications
Multi-purpose shared data lake
• Shared by Hadoop and many other use cases
19
HDP on Power with Elastic Storage Server
• Improve TCO
Up to 3X reduction of storage and compute
infrastructure moving to Power Systems and Elastic
Storage Server vs commodity scale out x86. Less
infrastructure means reduced costs in many areas
(Energy, cooling, server administration, floor space, SW licensing)
• Position for future growth, avoid hitting the
data center wall with cluster sprawl
Separating storage from compute enables the
selection of the best compute node for the workload
– and Power has the greatest range of options
E E
InfiniBand (RDMA) / 40 GigE / 10 GigE
IBM Power nodes running
HDP services and Spectrum
Scale client
ESS
HDP HDP HDP HDP HDP
ESS Elastic Storage Server(Powered by Spectrum Scale)
C C C C CC
C Spectrum Scale Client + HDFS Connector
20
Popular analytics use cases with
IBM Spectrum Scale storage
21
Challenges …
 Expensive EDW (Enterprise Data Warehouse) setups
 Silos of infrastructure for various analytics workflows
 Multiple copies of the same data
 Time consuming data ingest cycle
 Unmanageable analytics cluster sprawl
22
Popular use-cases that help eliminate analytics silos
I. EDW Optimization
Optimize data warehouse by shifting right workload to Hadoop
Reduce cost & improve efficiency
II. Integrated HPC and Hadoop
Efficiently transform data into insights with single data lake for HPC & Hadoop
Faster & better insights
IV. Unified Analytics Workflows
Single data lake for Hadoop and non-Hadoop analytics
Improve data governance
III. Hadoop Storage Tiering
Disaggregate storage and compute for better utilization
Reduce cluster sprawl
23
I. EDW Optimization
Optimize data warehouse by shifting right workload to Hadoop
Archive Data away from EDW
- Move cold or rarely used data to Hadoop
as active archive
- Store more of data longer
Offload costly ETL process
- Free your EDW to perform high-value functions like
analytics & operations, not ETL
- Use Hadoop for advanced ETL
Optimize the value of your EDW
- Use Hadoop to refine new data sources, such as web and
machine data for new analytical context
Reduce migration effort & skillset gap
- Use existing investment in Oracle/DB2/Netezza skills
- BigSQL allows you to migrate applications without major
code rewrites and additional SQL development
Control cluster sprawl
- Grow storage independent of compute with ESS
- POWER servers deliver 1.7x throughput compared to
Hortonworks on x86
- Up-to 60% less storage footprint
Enterprise Data
Warehouse
DB2 / Dashdb / Oracle /
Netezza / Teradata …
Hot Data
Hadoop
Cold Data, Archive Data,
New Sources
HDP On Power
SQL Interface BigSQL On Power
Analytics Software
(Business Analytics, Visualization like SAS grid, SAP HANA etc)
ESS for
Speed
ESS for
Data Lake
Spectrum Scale
A Financial Services company in Europe is optimizing their DB2 warehouse using
HDP, BigSQL, Power, ESS combination.
New Data Sources
Streaming / IOT data
HDF On Power
24
II. Integrated HPC and Hadoop
Efficiently transform data into insights with single data lake for HPC & Hadoop
NASA and a Healthcare company from middle east are using common Spectrum Scale data
lake to efficiently get insights using traditional HPC and Hadoop analytics.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Traditional HPC
Open, Read, Write, MPI, C-code,
Python etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
HDP On Power
NFS/SMB/Object
Interface
Spectrum Scale
Protocol Node
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
Extend HPC to add modern analytics capabilities
- Efficient movement of data between modern and traditional
applications with common namespace
- Spectrum Scale in-place analytics capabilities enable
accessing the same data using NFS/SMB/Object/POSIX/HDFS
without requiring any modifications to the data
- Improve data reliability and governance with single data lake
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage gives super
fast ingest ability
- Common namespace enables running some edge analytics at
the ingest layer as well
Control cluster sprawl
- Grow storage independent of compute with ESS
- Up-to 60% less storage footprint
- POWER servers deliver 1.7x throughput compared to
Hortonworks on x86
25
III. Hadoop Storage Tiering
Disaggregate storage and compute for better utilization
An Indian conglomerate is implementing ESS based ingest tier to their existing
Hadoop data-lake.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
New
Hadoop cluster
HDP On PowerESS for
Speed
Fast Ingest
Existing
Hadoop cluster
Native
HDFS Storage
HDFS
Interface
HDFS
Interface
Use ESS as Ingest Tier to existing Hadoop setup
- Get super-fast ingest with POSIX and Flash storage
- Run in-place analytics directly on tier1 storage
Use ESS as Secondary Tier to existing Hadoop setup
- Grow storage independent of compute
- Reduce cluster sprawl
- Share data between old & new Hadoop setups
- Avoid copying data between the two clusters with a common
data lake
- Introduce new IBM Power-based HDP clusters for demanding
next gen analytics workflows on the same data lake
26
IV. Unified Analytics Workflows
Single data lake for Hadoop and non-Hadoop analytics
A bank in South Africa is implementing HDP and SAS grid software on a common
ESS based infrastructure.
ESS for
Data Lake
POSIX
Interface
HDFS
Interface
Other Analytics
Platforms
SAS grid, SAP
HANA/Vora, ML/DL,
Conductor with
Spark etc
Hadoop
Map-Reduce,
Spark, ML/DL etc
HDP On Power
ESS for
Speed
Fast Ingest
POSIX
Interface
Spectrum Scale
All analytics workflows on common storage
- Improve data reliability and governance with single data lake for
Hadoop and non-Hadoop analytics setups
- Build ML/DL workflows that use multiple analytics platforms
- Share data across analytics workflows as appropriate
Ingest fast and improve time to insight
- POSIX interface combined with ESS Flash storage gives super fast
ingest ability
Control cluster sprawl
- Grow storage independent of compute with ESS
- Up-to 60% less storage footprint
- POWER servers deliver 1.7x throughput compared to Hortonworks
on x86
27
Thank You

Mais conteúdo relacionado

Mais procurados

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at WalgreensDataWorks Summit
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeDataWorks Summit
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingDataWorks Summit
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Eric Rice
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipelineDataWorks Summit
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronDataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...DataWorks Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudDataWorks Summit
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 

Mais procurados (20)

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Hadoop Journey at Walgreens
Hadoop Journey at WalgreensHadoop Journey at Walgreens
Hadoop Journey at Walgreens
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturing
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipeline
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache Metron
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 

Semelhante a Breaking the Silos: Storage for Analytics & AI

AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015Doug O'Flaherty
 
Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleAbhishek Sood
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsthinkASG
 
Spectrum Scale final
Spectrum Scale finalSpectrum Scale final
Spectrum Scale finalJoe Krotz
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 
IBM Storage for AI and Big Data
IBM Storage for AI and Big DataIBM Storage for AI and Big Data
IBM Storage for AI and Big DataTony Pearson
 
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cS110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cTony Pearson
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems
 
Achieving Storage Agility and Improved Economics
Achieving Storage Agility and Improved EconomicsAchieving Storage Agility and Improved Economics
Achieving Storage Agility and Improved EconomicsPatrick Berghaeger
 
Watson christofer j_180208
Watson christofer j_180208Watson christofer j_180208
Watson christofer j_180208IBM Sverige
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
The IBM Data Engine for NoSQL on IBM Power Systems™
The IBM Data Engine for NoSQL on IBM Power Systems™The IBM Data Engine for NoSQL on IBM Power Systems™
The IBM Data Engine for NoSQL on IBM Power Systems™IBM Power Systems
 

Semelhante a Breaking the Silos: Storage for Analytics & AI (20)

AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum Scale
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deployments
 
Spectrum Scale final
Spectrum Scale finalSpectrum Scale final
Spectrum Scale final
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
IBM Storage for AI and Big Data
IBM Storage for AI and Big DataIBM Storage for AI and Big Data
IBM Storage for AI and Big Data
 
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cS110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909c
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
 
Achieving Storage Agility and Improved Economics
Achieving Storage Agility and Improved EconomicsAchieving Storage Agility and Improved Economics
Achieving Storage Agility and Improved Economics
 
Watson christofer j_180208
Watson christofer j_180208Watson christofer j_180208
Watson christofer j_180208
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
The IBM Data Engine for NoSQL on IBM Power Systems™
The IBM Data Engine for NoSQL on IBM Power Systems™The IBM Data Engine for NoSQL on IBM Power Systems™
The IBM Data Engine for NoSQL on IBM Power Systems™
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Breaking the Silos: Storage for Analytics & AI

  • 1. Breaking the Silos: Storage for Analytics & AI
  • 2. Agenda • IBM Software Defined Storage for Analytics & AI • IBM AI Infrastructure Reference Architecture • Why customers are choosing IBM Spectrum Scale storage for Hadoop? • Popular analytics use cases with IBM Spectrum Scale storage
  • 3. IBM Spectrum Scale is a flexible and scalable software defined file storage GLOBAL Namespace Powered by IBM Spectrum Scale Automated data placement and data migration Disk Tape Shared Nothing Cluster Flash Transparent Cloud Tier JBOD/JBOF Spectrum Scale RAID NFS SMBPOSIX HDFS Object HPC Genomics Traditional applications New Gen applications Enterprise class functionality: Encryption Compression Synchronous Replication Asynchronous Replication Backup Disaster Recovery Audit Logging 4000+ clients IBM Spectrum Scale supports file systems with sizes of tens of petabytes that contain billions of files and can be accessed by thousands of nodes in a cluster.
  • 4. 4 IBM Spectrum Scale – Deployment models Software Install software on your own choice of Industry standard x86/ POWER servers Pre-built Systems Elastic Storage Server(ESS) with Spectrum Scale SW RAID Cloud Services Spectrum Scale can be deployed on IBM Cloud and Amazon Web Services (AWS) Spectrum Scale 4 ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage EXP3524 8 9 16 17 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 EXP3524 8 9 16 17 EXP3524 8 9 16 17
  • 5. 5  #1 Pure Open Source Hadoop Distribution  1300+ customers and 2100+ ecosystem partners  Employs the original architects, developers and operators of Hadoop from Yahoo!  Best-in-class 24x7 customer support  Leading professional services and training  #1 SQL Engine for complex, analytical workloads  #1 Data Science Platform (Source: Gartner)  Leader in On-premise and Hybrid Cloud solutions  OpenPOWER performance leadership  Software defined storage with unmatched scalability + The Power of ONEOne enterprise end-to-end solution for big data #1 open source Hadoop platform + IBM’s leading value adds
  • 6. IBM Systems: A Reference Architecture for AI Infrastructure June 2018
  • 7. 7 June 19th Announcement IBM Systems is announcing IBM PowerAI Enterprise and an AI infrastructure Reference Architecture for on-premises AI deployments. IBM Systems is addressing the challenges organizations face experimenting with PoCs, growing into multitenant, production systems, then expanding to enterprise scale, all while integrating into an organization’s existing IT infrastructure. With a set of easy to use, integrated software tools built on optimized, accelerated hardware, the architecture enables organizations to jump start AI and Deep Learning projects, speeds time to model accuracy and provides Enterprise-grade security, interoperability and support.
  • 8. 8 Autonomous driving Accident avoidance Location-based advertising Sentiment analysis of what’s hot, problems $ Market prediction Fraud/Risk Experiment sensor analysis Drilling exploration sensor analysis Consumer sentiment Analysis Sensor analysis for optimal traffic flows Smart Meter analysis for network capacity, Threat analysis - social media monitoring, video Surveillance Clinical trials, drug discovery, Genomics People & career matching Patient sensors, medical image interpretation Captioning, search, real time translation Mfg. quality Warranty analysis AI Examples in Every Industry
  • 9. 9 Data Science is a Team Sport and Iterative Extract Data Build models Prepare Data Train Models Evaluate Deploy Use models Monetize $$$ Monitor Building cognitive apps using deep learning requires multiple skillsets Connected infrastructure for data, development and iteration. A common data platform and workflow is crucial for enterprise success. Biz Analyst Dev OpsData Engineer App DeveloperDev OpsData Scientist IT Supports & Services the Complete Workflow
  • 10. 10 91% I&O Leaders Across Inquiries Cited "Data" as a Main Inhibitor of AI Initiatives. This is not easy… Source: Gartner "AI State of The Market - and Where HPC intersects”
  • 11. 11 Data Source New Data Years of Data Work flow and data flow is complex Inference Trained Model Deploy in Production using Trained Model Seconds to results Data Preparation Data Cleansing & Pre-Processing Training Dataset Testing Dataset Weeks & months Heavy IO Iterate Build, Train, Optimize Models AI Deep Learning Frameworks (Tensorflow & Caffe) Monitor & Advise Instrumentation Distributed & Elastic Deep Learning Parallel Hyper-Parameter Search & Optimization Network Models Hyper- Parameters Days & weeks Traditional Business IoT & Sensors Collaboration Partners Mobile Apps & Social Media Legacy
  • 12. Training Dataset Testing Dataset 12 Production Data Sensor Data Data from collaboration partners Data from mobile app and social media Legacy Data Data Preparation Pre-Processing Data Source Model Training Inference AI Deep Learning Frameworks (Tensorflow & IBM Caffe) Monitor & Advise Instrumentation Iterate Distributed & Elastic Deep Learning (Fabric) Parallel Hyper-Parameter Search & Optimization Network Models Hyper- Parameters Trained Model Deploy in Production using Trained Model New Data Years of Data Hours of preparation Weeks and months of training Seconds to results Data requirements varies significantly Data Variety Data Quantity Geo-dispersed, On-perm & Cloud Data Efficiency Data Quality Data Gravity HDFS/Spark Model Velocity Workflow Integration Data Access Density Data Velocity : Low latency High throughput Data Caching Data Security, Governance and Resilience
  • 13. 13© Copyright IBM Corporation 2017 IBM AI Architecture from Experimentation to Expansion Experimentation Single Tenant Stabilization & Production Secure Multitenant Expansion Enterprise Scale / Multiple Lines of Business Data Scientist’s workstations Internal SAS drives & NVM’s IBM Power Systems AC922 High-Speed Network Subsystem Existing Organization Infrastructure IBM Elastic Storage Server (ESS) Training & Inference Cluster IBM Power Systems AC922, LC921 & LC922 Master & Failover Master Nodes IBM Power Systems LC921 & LC922 Login Nodes IBM Power Systems LC921 & LC922 Training Cluster IBM Power Systems AC922 IBM Elastic Storage Server (ESS) High-Speed Network Subsystem Existing Organization Infrastructure One software stack from experimentation to expansion IBM PowerAI Enterprise Red Hat Enterprise Linux (RHEL) IBM Power System & x86 Servers Services& Support IBM Spectrum Scale / IBM Elastic Storage Server (ESS)
  • 14. AI Adoption Cycle –Single node –Single user/tenant –Small scale data –Algorithm prototyping, hyperparameter optimization Experimentation Production Expansion –Expanding use cases –Multi-node –Cluster –Medium scale data –Security –Data Science Shared Service –Multitenant –Upstream data pipeline –Model iteration –Scalable Inference 14
  • 15. AI Data Journey –Single node –Single user/tenant –Small scale data –Algorithm prototyping, hyperparameter optimization Experimentation Production Expansion –Expanding use cases –Multi-node –Cluster –Medium scale data –Security –Data Science Shared Service –Multitenant –Upstream data pipeline –Model iteration –Scalable Inference 15 Hadoop and Spark are the choice for data pipeline.
  • 16. 16 Why customers are choosing IBM Spectrum Scale Storage with Hadoop?
  • 17. 17 Reduce datacenter footprint and get faster ingest with in-place analytics Data NFS SMB POSIX Object HDFS API Access to the data using any of the industry standard protocols. No need to maintain separate copies for different applications. Grow storage independent of compute with the best data protection technology Grow storage independent of compute with pre-integrated ESS system. Eliminate need for 3 copies of data with SW RAID, Faster disk rebuilds, No data corruption Extreme scalability with parallel file system architecture Data + Metadata Node Data + Metadata Node Data + Metadata Node Data + Metadata Node Scale to billions of files. No centralized metadata node bottleneck. Global namespace that spans geographies Stretch clusters and Active – Active replicas of data for real time global collaboration ESS Why customers are choosing Spectrum Scale storage for Hadoop? Faster ingest, unmatched scalability, up-to 60% less storage footprint for Hadoop workloads 1 2 3 4
  • 18. 18 Data Lake: Up to 60% less storage footprint | 18 Ingest ObjectFile Direct Access POSIX Raw Data Analysis Less hardware • HDFS Shared Nothing: 15 PB of physical for 5 PB usable • Spectrum Scale on ESS: 6.5 PB of physical for 5 PB usable Analytics in place • No need to maintain copies of data for traditional applications and analytics applications Multi-purpose shared data lake • Shared by Hadoop and many other use cases
  • 19. 19 HDP on Power with Elastic Storage Server • Improve TCO Up to 3X reduction of storage and compute infrastructure moving to Power Systems and Elastic Storage Server vs commodity scale out x86. Less infrastructure means reduced costs in many areas (Energy, cooling, server administration, floor space, SW licensing) • Position for future growth, avoid hitting the data center wall with cluster sprawl Separating storage from compute enables the selection of the best compute node for the workload – and Power has the greatest range of options E E InfiniBand (RDMA) / 40 GigE / 10 GigE IBM Power nodes running HDP services and Spectrum Scale client ESS HDP HDP HDP HDP HDP ESS Elastic Storage Server(Powered by Spectrum Scale) C C C C CC C Spectrum Scale Client + HDFS Connector
  • 20. 20 Popular analytics use cases with IBM Spectrum Scale storage
  • 21. 21 Challenges …  Expensive EDW (Enterprise Data Warehouse) setups  Silos of infrastructure for various analytics workflows  Multiple copies of the same data  Time consuming data ingest cycle  Unmanageable analytics cluster sprawl
  • 22. 22 Popular use-cases that help eliminate analytics silos I. EDW Optimization Optimize data warehouse by shifting right workload to Hadoop Reduce cost & improve efficiency II. Integrated HPC and Hadoop Efficiently transform data into insights with single data lake for HPC & Hadoop Faster & better insights IV. Unified Analytics Workflows Single data lake for Hadoop and non-Hadoop analytics Improve data governance III. Hadoop Storage Tiering Disaggregate storage and compute for better utilization Reduce cluster sprawl
  • 23. 23 I. EDW Optimization Optimize data warehouse by shifting right workload to Hadoop Archive Data away from EDW - Move cold or rarely used data to Hadoop as active archive - Store more of data longer Offload costly ETL process - Free your EDW to perform high-value functions like analytics & operations, not ETL - Use Hadoop for advanced ETL Optimize the value of your EDW - Use Hadoop to refine new data sources, such as web and machine data for new analytical context Reduce migration effort & skillset gap - Use existing investment in Oracle/DB2/Netezza skills - BigSQL allows you to migrate applications without major code rewrites and additional SQL development Control cluster sprawl - Grow storage independent of compute with ESS - POWER servers deliver 1.7x throughput compared to Hortonworks on x86 - Up-to 60% less storage footprint Enterprise Data Warehouse DB2 / Dashdb / Oracle / Netezza / Teradata … Hot Data Hadoop Cold Data, Archive Data, New Sources HDP On Power SQL Interface BigSQL On Power Analytics Software (Business Analytics, Visualization like SAS grid, SAP HANA etc) ESS for Speed ESS for Data Lake Spectrum Scale A Financial Services company in Europe is optimizing their DB2 warehouse using HDP, BigSQL, Power, ESS combination. New Data Sources Streaming / IOT data HDF On Power
  • 24. 24 II. Integrated HPC and Hadoop Efficiently transform data into insights with single data lake for HPC & Hadoop NASA and a Healthcare company from middle east are using common Spectrum Scale data lake to efficiently get insights using traditional HPC and Hadoop analytics. ESS for Data Lake POSIX Interface HDFS Interface Traditional HPC Open, Read, Write, MPI, C-code, Python etc Hadoop Map-Reduce, Spark, ML/DL etc HDP On Power NFS/SMB/Object Interface Spectrum Scale Protocol Node ESS for Speed Fast Ingest POSIX Interface Spectrum Scale Extend HPC to add modern analytics capabilities - Efficient movement of data between modern and traditional applications with common namespace - Spectrum Scale in-place analytics capabilities enable accessing the same data using NFS/SMB/Object/POSIX/HDFS without requiring any modifications to the data - Improve data reliability and governance with single data lake Ingest fast and improve time to insight - POSIX interface combined with ESS Flash storage gives super fast ingest ability - Common namespace enables running some edge analytics at the ingest layer as well Control cluster sprawl - Grow storage independent of compute with ESS - Up-to 60% less storage footprint - POWER servers deliver 1.7x throughput compared to Hortonworks on x86
  • 25. 25 III. Hadoop Storage Tiering Disaggregate storage and compute for better utilization An Indian conglomerate is implementing ESS based ingest tier to their existing Hadoop data-lake. ESS for Data Lake POSIX Interface HDFS Interface New Hadoop cluster HDP On PowerESS for Speed Fast Ingest Existing Hadoop cluster Native HDFS Storage HDFS Interface HDFS Interface Use ESS as Ingest Tier to existing Hadoop setup - Get super-fast ingest with POSIX and Flash storage - Run in-place analytics directly on tier1 storage Use ESS as Secondary Tier to existing Hadoop setup - Grow storage independent of compute - Reduce cluster sprawl - Share data between old & new Hadoop setups - Avoid copying data between the two clusters with a common data lake - Introduce new IBM Power-based HDP clusters for demanding next gen analytics workflows on the same data lake
  • 26. 26 IV. Unified Analytics Workflows Single data lake for Hadoop and non-Hadoop analytics A bank in South Africa is implementing HDP and SAS grid software on a common ESS based infrastructure. ESS for Data Lake POSIX Interface HDFS Interface Other Analytics Platforms SAS grid, SAP HANA/Vora, ML/DL, Conductor with Spark etc Hadoop Map-Reduce, Spark, ML/DL etc HDP On Power ESS for Speed Fast Ingest POSIX Interface Spectrum Scale All analytics workflows on common storage - Improve data reliability and governance with single data lake for Hadoop and non-Hadoop analytics setups - Build ML/DL workflows that use multiple analytics platforms - Share data across analytics workflows as appropriate Ingest fast and improve time to insight - POSIX interface combined with ESS Flash storage gives super fast ingest ability Control cluster sprawl - Grow storage independent of compute with ESS - Up-to 60% less storage footprint - POWER servers deliver 1.7x throughput compared to Hortonworks on x86

Notas do Editor

  1. Here is a snapshot of what Spectrum Scale has to offer: It supports accessing the data using various different access protocols like POSIX, NFS, SMB, HDFS etc. and hence can be used as a data lake to consolidate all your organization’s data. This allows you to strengthen HDP use cases like EDW Offload, Active Archive, Single view of the customer etc. In the background Spectrum Scale offers automated data placement on any of the storage media like Flash, Disk, Tape, Cloud etc. This helps with storage utilization and cost optimization. We already have 4000+ enterprise customers using Spectrum Scale today as their data store.
  2. TALK TRACK Together, we are better able to address the changing dynamics we’ve just outlined, solve the associated challenges and create valuable outcomes. By joining forces, we have brought together Hortonworks deep expertise in data with IBM's data science platform which was a leader in the Magic quadrant and the best SQL Engine broadens our toolset to be able to help you accelerate your business. Additionally we have added IBM Systems differentiated Power and SDS offerings to improve ROI on these investments.
  3. The IBM Reference Architecture for AI Infrastructure is intended to be used as a reference by data scientists and IT professionals who are defining, deploying and integrating AI/ML/DL solutions into an organization. This document describes an architecture that will facilitate a productive proof of concept (PoC) and allow growth into a multitenant, production system that allows for sustained growth to enterprise scale, while integrating the solution into an organization’s existing IT infrastructure
  4. Every one of these is a use case on which IBM Systems has worked on this year.
  5. Not only does the AI workflow involve multiple team members working on complex often manual tasks, each step in the pipeline can weeks or months depending on the a variety of factors.  Data Scientists are key contributors to successful model building and training, however - they are hard to find with experience in orchestrating the ML/DL workflow and emerging AI frameworks and applications. It requires coordination and cooperation Even experienced data scientists can be challenged by the distributed data ingest and prep required with large complex AI data sets which can consume 80% of the time spent in an AI project.  
  6. Talking points Data Sources Data Preparation High data quality is critical to the success of any AI initiative and the very large, diverse data sets (typically 8-10X than that used for traditional analytics [IDC]), needed for AI create a data integration, transformation and labeling challenge that consumes significant time, human effort, and infrastructure resources. Data sensitivity requires multi-layered security across the AI data pipeline. IDC: ML and DL algorithms need huge quantities of training data(typically 8-10X than that used for traditional analytics), and Model Build, Train, Optimize AI is built on a complex mix of emerging, rapidly changing technologies and requires accelerated, high performance computing environment. Steep data scientist learning curves and open source framework complexity means it can take weeks to get up and running. Building accurate AI models is a time intensive, often manual process of experimentation and optimization of complex combinations of features and parameters. Training models requires massive amounts of data used in millions of jobs to make a model intelligent. Accessing distributed resources is often a manual, rigid process resulting in fixed, inflexible processing schema resulting in training that can run for weeks or months.
  7. Will add talking points : “NO one is on the same curve..”
  8. TALK TRACK Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications. [NEXT SLIDE]
  9. TALK TRACK Spectrum Scale has its roots in HPC and runs on number of super computers in the world. Customers have started adopting it now as SDS behind Hadoop/Spark based data lakes as well. Apart from standard shared storage advantage of being able to grow storage independent of compute and elimination of 3 way replication that is needed in standard HDFS, Spectrum Scale is also being adopted for its unmatched scalability and faster ingest. - Reduce datacenter footprint with industry’s best in-place analytics No need to maintain copies of the data for different applications requiring access methods -True software defined storage that can be purchased as software only OR pre-integrated system Can start small with SW only option and still leverage enterprise storage system benefits from day 1. ESS brings advantages of software RAID and eliminates the need for 3 times replication for data protection. - Extreme scalability with parallel file system architecture This allows you to grow your Hadoop environment as your data grows without system imposed limitations. Scales upto billions of files and thousands of nodes as against HDFS that scales upto 350 million files due to centralized name node limitations. Global namespace that can span geographies This allows global and international organizations to form data lakes across the globe. POSIX support is one of the key differentiator that Spectrum Scale brings to the table that makes Hortonworks Data platform stronger against MapR.
  10. TALK TRACK As Hadoop clusters grow, it is quite typical that compute nodes start to become under-utilized as they are added to primarily increase storage capacity. Having a cluster design where compute and storage are locked together in a common building block removes a great deal of flexibility and can result in cluster sprawl and mounting TCO for data center space, power, SW licensing and admin and management costs. The IBM Elastics Storage Server, which includes the IBM Spectrum Scale file system, is 100% HDFS compatible and allows for the separation of the storage into a high performance, resilient storage appliance (ESS) which then allows the compute nodes to be right sized for the demands of the workload, including mixing in workload optimized nodes such as GPUs. And Power has the most performance compute nodes. This approach has the significant advantage of having the same data storage plane, and single version of the data, shared with Hadoop and traditional analytics workloads. So, no need to copy data between your POSIX and Hadoop environements and no need to use 3X replication, as is typical in local storage Hadoop models, as ESS includes native SW RAID for complete resiliency with only 30% overhead.
  11. TALK TRACK Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications. [NEXT SLIDE]