This document discusses OpenIO, an open source software-defined storage platform. It can transform commodity x86 servers into a large storage and computing pool. OpenIO offers elastic, scalable object storage for up to 1000+ petabytes of data. It uses a grid architecture with no single point of failure and real-time load balancing. OpenIO also enables running applications directly on the storage infrastructure through its "Grid for Apps" feature. The document outlines how OpenIO provides ease of use, supports storage tiering, and can integrate with services like Backblaze B2 to offer hybrid cloud storage solutions.
5. 2006
Idea &
1st concept
2007
Design
dev starts
2009
1st massive
production above 1PB
2012
Open
sourced
2014
10+ PB
managed in a
single cluster
About
OpenIO
fork
2015
20 60employees millions end-users
San Francisco | Lille (FR) | Tokyo | Montreal
6. Use cases
> Email Platforms
> Video Streaming
> Object and file storage
• Storage-as-a-Service
• Compute + Storage Platform
• On-prem Data
7. > High frequency
modifications
most
88 % and growing fast
iops
capacity
> Low frequency modifications &
immutable data
Storage market evolution (80/20 rule)
IOPS Realm Hyper-Scale Capacity Realm
8. x86 servers Software-Defined
Storage
Hyper Scalable
Storage and Compute Platform
The OpenIO Solution
+ =
OpenIO transforms a rack of x86 servers
into a large storage and compute pool
9. OpenIO Democratizes Large Data Platforms
Internet giants
have initiated the wave
and proved the technology
OpenIO offers
the same model
“on-premise “
> Simple
> Elastic
> Flexible
> Cost effective
> On-demand
10. OpenIO Approach
Object Storage
Scale and store 1000+ PBs of
data, Billions of objects
Open Source software
Commodity hardware
Reduce
cost and TCO
Application-aware
Grid for Apps
Run applications on the
storage infrastructure
1. 2. 3.
12. …
…
…
Multi tenancy
Flat structure
Track containers, not objects
Directory with indirections
Containers Objects
grid://namespace/account/container/object
>
Namespace
Account
Container
Object
13. New nodes are
automatically
discovered and
immediately available
Scaling
No consistent hashing
algorithm > no recalculation
of the key space
Grid of nodes with no consistent hashing
Never rebalance>
14. 1. Collects metrics from the
services of each node
2. Computes a score for each
service
3. Distributes scores to every
node and client
4. On the fly best match making
for each request
The score is computed with a configurable
formula usually based on:
capacity, io performance, CPU
Conscience
Realtime load balancing
for optimal data placement
>
15. Grid for Apps
Data usage at the heart of the datacenter>
A data processing framework
integrated inside OpenIO’s Grid
Scale-out application back-ends
can be built on the storage
platform itself
Avoid wasted resources and
simplify load balancing for
storage and processing
16. APIs and Specific App Connectors
Data at the heart of the datacenter>
Gateway layer
Optimized
native APIs
• C
• Python
• Java
• Go
Command line
interface
Specific
App Connectors
• Enterprise Storage
• Email
• Video streaming
RESTAPIs
• Amazon S3
• Openstack Swift
17. What makes us unique?
3.2.1.
Ease of use
Easy to test, deploy in
production, manage and
use
0 TB > 1000+ PB
Start small and grow
with your needs (from
one engine for small
config to very large
ones)
Grid for Apps
Unifying Compute and
Storage on a single
platform
18. 1. Grid for Apps
Ingest Search
Store
Full Text Index
Video workflow
for user generated contents
Ingest Stream
Store
Transcode
Store
Spam sample
long-term archiving and search
Scale-out real life use cases
19. 2. From 0TB to 1000+PB
Start small! No compromise
on 3 object storage
characteristics
Fast
> Simple and standard APIs
> Data protection with
replication or erasure coding on
multiple nodes
> Scale-out capabilities to hyper-
scale
> A 3 nodes cluster can be
deployed in 5 minutes and
be production ready
20. 3. Ease of use
Full
Operational
Control
CLI
available
Ubiquitous
Management
via Web GUI
25. OpenIO tiering is obvious
1.
Built-in feature
Across storage
devices managed by
OIO or connected to
an external pool
2.
Multicriteria settings
Name, date, size,
patterns, type,
container name, etc.
3.
Transparent
At upload, or with
background jobs
26. OIO chunk Layer
OIO
directory
Client side
applications
OpenIO standard architecture
R R R R R R R R
29. OpenIO & Kinetic
Thanks Seagate for the Ember system!
Kinetic
Open Source
Project Member
Participation to
the Plugfest in
Raleigh
April ‘16
Working proof
of concept in
our lab
Ready to get
certified during
the next
Plugfestgithub.com/open-io/
oio-kinetic
45. IT struggling with archiving data explosion
1. Explosion of Data Volume
2. Complexity
3. Infrastructure cost
46. Current solution: Silos
Hot data
stored on disk
Cold data
stored on LTO
Archiving software
Move cold data to cold storage
47. But the current solution has lots of problems
Backblaze B2 & OpenIO take up the challenge
Data is siloed LTO has very slow
performance
High cost of having
multiple solutions
1. 2. 3.
48. A hybrid cloud storage solution to rule them all
OpenIO single
namespace &
tiering
Backblaze B2
public cloud
No more "cold"
storage
Backblaze B2
offered through
OpenIO solution
Data is siloed LTO has very slow
performance
High cost of having
multiple solutions
1. 2. 3.
49. 1. OpenIO single namespace & tiering
OIO chunk Layer
OIO
directory
Client side
applications
R R R R R
Blackblaze B2 objects
Hybrid
Cloud
GW
50. 2. Backblaze B2 public cloud / No more "cold" storage
2.
High
performance
Unlimited
storage
3.1.
Data is
instantly
available
4.
Pay only
for what
you use
51. 3. Backblaze B2 offered through OpenIO solution
OpenIO
(on premise)
Backblaze B2
(cloud)
hybrid
Push your
datacenter’s
walls!
52. 2.
Fast Performance
3.
99.999999% durability
Backblaze B2
Fully integrated in OpenIO
1.
Cost Effective
Only $.005/GB/month
No Capex
Pay for what you use
All data are stored on in-
production. Always online
disks. Parallelization
achieves fast throughput.
All data checked ever 30
days for degradation and
rebuilt when necessary.
53. 3 years TCO for 1PB
Legacy OpenIO +
Backblaze B2
Storage array
3 years hosting
Tape gear
Backup
software
Human admin
OIO subscription
x86 servers
3 years hosting
Human admin
B2
(>40% archived in Backblaze B2)
57. Ease of
Use
3.
5 minutes
deployment
Kinetic
Drives
4.
Ready for the
next storage
hardware
2.
Tiering
Built-in, multi
criteria and
transparent
Grid for
Apps
1.
scale-out
backend for
applications
Wrapping up
Backblaze
B2
5.
Unique hybrid
cloud storage
approach