SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
BeoLink.org



Design and build an inexpensive DFS



           Fabrizio Manfredi Furuholmen




 FrOSCon                                  August 2008
Agenda              BeoLink.org



    Overview
    Introduction
    Old way
      openAFS
    New way
      Hadoop
      CEPH
    Conclusion
Overview                        BeoLink.org



   Why Distributed File system ?
 • Handle terabytes of data
 • Transparent to final user
 • Working in WAN environment
 • Good level of scalability
 • Inexpensive
 • Performance
Overview                                    BeoLink.org

Software vs Hardware




 Centralize Storage           DFS
 • Block device (SAN)         • Single file system across
 • Shared file system (NAS)     multiple computer nodes
 • Simple System Management   • More complicated System
 • Single point of failure      Management
                              • Scalable
                              • HA (sometime)
Overview                                 BeoLink.org

DFS Advantages

          Small number of inexpensive
          fileservers provides similar
          performance to client side


          Increase in capacity are inexpensive



          Better manageability and redundancy.
Overview                                                       BeoLink.org

Inexpensive
Terabyte Cost (SAS/FB)               12 TB Storage (SATA)
• 18k $ NAS/SAN
• 5k $ DFS                           Device        Device   Price    Total
Terabyte Cost (SATA)
                                                            ($)      Price ($)
• 2.5k $ NAS/SAN                     SAN           1        37,000   37,000
• 0.7 $ DFS
                                     File Server   1        23,995   23,995
Disks Type

• 500/1000GB SATA Disk reduce >50%   DFS           3        2,500    7,500
Installation

• space
• network
• supply
                                     96 TB Storage (SATA)
Software                             Device        Device   Price    Total
• Port extension                                            ($)      Price ($)
• Special software for HA
                                     SAN           1        249,995 249,995
Discount

• Dumping                            DFS           16       4,500    72,000
Introduction                                        BeoLink.org

DFS
        Distributed file
           systems            NFS, CIFS, Netware..

        Distributed fault
          tolerant file         CODA, MS DFS..
            systems


          Distributed
          parallel file         VPFS2,LUSTRE..
           systems


       Distributed parallel
        fault tolerant file
                               Hadoop, GlusterFS,
             systems              MogileFS..

        Peer-to-peer file
            systems                Ivy, Infinit..
openAFS                                                       BeoLink.org

Intruduction
                       • Client Caching
                       • Replication
      Scalability      • Load balance among servers while data is in use




  Transparent Access   • Cell
                       • Partitions and volumes
     and Uniform       • Mount Points
      Namespace        • In-use volume moves


                       • Authentication and secure communication
                       • Authorization and flexible access control
       Security


                       • Single system interface
                       • Administration tasks without system outage
  System Management    • Delegation
                       • Backup
openAFS                                           BeoLink.org

Main Elements
   Cell

   • Cell is collection of file servers and
     workstation
   • The directories under /afs are cells,
     unique tree
   • Fileserver contains volumes

   Volumes

   • Volumes are "containers" or sets of
     related files and directories
   • Have size limit
   • 3 type rw, ro, backup

   Mount Point Directory

   • Access to a volume is provided through a
     mount point
   • A mount point looks and just like a static
     directory
openAFS                                                      BeoLink.org

Server Types
     Fileserver Server
     • Fileserver, delivers data files from the file server machine to
       workstations
     • Volume Server (Vol Server), performs all types of volume
       manipulation

     Database Server
     • Volume Location Server (VL Server), maintains the Volume
       Location Database (VLDB)
     • Protection Server (Ptserver), Users can grant access to several
       other users.
     • Authentication Server(Kaserver), AFS version of kerberos IV
       (deprecated).
     • Backup Server (Buserver), it stores information related to the
       Backup System.

     Ubik
     • Distributed Database
openAFS                                           BeoLink.org

Implementation
    Problem: Company file system
    • Share documents
    • User home dir
    • Application file storage
    • WAN Environment


    Solution
    • openAFS
     • Scalable, HA, good in WAN, inexpensive
     • More then 20 platforms
    • Samba (Gateway)
     • AFS windows client slow and bit unstable
     • Clientless
    • Heimdal Kerberos (SSO)
     • KA emulation
     • LDAP backend
    • Openldap
     • Centralize Identity storage
openAFS                                                           BeoLink.org

Usage

    Read/Write Volume

        •  Shared development areas
        •  Documentation data storage
        •  User home directories



        Read-Only Volume


        •    Application deployment
        •    Application executables (binaries, libraries, scripts)
        •    Configuration files
        •    Documentations (Model)
openAFS                               BeoLink.org

Design

 Scalability

 • Storage scalability (File system
   layer)
 • User scalability (Samba
   Gateway layer)

 Performance

 • Load balancing
 • Roaming user/branch office

 Clientless

 • Windows client

 Centralized Identity

 • Kerberos
 • Ldap
openAFS                                 BeoLink.org

Tricks

                      Cache on       Plan the
         At least 3
                      separated      Directory
          servers
                         disk          Tree



                                    Use volume
         Replicate    Use volume
                                    name that
         read only     much as
                                      explain
           data        possible
                                    mount point



         Replicate
          “mount      400 clients
           point”     per server
          volume
openAFS                          BeoLink.org

Enviroment
     3 AFS Server (3TB)
     • Disk 6 x 300 SAS RAID 5
     • 2 Gigabits Ethernet
     • 2 Processor Xeon
     • 2 GB Ram

     2 Samba Server
     • Disk 2 x 73 SAS RAID 1
     • 2 Gigabits Ethernet
     • 2 Processor Xeon
     • 4 GB Ram

     2 Switch (Backbone)
     • 24 port

     Users
     • 400 Concurrent Unix
     • 250 Concurrent Windows
openAFS                  BeoLink.org

Linux Performance


         • 20-35 MB/s

Write


         • Warm

Read
           35-100 MB/s
         • Cold 30-45
           MB/s
openAFS                             BeoLink.org

Windows through Samba Performance



        • 18-25 MB/s
Write



        • 20-50 MB/s
Read
openAFS                                                             BeoLink.org

Who use it ?
    Morgan Stanley IT
    •  Internal usage
    •  Storage: 450 TB (ro)+ 15 TB (rw)
    •  Client: 22.000


    Pictage, Inc
    •  Online picture album
    •  Storage: 265TB ( planned growth to 425TB in twelve months)
    •  Volumes: 800,000.
    •  Files: 200 000 000.


     Embian
    • Internet Shared folder
    • Storage: 500TB
    • Server: 200 Storage server
    • 300 App server


    RZH
    • Internal usage 210TB
openAFS                                   BeoLink.org

Good for..

             Good
             •  General purpose
             •  Wide Area Network
             •  Heterogeneous System
             •  Read operation > write operation
             •  Small File



                   Bad
                   • Locking
                   • Database
                   • Unicode
                   • Performance (until OSD)
New way                                              BeoLink.org




         • Object-based storage
         • Separation of file metadata management (MDS) from
           the storage of file data
  OS


         • Object storage devices
         • Replace the traditional block-level interface with one
           named object
  OSDs
New way                                                BeoLink.org




          • Multiple streams are parallel channels through which
            data can flow, thus improving the rate at which data can
            be written to the storage media
 Stream



          • Files are striped across a set of nodes in order to
            facilitate parallel access
          • Chunk simplify fault tolerance operation.
 Chunk
Hadoop                                             BeoLink.org

Introduction

                                Scalable: can reliably store and
                                      process petabytes.




                               Economical: It distributes the data
                                and processing across clusters of
                                 commonly available computers.
     “Moving Computation is
      Cheaper than Moving
             Data”
                              Efficient: can process data in parallel
                                 on the nodes where the data is
                                              located.




                               Reliable: automatically maintains
                                  multiple copies of data and
                              automatically redeploys computing
                                    tasks based on failures.
Hadoop                                    BeoLink.org

MapReduce

  MapReduce

  • it is an associated implementation
    for processing and generating
    large data sets.

  Map

  • It is a function that processes a
    key/value pair to generate a set of
    intermediate key/value pairs,

  Reduce

  • It is a function that merges all
    intermediate values associated
    with the same intermediate key.
Hadoop                                BeoLink.org

Map and Reduce

Map
• Split and mapped in key-
  value pairs




    Combine
    • For efficiency reasons, the
      combiner work directly to
      map operation outputs .




        Reduce
        • The files are then merge,
          sorted and reduced
Hadoop              BeoLink.org

HDFS Architecture
Hadoop                                                     BeoLink.org

Implementation

    Problem: Log centralization

    • Centralized log, keep track of all system activity
    • Search and statistics


    Solution

    • HDFS
     • Scalable, HA, distribution task
    • Hearbeat+DRDB
     • HA namenode
    • Syslog-ng
     • Flexible and scalable
    • Grep MapReduce Function
     • Mail logging
     • Firewall logging
     • Webserver logging
     • Generic Syslog
Hadoop                             BeoLink.org

Solution:

  Scale on demand

  •  ncrease syslog concentrator
   I
  •  adoop cluster size
   H

  Performance

  •  edicated mapReduce
   D
   function for report and
   search
  •  arallel operation
   P

  High Availability

  •  nternal replication
   I
  •  istribution on different
   D
   shelf
Hadoop                               BeoLink.org

Enviroment

     2 Log Server
     • Disk 2 x 143 SAS RAID 1
     • 2 Gigabits Ethernet
     • 2 Processor Xeon
     • 4 GB Ram



     2 Switch (Backbone)
     • 24 port Gigabit



     Hadoop
     • 2 namenode 8gb,300GB, 2Xeon
     • 5 node 4gb, 2TB, 2 Xeon
Hadoop                                   BeoLink.org

Tricks

         Much server                  Parallel
                        Block size
         as possible                  Streams




                                         Map /
            Good
                         No old       Reduce/
           Network
                        hardware     Partitioning
          (Gigabits)
                                       fuctions



           Simple
          Software
         distribution
Hadoop                                                                    BeoLink.org

Who use it ?
    Yahoo!
    • 2000 nodes (2*4cpu boxes w 3TB disk each)
    • Used to support research for Ad Systems and Web Search


    A9.com - Amazon
    • Amazon's product search


     Facebook
    • Internal log storage
    • Reporting/analytics and machine learning
    • 320 machine cluster with 2,560 cores and about 1.3 PB raw storage


    Last.fm
    • Charts calculation and web log analysis
    • 25 node cluster (dual Xeon LV 2GHz, 4GB RAM, 1TB/node storage)
    • 10 node cluster (dual Xeon L5320 1.86GHz, 8GB RAM, 3TB/node storage)
Hadoop                                    BeoLink.org

Good for..
             Good
             • Task distribution (Basic GRID
               infrastructure)
             • Distribution of content (High
               throughput of data access )
             • Read operations >> Write
               operations



                   Bad
                   • Not General purpose File system
                   • Not Posix Compliant
                   • Low granularity in security setting
Ceph                                     BeoLink.org

Next Generation


     Ceph addresses three critical challenges of
                 system storage




   Scalability    Performance        Reliability
Ceph                                         BeoLink.org

Introduction

    Capabilities

    • POSIX semantics.
    • Seamless scaling from a few nodes to many
      thousands
    • Gigabytes to Petabytes
    • High availability and reliability
    • No single points of failure
    • N-way replication of all data across multiple
      nodes
    • Automatic rebalancing of data on node addition/
      removal to efficiently utilize device resources
    • Easy deployment (userspace daemons)
Ceph                 BeoLink.org

Architecture




        • Client
        • Metadata
          Cluster
  OSD
        • Object
          Storage
          Cluster
Ceph                                         BeoLink.org

Architecture difference

     Dynamic Distributed Metadata

     • Metadata Storage
     • Dynamic Subtree Partitionin
     • Traffic Control


     Reliable Autonomic Distributed Object
     Storage

     • Data Distribution
     • Replication
     • Data Safety
     • Failure Detection
     • Recovery and Cluster Updates
Ceph                                           BeoLink.org

Architecture
               Pseudo-random data distribution function
               (CRUSH)

               Reliable object storage service (RADOS)

               Extent B-tree object File System
Ceph                                                          BeoLink.org

Transaction

  Splay Replication
  •  Only after it has been safely committed to disk is a final commit
   notification sent to the client.
Ceph                                     BeoLink.org

Good for..

             Good
             • General purpose (Posix compliant)
             • High throughput of data access
               (scientific)
             • Heavy Read / Write operations
             • Coherent




                   Bad
                   • Young (not complete yet)
                   • Linux only
Conclusions                                                      BeoLink.org


Environment Analysis
• No true Generic DFS
• Not simple move 400TB btw different solution


     Dimension
     • Start with the right size
     • Servers number is related to speed needed and number of
       clients
     • Replication

            Divide system in Class of Service
            • Different disk Type
            • Different Computer Type


                  System Management
                  • Monitoring Tools
                  • System/Software Deploy Tools
Next                               BeoLink.org


       Hadoop
       •  amba exporting (VFS ?)
        S
       •  yslog server
        S
       •  BASE
        H
       •  olr
        S

       openAFS
       •  D integration (Q4)
        A
       •  FS Manager
        A
       •  pcoming release (OSD)
        U
       •  EBDAV
        W

       CEPH
       •  napshoot
        S
       •  est with Samba Cluster
        T
Links                                           BeoLink.org




 OpenAFS             Hadoop                Ceph
 • www.openafs.org   • Hadoop.apache.org   • ceph.newdream.net
 • www.beolink.org   • Hbase               • Publication
                     • Pig                 • Mailing list
                     • Mahout
The least but not ..        BeoLink.org



   Gluster
   • Stable
   • Good performance


   MogileFS
   • Application oriented
   • High Availability


   PVFS2
   • Scientific oriented
   • High Performance
   • Plugin

   Lustre
   • High performance
   • Stable
BeoLink.org

   Reference
• For Further Questions:

• Fabrizio Manfredi
• fabrizio.manfredi@gmail.com
  manfred.furuholmen@gmail.com

• http://www.beolink.org



                                  Too
                                 Long


                                        The End

Mais conteúdo relacionado

Mais procurados

Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011GlusterFS
 
Microsoft Offical Course 20410C_08
Microsoft Offical Course 20410C_08Microsoft Offical Course 20410C_08
Microsoft Offical Course 20410C_08gameaxt
 
Microsoft Exchange 2010 Upgrade Seminar March 2010
Microsoft Exchange 2010 Upgrade   Seminar March 2010Microsoft Exchange 2010 Upgrade   Seminar March 2010
Microsoft Exchange 2010 Upgrade Seminar March 2010hagestadwt
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...Symphony Software Foundation
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Libraryrajivkumarmca
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageAidan Finn
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGlusterFS
 
Improving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & MigrationsImproving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & MigrationsTim Donohue
 
Oracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – StorageOracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – StorageMarketingArrowECS_CZ
 
Sizing your Content Databases: Understanding the Limits
Sizing your Content Databases: Understanding the LimitsSizing your Content Databases: Understanding the Limits
Sizing your Content Databases: Understanding the LimitsRandy Williams
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flashxKinAnx
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkSisimon Soman
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadMarius Adrian Popa
 
Alfresco scalability and performnce
Alfresco   scalability and performnceAlfresco   scalability and performnce
Alfresco scalability and performncePaul Hampton
 
Benefity Oracle Cloudu (3/4): Compute
Benefity Oracle Cloudu (3/4): ComputeBenefity Oracle Cloudu (3/4): Compute
Benefity Oracle Cloudu (3/4): ComputeMarketingArrowECS_CZ
 

Mais procurados (19)

Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
 
Microsoft Offical Course 20410C_08
Microsoft Offical Course 20410C_08Microsoft Offical Course 20410C_08
Microsoft Offical Course 20410C_08
 
Microsoft Exchange 2010 Upgrade Seminar March 2010
Microsoft Exchange 2010 Upgrade   Seminar March 2010Microsoft Exchange 2010 Upgrade   Seminar March 2010
Microsoft Exchange 2010 Upgrade Seminar March 2010
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Library
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined Storage
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
 
Improving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & MigrationsImproving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & Migrations
 
Oracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – StorageOracle Cloud Infrastructure – Storage
Oracle Cloud Infrastructure – Storage
 
Sizing your Content Databases: Understanding the Limits
Sizing your Content Databases: Understanding the LimitsSizing your Content Databases: Understanding the Limits
Sizing your Content Databases: Understanding the Limits
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flash
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
Openstorage Openstack
Openstorage OpenstackOpenstorage Openstack
Openstorage Openstack
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talk
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
 
Alfresco scalability and performnce
Alfresco   scalability and performnceAlfresco   scalability and performnce
Alfresco scalability and performnce
 
Benefity Oracle Cloudu (3/4): Compute
Benefity Oracle Cloudu (3/4): ComputeBenefity Oracle Cloudu (3/4): Compute
Benefity Oracle Cloudu (3/4): Compute
 

Destaque

Managing OpenAFS users with OpenIDM
Managing OpenAFS users with OpenIDMManaging OpenAFS users with OpenIDM
Managing OpenAFS users with OpenIDMManfred Furuholmen
 
SouthEast LinuxFest 2015 - Managing linux in a engineering college
SouthEast LinuxFest 2015 -  Managing linux in a engineering collegeSouthEast LinuxFest 2015 -  Managing linux in a engineering college
SouthEast LinuxFest 2015 - Managing linux in a engineering collegeedgester
 
Architecting for Greater Security - London Summit Enteprise Track RePlay
Architecting for Greater Security - London Summit Enteprise Track RePlayArchitecting for Greater Security - London Summit Enteprise Track RePlay
Architecting for Greater Security - London Summit Enteprise Track RePlayAmazon Web Services
 
The economics of storage virtualization webinar
The economics of storage virtualization webinarThe economics of storage virtualization webinar
The economics of storage virtualization webinarHitachi Vantara
 
Cloud Storage Options: The True Costs
Cloud Storage Options:  The True CostsCloud Storage Options:  The True Costs
Cloud Storage Options: The True CostsHitachi Vantara
 
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014tecrecruiter
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkXiaoxi Chen
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackSage Weil
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...Amazon Web Services
 
Use Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierUse Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierManfred Furuholmen
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 

Destaque (15)

Managing OpenAFS users with OpenIDM
Managing OpenAFS users with OpenIDMManaging OpenAFS users with OpenIDM
Managing OpenAFS users with OpenIDM
 
AFS introduction
AFS introductionAFS introduction
AFS introduction
 
SouthEast LinuxFest 2015 - Managing linux in a engineering college
SouthEast LinuxFest 2015 -  Managing linux in a engineering collegeSouthEast LinuxFest 2015 -  Managing linux in a engineering college
SouthEast LinuxFest 2015 - Managing linux in a engineering college
 
TCO for a cloud
TCO for a cloudTCO for a cloud
TCO for a cloud
 
Architecting for Greater Security - London Summit Enteprise Track RePlay
Architecting for Greater Security - London Summit Enteprise Track RePlayArchitecting for Greater Security - London Summit Enteprise Track RePlay
Architecting for Greater Security - London Summit Enteprise Track RePlay
 
The economics of storage virtualization webinar
The economics of storage virtualization webinarThe economics of storage virtualization webinar
The economics of storage virtualization webinar
 
Cloud Storage Options: The True Costs
Cloud Storage Options:  The True CostsCloud Storage Options:  The True Costs
Cloud Storage Options: The True Costs
 
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmark
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStack
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
 
Use Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierUse Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage Tier
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Understanding AWS Security
Understanding AWS SecurityUnderstanding AWS Security
Understanding AWS Security
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 

Semelhante a Inexpensive storage

VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationCeph Community
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksAmazon Web Services
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptx409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptxson2483
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storageGlusterFS
 
002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptx002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptxDrewMe1
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)Amazon Web Services
 
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure  to Enable Data Analysis CollaborationThe Efficient Use of Cyberinfrastructure  to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure to Enable Data Analysis CollaborationCybera Inc.
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Open Stack
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredNETWAYS
 

Semelhante a Inexpensive storage (20)

AFS case study
AFS case studyAFS case study
AFS case study
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptx409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptx
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storage
 
002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptx002-Storage Basics and Application Environments V1.0.pptx
002-Storage Basics and Application Environments V1.0.pptx
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
 
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure  to Enable Data Analysis CollaborationThe Efficient Use of Cyberinfrastructure  to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Azure storage
Azure storageAzure storage
Azure storage
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Cnam azure 2015 storage
Cnam azure 2015  storageCnam azure 2015  storage
Cnam azure 2015 storage
 

Mais de Manfred Furuholmen (16)

Pisa
PisaPisa
Pisa
 
Samba4 Introduction
Samba4 IntroductionSamba4 Introduction
Samba4 Introduction
 
Restfs internals
Restfs internalsRestfs internals
Restfs internals
 
Introduction to message_queue
Introduction to message_queueIntroduction to message_queue
Introduction to message_queue
 
Restfs
RestfsRestfs
Restfs
 
Winbind as Identity Management Connector
Winbind as Identity Management ConnectorWinbind as Identity Management Connector
Winbind as Identity Management Connector
 
Afs manager
Afs managerAfs manager
Afs manager
 
Pt server ng
Pt server ngPt server ng
Pt server ng
 
Best Practices to create High Load Websites
Best Practices to create High Load WebsitesBest Practices to create High Load Websites
Best Practices to create High Load Websites
 
Be lazy... make automation
Be lazy... make automationBe lazy... make automation
Be lazy... make automation
 
Disaster recovery
Disaster recoveryDisaster recovery
Disaster recovery
 
Domestic cloud
Domestic cloudDomestic cloud
Domestic cloud
 
Samba management Console
Samba management ConsoleSamba management Console
Samba management Console
 
Link Samba to Cloud Storage
Link Samba to Cloud StorageLink Samba to Cloud Storage
Link Samba to Cloud Storage
 
Samba as a gateway to OpenAFS
Samba as a gateway to OpenAFSSamba as a gateway to OpenAFS
Samba as a gateway to OpenAFS
 
Samba distributed env
Samba distributed envSamba distributed env
Samba distributed env
 

Último

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Inexpensive storage

  • 1. BeoLink.org Design and build an inexpensive DFS Fabrizio Manfredi Furuholmen FrOSCon August 2008
  • 2. Agenda BeoLink.org   Overview   Introduction   Old way   openAFS   New way   Hadoop   CEPH   Conclusion
  • 3. Overview BeoLink.org Why Distributed File system ? • Handle terabytes of data • Transparent to final user • Working in WAN environment • Good level of scalability • Inexpensive • Performance
  • 4. Overview BeoLink.org Software vs Hardware Centralize Storage DFS • Block device (SAN) • Single file system across • Shared file system (NAS) multiple computer nodes • Simple System Management • More complicated System • Single point of failure Management • Scalable • HA (sometime)
  • 5. Overview BeoLink.org DFS Advantages Small number of inexpensive fileservers provides similar performance to client side Increase in capacity are inexpensive Better manageability and redundancy.
  • 6. Overview BeoLink.org Inexpensive Terabyte Cost (SAS/FB) 12 TB Storage (SATA) • 18k $ NAS/SAN • 5k $ DFS Device Device Price Total Terabyte Cost (SATA) ($) Price ($) • 2.5k $ NAS/SAN SAN 1 37,000 37,000 • 0.7 $ DFS File Server 1 23,995 23,995 Disks Type • 500/1000GB SATA Disk reduce >50% DFS 3 2,500 7,500 Installation • space • network • supply 96 TB Storage (SATA) Software Device Device Price Total • Port extension ($) Price ($) • Special software for HA SAN 1 249,995 249,995 Discount • Dumping DFS 16 4,500 72,000
  • 7. Introduction BeoLink.org DFS Distributed file systems NFS, CIFS, Netware.. Distributed fault tolerant file CODA, MS DFS.. systems Distributed parallel file VPFS2,LUSTRE.. systems Distributed parallel fault tolerant file Hadoop, GlusterFS, systems MogileFS.. Peer-to-peer file systems Ivy, Infinit..
  • 8. openAFS BeoLink.org Intruduction • Client Caching • Replication Scalability • Load balance among servers while data is in use Transparent Access • Cell • Partitions and volumes and Uniform • Mount Points Namespace • In-use volume moves • Authentication and secure communication • Authorization and flexible access control Security • Single system interface • Administration tasks without system outage System Management • Delegation • Backup
  • 9. openAFS BeoLink.org Main Elements Cell • Cell is collection of file servers and workstation • The directories under /afs are cells, unique tree • Fileserver contains volumes Volumes • Volumes are "containers" or sets of related files and directories • Have size limit • 3 type rw, ro, backup Mount Point Directory • Access to a volume is provided through a mount point • A mount point looks and just like a static directory
  • 10. openAFS BeoLink.org Server Types Fileserver Server • Fileserver, delivers data files from the file server machine to workstations • Volume Server (Vol Server), performs all types of volume manipulation Database Server • Volume Location Server (VL Server), maintains the Volume Location Database (VLDB) • Protection Server (Ptserver), Users can grant access to several other users. • Authentication Server(Kaserver), AFS version of kerberos IV (deprecated). • Backup Server (Buserver), it stores information related to the Backup System. Ubik • Distributed Database
  • 11. openAFS BeoLink.org Implementation Problem: Company file system • Share documents • User home dir • Application file storage • WAN Environment Solution • openAFS • Scalable, HA, good in WAN, inexpensive • More then 20 platforms • Samba (Gateway) • AFS windows client slow and bit unstable • Clientless • Heimdal Kerberos (SSO) • KA emulation • LDAP backend • Openldap • Centralize Identity storage
  • 12. openAFS BeoLink.org Usage Read/Write Volume •  Shared development areas •  Documentation data storage •  User home directories Read-Only Volume •  Application deployment •  Application executables (binaries, libraries, scripts) •  Configuration files •  Documentations (Model)
  • 13. openAFS BeoLink.org Design Scalability • Storage scalability (File system layer) • User scalability (Samba Gateway layer) Performance • Load balancing • Roaming user/branch office Clientless • Windows client Centralized Identity • Kerberos • Ldap
  • 14. openAFS BeoLink.org Tricks Cache on Plan the At least 3 separated Directory servers disk Tree Use volume Replicate Use volume name that read only much as explain data possible mount point Replicate “mount 400 clients point” per server volume
  • 15. openAFS BeoLink.org Enviroment 3 AFS Server (3TB) • Disk 6 x 300 SAS RAID 5 • 2 Gigabits Ethernet • 2 Processor Xeon • 2 GB Ram 2 Samba Server • Disk 2 x 73 SAS RAID 1 • 2 Gigabits Ethernet • 2 Processor Xeon • 4 GB Ram 2 Switch (Backbone) • 24 port Users • 400 Concurrent Unix • 250 Concurrent Windows
  • 16. openAFS BeoLink.org Linux Performance • 20-35 MB/s Write • Warm Read 35-100 MB/s • Cold 30-45 MB/s
  • 17. openAFS BeoLink.org Windows through Samba Performance • 18-25 MB/s Write • 20-50 MB/s Read
  • 18. openAFS BeoLink.org Who use it ? Morgan Stanley IT •  Internal usage •  Storage: 450 TB (ro)+ 15 TB (rw) •  Client: 22.000 Pictage, Inc •  Online picture album •  Storage: 265TB ( planned growth to 425TB in twelve months) •  Volumes: 800,000. •  Files: 200 000 000. Embian • Internet Shared folder • Storage: 500TB • Server: 200 Storage server • 300 App server RZH • Internal usage 210TB
  • 19. openAFS BeoLink.org Good for.. Good •  General purpose •  Wide Area Network •  Heterogeneous System •  Read operation > write operation •  Small File Bad • Locking • Database • Unicode • Performance (until OSD)
  • 20. New way BeoLink.org • Object-based storage • Separation of file metadata management (MDS) from the storage of file data OS • Object storage devices • Replace the traditional block-level interface with one named object OSDs
  • 21. New way BeoLink.org • Multiple streams are parallel channels through which data can flow, thus improving the rate at which data can be written to the storage media Stream • Files are striped across a set of nodes in order to facilitate parallel access • Chunk simplify fault tolerance operation. Chunk
  • 22. Hadoop BeoLink.org Introduction Scalable: can reliably store and process petabytes. Economical: It distributes the data and processing across clusters of commonly available computers. “Moving Computation is Cheaper than Moving Data” Efficient: can process data in parallel on the nodes where the data is located. Reliable: automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
  • 23. Hadoop BeoLink.org MapReduce MapReduce • it is an associated implementation for processing and generating large data sets. Map • It is a function that processes a key/value pair to generate a set of intermediate key/value pairs, Reduce • It is a function that merges all intermediate values associated with the same intermediate key.
  • 24. Hadoop BeoLink.org Map and Reduce Map • Split and mapped in key- value pairs Combine • For efficiency reasons, the combiner work directly to map operation outputs . Reduce • The files are then merge, sorted and reduced
  • 25. Hadoop BeoLink.org HDFS Architecture
  • 26. Hadoop BeoLink.org Implementation Problem: Log centralization • Centralized log, keep track of all system activity • Search and statistics Solution • HDFS • Scalable, HA, distribution task • Hearbeat+DRDB • HA namenode • Syslog-ng • Flexible and scalable • Grep MapReduce Function • Mail logging • Firewall logging • Webserver logging • Generic Syslog
  • 27. Hadoop BeoLink.org Solution: Scale on demand •  ncrease syslog concentrator I •  adoop cluster size H Performance •  edicated mapReduce D function for report and search •  arallel operation P High Availability •  nternal replication I •  istribution on different D shelf
  • 28. Hadoop BeoLink.org Enviroment 2 Log Server • Disk 2 x 143 SAS RAID 1 • 2 Gigabits Ethernet • 2 Processor Xeon • 4 GB Ram 2 Switch (Backbone) • 24 port Gigabit Hadoop • 2 namenode 8gb,300GB, 2Xeon • 5 node 4gb, 2TB, 2 Xeon
  • 29. Hadoop BeoLink.org Tricks Much server Parallel Block size as possible Streams Map / Good No old Reduce/ Network hardware Partitioning (Gigabits) fuctions Simple Software distribution
  • 30. Hadoop BeoLink.org Who use it ? Yahoo! • 2000 nodes (2*4cpu boxes w 3TB disk each) • Used to support research for Ad Systems and Web Search A9.com - Amazon • Amazon's product search Facebook • Internal log storage • Reporting/analytics and machine learning • 320 machine cluster with 2,560 cores and about 1.3 PB raw storage Last.fm • Charts calculation and web log analysis • 25 node cluster (dual Xeon LV 2GHz, 4GB RAM, 1TB/node storage) • 10 node cluster (dual Xeon L5320 1.86GHz, 8GB RAM, 3TB/node storage)
  • 31. Hadoop BeoLink.org Good for.. Good • Task distribution (Basic GRID infrastructure) • Distribution of content (High throughput of data access ) • Read operations >> Write operations Bad • Not General purpose File system • Not Posix Compliant • Low granularity in security setting
  • 32. Ceph BeoLink.org Next Generation Ceph addresses three critical challenges of system storage Scalability Performance Reliability
  • 33. Ceph BeoLink.org Introduction Capabilities • POSIX semantics. • Seamless scaling from a few nodes to many thousands • Gigabytes to Petabytes • High availability and reliability • No single points of failure • N-way replication of all data across multiple nodes • Automatic rebalancing of data on node addition/ removal to efficiently utilize device resources • Easy deployment (userspace daemons)
  • 34. Ceph BeoLink.org Architecture • Client • Metadata Cluster OSD • Object Storage Cluster
  • 35. Ceph BeoLink.org Architecture difference Dynamic Distributed Metadata • Metadata Storage • Dynamic Subtree Partitionin • Traffic Control Reliable Autonomic Distributed Object Storage • Data Distribution • Replication • Data Safety • Failure Detection • Recovery and Cluster Updates
  • 36. Ceph BeoLink.org Architecture Pseudo-random data distribution function (CRUSH) Reliable object storage service (RADOS) Extent B-tree object File System
  • 37. Ceph BeoLink.org Transaction Splay Replication •  Only after it has been safely committed to disk is a final commit notification sent to the client.
  • 38. Ceph BeoLink.org Good for.. Good • General purpose (Posix compliant) • High throughput of data access (scientific) • Heavy Read / Write operations • Coherent Bad • Young (not complete yet) • Linux only
  • 39. Conclusions BeoLink.org Environment Analysis • No true Generic DFS • Not simple move 400TB btw different solution Dimension • Start with the right size • Servers number is related to speed needed and number of clients • Replication Divide system in Class of Service • Different disk Type • Different Computer Type System Management • Monitoring Tools • System/Software Deploy Tools
  • 40. Next BeoLink.org Hadoop •  amba exporting (VFS ?) S •  yslog server S •  BASE H •  olr S openAFS •  D integration (Q4) A •  FS Manager A •  pcoming release (OSD) U •  EBDAV W CEPH •  napshoot S •  est with Samba Cluster T
  • 41. Links BeoLink.org OpenAFS Hadoop Ceph • www.openafs.org • Hadoop.apache.org • ceph.newdream.net • www.beolink.org • Hbase • Publication • Pig • Mailing list • Mahout
  • 42. The least but not .. BeoLink.org Gluster • Stable • Good performance MogileFS • Application oriented • High Availability PVFS2 • Scientific oriented • High Performance • Plugin Lustre • High performance • Stable
  • 43. BeoLink.org Reference • For Further Questions: • Fabrizio Manfredi • fabrizio.manfredi@gmail.com manfred.furuholmen@gmail.com • http://www.beolink.org Too Long The End