SlideShare uma empresa Scribd logo
1 de 64
Under the Hood of Oracle ASM:
Fault Tolerance
Alex Gorbachev


Oracle OpenWorld
San Francisco, 2011
Alex Gorbachev

    • CTO,  The Pythian Group
    • Blogger

    • OakTable Network member
    • Oracle ACE Director

    • BattleAgainstAnyGuess.com

    • President, Oracle RAC SIG




2                           © 2009/2010 Pythian
Why Companies Trust Pythian
    • Recognized Leader:
    •   Global industry-leader in remote database administration services and consulting for Oracle,
        Oracle Applications, MySQL and SQL Server
    •   Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and
        MDS Inc. to help manage their complex IT deployments

    • Expertise:
    •   One of the world’s largest concentrations of dedicated, full-time DBA expertise.

    • Global Reach & Scalability:
    •   24/7/365 global remote support for DBA and consulting, systems administration, special
        projects or emergency response




3
8                                               © 2011 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




            file1




4                             © 2009/2010 Pythian
ASM Striping

    • Extents and Allocation Units
    • Coarse striping & Fine striping

    • Random striping -> equal distribution
    • You cannot disable ASM striping




            file1         file2




4                               © 2009/2010 Pythian
ASM Rebalancing

    • asm_power_limit       - rebalancing speed
     •   Async rebalancing in 11.2.0.2
    • No auto-magic re-layout based on performance
    • You can force rebalancing for a DG




5                                   © 2009/2010 Pythian
ASM Rebalancing

    • asm_power_limit       - rebalancing speed
     •   Async rebalancing in 11.2.0.2
    • No auto-magic re-layout based on performance
    • You can force rebalancing for a DG




5                                   © 2009/2010 Pythian
ASM Rebalancing

    • asm_power_limit       - rebalancing speed
     •   Async rebalancing in 11.2.0.2
    • No auto-magic re-layout based on performance
    • You can force rebalancing for a DG




5                                   © 2009/2010 Pythian
ASM Rebalancing

    • asm_power_limit       - rebalancing speed
     •   Async rebalancing in 11.2.0.2
    • No auto-magic re-layout based on performance
    • You can force rebalancing for a DG




5                                   © 2009/2010 Pythian
ASM Rebalancing

    • asm_power_limit       - rebalancing speed
     •   Async rebalancing in 11.2.0.2
    • No auto-magic re-layout based on performance
    • You can force rebalancing for a DG




5                                   © 2009/2010 Pythian
ASM Rebalancing

    • asm_power_limit       - rebalancing speed
     •   Async rebalancing in 11.2.0.2
    • No auto-magic re-layout based on performance
    • You can force rebalancing for a DG




5                                   © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Mirroring

    • Primary  extent & secondary extent (mirrored copy)
    • Write is done to all extent copies

    • Read is done always from primary extent by default




6                            © 2009/2010 Pythian
ASM Failure Groups

    • Group  of disks can fail at once
    • Mirror extents between failure groups

    • Beware of space provisioning




          SAN 1                            SAN 2




7                            © 2009/2010 Pythian
ASM Failure Groups

    • Group  of disks can fail at once
    • Mirror extents between failure groups

    • Beware of space provisioning




          SAN 1




7                            © 2009/2010 Pythian
ASM Failure Groups

    • Group  of disks can fail at once
    • Mirror extents between failure groups

    • Beware of space provisioning




          SAN 1




7                            © 2009/2010 Pythian
ASM Redundancy Levels

    • Perdiskgroup or per file
    • DG type external
     •   per file - unprotected only
    • DG    type normal redundancy
     •   per file - unprotected, two-way or three-way
    • DG    type high redundancy
     •   per file - three way only




8                                      © 2009/2010 Pythian
Read IO Failures

    • Can  re-read mirror block since 7.3 (initially done for
      Veritas)
    • Same happens with ASM - attempt to re-read from another
      mirror
     •   If successful, repair is done for all mirrors
    • H/w    RAID mirroring doesn’t give that flexibility
     •   Re-read is done by database blindly hopping for the best
    • Read failures in the database are handled depending on
      context
    • ASM can recover not only from media failure errors but
      from corruptions (bad checksum or wrong SCN)


9                                      © 2009/2010 Pythian
Read IO Failure Remapping

     • Read from secondary extent is performed
     • Write back to the *same* place is attempted
      •   Disk might do its own block reallocation
     • If the write to the same location fails then extent
       relocated and original AU marked as unusable
     • If the second write fails, then disk set OFFLINE like on any
       other write failure
     • Fix occurs only if a reading process can lock that extent



     • REMAP command - doesn’t detect block corruption but
      only read media failure


10                                    © 2009/2010 Pythian
Silent Corruptions of Secondary Extents

     • Reads    are first attempted from primary extent
      •   Secondary extent is accessed if primary read fails
     • preferred_read_failure_groups         can cause the same
       issue when some mirror extents are rarely read
     • Automatic remap is done when failure is detected

     • ASMCMD REMAP command forces read of an extent so if
       read media error is produced, remapping happens
      •   REMAP is used best alongside of disk scrubbing features
     • REMAP  doesn’t detect mirrors inconsistencies or logical
       block corruptions!
     • AMDU utility - Google for “Luca Canali AMDU”



11                                    © 2009/2010 Pythian
Partial Writes

     • ASM    doesn’t use DRL (Dirty Region Logging)
      •   So what happens if one mirror is done and second mirror write
          didn’t complete during a crash?
     • Oracle database has an ability to recover corrupted blocks
     • Oracle database reads both mirrors if corruption is
       detected and if one of the mirrors is good, repair happens




12                                   © 2009/2010 Pythian
Disk Failure

     • Diskis taken offline on write error - not on read error
     • On disk failure, ASM reads headers of disks in FG
      •   Whole FG is dismounted rather then individual disks
      •   Multiple FG failures -> dismount DG
     • ASM also probes partners when disk fails trying to identify
      failure group pathology
      •   If disks is lost while one of its partners is offline, DG is dismounted
     • Read    failure from disk header -> ASM takes disk offline




13                                      © 2009/2010 Pythian
What is MTBF?

     • Mean Time Before Failure (practically MTTF)
     • MTBF is inverse of the failure rate during useful life
      •    Thus MTBF is indicator of failure rate but does not predict useful
          lifetime
     • Assuming     failures have exponential distribution
      •   λ = 1-exp(-t/MTBF)
      •   Annualized Failure Rate (AFR) is λ calculated for time t = 1 year
     • MTBF    of 1,600,000 hours is 182+ years
      •   Doesn’t mean the drive will likely work 182 years!
      •   AFR = 1 - exp(24 * 365 / 1,600,000) = 0.546%



14                                    © 2009/2010 Pythian
“Bathtub” Failure Rate Curve




15             © 2009/2010 Pythian
Real AFR & Utilization (ref Google study)

     3 months infant
     mortality:
           10% AFR


     Assuming exponential
     distribution, hourly
     failure rate = 0.001%


     Poisson process:
     - exponential
     distribution
     - no failure
     correlation


16                           © 2009/2010 Pythian
Hardware Mirroring RAID and 2nd Disk Failure

                     • Data  loss happens only if 2nd failed disk
                       is the mirror of the 1st failed disk.
                     • Window of vulnerability = 5 hours
                      •   Time to replace a failed disk
                     • The chance that 2nd mirror fails in the
                      next 5 hours is 0.005% resulting in data
                      loss (based on Google’s data of 10% AFR)




17                           © 2009/2010 Pythian
Hardware Mirroring RAID and 2nd Disk Failure

                     • Data  loss happens only if 2nd failed disk
                       is the mirror of the 1st failed disk.
                     • Window of vulnerability = 5 hours
                      •   Time to replace a failed disk
                     • The chance that 2nd mirror fails in the
                      next 5 hours is 0.005% resulting in data
                      loss (based on Google’s data of 10% AFR)




17                           © 2009/2010 Pythian
Hardware Mirroring RAID and 2nd Disk Failure

                     • Data  loss happens only if 2nd failed disk
                       is the mirror of the 1st failed disk.
                     • Window of vulnerability = 5 hours
                      •   Time to replace a failed disk
                     • The chance that 2nd mirror fails in the
                      next 5 hours is 0.005% resulting in data
                      loss (based on Google’s data of 10% AFR)




17                           © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure




18                       © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure

                                               What if mirrored
                                               extent is placed
                                               randomly on other
                                               disks?




18                       © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure

                                               What if mirrored
                                               extent is placed
                                               randomly on other
                                               disks?
                                               ... any 2nd disk
                                               failure would result
                                               in data loss




18                       © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure

                                               What if mirrored
                                               extent is placed
                                               randomly on other
                                               disks?
                                               ... any 2nd disk
                                               failure would result
                                               in data loss
                                               Think Exadata - 168
                                               disks and one fails...
                                               then 2nd failure of
                                               any of 156 disks will
                                               result in data loss




18                       © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure

                                               What if mirrored
                                               extent is placed
                                               randomly on other
                                               disks?
                                               ... any 2nd disk
                                               failure would result
                                               in data loss
                                               Think Exadata - 168
                                               disks and one fails...
                                               then 2nd failure of
                                               any of 156 disks will
                                               result in data loss
                                               5h window of
                                               vulnerability



18                       © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure

                                               What if mirrored
                                               extent is placed
                                               randomly on other
                                               disks?
                                               ... any 2nd disk
                                               failure would result
                                               in data loss
                                               Think Exadata - 168
                                               disks and one fails...
                                               then 2nd failure of
                                               any of 156 disks will
                                               result in data loss
                                               5h window of
                                               vulnerability
                                               Chance of data loss =
                                               0.777%

18                       © 2009/2010 Pythian
ASM Mirroring and 2nd Disk Failure

                                               What if mirrored
                                               extent is placed
                                               randomly on other
                                               disks?
                                               ... any 2nd disk
                                               failure would result
                                               in data loss
                                               Think Exadata - 168
                                               disks and one fails...
                                               then 2nd failure of
                                               any of 156 disks will
                                               result in data loss
                                               5h window of
                                               vulnerability
                                               Chance of data loss =
                                               0.777%

18                       © 2009/2010 Pythian
ASM Partnering




19                    © 2009/2010 Pythian
ASM Partnering

                                            Each disk is assigned
                                            several partner disks
                                            (8 partners in
                                            11.2.0.1+
                                                                        )
                                            _asm_partner_target_disk_part




19                    © 2009/2010 Pythian
ASM Partnering

                                            Each disk is assigned
                                            several partner disks
                                            (8 partners in
                                            11.2.0.1+
                                                                        )
                                            _asm_partner_target_disk_part


                                            Now only one of 8
                                            partner disk failures
                                            would cause data loss
                                            - i.e. only 0.04%
                                            chance of data loss
                                            during 5 hours
                                            windows of
                                            vulnerability after
                                            the first disk failure




19                    © 2009/2010 Pythian
ASM Partnering

                                            Each disk is assigned
                                            several partner disks
                                            (8 partners in
                                            11.2.0.1+
                                                                        )
                                            _asm_partner_target_disk_part


                                            Now only one of 8
                                            partner disk failures
                                            would cause data loss
                                            - i.e. only 0.04%
                                            chance of data loss
                                            during 5 hours
                                            windows of
                                            vulnerability after
                                            the first disk failure
                                            Above is for Google’s
                                            10% AFR.


19                    © 2009/2010 Pythian
ASM Partnering

                                                   Each disk is assigned
                                                   several partner disks
       For manufacturer’s MTBF 1,600,000 hours (large partners in
                                                   (8
       diskgroups):                                11.2.0.1+
       •AFR = 0.546%                                   _asm_partner_target_disk_part)
       •data loss chance is 0.002% after first failure,
       during 5 hours recovery time                    Now only one of 8
                                                    partner disk failures
                                                    would cause data loss
       Assumption: disk failures follow Poisson process!
                                                    - i.e. only 0.04%
                                                    chance of data loss
                                                    during 5 hours
                                                    windows of
                                                    vulnerability after
                                                    the first disk failure
                                                    Above is for Google’s
                                                    10% AFR.


19                                  © 2009/2010 Pythian
Disk Failures in Real Life - a non-Poisson Process

     • Failures   are correlated in time
      •   Manufacturing defects affecting a batch
      •   Environmental (temperature)
      •   Operational (power surge)
      •   Software bugs
     • Non-exponential      distribution in time




20                                    © 2009/2010 Pythian
Consider a group of n disks all coming from the same production
batch. We will consider two distinct failure processes:
1. Each disk will be subject to independent failures that will be
   exponentially distributed with rate λ; these independent failures
   are the ones that are normally considered in reliability studies.
2. The whole batch will be subject to the unpredictable
   manifestation of a common defect. This event will be
   exponentially distributed with rate λ' << λ. It will not result in the
   immediate failure of any disk but will accelerate disk failures and
   make them happen at a rate λ'' >> λ.
21                             © 2009/2010 Pythian
Probability of Surviving a Data Loss After a Failure


              P   surv   = exp(–nλTR)
        n - # of partners for a failed ASM disk (8)
        λ - rate of failure
        TR - window of vulnerability (5 hours)




                                                 (normal redundancy)
22                         © 2009/2010 Pythian
Random Failure
                (No Global Batch Defect Manifestations)


                P   surv   = exp(–nλTR)
     λ - “normal” rate of failure (or 1/MTBF)

     MTBF = 1000000 hours:
     P   surv   = 99.996%
     Google’s AFR of 10%:
     P   surv   = 99.954%                            (normal redundancy)
23                             © 2009/2010 Pythian
Random Failure
                (No Global Batch Defect Manifestations)


                P   surv   = exp(–nλTR)
     λ - “normal” rate of failure (or 1/MTBF)

     MTBF = 1000000 hours:
     P   surv   = 99.996%
     Google’s AFR of 10%:
     P   surv   = 99.954%                            (normal redundancy)
23                             © 2009/2010 Pythian
After a Failure Caused by a Global Defect


                P   surv   = exp(–nλ’’TR)
          λ’’ - accelerated rate of failure

     λ’’ is one failure per week:
     P   surv   = 78.813%
     λ’’ is one failure per month:
     P   surv   = 94.596%                          (normal redundancy)
24                           © 2009/2010 Pythian
After a Failure Caused by a Global Defect


P    surv   = (1+nλ’’TR)exp(–nλ’’TR)
            λ’’ - accelerated rate of failure
            n - 5 hours
        λ’’ is one failure per week:
        P   surv   = 97.58%
        λ’’ is one failure per month:
        P   surv   = 99.85%                        (high redundancy)
25                           © 2009/2010 Pythian
After a Failure Caused by a Global Defect


P    surv   = (1+nλ’’TR)exp(–nλ’’TR)
            λ’’ - accelerated rate of failure
            n - 1 hour
        λ’’ is one failure per week:
        P   surv   = 99.89%
        λ’’ is one failure per month:
        P   surv   = 99.99%                        (high redundancy)
26                           © 2009/2010 Pythian
DEMO

     -- target # of partners
     select ksppstvl
       from sys.x$ksppi p, sys.x$ksppcv v
       where p.indx=v.indx and
         ksppinm='_asm_partner_target_disk_part';

     -- disk partners
     select d.path, p.number_kfdpartner
       from x$kfdpartner p, v$asm_disk_stat d
      where p.disk=&disk_no
        and p.grp=group_number
        and p.number_kfdpartner=d.disk_number;




27                      © 2009/2010 Pythian
Disk Failure Recovery - ASM vs RAID mirroring




     • Ask   the audience!




28                           © 2009/2010 Pythian
Q&A




     Email me - gorbachev@pythian.com
     Read my blog - http://www.pythian.com
     Follow me on Twitter - @AlexGorbachev
     Join Pythian fan club on Facebook & LinkedIn
29                           © 2009/2010 Pythian

Mais conteúdo relacionado

Mais procurados

Homeslide2
Homeslide2Homeslide2
Homeslide2
tykim001
 
Ibm san volume controller and ibm tivoli storage flash copy manager redp4653
Ibm san volume controller and ibm tivoli storage flash copy manager redp4653Ibm san volume controller and ibm tivoli storage flash copy manager redp4653
Ibm san volume controller and ibm tivoli storage flash copy manager redp4653
Banking at Ho Chi Minh city
 

Mais procurados (10)

z/OS Workload Management Update for z/OS V1.11 and V1.12
z/OS Workload Management Update for z/OS V1.11 and V1.12z/OS Workload Management Update for z/OS V1.11 and V1.12
z/OS Workload Management Update for z/OS V1.11 and V1.12
 
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere ReplicationVMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere Replication
 
Cloud infrastructure licensing and pricing customer presentation
Cloud infrastructure licensing and pricing customer presentationCloud infrastructure licensing and pricing customer presentation
Cloud infrastructure licensing and pricing customer presentation
 
Presentazione VMware @ VMUGIT UserCon 2015
Presentazione VMware @ VMUGIT UserCon 2015Presentazione VMware @ VMUGIT UserCon 2015
Presentazione VMware @ VMUGIT UserCon 2015
 
Homeslide2
Homeslide2Homeslide2
Homeslide2
 
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
 
vCenter Site Recovery Manager: Architecting a DR Solution
vCenter Site Recovery Manager: Architecting a DR SolutionvCenter Site Recovery Manager: Architecting a DR Solution
vCenter Site Recovery Manager: Architecting a DR Solution
 
Building vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring ToolsBuilding vSphere Perf Monitoring Tools
Building vSphere Perf Monitoring Tools
 
VMworld 2014: Site Recovery Manager and Stretched Storage
VMworld 2014: Site Recovery Manager and Stretched StorageVMworld 2014: Site Recovery Manager and Stretched Storage
VMworld 2014: Site Recovery Manager and Stretched Storage
 
Ibm san volume controller and ibm tivoli storage flash copy manager redp4653
Ibm san volume controller and ibm tivoli storage flash copy manager redp4653Ibm san volume controller and ibm tivoli storage flash copy manager redp4653
Ibm san volume controller and ibm tivoli storage flash copy manager redp4653
 

Destaque (7)

Tic y brecha digital
Tic y brecha digitalTic y brecha digital
Tic y brecha digital
 
kintone Overview for JAIT
kintone Overview for JAITkintone Overview for JAIT
kintone Overview for JAIT
 
Nutrition in older adults
Nutrition in older adultsNutrition in older adults
Nutrition in older adults
 
Chapter 7 lc business management skills communication
Chapter 7 lc business  management skills communicationChapter 7 lc business  management skills communication
Chapter 7 lc business management skills communication
 
Equipos informáticos. 1º bachillerato
Equipos informáticos. 1º bachilleratoEquipos informáticos. 1º bachillerato
Equipos informáticos. 1º bachillerato
 
Informatica y sociedad. 1º ESO. 01. La informática como elemento de innovación
Informatica y sociedad. 1º ESO. 01. La informática como elemento de innovaciónInformatica y sociedad. 1º ESO. 01. La informática como elemento de innovación
Informatica y sociedad. 1º ESO. 01. La informática como elemento de innovación
 
31 rapport activité cdsp 2011
31 rapport activité cdsp 201131 rapport activité cdsp 2011
31 rapport activité cdsp 2011
 

Semelhante a [INSIGHT OUT 2011] B16 analysis of oracle asm failability(alex)

We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster
We4IT Group
 
Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)
sumanth ch
 

Semelhante a [INSIGHT OUT 2011] B16 analysis of oracle asm failability(alex) (20)

Presentation implementing oracle asm successfully
Presentation    implementing oracle asm successfullyPresentation    implementing oracle asm successfully
Presentation implementing oracle asm successfully
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The Evolution
 
Oracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle Flex ASM - What’s New and Best Practices by Jim WilliamsOracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle Flex ASM - What’s New and Best Practices by Jim Williams
 
Less05 asm instance
Less05 asm instanceLess05 asm instance
Less05 asm instance
 
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
Linux on System z Optimizing Resource Utilization for Linux under z/VM - Part1
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
 
les12.pdf
les12.pdfles12.pdf
les12.pdf
 
Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019
Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019
Scalable Relational Databases with Amazon Aurora. Madrid Summit 2019
 
Splunk-EMC
Splunk-EMCSplunk-EMC
Splunk-EMC
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster
 
D81242GC20_les01.pptx
D81242GC20_les01.pptxD81242GC20_les01.pptx
D81242GC20_les01.pptx
 
What's new in Amazon Aurora - ADB207 - New York AWS Summit
What's new in Amazon Aurora - ADB207 - New York AWS SummitWhat's new in Amazon Aurora - ADB207 - New York AWS Summit
What's new in Amazon Aurora - ADB207 - New York AWS Summit
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
 
Mega Launch Recap Slide Deck
Mega Launch Recap Slide DeckMega Launch Recap Slide Deck
Mega Launch Recap Slide Deck
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMI
 
AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...
AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...
AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...
 
Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)
 
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
 
Storage tiering for Oracle Database on AWS and Oracle EBusiness Suite on AWS ...
Storage tiering for Oracle Database on AWS and Oracle EBusiness Suite on AWS ...Storage tiering for Oracle Database on AWS and Oracle EBusiness Suite on AWS ...
Storage tiering for Oracle Database on AWS and Oracle EBusiness Suite on AWS ...
 

Mais de Insight Technology, Inc.

コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
Insight Technology, Inc.
 

Mais de Insight Technology, Inc. (20)

グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
Docker and the Oracle Database
Docker and the Oracle DatabaseDocker and the Oracle Database
Docker and the Oracle Database
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する
 
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
 
MBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとMBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごと
 
グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
DBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームDBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォーム
 
SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門
 
Lunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL ServicesLunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL Services
 
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉
 
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也
 
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
 
難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?
 
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
 
そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?
 
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
 
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
 
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
 
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
 

Último

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

[INSIGHT OUT 2011] B16 analysis of oracle asm failability(alex)

  • 1. Under the Hood of Oracle ASM: Fault Tolerance Alex Gorbachev Oracle OpenWorld San Francisco, 2011
  • 2. Alex Gorbachev • CTO, The Pythian Group • Blogger • OakTable Network member • Oracle ACE Director • BattleAgainstAnyGuess.com • President, Oracle RAC SIG 2 © 2009/2010 Pythian
  • 3. Why Companies Trust Pythian • Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments • Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. • Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response 3 8 © 2011 Pythian
  • 4. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping 4 © 2009/2010 Pythian
  • 5. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping 4 © 2009/2010 Pythian
  • 6. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping 4 © 2009/2010 Pythian
  • 7. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping 4 © 2009/2010 Pythian
  • 8. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping 4 © 2009/2010 Pythian
  • 9. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping 4 © 2009/2010 Pythian
  • 10. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping file1 4 © 2009/2010 Pythian
  • 11. ASM Striping • Extents and Allocation Units • Coarse striping & Fine striping • Random striping -> equal distribution • You cannot disable ASM striping file1 file2 4 © 2009/2010 Pythian
  • 12. ASM Rebalancing • asm_power_limit - rebalancing speed • Async rebalancing in 11.2.0.2 • No auto-magic re-layout based on performance • You can force rebalancing for a DG 5 © 2009/2010 Pythian
  • 13. ASM Rebalancing • asm_power_limit - rebalancing speed • Async rebalancing in 11.2.0.2 • No auto-magic re-layout based on performance • You can force rebalancing for a DG 5 © 2009/2010 Pythian
  • 14. ASM Rebalancing • asm_power_limit - rebalancing speed • Async rebalancing in 11.2.0.2 • No auto-magic re-layout based on performance • You can force rebalancing for a DG 5 © 2009/2010 Pythian
  • 15. ASM Rebalancing • asm_power_limit - rebalancing speed • Async rebalancing in 11.2.0.2 • No auto-magic re-layout based on performance • You can force rebalancing for a DG 5 © 2009/2010 Pythian
  • 16. ASM Rebalancing • asm_power_limit - rebalancing speed • Async rebalancing in 11.2.0.2 • No auto-magic re-layout based on performance • You can force rebalancing for a DG 5 © 2009/2010 Pythian
  • 17. ASM Rebalancing • asm_power_limit - rebalancing speed • Async rebalancing in 11.2.0.2 • No auto-magic re-layout based on performance • You can force rebalancing for a DG 5 © 2009/2010 Pythian
  • 18. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 19. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 20. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 21. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 22. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 23. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 24. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 25. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 26. ASM Mirroring • Primary extent & secondary extent (mirrored copy) • Write is done to all extent copies • Read is done always from primary extent by default 6 © 2009/2010 Pythian
  • 27. ASM Failure Groups • Group of disks can fail at once • Mirror extents between failure groups • Beware of space provisioning SAN 1 SAN 2 7 © 2009/2010 Pythian
  • 28. ASM Failure Groups • Group of disks can fail at once • Mirror extents between failure groups • Beware of space provisioning SAN 1 7 © 2009/2010 Pythian
  • 29. ASM Failure Groups • Group of disks can fail at once • Mirror extents between failure groups • Beware of space provisioning SAN 1 7 © 2009/2010 Pythian
  • 30. ASM Redundancy Levels • Perdiskgroup or per file • DG type external • per file - unprotected only • DG type normal redundancy • per file - unprotected, two-way or three-way • DG type high redundancy • per file - three way only 8 © 2009/2010 Pythian
  • 31. Read IO Failures • Can re-read mirror block since 7.3 (initially done for Veritas) • Same happens with ASM - attempt to re-read from another mirror • If successful, repair is done for all mirrors • H/w RAID mirroring doesn’t give that flexibility • Re-read is done by database blindly hopping for the best • Read failures in the database are handled depending on context • ASM can recover not only from media failure errors but from corruptions (bad checksum or wrong SCN) 9 © 2009/2010 Pythian
  • 32. Read IO Failure Remapping • Read from secondary extent is performed • Write back to the *same* place is attempted • Disk might do its own block reallocation • If the write to the same location fails then extent relocated and original AU marked as unusable • If the second write fails, then disk set OFFLINE like on any other write failure • Fix occurs only if a reading process can lock that extent • REMAP command - doesn’t detect block corruption but only read media failure 10 © 2009/2010 Pythian
  • 33. Silent Corruptions of Secondary Extents • Reads are first attempted from primary extent • Secondary extent is accessed if primary read fails • preferred_read_failure_groups can cause the same issue when some mirror extents are rarely read • Automatic remap is done when failure is detected • ASMCMD REMAP command forces read of an extent so if read media error is produced, remapping happens • REMAP is used best alongside of disk scrubbing features • REMAP doesn’t detect mirrors inconsistencies or logical block corruptions! • AMDU utility - Google for “Luca Canali AMDU” 11 © 2009/2010 Pythian
  • 34. Partial Writes • ASM doesn’t use DRL (Dirty Region Logging) • So what happens if one mirror is done and second mirror write didn’t complete during a crash? • Oracle database has an ability to recover corrupted blocks • Oracle database reads both mirrors if corruption is detected and if one of the mirrors is good, repair happens 12 © 2009/2010 Pythian
  • 35. Disk Failure • Diskis taken offline on write error - not on read error • On disk failure, ASM reads headers of disks in FG • Whole FG is dismounted rather then individual disks • Multiple FG failures -> dismount DG • ASM also probes partners when disk fails trying to identify failure group pathology • If disks is lost while one of its partners is offline, DG is dismounted • Read failure from disk header -> ASM takes disk offline 13 © 2009/2010 Pythian
  • 36. What is MTBF? • Mean Time Before Failure (practically MTTF) • MTBF is inverse of the failure rate during useful life • Thus MTBF is indicator of failure rate but does not predict useful lifetime • Assuming failures have exponential distribution • λ = 1-exp(-t/MTBF) • Annualized Failure Rate (AFR) is λ calculated for time t = 1 year • MTBF of 1,600,000 hours is 182+ years • Doesn’t mean the drive will likely work 182 years! • AFR = 1 - exp(24 * 365 / 1,600,000) = 0.546% 14 © 2009/2010 Pythian
  • 37. “Bathtub” Failure Rate Curve 15 © 2009/2010 Pythian
  • 38. Real AFR & Utilization (ref Google study) 3 months infant mortality: 10% AFR Assuming exponential distribution, hourly failure rate = 0.001% Poisson process: - exponential distribution - no failure correlation 16 © 2009/2010 Pythian
  • 39. Hardware Mirroring RAID and 2nd Disk Failure • Data loss happens only if 2nd failed disk is the mirror of the 1st failed disk. • Window of vulnerability = 5 hours • Time to replace a failed disk • The chance that 2nd mirror fails in the next 5 hours is 0.005% resulting in data loss (based on Google’s data of 10% AFR) 17 © 2009/2010 Pythian
  • 40. Hardware Mirroring RAID and 2nd Disk Failure • Data loss happens only if 2nd failed disk is the mirror of the 1st failed disk. • Window of vulnerability = 5 hours • Time to replace a failed disk • The chance that 2nd mirror fails in the next 5 hours is 0.005% resulting in data loss (based on Google’s data of 10% AFR) 17 © 2009/2010 Pythian
  • 41. Hardware Mirroring RAID and 2nd Disk Failure • Data loss happens only if 2nd failed disk is the mirror of the 1st failed disk. • Window of vulnerability = 5 hours • Time to replace a failed disk • The chance that 2nd mirror fails in the next 5 hours is 0.005% resulting in data loss (based on Google’s data of 10% AFR) 17 © 2009/2010 Pythian
  • 42. ASM Mirroring and 2nd Disk Failure 18 © 2009/2010 Pythian
  • 43. ASM Mirroring and 2nd Disk Failure What if mirrored extent is placed randomly on other disks? 18 © 2009/2010 Pythian
  • 44. ASM Mirroring and 2nd Disk Failure What if mirrored extent is placed randomly on other disks? ... any 2nd disk failure would result in data loss 18 © 2009/2010 Pythian
  • 45. ASM Mirroring and 2nd Disk Failure What if mirrored extent is placed randomly on other disks? ... any 2nd disk failure would result in data loss Think Exadata - 168 disks and one fails... then 2nd failure of any of 156 disks will result in data loss 18 © 2009/2010 Pythian
  • 46. ASM Mirroring and 2nd Disk Failure What if mirrored extent is placed randomly on other disks? ... any 2nd disk failure would result in data loss Think Exadata - 168 disks and one fails... then 2nd failure of any of 156 disks will result in data loss 5h window of vulnerability 18 © 2009/2010 Pythian
  • 47. ASM Mirroring and 2nd Disk Failure What if mirrored extent is placed randomly on other disks? ... any 2nd disk failure would result in data loss Think Exadata - 168 disks and one fails... then 2nd failure of any of 156 disks will result in data loss 5h window of vulnerability Chance of data loss = 0.777% 18 © 2009/2010 Pythian
  • 48. ASM Mirroring and 2nd Disk Failure What if mirrored extent is placed randomly on other disks? ... any 2nd disk failure would result in data loss Think Exadata - 168 disks and one fails... then 2nd failure of any of 156 disks will result in data loss 5h window of vulnerability Chance of data loss = 0.777% 18 © 2009/2010 Pythian
  • 49. ASM Partnering 19 © 2009/2010 Pythian
  • 50. ASM Partnering Each disk is assigned several partner disks (8 partners in 11.2.0.1+ ) _asm_partner_target_disk_part 19 © 2009/2010 Pythian
  • 51. ASM Partnering Each disk is assigned several partner disks (8 partners in 11.2.0.1+ ) _asm_partner_target_disk_part Now only one of 8 partner disk failures would cause data loss - i.e. only 0.04% chance of data loss during 5 hours windows of vulnerability after the first disk failure 19 © 2009/2010 Pythian
  • 52. ASM Partnering Each disk is assigned several partner disks (8 partners in 11.2.0.1+ ) _asm_partner_target_disk_part Now only one of 8 partner disk failures would cause data loss - i.e. only 0.04% chance of data loss during 5 hours windows of vulnerability after the first disk failure Above is for Google’s 10% AFR. 19 © 2009/2010 Pythian
  • 53. ASM Partnering Each disk is assigned several partner disks For manufacturer’s MTBF 1,600,000 hours (large partners in (8 diskgroups): 11.2.0.1+ •AFR = 0.546% _asm_partner_target_disk_part) •data loss chance is 0.002% after first failure, during 5 hours recovery time Now only one of 8 partner disk failures would cause data loss Assumption: disk failures follow Poisson process! - i.e. only 0.04% chance of data loss during 5 hours windows of vulnerability after the first disk failure Above is for Google’s 10% AFR. 19 © 2009/2010 Pythian
  • 54. Disk Failures in Real Life - a non-Poisson Process • Failures are correlated in time • Manufacturing defects affecting a batch • Environmental (temperature) • Operational (power surge) • Software bugs • Non-exponential distribution in time 20 © 2009/2010 Pythian
  • 55. Consider a group of n disks all coming from the same production batch. We will consider two distinct failure processes: 1. Each disk will be subject to independent failures that will be exponentially distributed with rate λ; these independent failures are the ones that are normally considered in reliability studies. 2. The whole batch will be subject to the unpredictable manifestation of a common defect. This event will be exponentially distributed with rate λ' << λ. It will not result in the immediate failure of any disk but will accelerate disk failures and make them happen at a rate λ'' >> λ. 21 © 2009/2010 Pythian
  • 56. Probability of Surviving a Data Loss After a Failure P surv = exp(–nλTR) n - # of partners for a failed ASM disk (8) λ - rate of failure TR - window of vulnerability (5 hours) (normal redundancy) 22 © 2009/2010 Pythian
  • 57. Random Failure (No Global Batch Defect Manifestations) P surv = exp(–nλTR) λ - “normal” rate of failure (or 1/MTBF) MTBF = 1000000 hours: P surv = 99.996% Google’s AFR of 10%: P surv = 99.954% (normal redundancy) 23 © 2009/2010 Pythian
  • 58. Random Failure (No Global Batch Defect Manifestations) P surv = exp(–nλTR) λ - “normal” rate of failure (or 1/MTBF) MTBF = 1000000 hours: P surv = 99.996% Google’s AFR of 10%: P surv = 99.954% (normal redundancy) 23 © 2009/2010 Pythian
  • 59. After a Failure Caused by a Global Defect P surv = exp(–nλ’’TR) λ’’ - accelerated rate of failure λ’’ is one failure per week: P surv = 78.813% λ’’ is one failure per month: P surv = 94.596% (normal redundancy) 24 © 2009/2010 Pythian
  • 60. After a Failure Caused by a Global Defect P surv = (1+nλ’’TR)exp(–nλ’’TR) λ’’ - accelerated rate of failure n - 5 hours λ’’ is one failure per week: P surv = 97.58% λ’’ is one failure per month: P surv = 99.85% (high redundancy) 25 © 2009/2010 Pythian
  • 61. After a Failure Caused by a Global Defect P surv = (1+nλ’’TR)exp(–nλ’’TR) λ’’ - accelerated rate of failure n - 1 hour λ’’ is one failure per week: P surv = 99.89% λ’’ is one failure per month: P surv = 99.99% (high redundancy) 26 © 2009/2010 Pythian
  • 62. DEMO -- target # of partners select ksppstvl from sys.x$ksppi p, sys.x$ksppcv v where p.indx=v.indx and ksppinm='_asm_partner_target_disk_part'; -- disk partners select d.path, p.number_kfdpartner from x$kfdpartner p, v$asm_disk_stat d where p.disk=&disk_no and p.grp=group_number and p.number_kfdpartner=d.disk_number; 27 © 2009/2010 Pythian
  • 63. Disk Failure Recovery - ASM vs RAID mirroring • Ask the audience! 28 © 2009/2010 Pythian
  • 64. Q&A Email me - gorbachev@pythian.com Read my blog - http://www.pythian.com Follow me on Twitter - @AlexGorbachev Join Pythian fan club on Facebook & LinkedIn 29 © 2009/2010 Pythian