SlideShare uma empresa Scribd logo
1 de 38
Agenda
     Solutions for Disaster Recovery
     Mailbox Server High Availability
     CCR and SCR: Better Together
     Why CCR? Why not SCC?
     Continuous Replication Demystified




2
3
Solutions for Disaster Recovery
 Deleted Item Retention – default 14 days
 Deleted Mailbox Retention – default 30 days
 Mailbox Service and Data Recovery
     Server Recovery
        Setup /m:RecoverServer
        Setup /recoverCMS
     Database portability
     Dial tone portability
     Continuous replication
     Backup and Restore
        Legacy streaming ESE backups
        Volume Shadow Copy Service (VSS) backups
        Recovery Storage Groups, alternate restores
 Edge Transport Server Cloned Configuration
Solutions for Disaster Recovery
 Augment built-in solutions with other processes
    Configuration Management
      Server build standardization
      Server build documentation
    Change management
    Release management
    Proactive monitoring
    Detailed recovery plans
    Regular integrity checks
    Regular practice drills
Server Recovery
     Setup /m:recoverServer
        All roles except Edge
           Fresh install and ImportEdgeConfig for Edge
        All custom settings on Client Access server must be recreated
        Restrictions: Can’t use this for…
           repairing a failed setup
           migrating between different operating systems
           recovering or un-clustering a clustered mailbox server
     Setup /recoverCMS
        For CCR and SCC only
        Restrictions: Can’t use this for…
           changing from CCR to SCC or vice versa
           migrating between different operating systems
           clustering a standalone Mailbox server
           splitting or merging clustered Exchange environments
        Does not trigger Transport Dumpster
        Windows 2003 clustering has dependency on PDC Emulator
6
Data Recovery
     Switch to a replicated copy (Activation)
        Passive copy (LCR/CCR)
        Target copy (SCR)
     Restore from backup
        Same server
        Database portability on alternate server
          Database portability from Windows 2003 to Windows
          2008 has initial performance impact
        Dial tone and data merge using RSG


7
8
Mailbox Server High Availability
 Built-in features for various levels of availability
    Local Continuous Replication (LCR) – data
    availability
    Single Copy Cluster (SCC) – service availability
    Cluster Continuous Replication (CCR) – data and
    service availability
    Standby Continuous Replication (SCR) – disaster
    recovery and site resilience
Mailbox Server High Availability
            Local Continuous Replication (LCR)




10
Mailbox Server High Availability
               Single Copy Cluster (SCC)




11
Mailbox Server High Availability
              Cluster Continuous Replication (CCR)




12
Standby Continuous Replication
          SCR Sources      SCR Targets
                                         Standalone Mailbox
        CCR                              Server (w/o LCR)




                                         Standby Cluster with
                                         Passive Mailbox Role



              Standalone




              SCC
13
14
CCR and SCR: Better Together
 CCR provides high-availability for Mailbox data
 and services within the datacenter
 SCR replicates data remotely to provide site
 resilience for the Mailbox data
        Datacenter A            Datacenter B
CCR across 2 Sites
         Datacenter A     Datacenter B




16
CCR local / SCR to remote Site

            Datacenter A              Datacenter B




17
CCR/SCR vs SCC/Sync – 2 sites
                 Datacenter A                                                Datacenter B
                           CCR                              Log
                                                            corruption                    Setup /recovercms,
                                                            detected                      play logs forward
                                                            immediately
                                                            on replication
                         Physical                           at both
                         Corruption                         targets
                  Logs




                                      Logs
      DB




                                                  DB




                                                                                          DB
                                                                              Logs
                         SCC
                                                           Exchange Disaster                    On Site Failure in
                                                                                                  On full Storage
                                                           Recovery or 3rd                      Primary Failure
                                                                                                  or Site Site,
                                                           Party Failover                       ifin Primary Site,
                                                                                                   corruption not
                                                                                                detected and
                                                                                                  corruption is
                                              Physical        Undetected                        corrected from a
                                                                                                  detected, must
                                              Corruption      Physical                          test failover, must
                                                                                                  Recover from
                                                              Corruption                        Recover from
                                                                                                  Backup
               Clone
       Clone




                           Logs




                                             DB




                                                                                                Backup



                                                                             DB
               VSS
       VSS




                                  Q




                                                                                         Logs
                                                                              1 month later, Undetected
                                                                              Physical Corruption
18
19
Why CCR? Why not SCC?
                                CCR                                         SCC
     Single Point    None when stretched across     Data, Storage and Site single points of failure
                    sites or combined with SCR for   Potential for massive data loss on single failure:
     of Failure                                        • Storage device failures can lose collocated backups
                    site resiliency
                                                       • Hardware replication can propagate physical errors
                                                       • Storage failure requires activation of remote copy if
                                                         one exists
                                                       • Requires two VSS clones plus a remote copy of data
                                                         to achieve RPO equal to CCR

     Simplicity      Simple setup                    Shared storage
                      • No special storage            Storage configuration before and after forming
                        configuration                  cluster
                     Built-in Site Resilience        Complex storage stack
                     Same technology and             Complex deployment to get RTO/RPO of 1 CCR
                      redundancy model for intra-      cluster
                      and inter-site protection




20
Why CCR? Why not SCC?
                            CCR                                       SCC
     Backups     Backups off passive copy         Backups must be off active
                 eliminates/reduces backup
                 window
                  Reduced TCO                     Higher TCO
     TCO           • Cheaper hardware               • Additional products needed to achieve
                   • No special storage               equivalent combined RTO/RPO
                     expertise required             • Separate management tools for HA
                   • In-the-box solution              operations may be required
                   • Integrated management          • Higher-end servers and storage required
                   • Single operations team         • Storage expertise needed
                   • Reduced backup cost
     Large         • Great RTO/RPO, Simplicity,    Higher TCO, long recovery times constrain
                     No Maintenance Window,         mailbox size
     Mailboxes
                     Reduced TCO → improved
                     support for larger
                     mailboxes




21
Why CCR? Why not SCC?
                                              CCR                                            SCC
                Failure                                                 SCC + SCR/3rd party replication + 2 VSS clones
                                   Stretched CCR or CCR + SCR
                                                                       to approach combined RTO/RPO of 1 CCR cluster
         Server                            ~ 2 minutes                                   ~ 2 minutes
         Data or LUN                       ~ 2 minutes                                 15 min – 1 hour
 RTO     Full Storage                      ~ 2 minutes                 ~ 15 min with synchronous replication
                                                                       Days with VSS clones only
         Site                ~ 2 minutes for Stretched CCR            ~ 15 min with synchronous replication
                             30-60 minutes for CCR + SCR              Days with VSS clones only
         Server                          0 for mail*                             0 – uses same copy of data
                                appointment, contact, task, draft
        Physical      DB                      0                         Hours to days if sync repl; point in time if VSS
        Corrupt      Logs            0 (must reseed passive)             N/A if log not needed; same as DB if needed
         DB LUN dies                            0                      0 with synchronous replication
                                                                       Point-in-time with VSS clones
 RPO
         LOG LUN dies                     0 for mail*                  0 with synchronous replication
                                 appointment, contact, task, draft     Point-in-time with VSS clones
         Full Storage                    0 for mail*                   0 with synchronous replication
                                appointment, contact, task, draft      Hours to days with VSS clones only
         Site                Same as Server for Stretched CCR         0 with synchronous replication
                             1 Log**                                  Hours to days with VSS clone


  * Assumes following best practice guidance for Transport Dumpster              **Assumes replication’s keeping up
22
Why CCR? Why not SCC?
                  Corruptions caused by the application
       Logical    Logical corruption replicated by all replication solutions
     Corruption   SCR with lag replay can mitigate if detected early




                  SCC: no mechanism to detect database corruption on the copy
                  replicated by 3rd Party solutions
                  SCC: no mechanism to detect log corruption on the copy
                  replicated by 3rd Party solutions
      Physical    With hardware-based replication, deeper stack can lead to
     Corruption   corruption caused by:
                       HBA driver/firmware
                       Multi-path driver
                       Server hardware
                       FC Switch firmware
                       Storage controller firmware/OS
                       Target storage controller firmware/OS


23
24
Basic Replication Pipeline
      Source
       DB


               Store
                        Log                    Log
                       Copier   Inspector   Inspector   Replica
       Source                   Directory               Log
       Log                                              Directory
       Directory
                                                               Log
                                                             Replayer




                                                                Target
                                                                 DB



25
Continuous Replication Basics
      When current log file is closed, it is copied to
      the replication target by the Replication service
      Replication service
         at source: creates read-only shares for log directory
         at target: reads from the shares and pulls a copy of
         the log file
         contains a ReplicaInstance for each storage group
           Configuration discovered from Active Directory (every 30
           sec for LCR/CCR, every 3 min for SCR)



26
Continuous Replication Basics
      Communication is done via logs, registry, cluster
      database and RPC
         Logs: replicate database changes and backup status
         Registry: used in LCR and SCR. Also in CCR for
         checkpointing the current log generation value for
         loss calculation
         Cluster database: cluster res quot;Exchange Information
         Store Instance (CMSName)quot; /priv | findstr /i replay
         RPCs: Target Replication service RPCs into Store for
         log truncation coordination


27
Lost Log Resilience (LLR)
      Designed to minimize need to reseed after lossy
      failover
      Database changes written to log file prior to database,
      and the database can be updated as soon as change is
      logged
      LLR modifies this behavior by delaying updates to the
      database until 1 or more log generations are created
      Utilizes a new log stream marker called the waypoint
         Minimum Log Required to prevent database divergence
         No modifications after the waypoint
         have been written to the database


28
Log Stream Markers
 Committed: Log generation 20
 Checkpoint: Log generation 2
 Waypoint: Log generation 10
 What this means:
    Only logs 2-10 are needed
    Logs 11-20 can be discarded
             Initiating FILE DUMP mode...
             Database: priv1.edb
             ...
             State: Dirty Shutdown
             Log Required: 2-10 (0x2-0xA)
             Log Committed: 0-20 (0x0-0x14)
             ...
NodeA                    NodeB
21                        21                 Healthy CCR
20                        20
19                        19
18                        18          NodeA fails and a failover to
17                        17
                                            NodeB occurs
16                        16
                                      Validate database can mount
15                        15
                                                logs lost <
14                        14            AutoDatabaseMountDial
13                        13
12           waypoint     12            Logs are generated on
11                        11            NodeB (beyond gen21)
10                        10
9                         9               NodeA recovers and
                                              performs a
8                         8
                                           divergence check
7                         7
6                         6
                                      NodeA performs incremental
5                         5             reseed and copies logs
4            checkpoint   4
3                         3
2                         2                  Healthy CCR
1                         1
When Do I Need A Full Reseed?
      Rarely
         Lost log past current Waypoint
           Admin accepted large amount of loss by running Restore-
           StorageGroupCopy
           Automatic mount while LLR was “not honored”
           Automatic lossy mount with “stale” loss window
           calculation
         Log corruption prior to log replay
           ESE cannot skip over logs
         Database files modified outside of Store or
         Replication service
           E.g., Offline defrag, eseutil /r
31
Hub Transport servers retain messages that have been delivered
       to destination mailbox until size or time limit is reached
       Transport Dumpster is per storage group per Hub Transport
       server for servers in same Active Directory site as the storage
       group
       Transport Dumpster statistics:
     Get-StorageGroupCopyStatus -DumpsterStatistics
       Output:
           DumpsterServersNotAvailable:{HUB1}
           DumpsterStatistics:
                 {HUB2(2/25/2009 10:20:37 PM; 2 ; 1032KB)}




32
CCR CMS
                                                                      MBX1

                                       HUB1
      SG         Dumpster Contents
                                                                 SG1        SG2
      SG1        Msg1
                                              Active
      SG2        Msg1,Msg3
                 Msg1


                                                                        MBX2
     Redeliver SG1,SG2(returns timeout)
                               retry)
                               success)



                                       HUB2                     SG1         SG2

           SG      Dumpster Contents
                                              Passive
           SG1     Msg2,Msg4
                   Msg2
                                                SG      Resubmit Required

           SG2     Msg4                         SG1     HUB1
                                                        HUB1,HUB2

                                                SG2     HUB1
                                                        HUB1,HUB2
       Redeliver SG1,SG2(returns Retry)
                                 Success)
33
How much data loss can transport dumpster mitigate?
        18 MB dumpster per storage group on 8 Hub Transport
        servers = 144 MB / storage group
        [20 MB / 10 hour] x [100 users / SG] = 200 MB message
        traffic in one hour
        Putting the above two together gives
            60 min X 144 / 200  43.2 minutes worth of data
            in 43.2 minutes  144+ logs created per SG
     Customize transport dumpster size/time limit
     Set-TransportConfig –MaxDumpsterSizePerStorageGroup
        30MB –MaxDumpsterTime 07.00:00:00
     No time window guarantees
        If there are no message size limits, a single large message
        (e.g., 15 MB) will purge all other messages for destination
        storage group(s) on a given Hub Transport server
34
When CCR detects a lossy failover:
         Expands loss window by 12 hours back and 4 hours forward
         Finds all Hub Transport servers in the local Active Directory site
         Requests transport dumpster redelivery from all detected servers
            New servers not added to redelivery list
         Inaccessible servers: CCR retries same request every 30 seconds until
         configured MaxDumpsterTime
         If multiple lossy failovers take place, new loss is window added to
         previous one
     Restore-StorageGroupCopy on LCR is one time request, no
     retries
     Redelivery not triggered as part of Setup /recoverCMS
     No other ways to redeliver messages from transport dumpster

35
Redundant Networks
 Use for log shipping and seeding in CCR
         Enable-ContinuousReplicationHostName




                        Seeding
                                  Update-StorageGroupCopy
                                  -DataHostNames:Host1,Host2
                        Get-ClusteredMailboxServerStatus
                            OperationalReplicationHostNames:
                            FailedReplicationHostNames:
                            InUseReplicationHostNames:

                            Watch out for misconfigured host file
Circular Logging
      One configuration setting with two consumers
         Store service: requires database to be dismounted and re-
         mounted to take effect
         Replication service: picks up new setting dynamically
      In CCR, it’s no big deal to switch between on/off/on
      In some settings, logs are deleted prematurely
         Example: turn off circular logging, then enable LCR without
         dismount/mount of database
            ESE is still doing log truncation with circular logging logic
            Logs will get truncated before making it to the LCR copy
      To be safe follow this recipe:
         Suspend, dismount, change setting, mount, resume

37
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
 not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
                                                                           IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Mais conteúdo relacionado

Semelhante a Disaster Recovery and Mailbox High Availability Solutions

Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAmazon Web Services
 
Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...
Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...
Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...DataCore Software
 
Active dataguard
Active dataguardActive dataguard
Active dataguardManoj Kumar
 
Wwc disaster solution
Wwc disaster solutionWwc disaster solution
Wwc disaster solutionSeth David
 
Exchange 2010 ha ctd
Exchange 2010 ha ctdExchange 2010 ha ctd
Exchange 2010 ha ctdKaliyan S
 
Disaster recovery in sql server
Disaster recovery in  sql serverDisaster recovery in  sql server
Disaster recovery in sql serverRajib Kundu
 
Sp2010 high availlability_sql
Sp2010 high availlability_sqlSp2010 high availlability_sql
Sp2010 high availlability_sqlSamuel Zürcher
 

Semelhante a Disaster Recovery and Mailbox High Availability Solutions (8)

Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
 
Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...
Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...
Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored ...
 
Active dataguard
Active dataguardActive dataguard
Active dataguard
 
Wwc disaster solution
Wwc disaster solutionWwc disaster solution
Wwc disaster solution
 
Exchange 2010 ha ctd
Exchange 2010 ha ctdExchange 2010 ha ctd
Exchange 2010 ha ctd
 
Disaster recovery in sql server
Disaster recovery in  sql serverDisaster recovery in  sql server
Disaster recovery in sql server
 
Sp2010 high availlability_sql
Sp2010 high availlability_sqlSp2010 high availlability_sql
Sp2010 high availlability_sql
 

Mais de rsnarayanan

Kevin Ms Web Platform
Kevin Ms Web PlatformKevin Ms Web Platform
Kevin Ms Web Platformrsnarayanan
 
Harish Understanding Aspnet
Harish Understanding AspnetHarish Understanding Aspnet
Harish Understanding Aspnetrsnarayanan
 
Harish Aspnet Dynamic Data
Harish Aspnet Dynamic DataHarish Aspnet Dynamic Data
Harish Aspnet Dynamic Datarsnarayanan
 
Harish Aspnet Deployment
Harish Aspnet DeploymentHarish Aspnet Deployment
Harish Aspnet Deploymentrsnarayanan
 
Whats New In Sl3
Whats New In Sl3Whats New In Sl3
Whats New In Sl3rsnarayanan
 
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...rsnarayanan
 
Advanced Silverlight
Advanced SilverlightAdvanced Silverlight
Advanced Silverlightrsnarayanan
 
Occasionally Connected Systems
Occasionally Connected SystemsOccasionally Connected Systems
Occasionally Connected Systemsrsnarayanan
 
Developing Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And ServicesDeveloping Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And Servicesrsnarayanan
 
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...rsnarayanan
 
J Query The Write Less Do More Javascript Library
J Query   The Write Less Do More Javascript LibraryJ Query   The Write Less Do More Javascript Library
J Query The Write Less Do More Javascript Libraryrsnarayanan
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sqlrsnarayanan
 
Windows 7 For Developers
Windows 7 For DevelopersWindows 7 For Developers
Windows 7 For Developersrsnarayanan
 
What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1rsnarayanan
 
Ux For Developers
Ux For DevelopersUx For Developers
Ux For Developersrsnarayanan
 
A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8rsnarayanan
 

Mais de rsnarayanan (20)

Walther Aspnet4
Walther Aspnet4Walther Aspnet4
Walther Aspnet4
 
Walther Ajax4
Walther Ajax4Walther Ajax4
Walther Ajax4
 
Kevin Ms Web Platform
Kevin Ms Web PlatformKevin Ms Web Platform
Kevin Ms Web Platform
 
Harish Understanding Aspnet
Harish Understanding AspnetHarish Understanding Aspnet
Harish Understanding Aspnet
 
Walther Mvc
Walther MvcWalther Mvc
Walther Mvc
 
Harish Aspnet Dynamic Data
Harish Aspnet Dynamic DataHarish Aspnet Dynamic Data
Harish Aspnet Dynamic Data
 
Harish Aspnet Deployment
Harish Aspnet DeploymentHarish Aspnet Deployment
Harish Aspnet Deployment
 
Whats New In Sl3
Whats New In Sl3Whats New In Sl3
Whats New In Sl3
 
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
 
Advanced Silverlight
Advanced SilverlightAdvanced Silverlight
Advanced Silverlight
 
Netcf Gc
Netcf GcNetcf Gc
Netcf Gc
 
Occasionally Connected Systems
Occasionally Connected SystemsOccasionally Connected Systems
Occasionally Connected Systems
 
Developing Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And ServicesDeveloping Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And Services
 
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
 
J Query The Write Less Do More Javascript Library
J Query   The Write Less Do More Javascript LibraryJ Query   The Write Less Do More Javascript Library
J Query The Write Less Do More Javascript Library
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sql
 
Windows 7 For Developers
Windows 7 For DevelopersWindows 7 For Developers
Windows 7 For Developers
 
What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1
 
Ux For Developers
Ux For DevelopersUx For Developers
Ux For Developers
 
A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Disaster Recovery and Mailbox High Availability Solutions

  • 1.
  • 2. Agenda Solutions for Disaster Recovery Mailbox Server High Availability CCR and SCR: Better Together Why CCR? Why not SCC? Continuous Replication Demystified 2
  • 3. 3
  • 4. Solutions for Disaster Recovery Deleted Item Retention – default 14 days Deleted Mailbox Retention – default 30 days Mailbox Service and Data Recovery Server Recovery Setup /m:RecoverServer Setup /recoverCMS Database portability Dial tone portability Continuous replication Backup and Restore Legacy streaming ESE backups Volume Shadow Copy Service (VSS) backups Recovery Storage Groups, alternate restores Edge Transport Server Cloned Configuration
  • 5. Solutions for Disaster Recovery Augment built-in solutions with other processes Configuration Management Server build standardization Server build documentation Change management Release management Proactive monitoring Detailed recovery plans Regular integrity checks Regular practice drills
  • 6. Server Recovery Setup /m:recoverServer All roles except Edge Fresh install and ImportEdgeConfig for Edge All custom settings on Client Access server must be recreated Restrictions: Can’t use this for… repairing a failed setup migrating between different operating systems recovering or un-clustering a clustered mailbox server Setup /recoverCMS For CCR and SCC only Restrictions: Can’t use this for… changing from CCR to SCC or vice versa migrating between different operating systems clustering a standalone Mailbox server splitting or merging clustered Exchange environments Does not trigger Transport Dumpster Windows 2003 clustering has dependency on PDC Emulator 6
  • 7. Data Recovery Switch to a replicated copy (Activation) Passive copy (LCR/CCR) Target copy (SCR) Restore from backup Same server Database portability on alternate server Database portability from Windows 2003 to Windows 2008 has initial performance impact Dial tone and data merge using RSG 7
  • 8. 8
  • 9. Mailbox Server High Availability Built-in features for various levels of availability Local Continuous Replication (LCR) – data availability Single Copy Cluster (SCC) – service availability Cluster Continuous Replication (CCR) – data and service availability Standby Continuous Replication (SCR) – disaster recovery and site resilience
  • 10. Mailbox Server High Availability Local Continuous Replication (LCR) 10
  • 11. Mailbox Server High Availability Single Copy Cluster (SCC) 11
  • 12. Mailbox Server High Availability Cluster Continuous Replication (CCR) 12
  • 13. Standby Continuous Replication SCR Sources SCR Targets Standalone Mailbox CCR Server (w/o LCR) Standby Cluster with Passive Mailbox Role Standalone SCC 13
  • 14. 14
  • 15. CCR and SCR: Better Together CCR provides high-availability for Mailbox data and services within the datacenter SCR replicates data remotely to provide site resilience for the Mailbox data Datacenter A Datacenter B
  • 16. CCR across 2 Sites Datacenter A Datacenter B 16
  • 17. CCR local / SCR to remote Site Datacenter A Datacenter B 17
  • 18. CCR/SCR vs SCC/Sync – 2 sites Datacenter A Datacenter B CCR Log corruption Setup /recovercms, detected play logs forward immediately on replication Physical at both Corruption targets Logs Logs DB DB DB Logs SCC Exchange Disaster On Site Failure in On full Storage Recovery or 3rd Primary Failure or Site Site, Party Failover ifin Primary Site, corruption not detected and corruption is Physical Undetected corrected from a detected, must Corruption Physical test failover, must Recover from Corruption Recover from Backup Clone Clone Logs DB Backup DB VSS VSS Q Logs 1 month later, Undetected Physical Corruption 18
  • 19. 19
  • 20. Why CCR? Why not SCC? CCR SCC Single Point  None when stretched across Data, Storage and Site single points of failure sites or combined with SCR for Potential for massive data loss on single failure: of Failure • Storage device failures can lose collocated backups site resiliency • Hardware replication can propagate physical errors • Storage failure requires activation of remote copy if one exists • Requires two VSS clones plus a remote copy of data to achieve RPO equal to CCR Simplicity  Simple setup  Shared storage • No special storage  Storage configuration before and after forming configuration cluster  Built-in Site Resilience  Complex storage stack  Same technology and  Complex deployment to get RTO/RPO of 1 CCR redundancy model for intra- cluster and inter-site protection 20
  • 21. Why CCR? Why not SCC? CCR SCC Backups Backups off passive copy Backups must be off active eliminates/reduces backup window  Reduced TCO  Higher TCO TCO • Cheaper hardware • Additional products needed to achieve • No special storage equivalent combined RTO/RPO expertise required • Separate management tools for HA • In-the-box solution operations may be required • Integrated management • Higher-end servers and storage required • Single operations team • Storage expertise needed • Reduced backup cost Large • Great RTO/RPO, Simplicity,  Higher TCO, long recovery times constrain No Maintenance Window, mailbox size Mailboxes Reduced TCO → improved support for larger mailboxes 21
  • 22. Why CCR? Why not SCC? CCR SCC Failure SCC + SCR/3rd party replication + 2 VSS clones Stretched CCR or CCR + SCR to approach combined RTO/RPO of 1 CCR cluster Server ~ 2 minutes ~ 2 minutes Data or LUN ~ 2 minutes 15 min – 1 hour RTO Full Storage ~ 2 minutes  ~ 15 min with synchronous replication  Days with VSS clones only Site  ~ 2 minutes for Stretched CCR  ~ 15 min with synchronous replication  30-60 minutes for CCR + SCR  Days with VSS clones only Server 0 for mail* 0 – uses same copy of data appointment, contact, task, draft Physical DB 0 Hours to days if sync repl; point in time if VSS Corrupt Logs 0 (must reseed passive) N/A if log not needed; same as DB if needed DB LUN dies 0  0 with synchronous replication  Point-in-time with VSS clones RPO LOG LUN dies 0 for mail*  0 with synchronous replication appointment, contact, task, draft  Point-in-time with VSS clones Full Storage 0 for mail*  0 with synchronous replication appointment, contact, task, draft  Hours to days with VSS clones only Site  Same as Server for Stretched CCR  0 with synchronous replication  1 Log**  Hours to days with VSS clone * Assumes following best practice guidance for Transport Dumpster **Assumes replication’s keeping up 22
  • 23. Why CCR? Why not SCC? Corruptions caused by the application Logical Logical corruption replicated by all replication solutions Corruption SCR with lag replay can mitigate if detected early SCC: no mechanism to detect database corruption on the copy replicated by 3rd Party solutions SCC: no mechanism to detect log corruption on the copy replicated by 3rd Party solutions Physical With hardware-based replication, deeper stack can lead to Corruption corruption caused by: HBA driver/firmware Multi-path driver Server hardware FC Switch firmware Storage controller firmware/OS Target storage controller firmware/OS 23
  • 24. 24
  • 25. Basic Replication Pipeline Source DB Store Log Log Copier Inspector Inspector Replica Source Directory Log Log Directory Directory Log Replayer Target DB 25
  • 26. Continuous Replication Basics When current log file is closed, it is copied to the replication target by the Replication service Replication service at source: creates read-only shares for log directory at target: reads from the shares and pulls a copy of the log file contains a ReplicaInstance for each storage group Configuration discovered from Active Directory (every 30 sec for LCR/CCR, every 3 min for SCR) 26
  • 27. Continuous Replication Basics Communication is done via logs, registry, cluster database and RPC Logs: replicate database changes and backup status Registry: used in LCR and SCR. Also in CCR for checkpointing the current log generation value for loss calculation Cluster database: cluster res quot;Exchange Information Store Instance (CMSName)quot; /priv | findstr /i replay RPCs: Target Replication service RPCs into Store for log truncation coordination 27
  • 28. Lost Log Resilience (LLR) Designed to minimize need to reseed after lossy failover Database changes written to log file prior to database, and the database can be updated as soon as change is logged LLR modifies this behavior by delaying updates to the database until 1 or more log generations are created Utilizes a new log stream marker called the waypoint Minimum Log Required to prevent database divergence No modifications after the waypoint have been written to the database 28
  • 29. Log Stream Markers Committed: Log generation 20 Checkpoint: Log generation 2 Waypoint: Log generation 10 What this means: Only logs 2-10 are needed Logs 11-20 can be discarded Initiating FILE DUMP mode... Database: priv1.edb ... State: Dirty Shutdown Log Required: 2-10 (0x2-0xA) Log Committed: 0-20 (0x0-0x14) ...
  • 30. NodeA NodeB 21 21 Healthy CCR 20 20 19 19 18 18 NodeA fails and a failover to 17 17 NodeB occurs 16 16 Validate database can mount 15 15 logs lost < 14 14 AutoDatabaseMountDial 13 13 12 waypoint 12 Logs are generated on 11 11 NodeB (beyond gen21) 10 10 9 9 NodeA recovers and performs a 8 8 divergence check 7 7 6 6 NodeA performs incremental 5 5 reseed and copies logs 4 checkpoint 4 3 3 2 2 Healthy CCR 1 1
  • 31. When Do I Need A Full Reseed? Rarely Lost log past current Waypoint Admin accepted large amount of loss by running Restore- StorageGroupCopy Automatic mount while LLR was “not honored” Automatic lossy mount with “stale” loss window calculation Log corruption prior to log replay ESE cannot skip over logs Database files modified outside of Store or Replication service E.g., Offline defrag, eseutil /r 31
  • 32. Hub Transport servers retain messages that have been delivered to destination mailbox until size or time limit is reached Transport Dumpster is per storage group per Hub Transport server for servers in same Active Directory site as the storage group Transport Dumpster statistics: Get-StorageGroupCopyStatus -DumpsterStatistics Output: DumpsterServersNotAvailable:{HUB1} DumpsterStatistics: {HUB2(2/25/2009 10:20:37 PM; 2 ; 1032KB)} 32
  • 33. CCR CMS MBX1 HUB1 SG Dumpster Contents SG1 SG2 SG1 Msg1 Active SG2 Msg1,Msg3 Msg1 MBX2 Redeliver SG1,SG2(returns timeout) retry) success) HUB2 SG1 SG2 SG Dumpster Contents Passive SG1 Msg2,Msg4 Msg2 SG Resubmit Required SG2 Msg4 SG1 HUB1 HUB1,HUB2 SG2 HUB1 HUB1,HUB2 Redeliver SG1,SG2(returns Retry) Success) 33
  • 34. How much data loss can transport dumpster mitigate? 18 MB dumpster per storage group on 8 Hub Transport servers = 144 MB / storage group [20 MB / 10 hour] x [100 users / SG] = 200 MB message traffic in one hour Putting the above two together gives 60 min X 144 / 200  43.2 minutes worth of data in 43.2 minutes  144+ logs created per SG Customize transport dumpster size/time limit Set-TransportConfig –MaxDumpsterSizePerStorageGroup 30MB –MaxDumpsterTime 07.00:00:00 No time window guarantees If there are no message size limits, a single large message (e.g., 15 MB) will purge all other messages for destination storage group(s) on a given Hub Transport server 34
  • 35. When CCR detects a lossy failover: Expands loss window by 12 hours back and 4 hours forward Finds all Hub Transport servers in the local Active Directory site Requests transport dumpster redelivery from all detected servers New servers not added to redelivery list Inaccessible servers: CCR retries same request every 30 seconds until configured MaxDumpsterTime If multiple lossy failovers take place, new loss is window added to previous one Restore-StorageGroupCopy on LCR is one time request, no retries Redelivery not triggered as part of Setup /recoverCMS No other ways to redeliver messages from transport dumpster 35
  • 36. Redundant Networks Use for log shipping and seeding in CCR Enable-ContinuousReplicationHostName Seeding Update-StorageGroupCopy -DataHostNames:Host1,Host2 Get-ClusteredMailboxServerStatus OperationalReplicationHostNames: FailedReplicationHostNames: InUseReplicationHostNames: Watch out for misconfigured host file
  • 37. Circular Logging One configuration setting with two consumers Store service: requires database to be dismounted and re- mounted to take effect Replication service: picks up new setting dynamically In CCR, it’s no big deal to switch between on/off/on In some settings, logs are deleted prematurely Example: turn off circular logging, then enable LCR without dismount/mount of database ESE is still doing log truncation with circular logging logic Logs will get truncated before making it to the LCR copy To be safe follow this recipe: Suspend, dismount, change setting, mount, resume 37
  • 38. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Notas do Editor

  1. DB portability between different OS versions – watch out for performance impact!- an upgrade of the operating system for an Exchange database results in the updating of the value for OS Version in the database header. - This update triggers the rebuilding of internal database indexes. When using database portability to move a database from a Mailbox server running WS03 to a Mailbox server running WS08 , the Extensible Storage Engine (ESE) will detect the operating system upgrade and take the following actions:-- During the first database mount operation, all secondary indexes are discarded. A secondary index is used to provide a specific view of the mailbox data (for example, when messages in a mail folder are sorted using Outlook in Online Mode). The database will not be mounted and available to clients until this initial operation is complete. The amount of time it takes to complete the operation is largely dependent on the size of the database. The larger the database is, the longer the mount operation will take.-- Secondary indexes will be rebuilt on-demand, as Outlook users sort their views in Online Mode. In environments with large or extremely large databases, the on-demand rebuilding of indexes will initially result in high processor and disk utilization.
  2. This illustrates why our belief that CCR is a better solution than SCC.When you lose a database in SCC, you can recover by restoring a VSS clone, but that clone is a point-in-time restore (it could be 10 min old, 30 min old, etc., depending on how frequently backups occur).
  3. Storage failures for SCC can involve storage for data and the storage hosting the VSS clones. Typically, this is the same storage, so when it fails, you need to use remote data to recover.
  4. This is a summary showing the RTO and RPO for these two solutions. You can see that to achieve the same RTO/RPO of CCR, an SCC solution also needs to be extended with replication technology, as well as hardware-based VSS (at least two).RTO for Data/LUN failure is15min -1 hour: While 3rd part solutions can activate a VSS clone quickly, Exchange server still has to be brought up and recovery (play the logs forward) still has to be run once the clone has been activate. This can take several minutes to over an hour depending upon log backup regimen.RPO: For CCR, the normal RPO can’t really be measured by time. The type of items that can be lost are the items that don’t go through Transport. If a deployment has synchronous replication and no geo-clustering, then it is a manual DR process to activate the copy (expose the LUNS, go through Exchange DR/Database Portability steps).  Exchange server may or may not be pre-built out (depends upon the SLA and how much idle hardware a customer can afford).Geo-clustered synchronous replication solutions are almost always failed over manually (automatic failover between sites is a big deal for customers and they prefer to hit the “big red button”).  RTO is typically~15min if all works correctly.RPO for LOG LUN:If the log LUN dies, the DB becomes unclean. Jet can't shutdown and all un-flushed writes to the db are lost,leaving the DB in a bad state. As a result, recovery must be run but can’t since the LOG LUN is dead; thus, the DB is also lost. If the logs have been synchronously replicated and the replicated copy of the logs are good, they can be used to recover the DB. However, if the reason the LOG LUN was lost was because of physical corruption on the logs, which gets replicated to the LOG LUN’s replicated copy, then the only option is to recover from a backup.
  5. - Polls and uses file system notifications to see a new log in a directory- LogInspector verifies that the log is safe to replay (3rd party sync replication cannot provide this type of replicated data verification for logs)ChecksumIs this log for this log stream?Recopy on failure
  6. If shares do exist, they will not be re-created. If permissions on the shares as messed up, remove the shares manually and cycle replication service.Different ReplicaInstance types cannot co-exist
  7. Logs required indicates that some transactions haven’t been committed (some pages may have been written to disk, others may have not been). Checkpoint is the minimum log that we need in order to perform recovery. Waypoint is the maximum log needed for recovery, i.e. the last log file that has potential log records that have been recorded in the physical database.Committed Generation is the last log file generated by ESE for the particular storage group.
  8. Logically speaking – dumpster is a property of the storagegroup not storagegroupcopy. Loss calculation: now – last log inspectedRequest dumpster resubmit: 12 hours before the loss and 1 hour after the loss window.no it cannot grab extra space. Every SG has a max dumpster size dedicated to that specific SG. Messages are stored only once but counted against multiple SGs if they happen to be in an SG’s dumpster. Maybe you are remembering this other discussion: Msg1 is delivered to both SG1 and SG2. This message counts against the dumpster quota for both SGs. Let’s say SG1 got lots of messages and had to drop Msg1 from its dumpster (Msg1 is still at the HUB server because it is included as part of SG2’s dumpster. When a dumpster resubmit request comes for SG1, msg1 will get resubmitted because it happen to be on the server. This is not guaranteed though
  9. - This traffic amounts to around 3 (= 20/6) logs/min/SG