Strategies for Backing Up MongoDB

•Download as PPTX, PDF•

15 likes•5,718 views

For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com Come learn about the different ways to back up your single servers, replica sets, and sharded clusters

#MongoBoston

Strategies for Backing Up
MongoDB
Jeff Yemin
Engineering Manager, 10gen

File and Directory Layout
• A set of files per database

Insert with write concern of {fsync :
true}

Everything is fine, right?
• No, it's not
• But you can't tell until you look

Try validating the collection
• In the shell, run the validate command

How can we get a clean
backup?
• kill mongod
• fsyncLock / fsyncUnlock

How can we get a clean
backup?
• mongodump

mongodump
• Snapshot of each collection
– Does NOT represent a point in time, even for a single
collection
• Can NOT be combined with fsyncLock
– Remember, you can't read…

• You CAN dump directly from data files to get a
point in time backup
– mongodump –dbpath

• Can be costlier than archiving as FS level

How can we get a clean
backup?
• journaling

Journaling
• Write-ahead log
• Guarantees a consistent view even after a hard
crash
• Default behavior as of 2.0
• Journal stored in –dbpath /journal folder
• --journalCommitInterval* (2ms - 300ms)

Journaling implications for
backup
• Logical Volume Manager (LVM)
• LVM snapshots to the rescue
– lvcreate –size 100M –snapshot –name mdb-snap01 /dev/vg0/mongodb

• No shutdown or fsyncLock needed
• True point in time backup for a single instance

Backing up a replica set
• Back up a (hidden) secondary
– kill mongod
– fsyncLock
– mongodump
– LVM snapshot

Mongodump for replica sets
• True point in time
– mongodump –oplog
– mongorestore –-oplogreplay

• Snapshot query of each collection, then replay
the oplog at the end
– Similar to how a new secondary does an initial sync

mongos config
Chunks! balancer
config

config

1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40

5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48

Shard 1 Shard 2 Shard 3 Shard 4

Sharded clusters

Backing up a sharded cluster

• mongodump through mongos
– (but no –oplog)

• mongorestore through mongos

$Backup a Sharded Cluster 1. Stop Balancer, and wait till inactive (state:0) db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ) 2. Stop a config server Backup Data – Each shard – Config server (mongodump --db config) 3. Restart config server 4. Resume balancer$

#MongoBoston

Thank You
Jeff Yemin
Engineering Manager, 10gen

What's hot

Performance comparison of Distributed File Systems on 1Gbit networks

Marian Marinov

Comparison of foss distributed storage

Marian Marinov

One of the much awaited features in MongoDB 1.6 is replica sets, MongoDB replication solution providing automatic failover and recovery. MongoDB High Availabiltity with Replica Sets This talk will cover - • What is Replica Set? • Replication Process • Advantaged of Replica Set vs master/slave • How to set up replica set on production Demo This video is tutorial for setting up the MongoDb replica-set ion production environment. In this i took 3 instances which have already mongo installed and running. This tutorial consists-: 1.Setup the each instance of replica set 2.modify the mongodb.conf to include replica set information 3.configure the servers to include in replica set 4.then cross checking if we kill one primary then secondary becomes primary or not.

MongoDb scalability and high availability with Replica-Set

Vivek Parihar

Improve your storage with bcachefs

Marian Marinov

MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...

ronwarshawsky

Keeping your files safe in the post-Snowden era with SXFS

Robert Wojciechowski

LizardFS-WhitePaper-Eng-v3.9.2-web

Szymon Haly

MongoDB Shard Cluster

Anuchit Chalothorn

1 m+ qps on mysql galera cluster

OlinData

Intro to MySQL Master Slave Replication

satejsahu

Redis persistence in practice

Eugene Fidelin

An introduction to the basics of primary storage in CloudStack, including a discussion of the challenges of guaranteeing storage performance in a cloud. Learn how to leverage the latest enhancements to CloudStack to enable storage administrators to deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. View now for a detailed look at CloudStack enhancements, the management benefits they provide, and common go-to-market approaches.

Guaranteeing CloudStack Storage Performance

NetApp

GlusterFS As an Object Storage

Keisuke Takahashi

Kvm optimizations

OpenNebula Project

Basics of Logical Replication,Streaming replication vs Logical Replication ,U...

Rajni Baliyan

OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos

NETWAYS

Cinder

Eldho George

MYSQLDUMP & ZRM COMMUNITY (EN)

Cédric P

LSA2 - 02 Control Groups

Marian Marinov

What's hot (19)

Performance comparison of Distributed File Systems on 1Gbit networks

Comparison of foss distributed storage

MongoDb scalability and high availability with Replica-Set

Improve your storage with bcachefs

MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...

Keeping your files safe in the post-Snowden era with SXFS

LizardFS-WhitePaper-Eng-v3.9.2-web

MongoDB Shard Cluster

1 m+ qps on mysql galera cluster

Intro to MySQL Master Slave Replication

Redis persistence in practice

Guaranteeing CloudStack Storage Performance

GlusterFS As an Object Storage

Kvm optimizations

Basics of Logical Replication,Streaming replication vs Logical Replication ,U...

OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos

Cinder

MYSQLDUMP & ZRM COMMUNITY (EN)

LSA2 - 02 Control Groups

Similar to Strategies for Backing Up MongoDB

Deployment Strategies

MongoDB

Deployment Strategy

MongoDB

Deployment

rogerbodamer

Deployment Strategies (Mongo Austin)

MongoDB

A backup and recovery strategy is necessary to protect your mission critical data against the risk of catastrophic failure or human error. In this session, we'll discuss the different strategies to backing up and restoring your MongoDB clusters in case of a disaster scenario. We'll review the benefits and drawbacks of various approaches, including taking filesystem snapshots, using mongodump, or using MongoDB Management Service.

Webinar: Keeping Your MongoDB Data Safe

MongoDB

Oh no! My backups aren't progressing! If something happens in production now, and I don't have current backups, I'll be out of a job for sure! If these words resonate with you, don’t worry; you’re not the only one! Backup issues are one of the most common topics we deal with in Technical Services. In this talk, we will go through the backup flow, talk about where things might go wrong, and the symptoms you will see in the logs and the UI. We will also talk about other commands you can run to confirm the diagnosis, and how support can assist if you’re still stuck. Finally, we will talk about the new backup architecture in 4.2 and how it simplifies some of these concerns. This session is suitable for those with all levels of Ops Manager experience, but attendees should have a basic understanding of MongoDB’s replication process before attending this session. After this talk, you will have leveled up your backup superpowers, and can swoop in to save your job (and the day)!

MongoDB World 2019: Becoming an Ops Manager Backup Superhero!

MongoDB

MongoDB Tokyo - Monitoring and Queueing

Boxed Ice

How to Install and Use MMS

MongoDB

"Have you ever crossed your fingers before performing an upgrade or switching storage engines, because you weren't quite sure what would happen? Have you ever been bitten by a slight change in behavior that turned out to be unexpectedly significant for your workload? At Parse we have developed a workflow that lets us repeatedly capture and replay real production workloads offline. This has allowed us to confidently perform upgrades across a large fleet with a minimum amount of canarying, and has helped us load test a variety of storage engines with real workloads so we can compare and understand the performance tradeoffs. In this talk we will cover best practices for upgrades and migrations, and we will walk through how to use our open-sourced tooling to demonstrate how you can do the same. We will also share some fun war stories about various disasters found and averted *before* putting them into production thanks to offline benchmarking."

Benchmarking, Load Testing, and Preventing Terrible Disasters

MongoDB

Let the Tiger Roar!

MongoDB

MongoDB Best Practices in AWS

Chris Harris

视觉中国的MongoDB应用实践(QConBeijing2011)

Night Sailer

Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel

Daniel Coupal

Redundancy and high availability are the basis for all production deployments. With MongoDB this can be achieved by deploying replica set. In this slides we are exploring how the replication works with MongoDB, why you should use replication, what are the features and go over different deployment use cases. At the end we are comparing some features with MySQL replication and what are the differences between the two

Exploring the replication in MongoDB

Igor Donchovski

Let the Tiger Roar - MongoDB 3.0

Norberto Leite

Linux performance tuning & stabilization tips (mysqlconf2010)

Yoshinori Matsunobu

MongoDB开发应用实践

iammutex

Hdlogger project 2014.Aug

捷恩蔡

Backup, Restore, and Disaster Recovery

MongoDB

The current craze of Docker has everyone sticking their processes inside a container… but do you really understand cgroups and how they work? Do you understand the difference between CPU Sets and CPU Shares? Spark is a Scala application that lives inside a Java Runtime, do you understand the consequence of what impact the cgroup constraints have on the JRE? This talk starts with a deep understand of Java’s memory management and GC characteristics and how JRE characteristics change based on core count. We will continue the talk looking at containers and how resource isolation works.

Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017

Codemotion

Similar to Strategies for Backing Up MongoDB (20)

Deployment Strategies

Deployment Strategy

Deployment

Deployment Strategies (Mongo Austin)

Webinar: Keeping Your MongoDB Data Safe

MongoDB World 2019: Becoming an Ops Manager Backup Superhero!

MongoDB Tokyo - Monitoring and Queueing

How to Install and Use MMS

Benchmarking, Load Testing, and Preventing Terrible Disasters

Let the Tiger Roar!

MongoDB Best Practices in AWS

视觉中国的MongoDB应用实践(QConBeijing2011)

Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel

Exploring the replication in MongoDB

Let the Tiger Roar - MongoDB 3.0

Linux performance tuning & stabilization tips (mysqlconf2010)

MongoDB开发应用实践

Hdlogger project 2014.Aug

Backup, Restore, and Disaster Recovery

Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017

More from MongoDB

During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Strategies for Backing Up MongoDB

1. #MongoBoston Strategies for Backing Up MongoDB Jeff Yemin Engineering Manager, 10gen

2. File and Directory Layout • A set of files per database

3. Insert with write concern of {fsync : true}

4. Archive the data directory

5. Restore the data directory

6. Start mongod on restored data directory

7. Everything is fine, right? • No, it's not • But you can't tell until you look

8. Try validating the collection • In the shell, run the validate command

9. How can we get a clean backup? • kill mongod • fsyncLock / fsyncUnlock

10. How can we get a clean backup? • mongodump

11. mongodump • Snapshot of each collection – Does NOT represent a point in time, even for a single collection • Can NOT be combined with fsyncLock – Remember, you can't read… • You CAN dump directly from data files to get a point in time backup – mongodump –dbpath • Can be costlier than archiving as FS level

12. Snaphot Query 5 2 7 1 3 4 6 8 9

13. How can we get a clean backup? • journaling

14. Journaling • Write-ahead log • Guarantees a consistent view even after a hard crash • Default behavior as of 2.0 • Journal stored in –dbpath /journal folder • --journalCommitInterval* (2ms - 300ms)

15. Journaling implications for backup • Logical Volume Manager (LVM) • LVM snapshots to the rescue – lvcreate –size 100M –snapshot –name mdb-snap01 /dev/vg0/mongodb • No shutdown or fsyncLock needed • True point in time backup for a single instance

16. Replica Sets

17. Backing up a replica set • Back up a (hidden) secondary – kill mongod – fsyncLock – mongodump – LVM snapshot

18. Mongodump for replica sets • True point in time – mongodump –oplog – mongorestore –-oplogreplay • Snapshot query of each collection, then replay the oplog at the end – Similar to how a new secondary does an initial sync

19. mongos config Chunks! balancer config config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4 Sharded clusters

20. Backing up a sharded cluster • mongodump through mongos – (but no –oplog) • mongorestore through mongos

21. Backup a Sharded Cluster 1. Stop Balancer, and wait till inactive (state:0) db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ) 2. Stop a config server Backup Data – Each shard – Config server (mongodump --db config) 3. Restart config server 4. Resume balancer

22. #MongoBoston Thank You Jeff Yemin Engineering Manager, 10gen

Editor's Notes

Do the fsyncLock/fsyncUnlock demo
i need a picture for the first bullet
Make the point that while you can turn journaling off, you shouldn't.Without journaling, the approach is quite straightforward, there is a one-to-one mapping of data files to memory and when either the OS or an explicit fsync happens, your data is now safe on disk.With journaling we do some tricks.Write ahead log, that is, we write the data to the journal before we update the data itself.Each file is mapped twice, once to a private view which is marked copy-on-write, and once to the shared view – shared in the context that the disk has access to this memory.Every time we do a write, we keep a list of the region of memory that was written to.Batches into group commits, compresses and appends in a group commit to disk by appending to a special journal fileOnce that data has been written to disk, we then do a remapping phase which copies the changes into the shared view, at which point those changes can then be synced to disk.Once that data is synced to disk then it’s safe (barring hardware failure). If there is a failure before the shared/storage view is written to disk, we simply need to apply all the changes in order to the data files since the last time it was synced and we get back to a consistent view of the data
LVM Logical volume manager. LVM is a program that abstracts disk images from physical devices, and provides a number of raw disk manipulation and snapshot capabilities useful for system management.LvcreateThis command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodbvolume in the vg0 volume group.This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’sLVM configuration.The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.) Make sure you size this big enough.EBS:If your deployment depends on Amazon’s Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform’s snapshot tool. As a result you may: 1. Flush all writes to disk and create a write lock to ensure consistent state during the backup process. If you choose this option see Backup Without Journaling. 2. Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.If you choose this option, perform the LVM backup operation described in Create Snapshot
If the secondary is hidden, then options are more varied. Killing and locking are valid options, so long as there is enough spare capacity in the system to catch up after the backup is complete
Ok if you have enough space to store all the data on all the shards

Strategies for Backing Up MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Strategies for Backing Up MongoDB

Similar to Strategies for Backing Up MongoDB (20)

More from MongoDB

More from MongoDB (20)

Strategies for Backing Up MongoDB

Editor's Notes