What's you backup strategy?

vitobotta · April 2022

I mean both for your servers and for your computers.

Personally I was using several different tools before (servers: Borg to local disk and Hetzner Storage Box plus Restic to Backblaze B2; my Macs: Arq also to local disk, Storage Box and Backblaze), but I switched to just using Duplicacy everywhere since it has lock-free multi-computer deduplication which is awesome and saves storage. It also has a nice GUI. As for the destinations they are the same apart from Backblaze, which I replaced with Wasabi.

What do you use and where do you store your backups?

Nekki · April 2022

My servers are not backed-up, I couldn’t give a shit if the data gets lost. I have the necessary information to recreate any configurations I need.

My important personal data is backed up to two physical storage devices (one solid state that travels with me, one spinning that stays at my house) and to 2 separate cloud storage providers.

chenxuhua · April 2022

backup to EC2 use rsync
backup to S3 use minio client

bikegremlin · April 2022

Striking a deal with the Italians.

It's not always effective, but fine as a backup strategy.

For data: Hetzner Storage Box and external HDDs.

MeAtExampleDotCom · April 2022

Everything I can't re-obtain easily: rsync to two off-site locations, hard-linked snapshots are kept so I don't just have the latest version of content. A third resource pulls data from one of those without being able to connect to the sources or vice-versa as a soft-offline backup. Everything covered by RAID1 or similar either using multiple physical drives or the provider promising the same (RAID is not a backup solution, but it does reduce the risk of needing to rebuild/restore due to hardware issues). Everything encrypted at rest (not that I have anything particularly sensitive). A small selection of important stuff (keys, some personally significant data) also copied to removable media. A couple of important keys are backed up physically (QR on “indestructible” paper, usually stored away from anything with easy access to what they unlock) for extra paranoia.

How do you test your backups? (which can be as important as taking them in the first place)

In my case the backups locations scan and verify checksums and send alerts on unexpected changes to protect from corruption due to bad drives or my error, checksums of the latest snapshot are also verified against live data (ignoring files modified recently, to reduce false positives), some VMs have a low-spec mirror that automatically wipes data and restores from latest backup (errors sent as alerts, and I manually check they have recent data occasionally) so I know the restore process is solid (these mirrors are not on the same server as what they mirror, so at a pinch they could be given more resource and with DNS changes made active in the event of an issue that fully takes out the original).

All manually scripted, which can be a faf to maintain. I might get around to improving it some day, but it works well enough. The two times recently that I've needed to restore anything (one was due to filesystem corruption, the other due to human error) all was good.

Erisa · April 2022

Anything I have as a ZFS dataset uses zfs_uploader to send incremental snapshots to Scaleway Object Storage (C14 Glacier tier)

Anything else or something that's simpler when expressed as a flat filesystem uses BorgMatic to send to BorgBase ( @m4nu )

Mastodont · April 2022

WinSCP for files, DB managers for databases, external HDD for my comp. All backups on my side, nothing in cloud.

pbx · April 2022

rsync to /dev/null. Fast and cheap.

vitobotta · April 2022

@Erisa said:
Anything I have as a ZFS dataset uses zfs_uploader to send incremental snapshots to Scaleway Object Storage (C14 Glacier tier)

Anything else or something that's simpler when expressed as a flat filesystem uses BorgMatic to send to BorgBase ( @m4nu )

what is the advantage of zfs snapshots over regular backups?

Xiaoming · April 2022

No backup. Just live on the egde and prepare to start over.

Erisa · April 2022

@vitobotta said: what is the advantage of zfs snapshots over regular backups?

Lots.

For real though, mainly that ZFS snapshots can be sent as blocks of data rather than a set of files. You don't need to crawl through every subdirectory and send every (changed) file one by one, you can just stream the however-many-GB snapshot as one. This results in much faster backups since it uses the equaly speed of your disk.

You have no overhead searching through every subdir for changed files, since ZFS is already tracking that and can make a snapshot in an instant.
And ZFS has compression (and encryption if needed) built-in, so as long as you remember to keep it when sending the backup you get those benefits without aditional overhead each and every backup.

I just generally love using ZFS to manage filesystems and the data within them, it would be way too exhaustive for me to say everything I love and use about ZFS here.

If you already use ZFS, then you should look into increment snapshots and zfs send/zfs recv or the above zfs_uploader project for making backups from it.

If you don't use ZFS, don't sweat the details, just use Borgmatic and be happy.

vitobotta · April 2022

@Erisa said:

@vitobotta said: what is the advantage of zfs snapshots over regular backups?

Lots.

For real though, mainly that ZFS snapshots can be sent as blocks of data rather than a set of files. You don't need to crawl through every subdirectory and send every (changed) file one by one, you can just stream the however-many-GB snapshot as one. This results in much faster backups since it uses the equaly speed of your disk.

You have no overhead searching through every subdir for changed files, since ZFS is already tracking that and can make a snapshot in an instant.
And ZFS has compression (and encryption if needed) built-in, so as long as you remember to keep it when sending the backup you get those benefits without aditional overhead each and every backup.

I just generally love using ZFS to manage filesystems and the data within them, it would be way too exhaustive for me to say everything I love and use about ZFS here.

If you already use ZFS, then you should look into increment snapshots and zfs send/zfs recv or the above zfs_uploader project for making backups from it.

If you don't use ZFS, don't sweat the details, just use Borgmatic and be happy.

Interesting. I would have to reformat my disks though
But how do you restore the data on another system? For example if you want to restore just something on say a Mac?

Erisa · April 2022

@vitobotta said: Interesting. I would have to reformat my disks though

But how do you restore the data on another system? For example if you want to restore just something on say a Mac?

One disadvantage of the full ZFS setup is not being able to retrieve a single file from your backup. I have to restore the entire dataset (Typically a folder, like your homefolder or a specific service may have its own dataset)

To restore, I would create a new pool on the target if there wasn't already one, and then use the restore command of zfs_uploader or otherwise zfs recv and the snapshot and related data manually, and write it to the poolk as a new dataset (Or addition to an existing one)

On Mac this is naturally a lot harder since ZFS is primarily designed for Linux. I also don't have a Mac, so I can't speak much further than that.

ZFS is like.. an acquired taste. The people who use it generally love it, but it changes the way you approach everything about server setup. It's so much more than a regular filesystem and I love it for that, but I wouldn't recommend to someone unless I knew they were willing to invest the effort to get it working beautifully for their setup.

Generally you should just stick to what you know and use Borgmatic or similar, unless you're in the mood for going down a huge rabbit hole.

sparek · April 2022

Backups are always a question of efficiency.

Backups are important... but how often do you really use them?

Some people may have backups of their backups of their backups of their backups or 365 different backups they can restore from. And I'm not knocking that, but at some point you start spending more on resources (i.e. CPU cycles it takes to generate these backups, the disk space necessary to store these backups, etc) than is really worth having those backups.

Additionally it depends on the overall size of the account as well. It's one thing to backup a static 50MB account. It's another to backup a dynamic 50GB account. Holding multiple full backups of a 50GB account will quickly consume a lot of disk space. You can mitigate the disk space used by only backing up the deltas, but again if the account is dynamic it's going to cost a lot of CPU cycles to build those deltas.

As far as typical web hosting backups - I like to do what I call a split backup. I create a backup package of just the bare minimum needed to recreate the account upon restore. Then I use rsync to sync the home directories of the accounts. The home directories is TYPICALLY going to be where the majority of the disk space of the account is used. For example's sake - I use the --skiphomedir parameter in cPanel's pkgacct script to generate these bare minimum packages. I could probably strip these out further by excluding database dumps in these packages and then dump databases separately and rsync those. But the savings in doing this is negligible for the most part.

Then when I need to restore, I have the backup package that can recreate the account and then just copy the rsync'd home directory back over. Restores are probably a little slower this way as compared to conventional backup restores... but again, how often am I restoring one of these backups? Not often. But it saves a lot of time in the backing up process.

Lee · April 2022

@sparek said: Some people may have backups of their backups of their backups of their backups or 365 different backups they can restore from.

And never once do they actually check until it's too late that those backups are actually usable.

vitobotta · April 2022

@Erisa said:

@vitobotta said: Interesting. I would have to reformat my disks though

But how do you restore the data on another system? For example if you want to restore just something on say a Mac?

One disadvantage of the full ZFS setup is not being able to retrieve a single file from your backup. I have to restore the entire dataset (Typically a folder, like your homefolder or a specific service may have its own dataset)

To restore, I would create a new pool on the target if there wasn't already one, and then use the restore command of zfs_uploader or otherwise zfs recv and the snapshot and related data manually, and write it to the poolk as a new dataset (Or addition to an existing one)

On Mac this is naturally a lot harder since ZFS is primarily designed for Linux. I also don't have a Mac, so I can't speak much further than that.

ZFS is like.. an acquired taste. The people who use it generally love it, but it changes the way you approach everything about server setup. It's so much more than a regular filesystem and I love it for that, but I wouldn't recommend to someone unless I knew they were willing to invest the effort to get it working beautifully for their setup.

Generally you should just stick to what you know and use Borgmatic or similar, unless you're in the mood for going down a huge rabbit hole.

Then it sounds very inflexible to me. I prefer regular backups.

MeAtExampleDotCom · April 2022

@sparek said: Holding multiple full backups of a 50GB account will quickly consume a lot of disk space.

Depending on what sort of account you are talking about, there are techniques to manage that somewhat. If you are holding multiple copies of an entire hosting account (/home, DB backups, ...) each copy in its own compressed archive, then you are going to use a lot of space quickly.

My main backups use rsync and hard-link based snapshots - if very little has changed a new snapshot takes very little space. This doesn't work for large files such as database backups, for those you need to start looking at log shipping or differentials if you want PiT recovery. It is also a per-file method, and piles of hard-links can be inefficient in their own ways. There are other de-duplication methods even some based on backing up whole filesystems this way, you just have to pick the right tool for the data and modification patterns you have, and the restoration options you (and your users, if you are not just looking after your own data) need/desire. Fully deduplicated backups are more susceptible to partial failure (bad sectors, filesystem corruption) because every copy of a given file in every snapshot is potentially the same set of bits on-disk so if one copy is corrupt they all are. Again there are mitigations for this: multiple backup stores, or keeping multiple snapshot chains within the same backup store, or breaking the chain occasionally and converting a snapshot into a full backup, for instance (I use the first two of those, next time I rearrange I might switch 2 for 3 to save a little bandwidth when updating the backups of data from my phone or laptop while on a slow link).

but at some point you start spending more on resources (i.e. CPU cycles it takes to generate these backups, the disk space necessary to store these backups, etc) than is really worth having those backups.

This is very true. I go a bit OTT for some of my stuff, but even then a lot of data isn't backed up at all because I can re-obtain it easily enough (music & videos mainly), or it isn't something I'll really care about losing should something bad happen.

Erisa · April 2022

@vitobotta said: Then it sounds very inflexible to me. I prefer regular backups.

Good, because It's not a regular backup. I was just sharing what I use.

desperand · April 2022

@vitobotta said: What do you use and where do you store your backups?

if linux server -> depends on project. If project small -> just archiving it (zstd) plus adding encryption, and uploading using rclone to any cloud storage of mine. If project medium or big -> incremental backups with restic (but restic cache annoying me way too much)

if windows -> macrium reflect (fucking killer in the market of features (also, they have free version). I'm creating my own golden image, and already have saved tons of my time to restore from snapshot when things got weird on my pc (i love software dev, and experimenting with different tools from github and other trashcans xD).

What about "where to store" -> always a question. The easier and cheapest solution is always to purchase any HDD 2TB or like that + hdd box, and viola -> fast restore/store of my local PC data in encrypted way. Also, recommend: gocryptfs (everywhere, even on android) /cppcryptfs (for windows). Low overhead, nice features.

stefeman · April 2022

@vitobotta said:
What's you backup strategy?

YOLO

Im a true degenerate.

Talistech · April 2022

Hi @vitobotta,
Just here to tell you that I have really enjoyed all of your previous posts. You are asking the right questions and starting really interesting topics.
By reading how others do stuff, I'm also learning new stuff too.

Keep it up. Thank you!

vitobotta · April 2022

@Talistech said:
Hi @vitobotta,
Just here to tell you that I have really enjoyed all of your previous posts. You are asking the right questions and starting really interesting topics.
By reading how others do stuff, I'm also learning new stuff too.

Keep it up. Thank you!

Thanks! Glad my posts are useful

nfn · April 2022

Restic backup to rsync.net, rclone to B2, a live mirror server and some regional servers.

serveradministrator · April 2022

Pictures on DVD and NAS, rest of the files on NAS.

Thundas · April 2022

raid 0, no backups taken at all. If server fails, blame provider for not providing dr.

paijrut · April 2022

Nothing over complicated, just google drive and a local backup pc and check the backup regularly.

Francisco · April 2022

Encrypted zip file named after the latest marvel movie, uploaded to thepiratebay.

Francisco

caracal · April 2022

@Francisco said:
Encrypted zip file named after the latest marvel movie, uploaded to thepiratebay.

Francisco

I'd seed your backups

Homie · April 2022

Duplicati > Backblaze B2

vitobotta · April 2022

@Homie said:
Duplicati > Backblaze B2

I used Duplicati in the past and it chocked with large backups.

rcy026 · April 2022

For my personal servers, I have a shellscript that I simple copy to any new server and add to cron. The scripts then sets up the needed backup by checking what's installed, what directories exists etc etc. It takes me 10 seconds and it is fully dynamic, so I never have to think about adding to backup whenever I install something new.
The script uses Restic to a storage vps running Restic Rest-server, and the data on that server is duplicated to another storage vps at another provider.
My pc's at home store everything on a NAS. That NAS is also backed up to the same Rest-server. There is no local data on any PC that I can not recreate.
I use restic2influx to get performance metrics in Grafana and Icinga gives me a notification as soon as any repository has not been touched for 24 hours.

It has worked flawlessly for years, and I often ponder about how effective such an extremely simple solution actually is. I've helped companies spend tens of thousands of dollars on backup solutions that has performed much worse.

Howdy, Stranger!

Categories

In this Discussion

What's you backup strategy?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

What's you backup strategy?

Comments