New on LowEndTalk? Please Register and read our Community Rules.
High bandwidth usage from my backups
I have two of my VPSes which archive /var/www/ and a MySQL dump once a day, then I use another VPS to download them from the VPSes.
The backups together are about 500MB, and are downloaded once a day with SFTP so bandwidth usage should be ~15GB/mo?
But it's already used 50GB banwidth in 5 days.
One thing I noticed the other day, is that one one of the VPSes it pulls backups from, it had about 30+ SSH processes open and everything had crashed but that hasn't happened since.
Here's my backup script
sftp -P port [email protected]:/*.7z backups/site1/;
sftp -P port [email protected]:/*.7z backups/site2/;
and crontab is set to run it every 24 hours
Comments
Where are you seeing the 50GB usuage?
SolusVM
P.S. on the VPSes which are being backed up, when the backup script is run the old backup.7z is deleted first, then replaced, so it's not download 1 backup on day 1, 2 on day 2, 3 on day 3, etc..
Use rsync, not sftp. No need to transfer the same data day after day after day....
Transfer files + mysqldumps to the backup server with rsync, then build redundancy on the backup server instead of archiving things on the production servers.
but if something happens and everything gets deleted on one of the production servers, won't that remove everything from the backup server if it's sync'd?
Only the last scratch file, he's saying to archive on the backup server (thus only the scratchdir would be hurt otherwise.)
Not as bad as when one of my backup scripts went wrong...
Updated post to include an example.
Don't have the --delete flag on.
My recommandation is dont compress your backups before you send them to the backup server. And then use git or rsync and not sftp should be better. At least with git you will only send the diffed changes with rsync i never rember if it sends the file diffs or files that changed
ah cool, I will look into rsync
thanks guys.
Ouch, did you get charged overage fees?
Disagree
If you don't use the --delete flag, then your synced copy will contain files that were intentionally removed from the live site. When the time comes to restore from the backup (which it will) you'll be restoring those files, which should not be there. Endless confusion ensues.
Your rsync copy should be a mirror. New files are added, deleted files are removed, changed files are updated.
To protect against loss or damage (e.g. a hacked site, or inadvertant user changes) you build redundancy on the backup server. So if the most recent backup (e.g. last night) is corrupt, damaged or missing files, you can fall back to a previous night.
You can build redundancy either by creating .gz archives (which is terribly wasteful of diskspace) or by using an app designed to do it efficiently, like rdiff-backup,
I just realised something, right now my backup method pulls a password protected archive from the other servers, so if it gets compromised that's the only thing they'll have access to.
If I use rsync, I'll need to have an account that has full read access to /var/www/ won't I? And the attacker will have access to every upload, config file and DB password (as well as a DB dump) on all of my servers.
To be honest the production servers are fine with the load, but it's still using a lot of bandwidth (used 27GB already since I made this thread)
Does anyone have any ideas why it's using so much bandwidth? Should I use rsync just for the backup archives instead of sftp? and if so, how can I set it up so the backup user is locked to it's homedir (right now I'm using this for SFTP - https://wiki.archlinux.org/index.php/SFTP_chroot)
Use authprogs to restrict what an SSH user can do: https://pypi.python.org/pypi/authprogs/0.5.1
Use duplicity over rsync for the encryption and the compression. Both function similarly in efficiency (duplicity uses rsync's diff library, but doesn't demand anything from the remote server). Think of it like a poor man's rsync. Then you can continue using sftp.
+1 for duplicity.
Another option is to use rdiff-backup if you are comfortable with hard link.
I just solved it, it was a stupid cron mistake.
I'm gonna look into these other methods too though, thanks everyrone.