Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How would you provide clients access to easily download extremely large backups?

13

Comments

  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited May 2025

    @LordSpock said: Oh that's a neat idea.

    To be fair this is exactly how Google Takeout works. You request a backup and after some time (hours or days) you get a multi part zip file (2GB chunks?). It works and most people will understand how to work with it.

    The same staffer that is adamant about this did a POC of my streaming gzip aswell. It works well, but with 2 large caveats. You don't know how big the archive is, so you never know when you're 'done' other than when the stream disconnects. Did it disconnect because it's all done or because of a network disruption? No idea.

    There's no way to know the final size with the stream gzip option, nor anyway to resume (since we have no guarantee that the files will be in the exact same order.).

    The archive option is more involved since we have to spend the time creating the archive and then we have to allocate those additional resources to store the archive for whatever the hold time is (1 week?). It's not a big deal, we have a PB or two just for backups at the moment, with us using only ~100T of it.

    Francisco

  • JabJabJabJab Member
    edited May 2025

    As there is no standardized way of importing those backups anywhere (and depends from platform used) splitting things into different archives is a must. Most people gonna take out mails, fuck the memos/notes, maybe someone will care about files?

    I don't get the streaming idea - in theory it's nice idea, in practice working with big files without a resume option (or chunks for multithread) it's a PITA - you will end up with a client downloading things to some remote shitty host with 8kB/s and the archive will be 300GB - you will hold it in memory for weeks? Then of course it's gonna die like 157175 times and you will end with 157175 attempts to download that huge thing, rip memory/cpu/io.

    Pack it, on demand [could be API, could be ticket] and ship to some remote S3. Remote, clients S3, not yours. He controls that S3, only he can be blamed for speed / lack of space / cost and how long he want to store it there. He is in control of his data.

    Plus if you ever end with EU mail (still waaaaaaaaaiting) you won't be harased for uploading EU data to non-EU server?

    and yeah, on demand - most people won't care, then people gonna forget about it and be angry that they stopped using your service (without termination of course) and you weekly put a data into theirs S3 and they bill racked up!!1111

    Plus you can control it so you won't get (D)DOS-ed by everyone clicking "Backup" at the same time.

  • What fastmail does is take regular snapshots of the their data (on a per-user basis) and then when you need to restore something, they "mount" a read-only copy of the selected version to the webmail client under a "restore" folder that is available for ~24 hours.

    The items (email,files,calendars,contact) show under the corresponding "shared" section, as if another user had shared the folder but I assume driven by some service account.

    Anything they want to restore, just select the desired item and copy it over to desired folder like normal.

    Very user friendly and don't really need to learn anything new.

    Would be a bit annoying for an admin if they needed to restore a ton of accounts at once but tbh most of the time it's just one person who made the mistake...

    It does take a few minutes to load (it starts with an empty folder and then starts importing) but that's fine... They already survived ~30 days with the email "missing" with it sitting in their trash before it got expunged - they can wait a little longer.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @JabJab said: As there is no standardized way of importing those backups anywhere (and depends from platform used) splitting things into different archives is a must.

    You can always install a copy of smartermail and drop the whole domain in place and run with.

    It isn't a PST file, that'd require we have login/passwords to (attempt to) export.

    Francisco

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @macmouse said: What fastmail does is take regular snapshots of the their data (on a per-user basis) and then when you need to restore something, they "mount" a read-only copy of the selected version to the webmail client under a "restore" folder that is available for ~24 hours.

    This isn't for 'restoring' within Namecrane. We'll have to think up something clever there.

    This is more for people that want their data so they can take it somewhere else all together (or just to have it for legal archival/hold reasons).

    Francisco

  • JabJabJabJab Member

    @Francisco said: You can always install a copy of smartermail and drop the whole domain in place and run with.

    I mean most people (I assume) that will migrate / backup from hosted service gonna go to other hosted service, using different app stack.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @JabJab said: I mean most people (I assume) that will migrate / backup from hosted service gonna go to other hosted service, using different app stack.

    Fair, and you're right, there isn't some standard in place to make that easy to do. We could try to export PST's that someone could then import and 'restore', but Outlook already does that locally. You can just save the PST, attach it to a new inbox, and push the data back.

    If you have a big inbox that's going to suck ass, but that's your problem.

    Francisco

  • Oops, my bad..

    IMO these are three different use cases that need three different solutions.

    For offsite backup purposes, you want to efficiently de-duplicate all the data and store it somewhere reliable.

    For migration purposes, it always depends on what tools they have and what the new destination service supports and it's always a little different each time..

    Most of the time you end up with having to move to the lowest common denominator (PST if it's microsoft/outlook world or in *nix land a folder of maildir/etc files).

    I would keep it simple and have on archive file per user in some standardized format...
    domain.com-username.zip or what have you.

    While some of the nicer services provide a user interface to migrate stuff (although generally that is using imap-to-imap but then you don't need to have an export in the first place), I would say half the time you end up having to run a script on your computer anyway because everyone has slightly different file formats.

    For archival/legal purposes, they generally want to be feed to a specialized database that is setup to lock the records (emails), that prevents them from being modified/deleted before a certain period of time has passed (varies by industry and the specific regulations).

    The data also needs to be searchable , which generally means it needs to end up in some kind of a database...

    Since you're now using a database, might as well use a SQL connector or load in a nightly "file" with just the changes. Providing the whole data set every time will not be practical.

    At one construction firm I worked at it, was ~7 years because that is what the local regulatory commission wanted but at a fintech firm it was like 20 years due to federal finance regulations.

    Doing a "once a day snapshot of everything" was good enough for the construction firm and we used an off-the-shelf software MailStore for that running on the archive NAS. This was read using imap on "production" (using a special service account that saw everything) and then feed into the Microsoft(?) SQL server that MailStore used on the backend.

    However, the fintech one required us to store copies of every revision of emails, including draft versions they saved (even if it was later deleted and not actually sent anywhere).

    The latter in particular was a big pain and we ended up having to switch to an email server using an SQL backend that was "write only", so it kept every possible revision.

    Besides the regular archival job, we would have todo regular re-imports (alternating between the primary and standby server), so the expunged records were actually removed (make a empty table on other server, sync non-expunged messages and then switch which is primary) because outlook gets grumpy when you have too much to slog through...

  • @Francisco said:

    @cmeerw said: How is the customer supposed to use that backup? Set up their own cranemail compatible service to access the data? Is that documented somewhere how to do that?

    You can take the backup, throw it into a smartermail install, and be on your way. Smartermail has a free tier that allows 1 domain and 10 users. File Storage items are stored as the whole file w/ the correct name/path, making it easy enough to cherry pick.

    To be fair though, Google Takeout probably doesnt have an "Eat in" option where you can import the backup, do they? I'm assuming it's a one way trip. I tihnk protonmail has an 'import' option.

    imapsync would allow 2 way push/pull on this as you said, but we do have people that want access to download backups. imapsync also requires you have every users login/password, which is rarely the case.

    You can import gmail and Microsoft from within the web gui settings.

    Microsoft and Google allow mailboxes to delegate access to admins. I know for 365, it was PowerShell only to enable.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @TimboJones said: You can import gmail and Microsoft from within the web gui settings.

    Microsoft and Google allow mailboxes to delegate access to admins. I know for 365, it was PowerShell only to enable.

    >

    I think you're going in the wrong direction.

    We're talking about exporting all emails/etc on a domain into some format a user can then restore to another smartermail install, or what have you.

    Francisco

  • schwabeneschwabene Member
    edited May 2025

    @Francisco said:

    Ideas? :)

    Francisco

    Here is one.
    You're a hosting business after all.

    1. For each user, set up a folder on a separate machine with rsync write access for you, and read-only access for them (e.g., via SFTP). This way, they cannot delete the folder and trigger a full resync.
    2. In your backup process, rsync to BOTH your ZFS storage (as you normally do) and the user's folder. This adds redundancy without affecting your current backups.
    3. If the user wants versioning (by simply creating daily archives or doing borg backups), sell them a storage slab on the same machine and let them handle it themselves - or maybe you’ll decide to handle it for them..

    Result: No need to transfer 10TB backup files around.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @schwabene said: Result: No need to transfer 10TB backup files around.

    But then we need to keep at least 2 copies of the data...when we already do fairly paranoid R60's already (6 disk vdev's).

    Since I opened this thread we've had time to test different setups and we'll be replacing our rsync/zfs layer with restic. Reason being is that with zfs/rsync, we have to keep track of how much storage each remote node is using and shuffle things around. We can use JBOD's and such but that gets really iffy.

    Restic has an S3 backend and we have an S3 platform that easily scales out (we can just throw hardware at it and it's all under the same endpoint). Giving users a "mostly read-only" login would be fine (basically only has write access to the locks folder, or we just tell them to use the ignore locks option in restic).

    With that we're keeping 1 complete/full copy and then if we want to offer the tar/zips, we can without too much effort, just a little bit of storage bloat.

    Francisco

    Thanked by 1mp11
  • So we'll get access to one Restic repository per domain (just in read-only mode)? If so - would "restic mount" work too?

    Or I got it all wrong?

    TIA

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @ypmLA77zcs said:
    So we'll get access to one Restic repository per domain (just in read-only mode)? If so - would "restic mount" work too?

    Or I got it all wrong?

    TIA

    Mount would probably work too.

    Francisco

    Thanked by 2ypmLA77zcs mp11
  • luilui Member

    @Francisco said:

    @LordSpock said: Oh that's a neat idea.

    To be fair this is exactly how Google Takeout works. You request a backup and after some time (hours or days) you get a multi part zip file (2GB chunks?). It works and most people will understand how to work with it.

    The same staffer that is adamant about this did a POC of my streaming gzip aswell. It works well, but with 2 large caveats. You don't know how big the archive is, so you never know when you're 'done' other than when the stream disconnects. Did it disconnect because it's all done or because of a network disruption? No idea.

    There's no way to know the final size with the stream gzip option, nor anyway to resume (since we have no guarantee that the files will be in the exact same order.).

    The archive option is more involved since we have to spend the time creating the archive and then we have to allocate those additional resources to store the archive for whatever the hold time is (1 week?). It's not a big deal, we have a PB or two just for backups at the moment, with us using only ~100T of it.

    Francisco

    You can code the streaming gzip with node.js streams and know whether it failed or succeeded. I've done that multiple times. If you DM I can send an example

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @luissousa said: You can code the streaming gzip with node.js streams and know whether it failed or succeeded. I've done that multiple times. If you DM I can send an example

    We already have a working POC of it :) It's just that there's no way to know the final size to Content-Length.

    Francisco

  • cmeerwcmeerw Member

    @Francisco said: You don't know how big the archive is, so you never know when you're 'done' other than when the stream disconnects. Did it disconnect because it's all done or because of a network disruption? No idea.

    Your http client should be able to tell the difference (even on a raw TCP socket you can tell the difference between a clean shutdown from the server or a network disruption - with a TLS layer on top, your TLS library will tell you as well; and then there is also HTTP chunked transfer encoding that could help)

    But yes, streaming doesn't go together that well with resuming.

  • So which options you go with?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @Motion3549 said: So which options you go with?

    Not decided yet. Akash really wants us to build gzip/zip's upon request, and then the user pulls that down. I think streaming would work nicely, but he's worried about incomplete archives, or archives that we can't resume.

    I agree with that and think we could likely just do the gzip/zip/zst archives without too much issue.

    Francisco

  • @Francisco said:

    @Motion3549 said: So which options you go with?

    Not decided yet. Akash really wants us to build gzip/zip's upon request, and then the user pulls that down. I think streaming would work nicely, but he's worried about incomplete archives, or archives that we can't resume.

    I agree with that and think we could likely just do the gzip/zip/zst archives without too much issue.

    Francisco

    So restic is no longer in the cards?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @ypmLA77zcs said:

    @Francisco said:

    @Motion3549 said: So which options you go with?

    Not decided yet. Akash really wants us to build gzip/zip's upon request, and then the user pulls that down. I think streaming would work nicely, but he's worried about incomplete archives, or archives that we can't resume.

    I agree with that and think we could likely just do the gzip/zip/zst archives without too much issue.

    Francisco

    So restic is no longer in the cards?

    I think restic is fine and doable too. Most users wouldn’t be doing restic though, only the technical/advnaced users.

    Francisco

  • Hi @Francisco what solution you ended up using? Were you able to allow customers' use of restic to access mail server backups?

    TIA

  • I would pay a small fee for S3 push, that’d be set up and forget with Backblaze B2

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @ypmLA77zcs said:
    Hi @Francisco what solution you ended up using? Were you able to allow customers' use of restic to access mail server backups?

    TIA

    We haven’t decided one way or another. I got caught up with some other big projects and this fell to the way side.

    Francisco

    Thanked by 1ypmLA77zcs
  • ypmLA77zcsypmLA77zcs Member
    edited September 2025

    I REALLY hope restic is still a serious contender :smile:

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @ypmLA77zcs said:
    I REALLY hope restic is still a serious contender :smile:

    I think it’s as close to “ideal” as I can come up with, and wouldn’t be overly complex to implement.

    Francisco

    Thanked by 3edrebe ypmLA77zcs mp11
  • I'm on macOS so I just use https://thehorcrux.com to back up my email

    Thanked by 1ypmLA77zcs
  • @vitobotta said:
    I'm on macOS so I just use https://thehorcrux.com to back up my email

    That sounds interesting, but I'd rather minimize the number of programs installed and stay away from a proprietary backup scheme. I'm using restic already for my backups, so it only makes sense having access to a restic repository for my email

  • For huge backups (10TB+ especially) I'd suggest the option to physically mail a hard drive with the contents. Place a hold on the users credit card for the value of the drive until it is returned.

    Thanked by 1OpaqueRegistrant
  • As customer, either s3 or restic works for me.I feel sending to s3 bucket is clean and versatile. Later, you can develop features based on s3, such as using restic to backup to s3, which combines the advantages of these two tools.

Sign In or Register to comment.