New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
To be fair this is exactly how Google Takeout works. You request a backup and after some time (hours or days) you get a multi part zip file (2GB chunks?). It works and most people will understand how to work with it.
The same staffer that is adamant about this did a POC of my streaming gzip aswell. It works well, but with 2 large caveats. You don't know how big the archive is, so you never know when you're 'done' other than when the stream disconnects. Did it disconnect because it's all done or because of a network disruption? No idea.
There's no way to know the final size with the stream gzip option, nor anyway to resume (since we have no guarantee that the files will be in the exact same order.).
The archive option is more involved since we have to spend the time creating the archive and then we have to allocate those additional resources to store the archive for whatever the hold time is (1 week?). It's not a big deal, we have a PB or two just for backups at the moment, with us using only ~100T of it.
Francisco
As there is no
standardizedway of importing those backups anywhere (and depends from platform used) splitting things into different archives is a must. Most people gonna take out mails, fuck the memos/notes, maybe someone will care about files?I don't get the streaming idea - in theory it's nice idea, in practice working with big files without a resume option (or chunks for multithread) it's a PITA - you will end up with a client downloading things to some remote shitty host with 8kB/s and the archive will be 300GB - you will hold it in memory for weeks? Then of course it's gonna die like 157175 times and you will end with 157175 attempts to download that huge thing, rip memory/cpu/io.
Pack it, on demand [could be API, could be ticket] and ship to some remote S3. Remote, clients S3, not yours. He controls that S3, only he can be blamed for speed / lack of space / cost and how long he want to store it there. He is in control of his data.
Plus if you ever end with EU mail (still waaaaaaaaaiting) you won't be harased for uploading EU data to non-EU server?
and yeah, on demand - most people won't care, then people gonna forget about it and be angry that they stopped using your service (without termination of course) and you weekly put a data into theirs S3 and they bill racked up!!1111
Plus you can control it so you won't get (D)DOS-ed by everyone clicking "Backup" at the same time.
What fastmail does is take regular snapshots of the their data (on a per-user basis) and then when you need to restore something, they "mount" a read-only copy of the selected version to the webmail client under a "restore" folder that is available for ~24 hours.
The items (email,files,calendars,contact) show under the corresponding "shared" section, as if another user had shared the folder but I assume driven by some service account.
Anything they want to restore, just select the desired item and copy it over to desired folder like normal.
Very user friendly and don't really need to learn anything new.
Would be a bit annoying for an admin if they needed to restore a ton of accounts at once but tbh most of the time it's just one person who made the mistake...
It does take a few minutes to load (it starts with an empty folder and then starts importing) but that's fine... They already survived ~30 days with the email "missing" with it sitting in their trash before it got expunged - they can wait a little longer.
You can always install a copy of smartermail and drop the whole domain in place and run with.
It isn't a PST file, that'd require we have login/passwords to (attempt to) export.
Francisco
This isn't for 'restoring' within Namecrane. We'll have to think up something clever there.
This is more for people that want their data so they can take it somewhere else all together (or just to have it for legal archival/hold reasons).
Francisco
I mean most people (I assume) that will migrate / backup from
hostedservice gonna go to otherhostedservice, using different app stack.Fair, and you're right, there isn't some standard in place to make that easy to do. We could try to export PST's that someone could then import and 'restore', but Outlook already does that locally. You can just save the PST, attach it to a new inbox, and push the data back.
If you have a big inbox that's going to suck ass, but that's your problem.
Francisco
Oops, my bad..
IMO these are three different use cases that need three different solutions.
For offsite backup purposes, you want to efficiently de-duplicate all the data and store it somewhere reliable.
For migration purposes, it always depends on what tools they have and what the new destination service supports and it's always a little different each time..
Most of the time you end up with having to move to the lowest common denominator (PST if it's microsoft/outlook world or in *nix land a folder of maildir/etc files).
I would keep it simple and have on archive file per user in some standardized format...
domain.com-username.zip or what have you.
While some of the nicer services provide a user interface to migrate stuff (although generally that is using imap-to-imap but then you don't need to have an export in the first place), I would say half the time you end up having to run a script on your computer anyway because everyone has slightly different file formats.
For archival/legal purposes, they generally want to be feed to a specialized database that is setup to lock the records (emails), that prevents them from being modified/deleted before a certain period of time has passed (varies by industry and the specific regulations).
The data also needs to be searchable , which generally means it needs to end up in some kind of a database...
Since you're now using a database, might as well use a SQL connector or load in a nightly "file" with just the changes. Providing the whole data set every time will not be practical.
At one construction firm I worked at it, was ~7 years because that is what the local regulatory commission wanted but at a fintech firm it was like 20 years due to federal finance regulations.
Doing a "once a day snapshot of everything" was good enough for the construction firm and we used an off-the-shelf software MailStore for that running on the archive NAS. This was read using imap on "production" (using a special service account that saw everything) and then feed into the Microsoft(?) SQL server that MailStore used on the backend.
However, the fintech one required us to store copies of every revision of emails, including draft versions they saved (even if it was later deleted and not actually sent anywhere).
The latter in particular was a big pain and we ended up having to switch to an email server using an SQL backend that was "write only", so it kept every possible revision.
Besides the regular archival job, we would have todo regular re-imports (alternating between the primary and standby server), so the expunged records were actually removed (make a empty table on other server, sync non-expunged messages and then switch which is primary) because outlook gets grumpy when you have too much to slog through...
You can import gmail and Microsoft from within the web gui settings.
Microsoft and Google allow mailboxes to delegate access to admins. I know for 365, it was PowerShell only to enable.
>
I think you're going in the wrong direction.
We're talking about exporting all emails/etc on a domain into some format a user can then restore to another smartermail install, or what have you.
Francisco
Here is one.
You're a hosting business after all.
Result: No need to transfer 10TB backup files around.
But then we need to keep at least 2 copies of the data...when we already do fairly paranoid R60's already (6 disk vdev's).
Since I opened this thread we've had time to test different setups and we'll be replacing our rsync/zfs layer with restic. Reason being is that with zfs/rsync, we have to keep track of how much storage each remote node is using and shuffle things around. We can use JBOD's and such but that gets really iffy.
Restic has an S3 backend and we have an S3 platform that easily scales out (we can just throw hardware at it and it's all under the same endpoint). Giving users a "mostly read-only" login would be fine (basically only has write access to the
locksfolder, or we just tell them to use theignore locksoption in restic).With that we're keeping 1 complete/full copy and then if we want to offer the tar/zips, we can without too much effort, just a little bit of storage bloat.
Francisco
So we'll get access to one Restic repository per domain (just in read-only mode)? If so - would "restic mount" work too?
Or I got it all wrong?
TIA
Mount would probably work too.
Francisco
You can code the streaming gzip with node.js streams and know whether it failed or succeeded. I've done that multiple times. If you DM I can send an example
We already have a working POC of it
It's just that there's no way to know the final size to
Content-Length.Francisco
Your http client should be able to tell the difference (even on a raw TCP socket you can tell the difference between a clean shutdown from the server or a network disruption - with a TLS layer on top, your TLS library will tell you as well; and then there is also HTTP chunked transfer encoding that could help)
But yes, streaming doesn't go together that well with resuming.
So which options you go with?
Not decided yet. Akash really wants us to build gzip/zip's upon request, and then the user pulls that down. I think streaming would work nicely, but he's worried about incomplete archives, or archives that we can't resume.
I agree with that and think we could likely just do the gzip/zip/zst archives without too much issue.
Francisco
So restic is no longer in the cards?
I think restic is fine and doable too. Most users wouldn’t be doing restic though, only the technical/advnaced users.
Francisco
Hi @Francisco what solution you ended up using? Were you able to allow customers' use of restic to access mail server backups?
TIA
I would pay a small fee for S3 push, that’d be set up and forget with Backblaze B2
We haven’t decided one way or another. I got caught up with some other big projects and this fell to the way side.
Francisco
I REALLY hope restic is still a serious contender
I think it’s as close to “ideal” as I can come up with, and wouldn’t be overly complex to implement.
Francisco
I'm on macOS so I just use https://thehorcrux.com to back up my email
That sounds interesting, but I'd rather minimize the number of programs installed and stay away from a proprietary backup scheme. I'm using restic already for my backups, so it only makes sense having access to a restic repository for my email
For huge backups (10TB+ especially) I'd suggest the option to physically mail a hard drive with the contents. Place a hold on the users credit card for the value of the drive until it is returned.
As customer, either s3 or restic works for me.I feel sending to s3 bucket is clean and versatile. Later, you can develop features based on s3, such as using restic to backup to s3, which combines the advantages of these two tools.