New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
What a great thread! Great answers that one can really learn from.
Even though some are a bit... fragile... :-)
and what happens if the connection gets disconnected?
One way: write a log against which to compare. Quick and dirty way: script it and run it against all the directories; if the process gets interrupted you don't lose much time and still have everything transferred.
Ah, your post lead me to an epiphany. I present the Grand Unified Theory of Transferring Millions of Files:
Use tar pipe
If tar pipe fails, decide if you want to start afresh or keep your progress. If the latter, use rsync.
rsync running with parallel inside a screen?
nohup screen is totally a nerdhipster band waiting to be made
Given how rsync works, I'm not actually convinced that piping a tar is any faster than just rsyncing over SSH.
ssh into the server and type the following commands.
cd /
scp -r * [email protected]:/
Now get a new laptop because its now busy forever moving millions of files... or is it? Maybe if you google it you can find out.
null modem
Shure, there are many ways, each with its own advantages and disadvantages.
I'm haven't said a bad word about other ways like rsync but I just happened to like AnthonySmiths way (which also plus/minus is the one I often use) and I've yet to experience failure.
Some here seem to view things from a competitive angle; I tend more to view multiple options/suggestions as welcome variety and choice.
xmodem/crc
tar+gz+pipe+ssh might be faster in the case of several million small files.
I remember a HN or stackexchange thread about a similar problem with tens of millions of files. I'll try and dig it up from bookmarks.
I just use rsync for 99.9%.
(rsync -avPh) covers most transfer cases
I've never needed rsync resume capability between datacenter dedis, except when I forget to add a bwlimit argument (online.net! )
Since we are now moving onto technical merits, I thought I'd just add in a few points:
This is bound to be so heavily IO bound (pardon the repetition) that the cipher shouldn't make any difference at all.
For such (what I assume is a very large) set of files, the tar pipe option is inherently going to be very fragile and I really think repeating from start is not going to be a nice experience (and this is precisely why I didn't think this is a good idea as I mentioned in my earlier comment).
rsync is the simplest and easiest solution (but IMHO, not the best - see point 5)
I still think borgbackup will be the most efficient solution (granted I don't know the content of the files, but I'm willing to wager that there'll be a good level of deduplication - statistically on average, there isn't a great deal of randomness in ordinary files)
Repeating point 1, since this is going to be so heavily IO bound, using something like borg to use those available cpu cycles to compress/deduplicate and build a reusable archive (for posterity) is a really nice benefit. It should be way better than rsync compression wise since it is going to work across the entire repository (and not just a file at a time).
Come on people, like borg - it is a great tool and seems ideal for such a problem.
I don't waste my life on the 'what if' and 'might' I deal with them only when and if they 'actually' happen other wise before you know it your wasting 10x more time putting safety nets in place fore your safety nets safety net and productivity is dead because of what 'might' happen 5% of the time.
@Ruriko Yank out the HDD.
What if you tar to a sshfs mounted volume on your source serv, and then untar that on your target serv? That way you need the free space on your target serv only, not on your source serv.
Or ymodem-g, or (salivated) zmodem. Resumable downloads blew my mind. The guy who created ymodem/zmodem (Chuck Forsburg, don't quote me on that spelling) just died last year :-(
Of course, we could also uuencode every file and sent it through a bespoke RabbitMQ deployment...
You're assuming (a) dedupe will be significant (if the OP has 10,000,000 images, it won't), and (b) that there will be posterity. Even if there is...I'm guessing the OP just wants to move file, not build another Trementina Base.
You'll be surprised and that's why I qualified it:
As for:
It's a fringe benefit of running borg. If you're ok running tar (or something similar to "create" an archive) I don't see why running borg to create a similar "archive" is bad.
Again my answer (and addition to the technical "variety" of solutions) was to highlight the very useful (and valid) case of running borg (even in chunks if required) to get the job done in a safe/sane manner.
There! Just look! A letter fell off the safety net!
................................................................... letter, falling -> x
........................................................wireSHARK, waiting -> Osssssssssssss
(Note the open snout! What an evil shark! Had only I used a safety net for the safety net!)
@raindog308 said:
Yep. I had forgotten that he created both Y and Z until I put on this rather in-depth, but incomplete documentary on BBS systems. I wish it was purchasable, but the author released it under CC, so I can at least burn my own low quality DVDs (Being that it was filmed in 2001-2005, there won't be a high def version-ever). He went into the ANSI scene, cracking, and even the textfiles folks- but completely omitted the demoscene, which makes NO sense to me- even though those have since been covered pretty well.
I generally used GSZ on my client side, but did get to use HS/Link once or twice, so the Sysop and I could play Tetris while I downloaded at about 1MiB per 5min.
I'd be interested to read the thread, if you can find it.
The I/O overhead with lots of small files is going to be primarily in disk seeks (which happen regardless of whether you use scp, rsync or tar), and unlike scp, rsync works with a continuous stream of data, which should get you comparable latency to tar since it's pretty much taking the same approach.
The only more effective solution I can see is a straight disk image, since that totally disregards the filesystem - it's just a straight read from start to end, with no disk seeks.
Couldn't find it in my pinboard.
But you're probably right about rsync internal implementation being similarly performant. (if you can skip metadata)
Some ideas here for extreme file sets: https://news.ycombinator.com/item?id=8305283
I'll put it to the test when I have to deal with this scenario someday.
Yeah, this is probably what I'd do if my provider doesn't mind me use bursting 1gbps @ 3hrs/TB , at night or something.
note you can use ssh -C to compress on the fly.