New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
What's the best way for server to server transfer millions of files?
I have like over 10million files and I want to transfer to another server. What's the best way to transfer files over to a new server quickly? I don't want to tar the directory cause I'd need double the space which I don't have.
OS for both servers is Ubuntu 16.04
Comments
sftp, rsync, seafile.
Make a script that compresses a small amount of files (taking in consideration their size and not only their number, big files that will require more than the a) and directly uploads them to your new server, deletes the compressed file and repeat.
The speed will depend on the available resources (space, i/o, cpu & ram) on the initial server and upload speed obviously.
You can speed up the process by deleting the originally uploaded files and increasing the number/size of files to compress after each upload but it's kinda risky IMO.
You can give lftp a try (over sftp) with concurrency set to x concurrent threads.
Quickest way will be to tar them, but requires effectively double the space as you mentioned.
Trying to transfer all of the files over the network will be several times slower due to latency, so it's in your interest to archive them first where latency is lowest (on the local system). You could potentially gzip the archive to save on space, assuming the files compress well. This will slow the process down a bit, but I expect it would still be quicker than trying to rsync/scp/ftp 10 million files.
You could still tar the files and dump the resulting archive at the remote end, for best of both worlds. I haven't tested this exact example, but it illustrates the idea -- http://meinit.nl/using-tar-and-ssh-to-efficiently-copy-files-preserving-permissions
rsync and forget it for a few days
Can agree that probably the easiest solutions would be rsync
Transfer an entire disk partition as an image file.
You could also use Duplicati. Open up a FTP server on your new server and set it to back it up there. Set block size to around a Gig or two. Than use duplicati on the destination to unzip it again
+1 for rsync
make use of compression
simply restart to resume if it somewhat gets broken (which it normally doesn't)
make a second run to update files which might changed in between
also use it with screen to be able to detach/logout while it's running
screen + rsync is the best if time is not an issue because it also preserves timestamps (if that's important). Quickest would be tar all files and then move it. If it's compressible the faster you can do it.
screen + tar + rsync done right!
screen is not entirely necessary..you can just
In fact, I think bash runs things nohup by default.
You do know that just because it doesn't stop the process with Cygwin crashes that it isn't a default, right?
O crap does this count as mod sass?
Pull the hdd , attach and mount it on the other server then copy?
Depending on the CPU you have, I'd suggest you to use rsync with decent CPU or dd'ing whole parition in rescue mode with crappy CPU directly to second server and mounting it there.
I think this is a good use case for borgbackup. Assuming that what you have compresses reasonably and possibly/hopefully/probably has duplicate block/content that can be deduplicated and assuming you have enough disk space (and cpu/memory) to run borg, give it a shot.
It should be better than vanilla rsync (which is the other best option, with compression) because either way you've got to read the whole damn set of files but with borg you'll at least get some (hopefully large) reduction in what you have to copy across the network.
Of course it may take a long time but that's the price you have to pay for bandwidth reduction.
If you really are a fan of tar you can tar to a pipe to ssh to the remote host to untar there but it'll be horribly fragile so I don't even want to imagine running something like this (it is uninterruptible and essentially useless, but it can be done, just for pedantic purposes).
DD the hdd image / tar the files to an sshfs network disk, or something like that.
If you have NO free space to tar them, the easiest way it's mount remote hdd via samba (or SSHfs) and make tar your small files to new mounted folder.
You're implying that I've ever used cygwin so yeah, that's sass.
I don't know about cygwin, but a little light binging shows that nohup behavior isn't the bash default...I guess that's why I always use nohup regardless :-)
I've never used a shell that has setup nohup as a default. Maybe it was in a forgotten global profile setting.
Another +1 for rsync. I use it along with tmux.
If you can't image transfer, then tarring subdirectories (containing say a few thousand files each, so not too much disk space) will likely be a lot faster than transferring individual files over ssh/rsync, because of inefficiencies in those protocols. Having millions of files on an HDD is likely to be painful regardless. SSD won't be as bad, but that's still an awful lot of files. Are you sure you don't want that info in a database?
i would go for the LFTP approach. I used it moving several sites with lots of little images.
It's not the protocol inefficiency so much as transferring all the metadata. With huge numbers of files, the metadata becomes huge.
They could be images, etc.
(Yes you can put images in a DB but for typical web hosting...you don't want to).
takes up no space on source, untar directly on destination, use arcfour256 rather than AES and you will get a much better transfer rate at the cost of a weaker cypher.
>
It doesn't.
>
I've never heard about anything like that. tmux does some weird stuff with the daemonize syscall to work the way it does, and I'm not sure about screen.
Err.. not just "weaker". It's essentially considered "comically broken".
Pfft, I'm fairly certain that Obama won't wiretap the stream. /jks
First: Cool that you mentioned the possibility to tar and pipe.
I would, however, insert (via pipe) an intermediate zstd stage because compression is key here. If one doesn't like that it might be worthwhile to replace the 'z' by 'j' for bzip2 compression.
As for the cipher I stand with AnthonySmith unless the files are very sensitive (which they are probably not).
Please note that "cipher XYZ has been 'broken'" usually just means that cryptanalysts succeeded in a lab to significantly decrease security, say from 2^96 to 2^68. That does rarely mean that any and all use of that cipher is utterly unreasonable. Keep in mind that while say 68 bits remaining security could be cracked - still with some effort needed and running big iron - by nsa, it will not be easily drive-by cracked by just any scriptkiddie.
Usually the goal in cases like this here is something like "I do not want to send my files in plain text. I want some security (but I'm not transmitting state secrets)".
That said, you can actually have both because many modern sym. ciphers are blindingly fast. AES comes to mind.
So, my advice would be to use AES-128 which has no problems to en/decrypt much faster than a Gb connection can pump. As for PKE AnthonySmith is right again; don't care, that is a single shot operation anyway. But for sym. crypto (which is used to actually en/decrypt the data) AES-128 might be a better choice than arcfour.
I like @AnthonySmith's solution...but if it was me, I'd probably still use rsync because it's resumable, whereas if the far fails for whatever reason, you have to start over. Then again, if it's few hours' to copy and you don't care if you have to maybe repeat the copy, it'll be faster with the tar pipe.
Multiple quality answers to a technical question on LowEndStackExchange. I like it.