New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Distributed file systems
tehdartherer
Member
in General
I am planning to create a RAID-like redundant distributed fs to connect a few storage servers to one single "virtual" fs. I know there are a few alternatives to create such a fs, e.g. glusterfs, xtreemfs, ceph, Tahoe-LAFS etc., while all of them have their own feature sets.
I am curious about those people here, which are already using a distributed fs:
- Which software are you using?
- In which mode is it operating? Mirroring / striping / distributing?
- Are the storage nodes in one location (geographically) or scattered to different locations?
- How is it performing? Latencies, transfer rates etc.
- Maybe you have some favorite feature, which the alternatives are missing?
This could help to get an overview of the possible use cases.
Thanks!
Comments
my suggestion is you do your own testing.
In all cases you should run them with mirroring. It is VERY idiotic to run them in striping. Even glusterfs says you should never never never never use striping unless for a few very very very specific cases.
Have I highlighted enough not to use striping? ..... NOPE, DON'T USE STRIPING!
You will never be able to expect 100% uptime on any nodes and so you should make a few replicas period, actually the more the better.
Certainly I will do my own tests. I am not really interested in hints for my special use case, but wanted to see what experiences people have made in different situations.
Also, it would be interesting to hear how resource needs scale up in different cases on the server / node side.
I like GlusterFS a lot since it's master-free, and the management tools and performance are not too bad. It's definitely sensitive to latency, so keep them close together, unless you're doing a distributed setup (which works a bit differently than the other modes, with librsync under the hood, IIRC). I have a simple 2-node cluster (mirror mode) on AWS with about a dozen clients attached. It works pretty well so long as you don't bombard it too hard.
GlusterFS is ok but love to fallover under load and if there are any latency issues. You must also make sure not to expose your GlusterFS over your public interface otherwise there are plenty of simple DoS attacks to shut the daemons down.
XtreemFS is also ok, I have a test bed here at the moment running some loads.
Ultimately I think Ceph is probably the only high-performance/capacity production-ready one you've mentioned.
I use XtreemFS with striping and replication. Replications for each file plus stripes - basically a distributed RAID 10 (or is that a RAID 01?). It performs well and is currently bottlenecking at my Tinc VPN (maxes at 1.6MB/s for some reason). I'm in the process of switching it over to SSL encryption for a WAN deployment.
I like XtreemFS for the automatic failover and operations for WAN deployments - great for LowEndBoxes. XtreemFS also has a working Windows port - automatic 10 points for me. :P
As I remember, Tahoe-LAFS is a totally different beast. With Tahoe-LAFS you have to sacrifice speed and performance for data security and safety. With the other systems you mentioned there is a basic assumption that you can trust each host. With Tahoe-LAFS you don't need that assumption.
Why? I would say that a RAID 10 is prefered over a RAID 1 for speed. Of course I might be missing something.
I think he meant straight striping -- IE: RAID0.
There's nothing wrong but everything right with a mirror+striping setup (RAID10)
The Xtreemfs Windows client seems to have issues with corruption for files >4 GB. Nonetheless clients for multiple platforms is really a nice thing to have.
With the release of glusterfs 3.5 they introduced at rest encryption. Unfortunately it is hitting the performance quite hard, saw some dd numbers ~50 MB/s dropping to ~4 MB/s.
>