Alternatives to gluster?

MuZo · March 2014

I've setup pydio in a cluster (percona for the database and gluster for the files) but it takes around 2 min to load the website.

After doing some tests I found out that the problem is probabilly gluster therefore I'm looking for alternatives.

I've checked DRDB but from what i've undersood it only works between 2 servers.

rm_ · March 2014

There's quite a lot of them, e.g. also check out XtreemFS.
http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems
http://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems
But I found GlusterFS to be the best, it seemed to be the most actively developed and also easy enough to understand and set-up, with a design that's not over-engineered (for example with XtreemFS you need to run like 3 separate server processes to even begin to do anything).
Maybe you misconfigured GlusterFS, and it tries to do something (resolve some DNS?) and pauses the operation until that attempt times out.

serverian · March 2014

Tahoe-LAFS

hbjlee17 · March 2014

moosefs

MuZo · March 2014

rm_ said: Maybe you misconfigured GlusterFS, and it tries to do something (resolve some DNS?) and pauses the operation until that attempt times out.

I'm testing at the moment with only 2 servers both running gluster server-client (I followed @Raymii tutorial: https://github.com/RaymiiOrg/website-mirrors/blob/master/raymii.org/s/tutorials/Gluster-webroot-cluster.html)

I'm using IPs to connect to each other.

server 1 disk IO

dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 0.734081 s, 366 MB/s

server 2 disk IO

dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 0.709718 s, 378 MB/s

server 1 downloading form server 2

35% [=============> ] 95,701,552 18.1M/s eta 10s

server 2 downloading from server 1

30% [===============> ] 81,204,717 10.7M/s eta 17s

IO speed in gluster folder

dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 68.3736 s, 3.9 MB/s

XtreemFS looks good, so does MooseFS but, as @rm_ said, they do require more services up to run. I would prefer something simple like gluster.

serverian said: Tahoe-LAFS

I would have to setup sftp access to it and mount it via sshfs, it'd probably end up being slower than gluster.

FrankZ · March 2014

I am running a redundant gluster with a drive in Montreal, Dallas and Phoenix

dd if=/dev/zero of=iotest bs=64k count=16k conv=fdatasync && rm -fr iotest

16384+0 records in

16384+0 records out

1073741824 bytes (1.1 GB) copied, 56.6597 s, 19.0 MB/s

Not so great I/O but good for backups

I also run a redundant gluster on a couple of different providers all in Dallas for a similar purpose as yours. I/O is quite a bit better

dd if=/dev/zero of=iotest bs=64k count=16k conv=fdatasync && rm -fr iotest

16384+0 records in

16384+0 records out

1073741824 bytes (1.1 GB) copied, 23.4155 s, 45.9 MB/s

I would think that bandwidth and distance will make the biggest difference no mater which redundant drive system you use.

MuZo · March 2014

@FrankZ I ran the same dd test in the gluster folder:

dd if=/dev/zero of=iotest bs=64k count=16k conv=fdatasync && rm -fr iotest

16384+0 records in

16384+0 records out

1073741824 bytes (1.1 GB) copied, 367.283 s, 2.9 MB/s

I'm using backupsy in NL and xenpower (prometeus) in IT, ping around 23ms.

Montreal - Phoenix is further away than Naaldwijk - Milan.

FrankZ · March 2014

@MuZo - I/O should be better then that. It should be close to what your download/upload speed is between the locations.

Are you using the root partition/LV for the gluster brick ?

EDIT: It is between 38ms and 43ms to Montreal, Phoenix, Dallas locations

MuZo · March 2014

server 1

Filesystem            Size  Used Avail Use% Mounted on
rootfs                119G  2.3G  110G   3% /
udev                   10M     0   10M   0% /dev
tmpfs                 101M  104K  101M   1% /run
/dev/xvda1            119G  2.3G  110G   3% /
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                 304M     0  304M   0% /run/shm
<server 1 IP>:/cloud   98G  1.8G   91G   2% /var/cloud

server 2

Filesystem                                              Size  Used Avail Use% Mounted on
rootfs                                                   98G  1.8G   91G   2% /
udev                                                     10M     0   10M   0% /dev
tmpfs                                                    51M  180K   50M   1% /run
/dev/disk/by-uuid/332a044a-d26b-4d38-be73-19cbc21f3bda   98G  1.8G   91G   2% /
tmpfs                                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                                   305M     0  305M   0% /run/shm
<server 2 IP>:/cloud                                    98G  1.8G   91G   2% /var/cloud

dcc · March 2014

A few years back we used ceph for backups (on a few cheap servers packed with hard drives). It was running pretty well, benchmarks were decent... until we started putting considerable load on it.

We ended up switching over to ZFS. It is not ideal, but at least it works.

FrankZ · March 2014

@MuZo said:
server 1

Filesystem            Size  Used Avail Use% Mounted on
> rootfs                119G  2.3G  110G   3% /
> udev                   10M     0   10M   0% /dev
> tmpfs                 101M  104K  101M   1% /run
> /dev/xvda1            119G  2.3G  110G   3% /
> tmpfs                 5.0M     0  5.0M   0% /run/lock
> tmpfs                 304M     0  304M   0% /run/shm
> <server 1 IP>:/cloud   98G  1.8G   91G   2% /var/cloud

server 2

Filesystem                                              Size  Used Avail Use% Mounted on
> rootfs                                                   98G  1.8G   91G   2% /
> udev                                                     10M     0   10M   0% /dev
> tmpfs                                                    51M  180K   50M   1% /run
> /dev/disk/by-uuid/332a044a-d26b-4d38-be73-19cbc21f3bda   98G  1.8G   91G   2% /
> tmpfs                                                   5.0M     0  5.0M   0% /run/lock
> tmpfs                                                   305M     0  305M   0% /run/shm
> <server 2 IP>:/cloud                                    98G  1.8G   91G   2% /var/cloud

You need to have a separate block device for the gluster brick.

Using logical volumes on 100 GB drives you could set the root volume at 50gb and make a separate /cloud volume of 50 GB to use for the gluster. You would need two separate logical volumes/partitions.

/dev/mapper/yada-root -- /

/dev/mapper/yada-cloud -- /mnt/cloud

Server IP:cloud -- /var/cloud

hbjlee17 · March 2014

MuZo said: XtreemFS looks good, so does MooseFS but, as @rm_ said, they do require more services up to run. I would prefer something simple like gluster.

We were able to achieve 90MB/s dd tests using moosefs on a gigabit interface.

tchen · March 2014

@dcc said:
A few years back we used ceph for backups (on a few cheap servers packed with hard drives). It was running pretty well, benchmarks were decent... until we started putting considerable load on it.

We ended up switching over to ZFS. It is not ideal, but at least it works.

I think it's improved a bit since then. Dreamhost is using it now as their backing store for their dream objects so it can't be that bad.

dcc · March 2014

@tchen said:
I think it's improved a bit since then. Dreamhost is using it now as their backing store for their dream objects so it can't be that bad.

IIRC, Dreamhost was the company who conceived Ceph, so I am not surprised to hear that it works well for them

Last time I played with ceph was about 6 months ago, and it still had issues under load. To reproduce, simply do a few simultaneous "dd if=/dev/zero of=/file/on/ceph/cluster bs=1M". First hour or so the cluster will perform just fine. Next few hours you will see delayed sync warnings. A few more hours into the test your nodes will start going offline.

Back then I checked their mailing list, and I found that we were not alone with this issue, and there was no simple solution.

We are currently looking to deploy a few 40TB+ clusters for backup purposes, and for now it seems like we would have to stick with ZFS + sshfs. This setup is quite ugly, but so far it was the only solution that survived under heavy load.

MuZo · March 2014

With the help of @FrankZ I got the I/O speed in gluster folder almost the same as the download/upload speed between the two nodes, around 13MB/s. No idea why but this by just creating a separate partition for the gluster brick.

Pydio works faster than before but it still takes quite a long time when loading pages.

@FrankZ suggested to get nodes with 30-40MB/s throughput to do the job.

In alternative I was thinking about something that checks a local folder for changes and sync it between multiple nodes, something like btsync would work? (But not btsync as it's not opensource, it 'calls home' and it takes some time to detect changes).

In this way I/O isn't a problem and a 'normal' 100mpbs connection would be okay.

fileMEDIA · March 2014

dcc said: Last time I played with ceph was about 6 months ago, and it still had issues under load. To reproduce, simply do a few simultaneous "dd if=/dev/zero of=/file/on/ceph/cluster bs=1M". First hour or so the cluster will perform just fine. Next few hours you will see delayed sync warnings. A few more hours into the test your nodes will start going offline.

No problem with Ceph here. We using it on our Proxmox clusters in our testing area. We hosting around 500 VMs on it with 3 nodes and Infiniband QDR. Ceph cluster was built with the included functions of proxmox. Important is a low latency on the network side and SSDs for journaling.

MuZo · March 2014

What about http://inotify.aiken.cz/?section=incron&page=about&lang=en with http://www.cis.upenn.edu/~bcpierce/unison/ ?

tchen · March 2014

@MuZo your alt is csync2 + lsyncd

MuZo · March 2014

@tchen said:
MuZo your alt is csync2 + lsyncd

The problem of csync2 + lsyncd on multiple servers (incron + unison too) is that they have to be setup like a chain, if one node goes down, all other can't work.

FrankZ · March 2014

EDIT: removed silly comment

tchen · March 2014

@MuZo said:

Depends on what you're looking for. From the posts so far, performance seems to rank the highest. No talk about hot concurrency of data, so...

http://www.severalnines.com/blog/scaling-drupal-multiple-servers-galera-cluster-mysql

Key is to set up multiple layers of csync2 fanouts, not a ring chain. The only gotcha is stonith which is needed if you don't want consistency clobbering - but that would be something you'd have to deal with anyways even with glusterfs and similar real time replicated stores, in their own fashion.

Virtovo · March 2014

Could consider sheepdog:

http://sheepdog.github.io/sheepdog/

devops231 · June 2015

Have you considered ObjectiveFS (https://objectivefs.com)?

It is a FUSE filesystem running on top of AWS S3. Much easier to maintain and more reliable than GlusterFS.

pbgben · June 2015

Check out http://lustre.org

joepie91 · June 2015

devops231 said: It is a FUSE filesystem running on top of AWS S3.

Doesn't that kind of defeat the point?

FrankZ · June 2015

I thought this thread looked familiar.
@devops231 has raised the dead with his first comment.

joepie91 · June 2015

@FrankZ said:
I thought this thread looked familiar.
devops231 has raised the dead with his first comment.

Hmm, odd. Starting to sound like spam, actually.

devops231 · June 2015

I came across this thread just this week and thought I'd chime in.

@joepie91: Sorry, I didn't mean to come across as spam. In the case of FUSE on top of S3, it saves you from running a storage cluster like GlusterFS.

joepie91 · June 2015

@devops231 said:
joepie91: Sorry, I didn't mean to come across as spam. In the case of FUSE on top of S3, it saves you from running a storage cluster like GlusterFS.

It also doesn't provide the same functionality. The whole point of GlusterFS is to remove the single point of failure by offering a distributed filesystem.

As far as I can tell, ObjectiveFS is just a FUSE layer over S3 - a completely different kind of application for completely different kinds of usecases (and by definition not distributed).

Howdy, Stranger!

Categories

In this Discussion

Alternatives to gluster?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Alternatives to gluster?

Comments