Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)

stoned · January 2023

Given Racknerd's awesome deals.. here's some brainstorming:

$1.25 * 4 = $5, right? So...

Say for example Racknerd, four data centers, buy a $1.25 VPS at each data center, cost you $14.88/year * 4 = 59.52. Call it 60/12 = $5/mo

For $60/year, $5/mo, you can setup Debian on each server, then install proxmox VE, and use the PVE as a LXC container manager.

For $60/year, you'll have a 4 * 25GB = 100GB cluster, 4 vCPU, 4GB RAM.

Using IPv4 you can setup GlusterFS and share 10GB directory brick from each server, and get 20GB redundant networked clustered FS storage.

Now you are spread across 4 data centers, around the country, in a 4 node PVE cluster, and can move your containers, schedule automatic backups, redundancy, migrations, and setup load balance and get 4x redundancy on your priority sites/services. This is a better more featureful setup than just docker containers on a Debian server for example.

That's $5/mo for a cheapo 4 node PVE cluster with 4x redundancy on your services. If you want to go bigger, you can do the same on other better providers with better features and machines if you can afford it.

Couple this with HostHatch 2TB for $3/mo storage VPS ($110/3yr) if you can find it, and that's now $8/mo.

Now you have pretty darn fast (HH is very good so far) storage you can share 2TB of proxmox backup server to your 4 nodes. You can use the HH storage VPS for other storage needs as well for Nextcloud or something.

That's $8/mo for 4 node PVE cluster with 2TB backup storage.

With this setup, you can run a pretty decent small business with redundancy. Cost? $8/mo. More cost? Learning curve, time spent on setting things up, maintaining them, administering them. etc.

That's even better than a single point of failure dedicated server. Just.. its not as beefy/powerful as bare metal, but you can still host dozens of small to medium websites, few other services, even next cloud with the remote 2TB storage you can configure it for.

If you can't get the HH deal, you can setup PVE at home, with a larger disk, and use that as a backing store as well.

Cheers.

Neoon · January 2023

Proxmox consumes 1GB alone
Shared Network storage over high latency is going to suck big time
Proxmox Cluster is unstable above 10ms, they recommend even less

Yes you can do it within the same DC sure.

stoned · January 2023

I've got a cluster up and it's working nicely. Coupled with ZRAM and some additional swap it seems to be working quite nicely. Every node is a diff server in a different data center.

Here are the numbers on the Racknerd 1G KVM node which runs 4 pretty decent-load containers.

Takes a 1-2 minutes to backup/restore normal sized containers on such a network, but latency isn't an issue on the clustered FS.

I understand they recommend certain things, but for my needs, things seem to be working for a long time.

I mean I said 'brainstorming' but really speaking from actual experience. It's totally doable and I don't find it too slow or experience hiccups. I utilize clusters built from nodes around the world and latency still isn't a big deal.

But I do have a private 7th server acting as a wireguard vpn, so each node can talk to other nodes on the private network. But that goes through the 7th server and makes things very slow. Direct connections to the cluster nodes even over 100mbps seem just fine (backups/replications/migrations take longer on slower network obviously).

If you turn off the clamav and freshclam services in PMG, even that runs nicely within 1G RAM as well in a container and can handle a decent load of email traffic. As you can see I have two mail proxies relaying to my third mail server, for redundancy via MX. Redundancy is the name of the game, primary goal, and also to be cheap and low end!

I should add that you will need to engage in a lot of performance tuning and tweaking, but it's fun and can be done under the Sun.

But for production and business use, one should invest according to business needs. But if one is lacking funding, cheaper lower end alternative like this can still work for some people.

Ideally, each node should have 1G connection or higher and maybe 2G+ RAM. 1G is the bare minimum.

64383042a · January 2023

@stoned said:
I've got a cluster up and it's working nicely. Coupled with ZRAM and some additional swap it seems to be working quite nicely. Every node is a diff server in a different data center.

Here are the numbers on the Racknerd 1G KVM node which runs 4 pretty decent-load containers.

Hi @stoned,

Very interesting idea indeed.

Would you mind sharing some more details about how you tweaked Proxmox using ZRAM for use in a low ram (e.g. 1G) VPS?

Thanks very much in advance.

Cheers,

stoned · January 2023

Thanks! Yes, sure.

Use ZSWAP. Edit /etc/default/grub and add boot parameters:

GRUB_CMDLINE_LINUX_DEFAULT="zswap.enabled=1 zswap.compressor=lz4hc zswap.max_pool_percent=40 zswap.zpool=z3fold"

In order for this to work, you need to enable certain modules:

echo lz4 >> /etc/initramfs-tools/modules
echo lz4_compress >> /etc/initramfs-tools/modules
echo lz4hc >> /etc/initramfs-tools/modules
echo lz4hc > /sys/module/zswap/parameters/compressor
echo z3fold >> /etc/initramfs-tools/modules

update-initramfs -u
update-grub
reboot

Ensure it's loaded: dmesg | grep zswap:

Combine with ZRAM:

#!/bin/bash

# ZRAM custom script by stoned

### BEGIN INIT INFO
# Provides:          zram
# Required-Start:    $local_fs
# Required-Stop:     $local_fs
# Default-Start:     S
# Default-Stop:      0 1 6
# Short-Description: Use compressed RAM as in-memory swap
# Description:       Use compressed RAM as in-memory swap

modprobe zram num_devices=1             # inserts zram modules, invokes 1 device 256MB

case "$1" in
    "start")
        echo 256M > /sys/block/zram0/disksize   # assigns size limit to virtual ram disk
        echo 1 > /sys/block/zram0/max_comp_streams #set max compression streams number to 3
        mkswap /dev/zram0                       # sets up a virtual swap area in ram
        swapon /dev/zram0 -p 100                 # enables virtual swap, assigns priority
        echo "zram.start executed: $(date +'%Y%m%d-%H%M%S')" >> /var/log/zram.log
;;
"stop")
        swapoff /dev/zram0                      # disables virtual device swap area
        echo 1 > /sys/block/zram0/reset         # triggers reset in running zram device
        wait
        sleep .5
        modprobe -r zram                        # removes module from running kernel
        echo "zram.stop executed: $(date +'%Y%m%d-%H%M%S')" >> /var/log/zram.log
;;
*)
        echo "Usage: `basename $0` (start | stop)"
        exit 1
;;
esac

You can even create a 512GB ZRAM device.

Now you want to setup sysctl.conf to tweak swapping:

vm.swappiness = 100 
vm.vfs_cache_pressure = 500
vm.dirty_background_ratio = 1
vm.dirty_ratio = 50

vm.swappiness = 100 Increasing how aggressively the kernel will swap memory pages since we use ZRAM first. Since we're on a low memory machine, please swap at all times, 100%.

vm.vfs_cache_pressure = 500 cache pressure denotes how frequently data is pushed to swap and memory is freed up. 500 is the highest value. With swappiness at 100, and pressure at 500, you will swap anything not currently in use into SWAP. Only needed things remain in RAM at this point. This increases the tendency of the kernel to reclaim memory used for caching of directory and inode objects. You will use less memory over a longer period. The downside of swapping sooner negates the performance hit.

vm.dirty_background_ratio = 1 You want to ensure the background processes start writing immediately at dirty ratio at 1%, instead of 10% or 20% like normal.

vm.dirty_ratio = 50 Do not force synchronous I/O until it gets to 50% dirty_ratio.

You will actually in time end up using and requiring more amount of memory than you have. You know that you will swap. ZRAM makes swapping less expensive and stores the same data compressed 2x to 3x smaller, you should begin this swap exchange sooner.

The cache pressure helps because we are in effect telling the kernel that you don’t have any extra memory lying around to use for cache, so please get rid of it sooner and only store the most frequently used.

Even with reduced caching, if we use most of the installed memory over time, the kernel will start opportunistic swapping to ZRAM much sooner so that CPU (used for compression) and swap I/O aren’t delayed until all at once.

And you also want a decent size SWAP space, 4GB should be enough.

#  swapon -s
Filename                                Type            Size    Used    Priority
/swapfile                               file            2097148 669560  -2
/dev/vda2                               partition       1048572 0       -3
/dev/zram0                              partition       262140  240536  100

Some tips I found from another mind: https://unix.stackexchange.com/questions/707108/is-increasing-the-max-pool-percent-parameter-for-zswap-a-smart-idea-to-combine-c

I have zswap in front of a writeback enabled zram swap device, plus a regular disk swap device with lower priority, just in case. Zswap has all it's perks, but 3:1 compression ratio max. When zswap triggers it's own writeback, be it LRU or incompressible data -- that data is sent to zram. If it's incompressible -- a simple script triggers huge pages writeback to zram's backing_dev, if pages were LRU evicted, then zswap decompresses them before eviction, and zram compresses them again before storing, achieving higher compression ratio than zswap (using lzo or zstd, i prefer zstd,it gives ~ 5:1 compression ratio). Same script monitors how much RAM zram has consumed, and dumps entire contents to zram's backing_dev, if the cunsumption higher than needed (i set it to 30% of zswap's max_pool_percent). This way, once the system begins swapping, zswap softens up the initial performance hit, and once it's full, instead of grinding performance to a halt with disk-based swap -- zram takes the hit. And even when the data, that is being fed to zram, can't be compressed, zram will still strip zero-filled and same-filled pages from it before sending to backing_dev, resulting in even less disk i\o. Surprisingly, zswap by itself doesn't do that -- once the pool limit is hit, it just stops accepting ANY new pages, so all the zero-filled and same-filled pages are just being written to disk, slowing down i\o, and hogging cpu. Hence the zram layer.

You'll have to write your own custom memory management script based on your needs and what type of containers and/or applications you plan on running. Know how the software works, how much memory it consumes, and optimize from there.

Check out https://unix.stackexchange.com/questions/406925/prevent-zram-lru-inversion-with-zswap-and-max-pool-percent-100 as well.

Subjectively, high max_pool_percent values resulted in sluggishness.
Subjectively, the system in experiment 9 was so slow as to be unusable.
High max_pool_percent values result in the least amount of writes whereas very low value of max_pool_percent results in the most number of writes.
Experiments 5 and 6 (zram swap) suggest that firefox wrote data that resulted in about 62000 sectors written to disk. Anything above about 1433000 are sectors written due to swapping. See the following table.
If we assume the lowest number of read sectors among the experiments to be the baseline, we can compare the experiments based on how much extra read sectors due to swapping they caused.

Cheers. Let me know how your adventure goes.

drunkendog · January 2023

Couple comments:

1) I don't get why you wouldn't use LXD instead; it will save you a lot of overhead compared to proxmox, especially given that you're running this on 1GB VPSes.
2) I really don't get why you're using one provider for your four VPSes; you still have one single point of failure, it's just your host now. Why not do 1 GreenCloud, 1 RackNerd, 1 HostHatch, 1 liteserver?

Otherwise, really interesting idea!

stoned · January 2023

1) I don't get why you wouldn't use LXD instead; it will save you a lot of overhead compared to proxmox, especially given that you're running this on 1GB VPSes.

For features that come with proxmox that are not in LXD and the overhead isn't that much. Ideally, you should be running them on 2+GB machines at least, but it's also possible with 1GB. Someone else may have different reasons.

LXC is the technology allowing the segmentation of your system into independent containers, whereas LXD is a daemon running on top of it allowing you to manage and operate these instances in an easy and unified way.

In my earlier post I had mentioned that the reason is primarily for a cluster setup for redundancy (thanks for contributing about the different provider bit, I'd forgotten to mention that). These features like backups, clusters, automated replications and etc. etc. don't come with LXC/LXD alone on a debian box or just a box with docker container and portainer even.

I really don't get why you're using one provider for your four VPSes; you still have one single point of failure, it's just your host now. Why not do 1 GreenCloud, 1 RackNerd, 1 HostHatch, 1 liteserver?

Correct. Ideally, yes, spread them out over different providers. I thought I'd mentioned that. I guess I didn't.

I have only one 1GB node. Rest are dedicated servers. Though if it's possible with 1 1GB node, it should be possible with ...N 1GB nodes.

This is an example of the possibility, and by no means a perfect setup. I encourage others to contribute if they'd like and help improve the setup for redundancy and lowend costs with other desired features or suggestions.

Thanks for your ideas and input

P.S. I plan on adding more 1GB nodes and even try it out on a 512MB node for experimentation.

TimboJones · January 2023

I've learned my lesson years ago not to run business use stuff on minimum requirements let alone less than. Eventually, an update will occur and the specs get bumped and now your server runs even worse than dogshit.

stoned · January 2023

zstd works better than lz4hc

I just added another 1G node for testing. Let's see how all this fairs.

stoned · January 2023

Here's a 1GB RAM node in a PVE cluster.

It's now got a third MX for redundancy on a third provider (4 providers in my cluster so far), 512MB RAM container, with memory optimizations:

Same container from the inside:

Mails are being delivered just fine:

Mails are delivered slowly though, but not that I notice it from a user's perspective so far.

VS.

AMD EPYC 64 coreserver that has 8G RAM and 2GB for the PMG container here, with 4vCPU performing 10x better.

I imagine performance would be the same, maybe slightly less due to bit more swapping if I were to try a 512MB PVE node with a 256MB PMG container plus some other self hosted services, small ones.

I want to test how far one can push the load and still have acceptable levels of performance.

The only thing left to do is to try netboot.xyz from GRUB through VNC on Racknerd, repartition the 25GB and try 1 single btrfs partition or a ZFS partition with max compression on to squeeze out more disk space from the 25GB cheap nodes.

You probably don't want to run such low spec as a production setup, but if slightly less performance is acceptable, you can squeeze out a lot of juice from cheap VPS, given the CPUs are good enough and you're on at least DDR3, DDR4 prefrebly.

So far performance is acceptable though I will continue to simulate normal load amount to see how low spec performs.

internetwarrior · January 2023

Terrible idea.

Tons of complexity on top of a shared infrastructure
The offers you mention have like 1 TB of traffic or something ancient. Good luck with that or adding shared storage. Are you serious.
The offers you mention don't have dedicated CPUs or are prone to serious performance degradation. Great experience for your end-customers

Your idea works with a low-end dedi's in the same rack. Once you add an external storage and it struggles just for a minute, the entire cluster will give a stroke

stoned · January 2023

@internetwarrior said:
Terrible idea.

Tons of complexity on top of a shared infrastructure

The offers you mention have like 1 TB of traffic or something ancient. Good luck with that or adding shared storage. Are you serious.

The offers you mention don't have dedicated CPUs or are prone to serious performance degradation. Great experience for your end-customers

Your idea works with a low-end dedi's in the same rack. Once you add an external storage and it struggles just for a minute, the entire cluster will give a stroke

Thank you for your opinion.

Even this is fun and possible, PVE in a 512MB node, and a wordpress site up in a 256MB container.

and

The goal isn't to be as cheap as possible, but to push things as far as possible on the lowest end and still have acceptable levels of performance. It helps determine the lowest possible acceptable setup. It also helps to learn to tweak things, optimize things, and tune the system for performance, which ends up being valuable when you're in an actual production environment and need to tune performance etc. This is just one aspect that's useful. There could be more. It also helps you learn and determine how to meet your needs without going overboard or spending more than required. It's good practice as well.

You also learn how to squeeze the most out of what you have instead of throwing more hardware at any software. Some people on the lowest end in some countries may find this useful for lowest end settings.

This may not be of interest to most people, but perhaps some might find this useful/interesting.

I personally only have two VPS nodes. The rest are dedicated machines. 2x KVM VPS, and 5x Dedicated. But it's still possible on 512MB VPS as well.

crunchbits · January 2023

@stoned said:
The goal isn't to be as cheap as possible, but to push things as far as possible on the lowest end and still have acceptable levels of performance. It helps determine the lowest possible acceptable setup. It also helps to learn to tweak things, optimize things, and tune the system for performance, which ends up being valuable when you're in an actual production environment and need to tune performance etc. This is just one aspect that's useful. There could be more. It also helps you learn and determine how to meet your needs without going overboard or spending more than required. It's good practice as well.

You also learn how to squeeze the most out of what you have instead of throwing more hardware at any software. Some people on the lowest end in some countries may find this useful for lowest end settings.

This may not be of interest to most people, but perhaps some might find this useful/interesting.

I personally only have two VPS nodes. The rest are dedicated machines. 2x KVM VPS, and 5x Dedicated. But it's still possible on 512MB VPS as well.

I think it is a pretty neat niche project. Even as just a way to test and learn. If you were looking to go from some sort of shared hosting to a 'higher availability' setup with more control/learning experience it is kind of fun. Especially on a real LET style budget.

Of course anyone relying on serious commercial activities would have more money to spend (otherwise, it's not a serious commercial activity) but I liked all the info and seeing how far you could push it. Something to look for if anyone is replicating this: I'm fairly certain your host has to have nested virtualization turned on for this to work.

Neoon · January 2023

@drunkendog said:
Couple comments:

1) I don't get why you wouldn't use LXD instead; it will save you a lot of overhead compared to proxmox, especially given that you're running this on 1GB VPSes.

LXD is cancer with extra steps.

I had the same idea, LXD is neat, easy to use, has a decent API, still has.
First issue, is snap, its cancer, the dev is only shipping lxd in snap's.

The entire Ubuntu with snap, shoot it into the face and let die.

Some people at Debian, a trying to get a .deb package ready with LXD, since years, it may be ready soon, it apparently has a shit ton of dependencies etc.

LXC is surely great, but LXD in its current state is not.
Not to mention the issues that, you get by forced updates via snap, even in LTS, that contain breaking changes.

Plus the shit that breaks if you upgrade to another LXD version.
Proxmox never breaks, ever.

Ah yea, and the stability issues, it just randomly shits itself, host needs hard reboot, but after reboot, everything works again.

If you wanna run something in a cluster already, with high uptime, lxd is not the choice to go, right now.

64383042a · January 2023

Hi @stoned,

Thank you for your reply. May I ask if the setup details you shared are currently in used in your 1GB ram VPS?

Either way, there are some spots I hope you could shred more lights on me:

In the Zram script:

    echo 1 > /sys/block/zram0/max_comp_streams #set max compression streams number to 3

The value for max_comp_streams you using is 1 or 3?

You can even create a 512GB ZRAM device.

Did you mean 512"MB" Zram?

And you also want a decent size SWAP space, 4GB should be enough.

```
swapon -s
Filename Type Size Used Priority
/swapfile file 2097148 669560 -2
/dev/vda2 partition 1048572 0 -3
/dev/zram0 partition 262140 240536 100

Why there are two HDD-backed swap space here? (/swapfile and /dev/vda2) on top of the zram?

Another thing I am wondering, how did you setup the swap inside the container itself?

Thank you again for your help and great post.

Cheers,

64383042a · January 2023

Hi @stone,

The goal isn't to be as cheap as possible, but to push things as far as possible on the lowest end and still have acceptable levels of performance. It helps determine the lowest possible acceptable setup. It also helps to learn to tweak things, optimize things, and tune the system for performance, which ends up being valuable when you're in an actual production environment and need to tune performance etc. This is just one aspect that's useful. There could be more. It also helps you learn and determine how to meet your needs without going overboard or spending more than required. It's good practice as well.

You also learn how to squeeze the most out of what you have instead of throwing more hardware at any software. Some people on the lowest end in some countries may find this useful for lowest end settings.

Very well said.

This may not be of interest to most people, but perhaps some might find this useful/interesting.

Sure, you bet.

Coincidently, in the past few days I just started messing with the idea of installing AlpineLinux+LXC on my HH 2GB RAM Storage VPS as a light-weight abstraction layer "virtual management shell".

My first choice would have been Promox instead of Alpine+LXC but I thought it simply would not work.

That's why I was very excited to see your post.

I personally only have two VPS nodes. The rest are dedicated machines. 2x KVM VPS, and 5x Dedicated. But it's still possible on 512MB VPS as well.

Could you also share details on your Wireguard setup for the Proxmox cluster nodes connections over WAN? I do have a tiny usff Proxmox box at home (also Oracle Free Tier as you mentioned in another post) and would love a centralized setup like yours

Thank you so much again for sharing the cool idea and setup details.

Cheers,

raza19 · January 2023

I've been trying to do this but with lxd only over ipv6. I've been spending a lot of time with the discussion forum at linuxcontainers. I must say the maintainers there are really helpful. Nevertheless, it's a learning curve. Both lxd and ipv6 r new to me. I'm also realizing how common hosts have misconfigured ipv6. I'm also now going to take a look into proxmox's VE. I'm sure wt u suggest can be done.

stoned · January 2023

@64383042a said: In the Zram script:

echo 1 > /sys/block/zram0/max_comp_streams #set max compression streams number to 3

The value for max_comp_streams you using is 1 or 3?

The value = number of threads/cpu you have.

@64383042a said: Did you mean 512"MB" Zram?

Yes. My error. Thanks for spotting it.

@64383042a said: Why there are two HDD-backed swap space here? (/swapfile and /dev/vda2) on top of the zram?

On Racknerd I used their Debian installer cloud image which creates a 1GB Swap partition which I thought wasn't enough to I created an additional swap file for use.

@64383042a said: Another thing I am wondering, how did you setup the swap inside the container itself?

Since it's an LXC container, swap is shared with host.

@64383042a said: Thank you again for your help and great post.

You're welcome. Glad to see someone finds it useful!

stoned · January 2023

@crunchbits said: I'm fairly certain your host has to have nested virtualization turned on for this to work.

Hi. Thank you.

You are correct. Nested Virt needs to be enabled if you plan on using VMs. PVE still functions as an LXC container manager with the added benefits that come with PVE containers without vt-x/vt-d

Cheers.

internetwarrior · January 2023

@stoned

Thank you for your opinion.

I'm not just bullying and giving my opinion, it's factual.

Proxmox is not meant for ultra lowend setups. Most will be wasted on overhead. You want to squeeze everything out of things? Well you are wasting many on overhead. Providers certainly won't like you as a customer as 1) Either you swap like mad 2) The usage is not the average customer, you are a noisy neighbor.

This may not be of interest to most people, but perhaps some might find this useful/interesting.

Recipe for disaster. If you don't know what you're doing you will have a very bad day once you actually starting using such setup and just throw away time. Either the provider kicks you out or many factors will collapse the cluster

stoned · January 2023

@internetwarrior said:
@stoned

Thank you for your opinion.

I'm not just bullying and giving my opinion, it's factual.

Proxmox is not meant for ultra lowend setups. Most will be wasted on overhead. You want to squeeze everything out of things? Well you are wasting many on overhead. Providers certainly won't like you as a customer as 1) Either you swap like mad 2) The usage is not the average customer, you are a noisy neighbor.

This may not be of interest to most people, but perhaps some might find this useful/interesting.

Recipe for disaster. If you don't know what you're doing you will have a very bad day once you actually starting using such setup and just throw away time. Either the provider kicks you out or many factors will collapse the cluster

Thank you for your opinion.

stoned · January 2023

@64383042a said: Coincidently, in the past few days I just started messing with the idea of installing AlpineLinux+LXC on my HH 2GB RAM Storage VPS as a light-weight abstraction layer "virtual management shell".

I use Proxmox Backup Server on storage VPS.

@64383042a said: Could you also share details on your Wireguard setup for the Proxmox cluster nodes connections over WAN? I do have a tiny usff Proxmox box at home (also Oracle Free Tier as you mentioned in another post) and would love a centralized setup like yours

Sure, np.

Say you have 4 servers. Before building a cluster, you would install one of the servers as a wireguard host. You can learn setup instructions from many tutorials online. Then setup the rest of your servers to be peers for this wireguard host.

Configure wireguard for a private ipv4/ipv6 address, 10.x or 192.x or 172.x or fd80::/64 etc. and give each machine a private IP.

Connect each peer to the wireguard host. Now you have a private internal network.

Now you can tell proxmox to use the private fd80 or 10.x etc. IP range, and when you create a cluster and add nodes to it, you should select the private IP from the drop down list in Proxmox.

Sample config ( I use IPv6 mostly only and I try not to use legacy IP anymore) unless I absolutely have to use IPv4, I don't.

wireguard server config (endpoints here are peer IPs, not a:b:c:d:, that's just filler)

[Interface]
Address = fd89:5870:e5d2::1/64
SaveConfig = true
PostUp = ip6tables -A FORWARD -i enp0s3 -j ACCEPT; ip6tables -A FORWARD -o enp0s3 -j ACCEPT; ip6tables -t nat -A POSTROUTING -o enp0s3 -j MASQUERADE
PreDown = ip6tables -D FORWARD -i enp0s3 -j ACCEPT; ip6tables -D FORWARD -o enp0s3 -j ACCEPT; ip6tables -t nat -D POSTROUTING -o enp0s3 -j MASQUERADE
ListenPort = 51820
FwMark = 0xca6c
PrivateKey = EC+EpLuZtDX

[Peer]
PublicKey = 3iciMsREsXYM
AllowedIPs = fd89:5870:e5d2::2/128
Endpoint = [a:b:c:d]:50076

[Peer]
PublicKey = d+naeJUf5pC8
AllowedIPs = fd89:5870:e5d2::3/128
Endpoint = [a:b:c:d:]35128

[Peer]
PublicKey = 13qHqd2ye/d
AllowedIPs = fd89:5870:e5d2::4/128
Endpoint = [a:b:c:d]:43060

[Peer]
PublicKey = vEnZbSC5kd
AllowedIPs = fd89:5870:e5d2::a/128
Endpoint = [a:b:c:d]:46050

[Peer]
PublicKey = 5NGrcbF9lF
AllowedIPs = fd89:5870:e5d2::b/128
Endpoint = [a:b:c:d]:34968

[Peer]
PublicKey = g4Aufdsw6
AllowedIPs = fd89:5870:e5d2::c/128
Endpoint = [a:b:c:d:]44978

wireguard peer/client config

[Interface]
PrivateKey = KAbkgDsE
Address = fd89:5870:e5d2::2/64

PostUp = ip -6 rule add table 200 from PEER_IP
PostUp = ip -6 route add table 200 default via PEER_GATEWAY
PreDown = ip -6 rule delete table 200 from  PEER_IP
PreDown = ip -6 route delete table 200 default via PEER_GATEWAY

DNS = 2606:4700:4700::1111

[Peer]
PublicKey = CmIfQKYRX
AllowedIPs =  ::/0
Endpoint = [WIREGUARD_HOST_IP]:51820

(Note: Ports are autoselected by WG and updated in the config. Google WG setup tutorials and read the documentation)

Now your cluster is operating solely through secure wireguard. Wireguard has minimal overhead, but you can test out both configurations. You can setup a cluster using the server's public IPs and test the latency/performance against the private wireguard network you create.

Thank you so much again for sharing the cool idea and setup details.

Use PVE Firewall and lock everything down on external IPs. Np. Enjoy.

stoned · January 2023

Hi. Thanks for your input.

@raza19 said: I've been trying to do this but with lxd only over ipv6. I've been spending a lot of time with the discussion forum at linuxcontainers. I must say the maintainers there are really helpful. Nevertheless, it's a learning curve. Both lxd and ipv6 r new to me. I'm also realizing how common hosts have misconfigured ipv6. I'm also now going to take a look into proxmox's VE. I'm sure wt u suggest can be done.

I prefer IPv6 to IPv4. I dislike having to do NAT for one. IPv6 is more intuitive and makes sense whereas IPv4 needs more setup sometimes.

You should always get a /64 from anyone. If someone cannot provide a /64 then they may not have their IPv6 setup correctly or maybe lacking knowledge on how to segment their /48 or /56

stoned · January 2023

I just got around to checking the 1G nodes after a day after some more tweaking.

I switched the compression on zswap and zram both to zstd and started swapping a bit less as I found out I didn't need to swap all the time, 1G is plenty actually.

Modified sysctl.conf to this a day before yesterday.

vm.swappiness = 80
vm.vfs_cache_pressure = 400
vm.dirty_background_ratio = 10
vm.dirty_ratio = 50

#  swapon --show; zramctl 

NAME       TYPE       SIZE   USED PRIO
/swapfile  file         2G   1.3G   -2
/dev/vda2  partition 1024M     0B   -3
/dev/zram0 partition  512M 348.4M  100

And here are the numbers. CPU doesn't generally go over 30% tops on idle unless I'm doing something. There are very very minor spike here and there when a mail comes in or if I'm on the GUI doing things. So unless actually using the CPU, it's not being too noisy and still fair share usage, which some were worried about.

and here's the other 1GB Racknerd node. I was updating the system earlier so CPU usage was high, but it idles around 20% with minor spikes here and there.

Even though it can be done, 1G is a bit low. 2G+ nodes are what you ideally want.

So if you get 2GB nodes, that's $10/mo, and you get another vCPU plus more disk space, for a better cluster even on KVM VPSes without nested virt. PVE is still a good manager, centralized, other features, LXC container manager, backups, replication, etc. etc.

So the 2GB+ might be a better setup option, but depends on your needs. If you're not running a lot and still want all the PVE features, then 1G is fine too with this low end ZRAM/ZSWAP setup.

If you're going to be running heavier containers needing more RAM, buy VPS with more RAM. That cannot be overstated.

For my needs, for $5/mo, I can get 4 redundant MX Proxmox Mail Gateways to route email to my server, so if one goes down, another can deliver it. Though for redundancy on MX, there should be one outside a cluster, so should the entire cluster fail and all nodes go down, at least mail can still be delivered.

In short, business/personal needs dictate what resources should be acquired. This was fun. Thanks for humoring me, folks.

Neoon · January 2023

@stoned said:
So if you get 2GB nodes, that's $10/mo

You can get it for less.

Thoughts? Opinions? Criticisms?

Wat, 3GB Swap file and you are swapping 1.6GB on Disk, on a cheap VPS.
You aware this causes high I/O usage, which is a reason to suspend you?

stoned · January 2023

@Neoon said: You aware this causes high I/O usage, which is a reason to suspend you?

iotop doesn't seem so busy... though I will collect iotop logs over a period of time.

https://termbin.com/hby59

Doesn't seem to be writing enough to be considered high i/o. It's busy, sure, but it's not writing a lot. Few hundred KB at most upon each iteration of iotop.

What you are seeing is the initial swapping in the beginning, after which it stabalizes itself unless you need to reboot, which I don't need to do so frequently. I'm not sure how to collect vmstat data. It doesn't have an option for non-interactive seemingly.

This is basically equivalent to a web or self hosted service where the disk is being written to frequently but not under any conditions that might cause thrashing or unacceptably high i/o.

Furthermore, most of the active paging I see is in compressed memory and not to disk post initial swapping upon boot up and container boot up.

512MB of that SWAP is actually ZRAM. Then we have 40% of 1G which is roughly 400MB being compressed by ZSWAP. Most of the active paging is happening between memory to compressed memory before it hits the disk. Disk i/o is not much, but swapping is frequent due to ZRAM/ZSWAP compression.

I'll continue monitoring vmstat and swap i/o over time and see how that looks.

Unless I'm incorrect/wrong, which is likely.

Neoon · January 2023

@Neoon said:

@drunkendog said:
Couple comments:

1) I don't get why you wouldn't use LXD instead; it will save you a lot of overhead compared to proxmox, especially given that you're running this on 1GB VPSes.

LXD is cancer with extra steps.

I had the same idea, LXD is neat, easy to use, has a decent API, still has.
First issue, is snap, its cancer, the dev is only shipping lxd in snap's.

The entire Ubuntu with snap, shoot it into the face and let die.

Some people at Debian, a trying to get a .deb package ready with LXD, since years, it may be ready soon, it apparently has a shit ton of dependencies etc.

LXC is surely great, but LXD in its current state is not.
Not to mention the issues that, you get by forced updates via snap, even in LTS, that contain breaking changes.

Plus the shit that breaks if you upgrade to another LXD version.
Proxmox never breaks, ever.

Ah yea, and the stability issues, it just randomly shits itself, host needs hard reboot, but after reboot, everything works again.

If you wanna run something in a cluster already, with high uptime, lxd is not the choice to go, right now.

New Debian Release going to ship LXD 5.0.
https://wiki.debian.org/LXD

stoned · January 2023

Thanks for the update. That will be awesome. Hopefully Debian's implementation would be more stable than Ubuntu's.

So, I've been monitoring /swapfile i/o and other stats, and here are the numbers for the 2x 1G Racknerd nodes:

And node 2:

Sure, we see 1.5x G in SWAP, but CPU is idle at 10% usually, no more, barely any spikes even when delivering mail which is my primary use case for the 1G nodes.

Disk i/o is minimal at most. I do not see any kworker threads consuming disk i/o for /swapfile or the /dev/vda2 swap partition, vmstat -d and other tools like perf.

And here are the numbers averaged for a day

Both 1G nodes idle at 10% CPU and swap in and out of zswap to zram. There is no issue with high i/o, high swap i/o, high CPU usage or any other noise which might attract the attention of the providers.

In previous screenshots I was using the system a lot, installing, updating, etc. so the load seemed high. But when it's just sitting there doing its job, it's not bad at all.

I should mention the updated tweaking to swapping: On a 1G node, you want 256MB ZRAM, 20% pool space in ZSWAP, zstd compressor for both, and sysctl.conf

vm.swappiness = 60
vm.vfs_cache_pressure = 250
vm.dirty_background_ratio = 60
vm.dirty_ratio = 50

My initial config of 100% swappiness was causing a bit too much swapping to occur, down to 60-80% seems to work well, reduce the cache pressure to 50% of our original value, don't start telling dirty processes to purge until we have 60% reached capacity and at 50% dirty ratio.

Play around with the numbers here and see what works best for you.

Let's see how Debian 12 do. Cheers folks, thanks for reading!

trycatchthis · January 2023

@stoned said:

Whats software in the screenshot?

banana_mcn · January 2023

May i know how to handle the inbound public IP address when the VPS migration to another node?

michae1 · January 2023

@stoned I like your idea! Sounds like a fun thing to try.

Howdy, Stranger!

Categories

In this Discussion

Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)

Comments