Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)

HalfEatenPie · January 2023

@trycatchthis said: Whats software in the screenshot?

It's proxmox.

noviap09 · January 2023

i think this is how to combine VPS?
But I tried and couldn't. lol

64383042a · January 2023

To @stoned and whom it may be interested in this thread:

It looks like ZRAM would not discard by default:

https://www.reddit.com/r/linux/comments/u0hb2a/tips_enable_discard_on_zram/

So it seems to be a good idea to add -d to swapon, otherwise it might get filled up and not reclaiming after some time (please correct me if I am wrong):

# swapon -d /dev/zram0

Cheers,

stoned · January 2023

@trycatchthis said:

@stoned said:

Whats software in the screenshot?

Proxmox VE. https://www.proxmox.com/en/proxmox-ve

stoned · January 2023

@michae1 said:
@stoned I like your idea! Sounds like a fun thing to try.

Glad to hear. Have fun!

stoned · January 2023

@noviap09 said:
i think this is how to combine VPS?
But I tried and couldn't. lol

What have you tried? What results did you get?

michae1 · January 2023

@stoned I still have a lot to learn. Hope you will stick around, I'll pm you or post here if things got stuck.
Cheers!

stoned · January 2023

@64383042a said:
To @stoned and whom it may be interested in this thread:

It looks like ZRAM would not discard by default:

https://www.reddit.com/r/linux/comments/u0hb2a/tips_enable_discard_on_zram/

So it seems to be a good idea to add -d to swapon, otherwise it might get filled up and not reclaiming after some time (please correct me if I am wrong):

# swapon -d /dev/zram0

Cheers,

from man swapon

       -d, --discard[=policy]
           Enable swap discards, if the swap backing device supports the discard or trim
           operation. This may improve performance on some Solid State Devices, but often it does
           not. The option allows one to select between two available swap discard policies:

           --discard=once
               to perform a single-time discard operation for the whole swap area at swapon; or

           --discard=pages
               to asynchronously discard freed swap pages before they are available for reuse.

           If no policy is selected, the default behavior is to enable both discard types. The
           /etc/fstab mount options discard, discard=once, or discard=pages may also be used to
           enable discard flags.

and from swapon.c

/*
 * swapon tell device that all the old swap contents can be discarded,
 * to allow the swap device to optimize its wear-levelling.
 */
static int discard_swap(struct swap_info_struct *si)
{
    struct swap_extent *se;
    sector_t start_block;
    sector_t nr_blocks;
    int err = 0;

    /* Do not discard the swap header page! */
    se = first_se(si);
    start_block = (se->start_block + 1) << (PAGE_SHIFT - 9);
    nr_blocks = ((sector_t)se->nr_pages - 1) << (PAGE_SHIFT - 9);
    if (nr_blocks) {
        err = blkdev_issue_discard(si->bdev, start_block,
                nr_blocks, GFP_KERNEL);
        if (err)
            return err;
        cond_resched();
    }

    for (se = next_se(se); se; se = next_se(se)) {
        start_block = se->start_block << (PAGE_SHIFT - 9);
        nr_blocks = (sector_t)se->nr_pages << (PAGE_SHIFT - 9);

        err = blkdev_issue_discard(si->bdev, start_block,
                nr_blocks, GFP_KERNEL);
        if (err)
            break;

        cond_resched();
    }
    return err;     /* That will often be -EOPNOTSUPP */
}

and

/*
 * swap allocation tell device that a cluster of swap can now be discarded,
 * to allow the swap device to optimize its wear-levelling.
 */
static void discard_swap_cluster(struct swap_info_struct *si,
                 pgoff_t start_page, pgoff_t nr_pages)
{
    struct swap_extent *se = offset_to_swap_extent(si, start_page);

    while (nr_pages) {
        pgoff_t offset = start_page - se->start_page;
        sector_t start_block = se->start_block + offset;
        sector_t nr_blocks = se->nr_pages - offset;

        if (nr_blocks > nr_pages)
            nr_blocks = nr_pages;
        start_page += nr_blocks;
        nr_pages -= nr_blocks;

        start_block <<= PAGE_SHIFT - 9;
        nr_blocks <<= PAGE_SHIFT - 9;
        if (blkdev_issue_discard(si->bdev, start_block,
                    nr_blocks, GFP_NOIO))
            break;

        se = next_se(se);
    }
}

If no policy is selected, the default behavior is to enable both discard types.

Discard is the linux term for telling a storage device that sectors are no longer storing valid data and applies equally to both ATA and SCSI devices. ie.

TRIM is the actual ATA-8 command that is sent to a SSD to cause a sector range or set of sector ranges to be discarded. As such it should only apply to ATA devices, but is often used generically. Given the prevalence of ATA devices, trim is often the most used of these terms.

Seems like ZRAM is DISCARD capable.

#  lsblk --discard /dev/zram0 
NAME  DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
zram0        0        4K       2T         0

Explanation: When files are removed, Zram doesn't remove the compressed pages on memory because it's not notified that the space is not used for data anymore. The discard option performs discard when a file is removed. If you use the discard mount option Zram will be notified about the unused pages and will resize accordingly.

Not sure if this requires manually clearing out the ZRAM device or what...

Thanks for the tip. I'll test it out.

stoned · January 2023

@michae1 said:
@stoned I still have a lot to learn. Hope you will stick around, I'll pm you or post here if things got stuck.
Cheers!

I am a full time student, on crunch time, and cannot provide personal tech support. I can try to do my best here in the forums. Thank you.

stoned · January 2023

@banana_mcn said:
May i know how to handle the inbound public IP address when the VPS migration to another node?

I'm not sure what you're asking. VPS migration? Are you asking about moving your PVE installation to a different VPS with a different public IP?

If so, then first backup the containers on that node somewhere, then pvecm delnode nodename_here and remove the node from the cluster. Then install PVE on a different VPS, join cluster again with new public IP and restore your containers/VMs

banana_mcn · January 2023

@stoned said:

@banana_mcn said:
May i know how to handle the inbound public IP address when the VPS migration to another node?

I'm not sure what you're asking, but the correct method is to first use pvecm delnode nodename to delete the node, then do your other operations, setup somewhere else, new ip, then add that new PVE installation to the cluster.

Just want to learn more about HA on PVE.

Public IP of Node A 120.0.0.2
Public IP of Node B 120.0.0.3
Public IP of Node C 120.0.0.4

If I put my website(lxc101) on Node A, The inbound IP from outside should be 120.0.0.2.
Then, If I migrate the website(lxc101) to Node B, The IP will change to 120.0.0.3?
How to handle the public inbound IP change while migrating the VPS/LXC container to a different node? Auto DNS update script or use a layer 4 load balancer?

Maounique · January 2023

I think this is a great tinkering project to kill time and learn things in the LE style.
Is it useful otherwise? Not really.
Still, it should be great fun in our community.

stoned · January 2023

@banana_mcn said:

@stoned said:

@banana_mcn said:
May i know how to handle the inbound public IP address when the VPS migration to another node?

I'm not sure what you're asking, but the correct method is to first use pvecm delnode nodename to delete the node, then do your other operations, setup somewhere else, new ip, then add that new PVE installation to the cluster.

Just want to learn more about HA on PVE.

Public IP of Node A 120.0.0.2
Public IP of Node B 120.0.0.3
Public IP of Node C 120.0.0.4

If I put my website(lxc101) on Node A, The inbound IP from outside should be 120.0.0.2.
Then, If I migrate the website(lxc101) to Node B, The IP will change to 120.0.0.3?
How to handle the public inbound IP change while migrating the VPS/LXC container to a different node? Auto DNS update script or use a layer 4 load balancer?

Hi, I've been busy. But here I am now.

Yes, anytime you move containers between nodes, since containers are behind NAT (unless you use routed ipv6), you need to update their ip addresses.

Public IP of Node A 120.0.0.2
Public IP of Node B 120.0.0.3
Public IP of Node C 120.0.0.4

So you have 3 servers. If you talk to .2 to access node A, then if you move node A to .3, how will you access Node A? Through .3's NAT.

Check my other post here about PVE setup: https://lowendtalk.com/discussion/183188/how-to-setup-ipv6-on-proxmox-on-naranja-tech-server-they-only-give-a-b-c-d-1-64-in-their-panel#latest

stoned · January 2023

I received a PM I'll address publicly so it may help others.

Hello @stoned,
What are the related applications for this kind of a setup? Could you list a few of scenarios?
Thank you!

Redundancy is most important, so I have 4 providers in a 7 node cluster. I may remove PVE from the 1G nodes, as I need to do things and 1G for a PVE is too little for anything beyond basic stuff. If you upgrade to 2GB or more RAM servers, that would be much better.

Application could be anything you want really, but my goal was cheap redundancy instead of spending that or more on a single server.

I only wanted to see if this could be done, and yes it can, but it's not going to be a viable production solution at 1GB of RAM. I cannot stress this enough. If you want a PVE cluster, your best bet is at the very minimum, 2GB RAM with the swap settings here.

4GB RAM and you can keep the same swap setting, but lower swappiness to say 10-20.

I have about 12TB on this cluster. 3x Proxmox Backup Server doing daily syncing. 4x Proxmox mail gateways for MX redundancy.

Soon I will cluster a couple of SMTP servers together for outbound mail, so in case one fails, another can deliver. For that, I'm probably going to try postfix and possibly manual syncing of mailboxes.

Basically I'm going to clone my SMTP server to at least 4 nodes, and if anything happens to one, I can simply point my PMG relay to the new SMTP server and mails will still go out. Same server, same config, just diff IP.

I'm also going to have a backup with a mailing service, hosted, in case something happens to my IPs, I can quickly change the transport to Amazon SES or mailgun or sendinblue or whatever.

topper · January 2023

@stoned said: But I do have a private 7th server acting as a wireguard vpn, so each node can talk to other nodes on the private network. But that goes through the 7th server and makes things very slow. Direct connections to the cluster nodes even over 100mbps seem just fine (backups/replications/migrations take longer on slower network obviously).

maybe you should use netmaker to build a mesh network, so each server could connect other directly without send traffic to centre node first.

LightBlade · January 2023

Nice project. Do you know how to forward ip from proxmox A to B using wireguard?

We have 4 usable ip in proxmox A but resource too low for make another lxc

stoned · January 2023

@topper said:

@stoned said: But I do have a private 7th server acting as a wireguard vpn, so each node can talk to other nodes on the private network. But that goes through the 7th server and makes things very slow. Direct connections to the cluster nodes even over 100mbps seem just fine (backups/replications/migrations take longer on slower network obviously).

maybe you should use netmaker to build a mesh network, so each server could connect other directly without send traffic to centre node first.

Oh I didn't even know about this. Looks fantastic. I'll check it out on a few VMs first in virtualbox. Thank you!

stoned · January 2023

Update:

The 1G RAM Racknerd node in my PVE cluster, average loads over the past month:

We have reached 40% CPU once last week or so. (during system update)
We have barely any disk activity, no thrashing at all, no noise, no high disk i/o

Every now and again, the pve-cluster service will crash because of OOM, which leads me to think 1G is just not feasible for PVE, though it can be done. Even when the pve-cluster service crashes, it's just a matter of restarting it. Everything continues to work normally.

Here's one container, Alpine, 64MB, nginx proxy manager:

Here's the second container, Proxmox Mail Gateway:

@LightBlade said:
Nice project. Do you know how to forward ip from proxmox A to B using wireguard?

We have 4 usable ip in proxmox A but resource too low for make another lxc

Thanks. Sure. Depends on how you have things setup, but iptables forwarding is pretty easy. You can forward anything short of ip6 to ip4, which can be done using 6tunnel for example.

Please ask your question in some detail. Also, it maybe time to learn iptables!

stoned · January 2023

So I was wrong that pve-cluster service goes through OOM kill and dies, and then has to be restarted because of low memory of 1GB.

The same thing happened to a 32GB dedicated server. The service crashed, but the containers continued to work, however no administration panel. So I had to login to the server, restart, and back to normal.

Looks like PVE isn't without its quirks etc. Still, it seems a better solution than most other thing clustering wise.

I'm going to setup another cluster, not PVE this time, at least 3-4 machines, then see if I can get a HA cluster going, with container management using LXD. Try to get away from PVE.

Stay tuned.

stoned · March 2023

Hello. If anyone is attempting still do this with 1GB nodes, here's a bit more info. Two things:

1)
Due to the following, I wanted to upgrade to 2+GB RAM nodes. I talked with Racknerd where I have the 2x 1GB nodes, and I realized they can't simply bump up my plan with rapid elasticity and I'd have to cancel the server, get another one and that would cost me time in changing everything over to the new IPs etc.

I was sometimes using 80-100% CPU for minutes at a time. Disk activity was normal, but CPU was going mad and sometimes so much that corosync wouldn't quorate properly. Keep in mind, we're clustering over WAN links here, not local LAN links and corosync is sensitive to jitters even though it may not use a lot of bandwidth.

I took off ZRAM and ZSWAP as I found the compressed RAM was causing this from various monitors and tests. Without ZRAM/ZSWAP and with just the following

vm.swappiness = 100
vm.vfs_cache_pressure = 500
vm.dirty_background_ratio = 10
vm.dirty_ratio = 10

It has been more stable with far less CPU and disk activity on the 1GB RAM nodes. I didn't want to go through changing configs, so I kept the same IPs on the same 1GB RAM nodes instead of nuking the servers and getting larger ones.

With the above config, even the 1GB RAM nodes are very quiet, running Exim4 container, Dovecot, etc. and Proxmox Mail Gateway, Nginx Proxy Manager, all three containers on a 1GB RAM KVM VPS and performance is pretty good, disk activity is very low, and CPU is very low.

I'm going to try to throw a few more static sites and do some intense traffic test on nginx in a container on the 1GB nodes and see how far they can be pushed and still performant.

2)
corosync is quite strange. I've noticed that over time, corosync service starts to consume as much RAM as the system would allow. This is the reason corosync goes throuh OOM KILL and if two nodes are down, cluster doesn't quorate properly. I have to do the following in cron:

*/5 * * * * for i in {0..9}; do sync; echo 1 > /proc/sys/vm/drop_caches; sync; echo 2 > /proc/sys/vm/drop_caches; sync; echo 3 > /proc/sys/vm/drop_caches; sync; done
*/30 */1 * * * service corosync restart

We drop caches, FS caches, inode caches, etc. etc. every 5 minutes, 10 times each (still testing this, but it's been stable and keeping needless stuff out of RAM), and restart the corosync service every 1.5 hours just in case it starts to consume large amounts of RAM.

I actually found this is especially useful on the larger servers, like the dedicated I have with 32GB RAM.

Okay, cheers everyone. Please excuse any grammatical errors.

itoffshore · November 2023

Sorry for bumping an old thread - I came across it as I'm about to try something similar with 4 x 2gb incus nodes + microceph / microovn. 4 x nodes is better than 3 for ceph - it can continue operating with one node down.

incus (the new LXD fork) is probably a better lowend choice - a proxmox cluster needs 2 x NIC's on each node & 2 separate networks for reliable operation.

As wireguard has such low overhead - running it on each node to create a mesh so they can communicate directly will be a better choice. I forked wg-meshconf to add quantum resistant security.

For automating the wireguard setup - ansible-semaphore (an ansible web gui) - looks good & integrates with git (gitea under Alpine works well in unprivileged LXD). Semaphore runs fine under rootless podman inside unprivileged Ubuntu LXD (rootless docker doesn't start properly after a reboot for some reason inside LXD)

I also wrote distrobuilder-menu for creating custom LXD / LXC containers with distrobuilder which you may find useful.

mrpastewart · May 2024

@stoned said Now you can tell proxmox to use the private fd80 or 10.x etc. IP range, and when you create a cluster and add nodes to it, you should select the private IP from the drop down list in Proxmox.

May I ask how you told promox to use the wireguard connection so u could select from the drop down. I am trying to figure that out with no success.

lukast__ · May 2024

@mrpastewart: This thread is over a year old, and @stoned was .

mrpastewart · May 2024

@lukast my bad I guess I got kinda excited about what i was reading here and wanted to ask a question but didn't pay attention to any of that.

Howdy, Stranger!

Categories

In this Discussion

Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)

Comments