All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Cheap $5/mo 4-NODE redundant PVE Cluster Setup for Small Biz/Personal Use (low end needs)
Given Racknerd's awesome deals.. here's some brainstorming:
$1.25 * 4 = $5, right? So...
Say for example Racknerd, four data centers, buy a $1.25 VPS at each data center, cost you $14.88/year * 4 = 59.52. Call it 60/12 = $5/mo
For $60/year, $5/mo, you can setup Debian on each server, then install proxmox VE, and use the PVE as a LXC container manager.
For $60/year, you'll have a 4 * 25GB = 100GB cluster, 4 vCPU, 4GB RAM.
Using IPv4 you can setup GlusterFS and share 10GB directory brick from each server, and get 20GB redundant networked clustered FS storage.
Now you are spread across 4 data centers, around the country, in a 4 node PVE cluster, and can move your containers, schedule automatic backups, redundancy, migrations, and setup load balance and get 4x redundancy on your priority sites/services. This is a better more featureful setup than just docker containers on a Debian server for example.
That's $5/mo for a cheapo 4 node PVE cluster with 4x redundancy on your services. If you want to go bigger, you can do the same on other better providers with better features and machines if you can afford it.
Couple this with HostHatch 2TB for $3/mo storage VPS ($110/3yr) if you can find it, and that's now $8/mo.
Now you have pretty darn fast (HH is very good so far) storage you can share 2TB of proxmox backup server to your 4 nodes. You can use the HH storage VPS for other storage needs as well for Nextcloud or something.
That's $8/mo for 4 node PVE cluster with 2TB backup storage.
With this setup, you can run a pretty decent small business with redundancy. Cost? $8/mo. More cost? Learning curve, time spent on setting things up, maintaining them, administering them. etc.
That's even better than a single point of failure dedicated server. Just.. its not as beefy/powerful as bare metal, but you can still host dozens of small to medium websites, few other services, even next cloud with the remote 2TB storage you can configure it for.
If you can't get the HH deal, you can setup PVE at home, with a larger disk, and use that as a backing store as well.
Cheers.
Comments
Yes you can do it within the same DC sure.
I've got a cluster up and it's working nicely. Coupled with ZRAM and some additional swap it seems to be working quite nicely. Every node is a diff server in a different data center.
Here are the numbers on the Racknerd 1G KVM node which runs 4 pretty decent-load containers.
Takes a 1-2 minutes to backup/restore normal sized containers on such a network, but latency isn't an issue on the clustered FS.
I understand they recommend certain things, but for my needs, things seem to be working for a long time.
I mean I said 'brainstorming' but really speaking from actual experience. It's totally doable and I don't find it too slow or experience hiccups. I utilize clusters built from nodes around the world and latency still isn't a big deal.
But I do have a private 7th server acting as a wireguard vpn, so each node can talk to other nodes on the private network. But that goes through the 7th server and makes things very slow. Direct connections to the cluster nodes even over 100mbps seem just fine (backups/replications/migrations take longer on slower network obviously).
If you turn off the clamav and freshclam services in PMG, even that runs nicely within 1G RAM as well in a container and can handle a decent load of email traffic. As you can see I have two mail proxies relaying to my third mail server, for redundancy via MX. Redundancy is the name of the game, primary goal, and also to be cheap and low end!
I should add that you will need to engage in a lot of performance tuning and tweaking, but it's fun and can be done under the Sun.
But for production and business use, one should invest according to business needs. But if one is lacking funding, cheaper lower end alternative like this can still work for some people.
Ideally, each node should have 1G connection or higher and maybe 2G+ RAM. 1G is the bare minimum.
Hi @stoned,
Very interesting idea indeed.
Would you mind sharing some more details about how you tweaked Proxmox using ZRAM for use in a low ram (e.g. 1G) VPS?
Thanks very much in advance.
Cheers,
Thanks! Yes, sure.
Use ZSWAP. Edit
/etc/default/grub
and add boot parameters:GRUB_CMDLINE_LINUX_DEFAULT="zswap.enabled=1 zswap.compressor=lz4hc zswap.max_pool_percent=40 zswap.zpool=z3fold"
In order for this to work, you need to enable certain modules:
echo lz4 >> /etc/initramfs-tools/modules
echo lz4_compress >> /etc/initramfs-tools/modules
echo lz4hc >> /etc/initramfs-tools/modules
echo lz4hc > /sys/module/zswap/parameters/compressor
echo z3fold >> /etc/initramfs-tools/modules
update-initramfs -u
update-grub
reboot
Ensure it's loaded:
dmesg | grep zswap:
Combine with ZRAM:
You can even create a 512GB ZRAM device.
Now you want to setup sysctl.conf to tweak swapping:
vm.swappiness = 100
Increasing how aggressively the kernel will swap memory pages since we use ZRAM first. Since we're on a low memory machine, please swap at all times, 100%.vm.vfs_cache_pressure = 500
cache pressure denotes how frequently data is pushed to swap and memory is freed up. 500 is the highest value. With swappiness at 100, and pressure at 500, you will swap anything not currently in use into SWAP. Only needed things remain in RAM at this point. This increases the tendency of the kernel to reclaim memory used for caching of directory and inode objects. You will use less memory over a longer period. The downside of swapping sooner negates the performance hit.vm.dirty_background_ratio = 1
You want to ensure the background processes start writing immediately at dirty ratio at 1%, instead of 10% or 20% like normal.vm.dirty_ratio = 50
Do not force synchronous I/O until it gets to 50% dirty_ratio.You will actually in time end up using and requiring more amount of memory than you have. You know that you will swap. ZRAM makes swapping less expensive and stores the same data compressed 2x to 3x smaller, you should begin this swap exchange sooner.
The cache pressure helps because we are in effect telling the kernel that you don’t have any extra memory lying around to use for cache, so please get rid of it sooner and only store the most frequently used.
Even with reduced caching, if we use most of the installed memory over time, the kernel will start opportunistic swapping to ZRAM much sooner so that CPU (used for compression) and swap I/O aren’t delayed until all at once.
And you also want a decent size SWAP space, 4GB should be enough.
Some tips I found from another mind: https://unix.stackexchange.com/questions/707108/is-increasing-the-max-pool-percent-parameter-for-zswap-a-smart-idea-to-combine-c
You'll have to write your own custom memory management script based on your needs and what type of containers and/or applications you plan on running. Know how the software works, how much memory it consumes, and optimize from there.
Check out https://unix.stackexchange.com/questions/406925/prevent-zram-lru-inversion-with-zswap-and-max-pool-percent-100 as well.
Cheers. Let me know how your adventure goes.
Couple comments:
1) I don't get why you wouldn't use LXD instead; it will save you a lot of overhead compared to proxmox, especially given that you're running this on 1GB VPSes.
2) I really don't get why you're using one provider for your four VPSes; you still have one single point of failure, it's just your host now. Why not do 1 GreenCloud, 1 RackNerd, 1 HostHatch, 1 liteserver?
Otherwise, really interesting idea!
For features that come with proxmox that are not in LXD and the overhead isn't that much. Ideally, you should be running them on 2+GB machines at least, but it's also possible with 1GB. Someone else may have different reasons.
LXC is the technology allowing the segmentation of your system into independent containers, whereas LXD is a daemon running on top of it allowing you to manage and operate these instances in an easy and unified way.
In my earlier post I had mentioned that the reason is primarily for a cluster setup for redundancy (thanks for contributing about the different provider bit, I'd forgotten to mention that). These features like backups, clusters, automated replications and etc. etc. don't come with LXC/LXD alone on a debian box or just a box with docker container and portainer even.
Correct. Ideally, yes, spread them out over different providers. I thought I'd mentioned that. I guess I didn't.
I have only one 1GB node. Rest are dedicated servers. Though if it's possible with 1 1GB node, it should be possible with ...N 1GB nodes.
This is an example of the possibility, and by no means a perfect setup. I encourage others to contribute if they'd like and help improve the setup for redundancy and lowend costs with other desired features or suggestions.
Thanks for your ideas and input
P.S. I plan on adding more 1GB nodes and even try it out on a 512MB node for experimentation.
I've learned my lesson years ago not to run business use stuff on minimum requirements let alone less than. Eventually, an update will occur and the specs get bumped and now your server runs even worse than dogshit.
zstd works better than lz4hc
I just added another 1G node for testing. Let's see how all this fairs.
Here's a 1GB RAM node in a PVE cluster.
It's now got a third MX for redundancy on a third provider (4 providers in my cluster so far), 512MB RAM container, with memory optimizations:
Same container from the inside:
Mails are being delivered just fine:
Mails are delivered slowly though, but not that I notice it from a user's perspective so far.
VS.
AMD EPYC 64 coreserver that has 8G RAM and 2GB for the PMG container here, with 4vCPU performing 10x better.
I imagine performance would be the same, maybe slightly less due to bit more swapping if I were to try a 512MB PVE node with a 256MB PMG container plus some other self hosted services, small ones.
I want to test how far one can push the load and still have acceptable levels of performance.
The only thing left to do is to try netboot.xyz from GRUB through VNC on Racknerd, repartition the 25GB and try 1 single btrfs partition or a ZFS partition with max compression on to squeeze out more disk space from the 25GB cheap nodes.
You probably don't want to run such low spec as a production setup, but if slightly less performance is acceptable, you can squeeze out a lot of juice from cheap VPS, given the CPUs are good enough and you're on at least DDR3, DDR4 prefrebly.
So far performance is acceptable though I will continue to simulate normal load amount to see how low spec performs.
Terrible idea.
Your idea works with a low-end dedi's in the same rack. Once you add an external storage and it struggles just for a minute, the entire cluster will give a stroke
Thank you for your opinion.
Even this is fun and possible, PVE in a 512MB node, and a wordpress site up in a 256MB container.
and
The goal isn't to be as cheap as possible, but to push things as far as possible on the lowest end and still have acceptable levels of performance. It helps determine the lowest possible acceptable setup. It also helps to learn to tweak things, optimize things, and tune the system for performance, which ends up being valuable when you're in an actual production environment and need to tune performance etc. This is just one aspect that's useful. There could be more. It also helps you learn and determine how to meet your needs without going overboard or spending more than required. It's good practice as well.
You also learn how to squeeze the most out of what you have instead of throwing more hardware at any software. Some people on the lowest end in some countries may find this useful for lowest end settings.
This may not be of interest to most people, but perhaps some might find this useful/interesting.
I personally only have two VPS nodes. The rest are dedicated machines. 2x KVM VPS, and 5x Dedicated. But it's still possible on 512MB VPS as well.
I think it is a pretty neat niche project. Even as just a way to test and learn. If you were looking to go from some sort of shared hosting to a 'higher availability' setup with more control/learning experience it is kind of fun. Especially on a real LET style budget.
Of course anyone relying on serious commercial activities would have more money to spend (otherwise, it's not a serious commercial activity) but I liked all the info and seeing how far you could push it. Something to look for if anyone is replicating this: I'm fairly certain your host has to have nested virtualization turned on for this to work.
LXD is cancer with extra steps.
I had the same idea, LXD is neat, easy to use, has a decent API, still has.
First issue, is snap, its cancer, the dev is only shipping lxd in snap's.
The entire Ubuntu with snap, shoot it into the face and let die.
Some people at Debian, a trying to get a .deb package ready with LXD, since years, it may be ready soon, it apparently has a shit ton of dependencies etc.
LXC is surely great, but LXD in its current state is not.
Not to mention the issues that, you get by forced updates via snap, even in LTS, that contain breaking changes.
Plus the shit that breaks if you upgrade to another LXD version.
Proxmox never breaks, ever.
Ah yea, and the stability issues, it just randomly shits itself, host needs hard reboot, but after reboot, everything works again.
If you wanna run something in a cluster already, with high uptime, lxd is not the choice to go, right now.
Hi @stoned,
Thank you for your reply. May I ask if the setup details you shared are currently in used in your 1GB ram VPS?
Either way, there are some spots I hope you could shred more lights on me:
In the Zram script:
The value for max_comp_streams you using is 1 or 3?
Did you mean 512"MB" Zram?
Why there are two HDD-backed swap space here? (/swapfile and /dev/vda2) on top of the zram?
Another thing I am wondering, how did you setup the swap inside the container itself?
Thank you again for your help and great post.
Cheers,
Hi @stone,
Very well said.
Sure, you bet.
Coincidently, in the past few days I just started messing with the idea of installing AlpineLinux+LXC on my HH 2GB RAM Storage VPS as a light-weight abstraction layer "virtual management shell".
My first choice would have been Promox instead of Alpine+LXC but I thought it simply would not work.
That's why I was very excited to see your post.
Could you also share details on your Wireguard setup for the Proxmox cluster nodes connections over WAN? I do have a tiny usff Proxmox box at home (also Oracle Free Tier as you mentioned in another post) and would love a centralized setup like yours
Thank you so much again for sharing the cool idea and setup details.
Cheers,
I've been trying to do this but with lxd only over ipv6. I've been spending a lot of time with the discussion forum at linuxcontainers. I must say the maintainers there are really helpful. Nevertheless, it's a learning curve. Both lxd and ipv6 r new to me. I'm also realizing how common hosts have misconfigured ipv6. I'm also now going to take a look into proxmox's VE. I'm sure wt u suggest can be done.
The value for max_comp_streams you using is 1 or 3?
The value = number of threads/cpu you have.
Yes. My error. Thanks for spotting it.
On Racknerd I used their Debian installer cloud image which creates a 1GB Swap partition which I thought wasn't enough to I created an additional swap file for use.
Since it's an LXC container, swap is shared with host.
You're welcome. Glad to see someone finds it useful!
Hi. Thank you.
You are correct. Nested Virt needs to be enabled if you plan on using VMs. PVE still functions as an LXC container manager with the added benefits that come with PVE containers without vt-x/vt-d
Cheers.
@stoned
I'm not just bullying and giving my opinion, it's factual.
Proxmox is not meant for ultra lowend setups. Most will be wasted on overhead. You want to squeeze everything out of things? Well you are wasting many on overhead. Providers certainly won't like you as a customer as 1) Either you swap like mad 2) The usage is not the average customer, you are a noisy neighbor.
Recipe for disaster. If you don't know what you're doing you will have a very bad day once you actually starting using such setup and just throw away time. Either the provider kicks you out or many factors will collapse the cluster
Thank you for your opinion.
Sure, np.
Say you have 4 servers. Before building a cluster, you would install one of the servers as a wireguard host. You can learn setup instructions from many tutorials online. Then setup the rest of your servers to be peers for this wireguard host.
Configure wireguard for a private ipv4/ipv6 address, 10.x or 192.x or 172.x or fd80::/64 etc. and give each machine a private IP.
Connect each peer to the wireguard host. Now you have a private internal network.
Now you can tell proxmox to use the private fd80 or 10.x etc. IP range, and when you create a cluster and add nodes to it, you should select the private IP from the drop down list in Proxmox.
Sample config
( I use IPv6 mostly only and I try not to use legacy IP anymore) unless I absolutely have to use IPv4, I don't.wireguard server config
(endpoints here are peer IPs, not a:b:c:d:, that's just filler)wireguard peer/client config
(Note: Ports are autoselected by WG and updated in the config. Google WG setup tutorials and read the documentation)
Now your cluster is operating solely through secure wireguard. Wireguard has minimal overhead, but you can test out both configurations. You can setup a cluster using the server's public IPs and test the latency/performance against the private wireguard network you create.
Use PVE Firewall and lock everything down on external IPs. Np. Enjoy.
Hi. Thanks for your input.
I prefer IPv6 to IPv4. I dislike having to do NAT for one. IPv6 is more intuitive and makes sense whereas IPv4 needs more setup sometimes.
You should always get a /64 from anyone. If someone cannot provide a /64 then they may not have their IPv6 setup correctly or maybe lacking knowledge on how to segment their /48 or /56
I just got around to checking the 1G nodes after a day after some more tweaking.
I switched the compression on zswap and zram both to zstd and started swapping a bit less as I found out I didn't need to swap all the time, 1G is plenty actually.
Modified sysctl.conf to this a day before yesterday.
And here are the numbers. CPU doesn't generally go over 30% tops on idle unless I'm doing something. There are very very minor spike here and there when a mail comes in or if I'm on the GUI doing things. So unless actually using the CPU, it's not being too noisy and still fair share usage, which some were worried about.
and here's the other 1GB Racknerd node. I was updating the system earlier so CPU usage was high, but it idles around 20% with minor spikes here and there.
Even though it can be done, 1G is a bit low. 2G+ nodes are what you ideally want.
So if you get 2GB nodes, that's $10/mo, and you get another vCPU plus more disk space, for a better cluster even on KVM VPSes without nested virt. PVE is still a good manager, centralized, other features, LXC container manager, backups, replication, etc. etc.
So the 2GB+ might be a better setup option, but depends on your needs. If you're not running a lot and still want all the PVE features, then 1G is fine too with this low end ZRAM/ZSWAP setup.
If you're going to be running heavier containers needing more RAM, buy VPS with more RAM. That cannot be overstated.
For my needs, for $5/mo, I can get 4 redundant MX Proxmox Mail Gateways to route email to my server, so if one goes down, another can deliver it. Though for redundancy on MX, there should be one outside a cluster, so should the entire cluster fail and all nodes go down, at least mail can still be delivered.
In short, business/personal needs dictate what resources should be acquired. This was fun. Thanks for humoring me, folks.
You can get it for less.
Wat, 3GB Swap file and you are swapping 1.6GB on Disk, on a cheap VPS.
You aware this causes high I/O usage, which is a reason to suspend you?
iotop doesn't seem so busy... though I will collect iotop logs over a period of time.
https://termbin.com/hby59
Doesn't seem to be writing enough to be considered high i/o. It's busy, sure, but it's not writing a lot. Few hundred KB at most upon each iteration of iotop.
What you are seeing is the initial swapping in the beginning, after which it stabalizes itself unless you need to reboot, which I don't need to do so frequently. I'm not sure how to collect vmstat data. It doesn't have an option for non-interactive seemingly.
This is basically equivalent to a web or self hosted service where the disk is being written to frequently but not under any conditions that might cause thrashing or unacceptably high i/o.
Furthermore, most of the active paging I see is in compressed memory and not to disk post initial swapping upon boot up and container boot up.
512MB of that SWAP is actually ZRAM. Then we have 40% of 1G which is roughly 400MB being compressed by ZSWAP. Most of the active paging is happening between memory to compressed memory before it hits the disk. Disk i/o is not much, but swapping is frequent due to ZRAM/ZSWAP compression.
I'll continue monitoring vmstat and swap i/o over time and see how that looks.
Unless I'm incorrect/wrong, which is likely.
New Debian Release going to ship LXD 5.0.
https://wiki.debian.org/LXD
Thanks for the update. That will be awesome. Hopefully Debian's implementation would be more stable than Ubuntu's.
So, I've been monitoring /swapfile i/o and other stats, and here are the numbers for the 2x 1G Racknerd nodes:
And node 2:
Sure, we see 1.5x G in SWAP, but CPU is idle at 10% usually, no more, barely any spikes even when delivering mail which is my primary use case for the 1G nodes.
Disk i/o is minimal at most. I do not see any kworker threads consuming disk i/o for /swapfile or the /dev/vda2 swap partition,
vmstat -d
and other tools likeperf
.And here are the numbers averaged for a day
Both 1G nodes idle at 10% CPU and swap in and out of zswap to zram. There is no issue with high i/o, high swap i/o, high CPU usage or any other noise which might attract the attention of the providers.
In previous screenshots I was using the system a lot, installing, updating, etc. so the load seemed high. But when it's just sitting there doing its job, it's not bad at all.
I should mention the updated tweaking to swapping: On a 1G node, you want 256MB ZRAM, 20% pool space in ZSWAP, zstd compressor for both, and sysctl.conf
My initial config of 100% swappiness was causing a bit too much swapping to occur, down to 60-80% seems to work well, reduce the cache pressure to 50% of our original value, don't start telling dirty processes to purge until we have 60% reached capacity and at 50% dirty ratio.
Play around with the numbers here and see what works best for you.
Let's see how Debian 12 do. Cheers folks, thanks for reading!
Whats software in the screenshot?
May i know how to handle the inbound public IP address when the VPS migration to another node?
@stoned I like your idea! Sounds like a fun thing to try.