All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Is my dream of a VPS optimized distro dead or misguided?
So for a while now I've been working on a gentoo system optimized for tiny VPS machines that could most benefit from optimized compilation but least afford to do said compilation themselves. Basically CFLAGS="-Os -march=x86-64-v3 ...", using MUSL as libc and a conservative
CPU_FLAGS_X86: aes mmx popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3
based on netcups which I thought was the lowest common denominator for available CPU flags, but that turned out to be wrong. It works on binarylane and onidel but trying it on crowd favourite racknerd keeps throwing illegal instructions, even after recompiling with even less ambitious flags. It turns out my racknerd VPS doesn't even support x86-64-v3!!! At first glance the list of supported flags looks comfortably long at "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms xsaveopt arat umip md_clear arch_capabilities" but based on my experiments I believe most of these are commands for the wagon driver to call out to the mules powering their horse-n-buggy setup.
So at this point I could start from scratch with racknerd as my new lowest common denominator but no doubt there's an even lamer provider waiting just around the corner and eventually I'll get to a point where I can't provide any advantages over just going with stock debian or fedora which is depressing. I know x86-64-v3 is too much for a general purpose distro, my own desktop doesn't even support it but I sort of assumed top rated cloud providers wouldn't be running on hardware old enough to legally date. Are there any resources out there discussing which CPU flags are available on which providers and what is the bare minimum considered acceptable in $currentyear?

Comments
Interesting project - if you need a vm for your testing, lmk and we could perhaps provide your images for building client VMs at FOSSVPS. We have 4 different hosts with varying cpus so could be good for testing.
ask @anakara he can help you with many things
I find this confusing. Sure, you'll find some recent-ish hardware at premium and/or expensive providers such as netcup, but this is LowEndTalk. The only reason it's possible to get such a cheap VPS is because a lot of these providers already have racks upon racks of aging hardware installed in their datacenter, slicing it up and renting it out for cheap is more ecologically effective than throwing it all away and buying expensive new hardware that you can rent out for more than monthly the price of a mortgage. Even the OVH VPS I got recently has an old crusty Haswell CPU.
Haswell may be over a decade old, but it's still relatively new because it features AVX2 and compatible with x86-64-v3.
That's technically true, but it's the first generation that supported it, and this is a brand new latest-generation VPS from a provider that doesn't sell anything for less than $4/mo. Some of RackNerd's nodes are at least Haswell, but many of them are Ivy Bridge, which isn't uncommon for the kinds of aging hardware that's available for the rock bottom prices people on this forum are willing to pay.
Are you sure? If I recall correctly they've been using Haswell CPU identifier for years on their VMs. The actual processor can be anything that supports Haswell's instruction set.
I think the dream of a universal VPS distro is misguided because of the different virtualization technologies at play, not to mention the different underlying hardware. My goto 'lite' install is Alpine. If I do intend to use the machine semi-regularly I'll go for Arch which is also very lightweight before you crud it up with modules and no need to compile the OS itself. Debian is probably the 'set and forget' option because it updates so slowly updating it is not going to help.
OP,
Intel tried this for years with the fastest distribution available (at the cost of not having apt/dnf copy and paste).
Zero fucks given by LET. This isn't the technically capable audience you think it is.
Or maybe this just isn't the problem worth optimising that OP thinks it is.
For any non-trivial workload, compiler optimisations in the OS plays an insignificant role in performance. By all means, profile your web app and recompile / optimise the shit out of that. Maybe do the same with apache if that's your web server or haproxy or whatever, maybe perl/python/whatever your core things are written in, but there's basically no need to make sure that the entire distribution is compiled with maximum everything.
And in all honesty, when you get to the kind of workloads where it'd make a practical difference, it's easier just to migrate to a better CPU / more cores / whatever.
Make a system where you can SSH to your VPS, it gets the CPU information and compiles the distro for your exact VPS.
Actually, I guess I should note that there have been a couple of significantly fast JSON / XML parsers, string compares, etc using AVX2 recently, so maybe if that's a hot point for you, that specific app is definitely worth using an AVX2 variant for. But in general, probably not.
You don't use arch btw
The most performance could be gotten with the profile-guided optimizations (PGO), and probably with LTO if it makes the code smaller.
With just march flags—not so much.
I'm quite deep into optimization of code, libs, and executables and also build and run my FreeBSD kernel version (where sensible). But when I need a small linux I simply install Alpine or in more extreme cases tinycore.
But I tend to agree with @ralf. In but the most extreme cases optimizing the OS build doesn't provide the performance increase many think it does (hello Gentoo fans
). Actually even on the application level optimized compiling doesn't really drastically improve performance - unlike profiling, targeted benchmarking, and optimizing the performance critical algorithms in extreme cases even by rewriting them in Assembler.
Note though that there is something you'll encounter throughout all levels: optimization tightens the targets your code runs on, plus you also should keep the costs in mind that is, true optimization on all levels (from the critical algorithms up to the OS) only very rarely makes sense, e.g. for one single customer/client and a particular use case.
https://www.phoronix.com/review/clear-linux-48p-ubuntu
I disagree. On a low end system, you're getting more done on minimal resources.
On a beast, you're getting more value for the $$$ paid and getting benefits without needing to upgrade hardware ($$$).
There's no question there's a performance benefits, the question is it worth the hassle of using something you need to change your SOP or learn something new.
https://arxiv.org/html/2507.16649v1#S5
I don't think PGO offers that much performance gains, performance increase rarely exceeds a few percent. It could optimise some workloads though, but I don't see it as universally useful given additional compilation overhead.
I usually work with lower-end CPUs in embedded and close-to-embedded devices, and PGO gave me 80% performance increase in Sega Genesis emulator back in the day.
Maybe it really is not that important on modern superscalar best of out-of-order intels and amds though, yet browsers still compiled with PGO enabled.
Not going to lie, I am surprised and impressed. I wonder why it is like that? Was the code paths not optimised/cache aware or some other reason? AFAIK, what PGO mostly does is reorder code according to how frequently it is used by relevant functions to save on jumps.
Please correct me if I misunderstand something.
The profile allows the compiler to apply many optimizations in different places. Modern GCC tunes branch prediction, unroll loops if it improves performance, sometimes it helps in vectorizing.
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#:~:text=Enable profile feedback-directed optimizations, and the following optimizations
Thanks I'll definitely keep FOSSVPS in mind when I get a project page and installer done. At the moment it's a tar file that I download to a vps, chroot into and install grub from but having more machines available would increase the amount of options that could be offered. Until the racknerd discovery I was leaning towards two variants.
1) A VPS version built with x86-64-v3 and musl
2) A general purpose build with x86-64-v2 and glibc
Ideally both versions would come in v2,v3 and v4 flavors with an installer that recommends and downloads the right stream but that's up to 6 already without taking arm64 into account. Can I ask is it possible to set up a FOSSVPS with a live-boot iso as I didn't see that mentioned on your form and it's something I would need to test installations.
Happy to help if we can.

At present we would have to download the ISO to make it available for you to use - it also makes the ISO 'public' which is likely not what you would want
However I am always looking at ways to improve FOSSVPS so will say 'soon'
LOL I guess the fact yabs.sh dies a horrible death under musl makes it a non-starter for many then. I think one of Clear Linux's issues was that it lacked a compelling narrative, I only realized after it's demise it was a good option for AMD processors which is why I never looked at it all that seriously. They probably would've done better making an optimized debian variant than reinventing the OS with that stateless stuff. Not saying it was wrong but when you're trying to choose between 100 different distros you can really only hold one or two concepts per distro in mind as in that's the X distro and X = 'Intel' wasn't the right choice for the times.
In truth the raison d'etre for my distro is supposed to be it's the openbsd of linux with s/qmail and hiawatha running out of the box. My interest in pushing the optimization as far as possible is more about getting some leeway to turn on more hardening options in the kernel rather than being the fastest.
It actually does use LTO, that was one of several options hidden away in the '...'. I must confess this is the first I'm hearing about PGO and while I'm currently recompiling binutils and gcc to make use of it it sounds like something better done by a large corporation with resources to spare to build an entire distro using it. I suspect the computing resources to do PGO is why noone from the community forked Intel's distro and maintained it after they pulled the plug. Not sure why Fedora aren't all over it though.