Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Is my dream of a VPS optimized distro dead or misguided?

So for a while now I've been working on a gentoo system optimized for tiny VPS machines that could most benefit from optimized compilation but least afford to do said compilation themselves. Basically CFLAGS="-Os -march=x86-64-v3 ...", using MUSL as libc and a conservative
CPU_FLAGS_X86: aes mmx popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3
based on netcups which I thought was the lowest common denominator for available CPU flags, but that turned out to be wrong. It works on binarylane and onidel but trying it on crowd favourite racknerd keeps throwing illegal instructions, even after recompiling with even less ambitious flags. It turns out my racknerd VPS doesn't even support x86-64-v3!!! At first glance the list of supported flags looks comfortably long at "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms xsaveopt arat umip md_clear arch_capabilities" but based on my experiments I believe most of these are commands for the wagon driver to call out to the mules powering their horse-n-buggy setup.

So at this point I could start from scratch with racknerd as my new lowest common denominator but no doubt there's an even lamer provider waiting just around the corner and eventually I'll get to a point where I can't provide any advantages over just going with stock debian or fedora which is depressing. I know x86-64-v3 is too much for a general purpose distro, my own desktop doesn't even support it but I sort of assumed top rated cloud providers wouldn't be running on hardware old enough to legally date. Are there any resources out there discussing which CPU flags are available on which providers and what is the bare minimum considered acceptable in $currentyear?

Thanked by 1mandala

Comments

  • msattmsatt Member, Host Rep

    Interesting project - if you need a vm for your testing, lmk and we could perhaps provide your images for building client VMs at FOSSVPS. We have 4 different hosts with varying cpus so could be good for testing.

  • ask @anakara he can help you with many things

  • crystalcrystal Member
    edited December 2025

    @MaeIstrom said: I know x86-64-v3 is too much for a general purpose distro, my own desktop doesn't even support it but I sort of assumed top rated cloud providers wouldn't be running on hardware old enough to legally date.

    I find this confusing. Sure, you'll find some recent-ish hardware at premium and/or expensive providers such as netcup, but this is LowEndTalk. The only reason it's possible to get such a cheap VPS is because a lot of these providers already have racks upon racks of aging hardware installed in their datacenter, slicing it up and renting it out for cheap is more ecologically effective than throwing it all away and buying expensive new hardware that you can rent out for more than monthly the price of a mortgage. Even the OVH VPS I got recently has an old crusty Haswell CPU.

  • @crystal said:

    @MaeIstrom said: I know x86-64-v3 is too much for a general purpose distro, my own desktop doesn't even support it but I sort of assumed top rated cloud providers wouldn't be running on hardware old enough to legally date.

    I find this confusing. Sure, you'll find some recent-ish hardware at premium and/or expensive providers such as netcup, but this is LowEndTalk. The only reason it's possible to get such a cheap VPS is because a lot of these providers already have racks upon racks of aging hardware installed in their datacenter, slicing it up and renting it out for cheap is more ecologically effective than throwing it all away and buying expensive new hardware that you can rent out for more than monthly the price of a mortgage. Even the OVH VPS I got recently has an old crusty Haswell CPU.

    Haswell may be over a decade old, but it's still relatively new because it features AVX2 and compatible with x86-64-v3.

    Thanked by 1oloke
  • crystalcrystal Member
    edited December 2025

    @fadedmaple said: Haswell may be over a decade old, but it's still relatively new because it features AVX2 and compatible with x86-64-v3.

    That's technically true, but it's the first generation that supported it, and this is a brand new latest-generation VPS from a provider that doesn't sell anything for less than $4/mo. Some of RackNerd's nodes are at least Haswell, but many of them are Ivy Bridge, which isn't uncommon for the kinds of aging hardware that's available for the rock bottom prices people on this forum are willing to pay.

  • @crystal said:

    @MaeIstrom said: I know x86-64-v3 is too much for a general purpose distro, my own desktop doesn't even support it but I sort of assumed top rated cloud providers wouldn't be running on hardware old enough to legally date.

    . Even the OVH VPS I got recently has an old crusty Haswell CPU.

    Are you sure? If I recall correctly they've been using Haswell CPU identifier for years on their VMs. The actual processor can be anything that supports Haswell's instruction set.

  • I think the dream of a universal VPS distro is misguided because of the different virtualization technologies at play, not to mention the different underlying hardware. My goto 'lite' install is Alpine. If I do intend to use the machine semi-regularly I'll go for Arch which is also very lightweight before you crud it up with modules and no need to compile the OS itself. Debian is probably the 'set and forget' option because it updates so slowly updating it is not going to help.

  • OP,

    Intel tried this for years with the fastest distribution available (at the cost of not having apt/dnf copy and paste).

    Zero fucks given by LET. This isn't the technically capable audience you think it is.

  • @TimboJones said:
    Zero fucks given by LET. This isn't the technically capable audience you think it is.

    Or maybe this just isn't the problem worth optimising that OP thinks it is.

    For any non-trivial workload, compiler optimisations in the OS plays an insignificant role in performance. By all means, profile your web app and recompile / optimise the shit out of that. Maybe do the same with apache if that's your web server or haproxy or whatever, maybe perl/python/whatever your core things are written in, but there's basically no need to make sure that the entire distribution is compiled with maximum everything.

    And in all honesty, when you get to the kind of workloads where it'd make a practical difference, it's easier just to migrate to a better CPU / more cores / whatever.

    Thanked by 1tentor
  • Make a system where you can SSH to your VPS, it gets the CPU information and compiles the distro for your exact VPS.

  • Actually, I guess I should note that there have been a couple of significantly fast JSON / XML parsers, string compares, etc using AVX2 recently, so maybe if that's a hot point for you, that specific app is definitely worth using an AVX2 variant for. But in general, probably not.

    Thanked by 1tentor
  • @OpaqueRegistrant said:
    Make a system where you can SSH to your VPS, it gets the CPU information and compiles the distro for your exact VPS.

    You don't use arch btw

  • The most performance could be gotten with the profile-guided optimizations (PGO), and probably with LTO if it makes the code smaller.

    With just march flags—not so much.

    Thanked by 1oloke
  • jsgjsg Member, Resident Benchmarker
    edited December 2025

    I'm quite deep into optimization of code, libs, and executables and also build and run my FreeBSD kernel version (where sensible). But when I need a small linux I simply install Alpine or in more extreme cases tinycore.

    But I tend to agree with @ralf. In but the most extreme cases optimizing the OS build doesn't provide the performance increase many think it does (hello Gentoo fans ;) ). Actually even on the application level optimized compiling doesn't really drastically improve performance - unlike profiling, targeted benchmarking, and optimizing the performance critical algorithms in extreme cases even by rewriting them in Assembler.

    Note though that there is something you'll encounter throughout all levels: optimization tightens the targets your code runs on, plus you also should keep the costs in mind that is, true optimization on all levels (from the critical algorithms up to the OS) only very rarely makes sense, e.g. for one single customer/client and a particular use case.

    Thanked by 1ralf
  • @ralf said:

    @TimboJones said:
    Zero fucks given by LET. This isn't the technically capable audience you think it is.

    Or maybe this just isn't the problem worth optimising that OP thinks it is.

    For any non-trivial workload, compiler optimisations in the OS plays an insignificant role in performance. By all means, profile your web app and recompile / optimise the shit out of that. Maybe do the same with apache if that's your web server or haproxy or whatever, maybe perl/python/whatever your core things are written in, but there's basically no need to make sure that the entire distribution is compiled with maximum everything.

    And in all honesty, when you get to the kind of workloads where it'd make a practical difference, it's easier just to migrate to a better CPU / more cores / whatever.

    https://www.phoronix.com/review/clear-linux-48p-ubuntu

    I disagree. On a low end system, you're getting more done on minimal resources.

    On a beast, you're getting more value for the $$$ paid and getting benefits without needing to upgrade hardware ($$$).

    There's no question there's a performance benefits, the question is it worth the hassle of using something you need to change your SOP or learn something new.

  • tentortentor Member, Host Rep

    @ValdikSS said:
    The most performance could be gotten with the profile-guided optimizations (PGO)

    https://arxiv.org/html/2507.16649v1#S5

    I don't think PGO offers that much performance gains, performance increase rarely exceeds a few percent. It could optimise some workloads though, but I don't see it as universally useful given additional compilation overhead.

  • @tentor said: I don't think PGO offers that much performance gains, performance increase rarely exceeds a few percent.

    I usually work with lower-end CPUs in embedded and close-to-embedded devices, and PGO gave me 80% performance increase in Sega Genesis emulator back in the day.

    Maybe it really is not that important on modern superscalar best of out-of-order intels and amds though, yet browsers still compiled with PGO enabled.

    Thanked by 2tentor Murv
  • tentortentor Member, Host Rep

    @ValdikSS said:

    @tentor said: I don't think PGO offers that much performance gains, performance increase rarely exceeds a few percent.

    I usually work with lower-end CPUs in embedded and close-to-embedded devices, and PGO gave me 80% performance increase in Sega Genesis emulator back in the day.

    Not going to lie, I am surprised and impressed. I wonder why it is like that? Was the code paths not optimised/cache aware or some other reason? AFAIK, what PGO mostly does is reorder code according to how frequently it is used by relevant functions to save on jumps.

    Please correct me if I misunderstand something.

    Thanked by 2oloke Murv
  • @tentor said: what PGO mostly does is reorder code according to how frequently it is used by relevant functions to save on jumps.

    The profile allows the compiler to apply many optimizations in different places. Modern GCC tunes branch prediction, unroll loops if it improves performance, sometimes it helps in vectorizing.

    https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#:~:text=Enable profile feedback-directed optimizations, and the following optimizations

  • @msatt said:
    Interesting project - if you need a vm for your testing, lmk and we could perhaps provide your images for building client VMs at FOSSVPS. We have 4 different hosts with varying cpus so could be good for testing.

    Thanks I'll definitely keep FOSSVPS in mind when I get a project page and installer done. At the moment it's a tar file that I download to a vps, chroot into and install grub from but having more machines available would increase the amount of options that could be offered. Until the racknerd discovery I was leaning towards two variants.
    1) A VPS version built with x86-64-v3 and musl
    2) A general purpose build with x86-64-v2 and glibc

    Ideally both versions would come in v2,v3 and v4 flavors with an installer that recommends and downloads the right stream but that's up to 6 already without taking arm64 into account. Can I ask is it possible to set up a FOSSVPS with a live-boot iso as I didn't see that mentioned on your form and it's something I would need to test installations.

  • msattmsatt Member, Host Rep

    Happy to help if we can.
    At present we would have to download the ISO to make it available for you to use - it also makes the ISO 'public' which is likely not what you would want :/
    However I am always looking at ways to improve FOSSVPS so will say 'soon' ;)

  • @TimboJones said:
    OP,

    Intel tried this for years with the fastest distribution available (at the cost of not having apt/dnf copy and paste).

    Zero fucks given by LET. This isn't the technically capable audience you think it is.

    LOL I guess the fact yabs.sh dies a horrible death under musl makes it a non-starter for many then. I think one of Clear Linux's issues was that it lacked a compelling narrative, I only realized after it's demise it was a good option for AMD processors which is why I never looked at it all that seriously. They probably would've done better making an optimized debian variant than reinventing the OS with that stateless stuff. Not saying it was wrong but when you're trying to choose between 100 different distros you can really only hold one or two concepts per distro in mind as in that's the X distro and X = 'Intel' wasn't the right choice for the times.

    In truth the raison d'etre for my distro is supposed to be it's the openbsd of linux with s/qmail and hiawatha running out of the box. My interest in pushing the optimization as far as possible is more about getting some leeway to turn on more hardening options in the kernel rather than being the fastest.

  • @ValdikSS said:
    The most performance could be gotten with the profile-guided optimizations (PGO), and probably with LTO if it makes the code smaller.

    With just march flags—not so much.

    It actually does use LTO, that was one of several options hidden away in the '...'. I must confess this is the first I'm hearing about PGO and while I'm currently recompiling binutils and gcc to make use of it it sounds like something better done by a large corporation with resources to spare to build an entire distro using it. I suspect the computing resources to do PGO is why noone from the community forked Intel's distro and maintained it after they pulled the plug. Not sure why Fedora aren't all over it though.

Sign In or Register to comment.