x86-64-v1 is all you need? What does cherry-picking CPU virt model actually help providers with?

nopint3 · March 15

Today I bought a VPS from a newly formed provider. They seem to be using the Virtualizor panel, and the CPU model template they selected is “Common KVM processor”.

From what I can see, that template only exposes something close to x86-64-v1 feature levels. In other words, it does not enable most of the x86-64 extensions introduced back in 2013, such as AVX and AES-NI. It does not even seem to include things like SSE3 or POPCNT in x86-64-v2.

For reference, the common x86-64 feature levels are roughly:

x86-64-v1: CMOV, CX8, FPU, FXSR, MMX, OSFXSR, SCE, SSE, SSE2

x86-64-v2: CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4_1, SSE4_2, SSSE3

x86-64-v3: AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, OSXSAVE

x86-64-v4: AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL

When I asked the provider about it, they said they believe anything beyond x86-64-v1 is unnecessary for most users, and that if someone wants a different CPU model, they can open a ticket and request it.

That feels a bit strange to me. If their physical CPUs are Xeon E5-v3, which support x86-64-v3 features, why limit the guest CPU instruction set by default?

I did some searching, and it seems mainstream Linux distros like Ubuntu still build packages targeting older baselines. So most software will still run fine on x86-64-v1.

But at the same time, it is pretty obvious that instruction sets like AVX can help with workloads such as databases, media encoding/decoding, and other compute-heavy tasks. AES-NI can also speed up TLS encryption/decryption. And of course, these features can make a pretty big difference in Geekbench scores too.

So I’m wondering: what is the actual benefit for the VPS provider here?

From my point of view, this seems to:

lower the benchmark performance of the CPU,
potentially increase CPU usage for the same real workload, because older instructions may need more CPU cycles to do the same job. Thus increasing steal time.

If you’re a hosting provider, how do you decide what virtual CPU model to expose to customers?

oloke · March 15

We (onidel) passthrough the CPU of the host. There is no point in artificially limiting the features for our customers. I think this approach is pretty common nowadays, on most hosting providers.

Restricting CPU features may lead to reduced performance within the VM (since the guest OS is not able to use optimal instruction set) and as a consequence, higher load on the hypervisor itself.

The only real downside of this approach lack of live migration between hypervisors with different CPUs, since CPU features can't change without OS reboot. However if all hypervisors have the same CPU model, live migration is achievable.

When I asked the provider about it, they said they believe anything beyond x86-64-v1 is unnecessary for most users.

Modern Linux distros are looking to provide packages compatible with newer x86-64-v3 architecture version to increase performance of precompiled binaries:
https://developers.redhat.com/articles/2024/01/02/exploring-x86-64-v3-red-hat-enterprise-linux-10
https://www.omgubuntu.co.uk/2025/11/ubuntu-amd64v3-x86-64-v3-support

So I wouldn't say it's "unnecessary for most users" in 2026.

ps. I would also like to hear what @forest thinks on this topic.

tentor · March 15

I often see whatever "host-model" is, so depends on an actually used CPU generation

zed · March 15

Are they intentionally limiting users or just don't know what they're doing?

nopint3 · March 15

@zed said:
Are they intentionally limiting users or just don't know what they're doing?

They did this intentionally

MannDude · March 15

@nopint3 said:

@zed said:
Are they intentionally limiting users or just don't know what they're doing?

They did this intentionally

This.

I don't think it's even a default value for Virtualizor or VirtFusion for new hypervisor deployments or anything.

yoursunny · March 15

We actively use x86-64-v2 instructions in our code.

https://github.com/usnistgov/ndn-dpdk/blob/67f84648ea71c18aae7e6306f872a235207f5c5d/csrc/iface/reassembler.c#L50-L56

__attribute__((nonnull)) static inline void
Reassembler_Drop_(Reassembler* reass, LpL2* pm, hash_sig_t hash) {
  Reassembler_Delete_(reass, pm, hash);

  reass->nDropFragments += pm->fragCount - rte_popcount32(pm->reassBitmap);
  rte_pktmbuf_free_bulk((struct rte_mbuf**)pm->reassFrags, pm->fragCount);
}

This function is part of the reassembler in a hop-by-hop fragmentation and reassembly protocol.
In the data structure, reassBitmap is a uint32 bitmap that indicates which fragments have arrived, and fragCount is the total number of fragments.
If the reassembly fails due to timeout, the function is called, where the nDropFragments counter must be incremented with the number of dropped fragments.
Instead of keeping an extra counter regarding "how many fragments have arrived", we used the POPCNT instruction.

The underlying library also has specialized optimization for x86-64-v3 that can be applied automatically.
However, we do not actively directly invoke AVX instructions.

tedd_plat514 · March 16

"When I asked the provider about it, they said they believe anything beyond x86-64-v1 is unnecessary for most users, and that if someone wants a different CPU model, they can open a ticket and request it."

Are you kidding me ? Do they expect all users to run a 2005 era web page or what ?
Modern software can greatly benefit from the expanded ISA and v4 extensions is already like 5+ years old for AI inference running on servers.

Fubuki · March 16

Reading this thread sent a flashbang at near midnight

igctt · March 16

@yoursunny said:
We actively use x86-64-v2 instructions in our code.

https://github.com/usnistgov/ndn-dpdk/blob/67f84648ea71c18aae7e6306f872a235207f5c5d/csrc/iface/reassembler.c#L50-L56
__attribute__((nonnull)) static inline void
Reassembler_Drop_(Reassembler* reass, LpL2* pm, hash_sig_t hash) {
  Reassembler_Delete_(reass, pm, hash);

  reass->nDropFragments += pm->fragCount - rte_popcount32(pm->reassBitmap);
  rte_pktmbuf_free_bulk((struct rte_mbuf**)pm->reassFrags, pm->fragCount);
}
This function is part of the reassembler in a hop-by-hop fragmentation and reassembly protocol.
In the data structure, reassBitmap is a uint32 bitmap that indicates which fragments have arrived, and fragCount is the total number of fragments.
If the reassembly fails due to timeout, the function is called, where the nDropFragments counter must be incremented with the number of dropped fragments.
Instead of keeping an extra counter regarding "how many fragments have arrived", we used the POPCNT instruction.

The underlying library also has specialized optimization for x86-64-v3 that can be applied automatically.
However, we do not actively directly invoke AVX instructions.

i remember there were different architectural optimizations using assembly code in dpdk library. especially these rte_ functions, and they intend to use the latest features of hardware

layer7 · March 16

Hi,

IF you do not run any software that benefit from those CPU extension THEN it does not matter.

But we talk here about (nowadays) standard applications like VPN / media processing / encryption related stuff.

If there is no CPU extension available that can be used, then the job has to be done through the CPU in "software" mode. That will consume significantly more CPU because the CPU has to math it in much more complicated / timeconsuming way as it has no access to the "hardware" acceleration that comes with the CPU features/extensions.

So there is a lot IF inside... i would actually not use providers who does not offer cpu passthrough or at least very recent emulations like Epyc or the intel equivalent.

With this very basic cpu emulations even simple tasks will consume CPU power like hell. So the total value/quality of the server is reduced a lot because YES the software will run, but your CPU consume will explode. So simple tasks will eat your available CPU power away and will run with (feelable) bad performance.

forest · March 18

@oloke said: The only real downside of this approach lack of live migration between hypervisors with different CPUs, since CPU features can't change without OS reboot. However if all hypervisors have the same CPU model, live migration is achievable.

Hosts that need to do live migration typically set the host model to the least common denominator. For example, if their oldest system is Broadwell, then they will set the host model to Broadwell. This is still leaps and bounds better than the default QEMU model which is meant to be compatible with Pentium 4-era software.

@layer7 said: IF you do not run any software that benefit from those CPU extension THEN it does not matter.

All software benefits from it, actually. It's not only heavy cryptography and video encoding that benefit. Even simple loops often get auto-vectorized by the compiler, and there are extensions that are not intended for performance but for security (see below).

@oloke said: Restricting CPU features may lead to reduced performance within the VM (since the guest OS is not able to use optimal instruction set) and as a consequence, higher load on the hypervisor itself.

It doesn't only improve performance, but security as well. The Linux kernel will automatically make use of hardware security features when they are present, such as SMEP (Supervisor Mode Execution Prevention, which unconditionally blocks execution of user pages in kernelspace), SMAP (Supervisor Mode Access Prevention, which blocks access of user pages when in kernel mode, except for in carefully-guarded APIs intended for user-kernel copies), and UMIP (User Mode Instruction Prevention, which blocks several sensitive instructions outside of kernel mode). The first two stop a wide variety of severe kernel-level exploits dead in their tracks. It's a huge waste if the CPU supports those features but the guest is unable to use them. Seriously, it's 2026. The kernel should not be able to jump into user code, ever.

And you can say what you want about the trustworthiness of RDRAND/RDSEED, but at least they provide unique numbers that prevent a common VM-related randomness problems when they are injected into the kernel entropy pool.

@nopint3 said: When I asked the provider about it, they said they believe anything beyond x86-64-v1 is unnecessary for most users, and that if someone wants a different CPU model, they can open a ticket and request it.

So they don't think guest security is important. You should tell us what the provider is. I'd like to avoid it.

Howdy, Stranger!

Categories

In this Discussion

x86-64-v1 is all you need? What does cherry-picking CPU virt model actually help providers with?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

x86-64-v1 is all you need? What does cherry-picking CPU virt model actually help providers with?

Comments