New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Kinda. What I referred to is the fact that crypto is very sensible to memory and caching (and not wasting it). But on the level we're talking about here (max. 1 or 2 Gb/s if that) a 32-bit OS and program usually will do fine (if the code is not idiotically designed and adapted).
Now, when we're talking 25 or 40 Gb/s systems the story dramatically changes of course, but that is way beyond the LET level.
I am talking about multiple threads and multiple routes. That complicates the matter when each uses own keys and possibly cyphers.
My experience has been around 30%/40% saved. Indeed not a "small amount" in RAM limited environments.
Too high crypto load means lower throughput.
Interesting. In low load/traffic setups I assume difference should be negligible performance-wise.
TLS is a very major brake anyway, 32-bits or 64-bits. People don't seem to realize (probably due to all the "TLS everywhere!!!" hype noise) that asymmetric crypto, especially with ever new (temporary) keys, is a very major bottleneck. On the nodes typically used for cheap VPS a single core can typically generate about 10 - 30 key pairs per second (RSA 2k) and on Epycs, Ryzens, and the newer Xeons about 50 - 75 (and key pair generation being a very major factor in terms of performance).
Note though that on routers that factor usually is irrelevant.
More No than Yes. You see, it's not just about the "famous" PK algos. One of the real major problems, besides RSA and ECC, is what we call 'state size'; a very clear example are pseudo random number generators (which aren't famous but actually pretty much everywhere) and in a PRNG's quality the size of its internal size is tightly related to its quality and for cryptographically secure PRNGs it's often in the 64+ bytes (and not rarely well over 128 bytes) range. "pumping" through and processing data with larger than L1 cache line size (typ. 64 bytes) as well as in small chunks (32-bits) or in 64-bit chunks makes a very major difference. And keep in mind that the full state needs to be changed for each and every random byte. The same goes for many other crypto routines.
Plus, the difference isn't just word size (32 vs. 64 bits) but quite some other factors too, like e.g. cache size. Let's have a quick look at a relative old Xeon Nehalem (64-bit, X5560)) and a shmick Pentium Prescott/Cedar Mills (32-bit):
plus 4+ times memory access speed, SSE 4.2, and more
@jsg very interesting, thank you very much!