Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Netcup pauses all G11 Root Server orders and reduces performance by 50%
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Netcup pauses all G11 Root Server orders and reduces performance by 50%

MoopahMoopah Member
edited May 31 in General

I was planning to buy exactly 453 of the Root Server G11 VDS (4 dedicated-core) from NetCup based on good recommendation from fellow LET user (0% steal on many Netcup VDS) to run Quilibrium crypto nodes which I also learned from the LET user.

However, upon investigating my only G11 VDS root server I got performance dropped by 50% in just 1 month since I ordered it for testing.

Others report of the same performance decrease on NetCup's forums: https://forum.netcup.de/administration-of-a-server-vserver/vserver-server-kvm-server/15354-aktuelle-benchmarks-rs-vps-produkte/?pageNo=25

Netcup also halted all sales of their G11 root servers:

(Translated)
Root Server G11 currently unavailable:
We are trying to offer you our products again as soon as possible.
Thank you for your understanding.

This is the before and after on my G11 Root Server which was never rebooted between tests and is idling (clean Ubuntu install)

Netcup RS G11 - April 20 2024

Running GB6 benchmark test...
Test:
---------------------------------
Test            | Value
                |
Single Core     | 2069
Multi Core      | 6804
Full Test       | https://browser.geekbench.com/v6/cpu/5819654
Netcup RS G11 - May 28 2024

Running GB6 benchmark test... *cue elevator music*^MESC[0KGeekbench 6 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 1071
Multi Core      | 3249
Full Test       | https://browser.geekbench.com/v6/cpu/6316670
«13

Comments

  • Interesting. Seems Netcup can't handle the load of 100s of instances running Quickburn (or whatever it's called...) 24/7 either. I guess i see this with one eye laughing and one eye crying. On one hand it exposes a dishonest marketing scheme but on the other hand it'll come of the expense of people pushing the VMs hard but still not that hard.

  • remyremy Member

    I've also noticed this significant loss of performance.
    If you have a previous-generation offer, I advise you not to upgrade to the new gen.
    So I end up with a more expensive offer, less disk space for comparable performance.
    Not worth it.

    # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
    #              Yet-Another-Bench-Script              #
    #                     v2024-04-22                    #
    # https://github.com/masonr/yet-another-bench-script #
    # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
    
    Fri May 31 02:02:38 CEST 2024
    
    Basic System Information:
    ---------------------------------
    Uptime     : 6 days, 4 hours, 33 minutes
    Processor  : AMD EPYC 9634 84-Core Processor
    CPU cores  : 4 @ 2246.622 MHz
    AES-NI     : ✔ Enabled
    VM-x/AMD-V : ❌ Disabled
    RAM        : 7.8 GiB
    Swap       : 976.0 MiB
    Disk       : 249.9 GiB
    Distro     : Debian GNU/Linux 12 (bookworm)
    Kernel     : 6.1.0-21-amd64
    VM Type    : KVM
    IPv4/IPv6  : ✔ Online / ✔ Online
    
    IPv6 Network Information:
    ---------------------------------
    ISP        : netcup GmbH
    ASN        : AS197540 netcup GmbH
    Host       : NETCUP-GMBH
    Location   : Nuremberg, Bavaria (BY)
    Country    : Germany
    
    fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/mapper/vds--de--vg-root):
    ---------------------------------
    Block Size | 4k            (IOPS) | 64k           (IOPS)
      ------   | ---            ----  | ----           ----
    Read       | 117.81 MB/s  (29.4k) | 230.38 MB/s   (3.5k)
    Write      | 118.12 MB/s  (29.5k) | 231.59 MB/s   (3.6k)
    Total      | 235.93 MB/s  (58.9k) | 461.97 MB/s   (7.2k)
               |                      |
    Block Size | 512k          (IOPS) | 1m            (IOPS)
      ------   | ---            ----  | ----           ----
    Read       | 290.09 MB/s    (566) | 395.70 MB/s    (386)
    Write      | 305.51 MB/s    (596) | 422.05 MB/s    (412)
    Total      | 595.61 MB/s   (1.1k) | 817.76 MB/s    (798)
    
    Geekbench 6 Benchmark Test:
    ---------------------------------
    Test            | Value
                    |
    Single Core     | 1163
    Multi Core      | 3639
    Full Test       | https://browser.geekbench.com/v6/cpu/6327920
    
    YABS completed in 8 min 16 sec
    
  • MoopahMoopah Member

    It makes customers very sad when performance decreases so much after purchase especially on VDS with dedicated cores. I expect 10-20% drop at most from CPU downclocking but not 50%.

    I buy VDS to run only Geekbench to see pretty numbers appear on my terminal screen. But now my Netcup VDS shows not pretty numbers.

    I will also need to buy 453 VDS with 4 dedicated cores somewhere else now :/

    Good thing I did not buy 12 month Netcup contract.

  • AdvinAdvin Member, Patron Provider

    @Moopah said:
    It makes customers very sad when performance decreases so much after purchase especially on VDS with dedicated cores. I expect 10-20% drop at most from CPU downclocking but not 50%.

    I buy VDS to run only Geekbench to see pretty numbers appear on my terminal screen. But now my Netcup VDS shows not pretty numbers.

    I will also need to buy 453 VDS with 4 dedicated cores somewhere else now :/

    Good thing I did not buy 12 month Netcup contract.

    As you may know, a lot of people run Geekbench5 and Geekbench6 tests, and some people notice that the score doesn't exactly scale with the amount of cores available in the system. For example, an 8 core system with a CPU that scores 1000 GB5 points in single threaded won't exactly score 8000 points in multi-threaded. This is probably because of a number of reasons, but I suspect that it's due to hyperthreading and all-core CPU clock speeds being lower.

    EPYC 7702 Testing (64 cores, 128 threads)
    I have an EPYC 7702 for a short period of time, so I thought I would start with this processor in particular. Threads are basically equivalent to what most providers call vCPU cores.

    I'll be testing with:

    0 threads loaded (0%)
    16 threads loaded (12.5%)
    32 threads loaded (25%)
    48 threads loaded (37.5%)
    64 threads loaded (50%)
    96 threads loaded (75%)
    120 threads loaded
    The BIOS was set to default values, and the CPU is theoretically rated for a 200W TDP. In practice, I only saw up to 185W on the CPU.

    In each test, a VM will be used to run a stress test using the stress -c command to stress these amount of threads. Then, another VM with 8 threads allocated will be running a Geekbench5 tests to see the changes in results.

    It's important to note that this may not be representative of real world results. The benchmarks are purely artificial, and most virtualization solutions typically schedules the VM to certain threads rather than evenly distributing it across all of the cores.

    In this case, we are testing with the following configuration:

    AMD EPYC 7702 (64c/128t)
    GIGABYTE MZ32-AR0 REV 1.0
    16 x 64GB 2400 MHz DDR4 ECC RDIMM
    4 x 3.84TB NVMe SSD + 2 x 512GB NVMe SSD + 1TB NVMe SSD
    40G ConnectX-3 Pro NIC

    Test VM:

    Host Passthrough (8 vCPU Cores)
    16GB DDR4 ECC Memory
    60GB NVMe SSD Storage

    0 threads loaded
    Idle usage is approximately 145W-180W of power on the host node.

    Single Core: 1037
    Multi Core: 7892

    16 threads loaded
    Upon loading the test VM, the server has a power usage between 290-300W.

    Single Core: 1047
    Multi Core: 7519
    Full Test: https://browser.geekbench.com/v5/cpu/22231694

    Most of the cores that were loaded by the stress test seem to be at around 3.2 GHz.

    32 threads loaded
    Upon loading the test VM, the server has a power usage between 290-300W.

    Single Core: 938
    Multi Core: 6672
    Full Test: https://browser.geekbench.com/v5/cpu/22231637

    Most of the cores that were loaded by the stress test seem to be at around 3.05 GHz.

    48 threads loaded
    Upon loading the test VM, the server has a power usage around 300-305W.

    Single Core: 860
    Multi Core: 5606
    Full Test: https://browser.geekbench.com/v5/cpu/22231663

    Most of the cores that were loaded by the stress test seem to be at around 2.71 GHz.

    64 threads loaded
    Upon loading the test VM, the server has a power usage around 310W.

    Single Core: 550
    Multi Core: 4082
    Full Test: https://browser.geekbench.com/v5/cpu/22231650

    Most of the cores that were loaded by the stress test seem to be at around 2.45 GHz.

    96 threads loaded
    Upon loading the test VM, the server has a power usage around 315-320W.

    Single Core: 544
    Multi Core: 4139
    Full Test: https://browser.geekbench.com/v5/cpu/22231672

    Most of the cores that were loaded by the stress test seem to be at around 2.43 GHz.

    120 threads loaded
    Upon loading the test VM, the server has a power usage around 315-320W.

    Single Core: 536
    Multi Core: 3639
    Full Test: https://browser.geekbench.com/v5/cpu/22231683

    Most of the cores that were loaded by the stress test seem to be at around 2.39 GHz.

    Conclusion
    As you can see from the testing above, at the half way point where 50% of the CPU is utilized, we can see a 50% reduction in the CPU single core and multi core values in the 8 core VM running GB5 tests. Despite almost 50% of the CPU being free and 64 threads (64 vCPU cores) literally sitting unutilized, we still see low results. There's a dramatic drop in clock speed to 2.45 GHz, and it seems to stay around there as more cores are loaded, which is probably the culprit behind the lower Geekbench results.

  • edited May 31

    @Advin hows this configured? Threads bound to VMs or big scheduler party? Not that it likely matters. The whole thing seems more like thermal throttling or hitting a power consumption limit (or maybe memory bandwidth depending how the GB test is constructed?). Well, at least i hope it's something like that, otherwise i'd call this a pretty shitty CPU. There is little sense in doubled thread counts if the overall performance doesn't increase.

  • remyremy Member
    edited May 31

    Thanks for these numbers
    It would be interesting to compare with pinned cpu cores.
    Where the impact should be much less significant. Since context switching wouldn't be as frequent. (only thermal throttling / memory bandwidth should have impact)
    But there must be a reason why Netcup doesn't do it...

    Surprisingly, I've never seen such a drop in performance with the previous generation.

  • edited May 31

    @remy said:
    Surprisingly, I've never seen such a drop in performance with the previous generation.

    Maybe it's a microcode "update"? Old x6xx Xeons used to be theoretically able to simultaneously turbo boost on all cores, it was just disabled in (microcode) software. There's some models where "faulty" microcode is floating around that allows unlocking it and it's pretty crazy. If i remember correctly they are still bound by a power limit though, so even allowing boost on all cores still sees some throttling regardless of the amount of cooling applied. Modern CPUs often come with a stupid amount of artificial crippling.

    Thanked by 1remy
  • AdvinAdvin Member, Patron Provider
    edited May 31

    @totally_not_banned said:
    @Advin hows this configured? Threads bound to VMs or big scheduler party? Not that it likely matters. The whole thing seems more like thermal throttling or hitting a power consumption limit (or maybe memory bandwidth depending how the GB test is constructed?). Well, at least i hope it's something like that, otherwise i'd call this a pretty shitty CPU. There is little sense in doubled thread counts if the overall performance doesn't increase.

    Scheduler party :D I can assure you that the CPU was not being thermal or power throttled manually, but the EPYC 7702 generally is a low-power 64 core SKU.

    Keep in mind that the CPU has 128 threads. 64 of those threads are just a result of hyperthreading, so anything past 50% CPU usage will generally result in a bigger performance loss from what I'm aware of. I've sometimes seen CPU steal even though there's still 30-40% of CPU left, that's why I generally like keeping my nodes under 50% usage.

    I'll maybe redo the testing at some point with EPYC Milan since it can sustain clocks better with higher default TDP, I'll also experiment with distributing the load and pinning CPU cores.

    Thanked by 1remy
  • AdvinAdvin Member, Patron Provider
    edited May 31

    Also, keep in mind that Netcup is likely facing the Quilibrium problem, meaning that I wouldn't be surprised if their nodes were sustaining large amounts of vCPU from VMs.

    In our shared environments, even with overallocated vCPU cores, we generally see an average load of 40-50%.

    At Netcup, I wouldn't be surprised if they were generally lower, especially since they don't overallocate on vCPU (supposedly), leading to better clock speeds (if it weren't for Quilibrium).

  • MoopahMoopah Member

    @Advin I'm curious, were both workloads pinned to their own independent group of cores (within same CCX for the Geekbench)?
    I wonder if NUMA configuration or cross-CCX latency across the Infinity Fabric is affecting the performance as well.

    I also see some claims they people were able to hit close to boost clock on all cores even with AVX workloads on Epyc Rome when they hit their max 200W TDP.

    Seems like in your test, it seems to only hit 185W of the 200W TDP limit?

  • remyremy Member
    edited May 31

    @Advin said:
    At Netcup, I wouldn't be surprised if they were generally lower, especially since they don't overallocate on vCPU (supposedly), leading to better clock speeds (if it weren't for Quilibrium).

    I doubt it. If that were the case, why not have the CPU cores pinned ?
    Performance would be better.
    This may make things a little more complex. I doubt that's the only reason... o:)

  • edited May 31

    @Advin said:
    thermal or power throttled manually

    To my best knowledge those kind of "features" are sadly pretty much universally hardcoded these days. The thermal/power management done by the OS/BIOS is mostly for just tightening further (well, outside of a couple intended or unintended knobs sometimes). Good example is GFX cards: One of the biggest overclocking hacks these days is manipulating the power draw sensors.

    but the EPYC 7702 generally is a low-power 64 core SKU.

    Ah, that would kind of explain it trying to avoid drawing tons of power.

    Keep in mind that the CPU has 128 threads. 64 of those threads are just a result of hyperthreading, so anything past 50% CPU usage will generally result in a bigger performance loss from what I'm aware of.

    Good point. I would have expected the HT cores to at least add like half the performance of a real core though and not just stagnate. Here it would also make quite a bit of sense to pin threads to specific VMs in my opinion as it would guarantee a fair distribution of real and HT cores. The fact that not doing so allows VMs the draw from currently non-HT'd cores as long as the overall load is low might also be a reason to not do it thereby allowing VMs the get more performance under ideal circumstances.

    I've sometimes seen CPU steal even though there's still 30-40% of CPU left, that's why I generally like keeping my nodes under 50% usage.

    This is pretty much baffling me but then again i know next to nothing about how steal is actually calculated.

    I'll maybe redo the testing at some point with EPYC Milan since it can sustain clocks better with higher default TDP and with pinned CPU cores.

    Cool, it's pretty interesting.

  • zrj766zrj766 Member

    According to this thread, some people are buying VDS/rootserver in large quantities, and the CPU is occupied for a long time. I think these reasons lead to performance degradation and shortage.

  • AdvinAdvin Member, Patron Provider
    edited May 31

    @Moopah said:
    @Advin I'm curious, were both workloads pinned to their own independent group of cores (within same CCX for the Geekbench)?
    I wonder if NUMA configuration or cross-CCX latency across the Infinity Fabric is affecting the performance as well.

    I also see some claims they people were able to hit close to boost clock on all cores even with AVX workloads on Epyc Rome when they hit their max 200W TDP.

    Seems like in your test, it seems to only hit 185W of the 200W TDP limit?

    I just let the Linux scheduler handle everything on stock settings, each Geekbench ran in a VM. Perhaps there could be some performance optimization done there.

    I can't really find any all-core benchmarks for the EPYC 7702, so I can't really compare it. However, I do know that the lower core count EPYC Rome processors can sustain their clocks better, since there is more power for each core.

    AMD only started rating and advertising their all core boost speeds starting from EPYC Genoa, and the Genoa 9634 that Netcup uses downgrades from 3.7 GHz to 3.1 GHz under load.

    The EPYC 7702 is more of a worst case scenario since it's a lower end 64 core chip that focuses simply on power efficiency and core density. Perhaps Milan 7763 or Rome 7742 would be better for a 64 core chip, since there's a larger TDP. I know that my Milan processors will basically hit 280W under half load since they aggressively turbo and try to maintain clocks.

    I don't exactly know why it stuck to 185W, it's something that I should probably look more closely at in further tests. Perhaps it could be because not every core may have been under load, maybe the 7702 just doesn't turbo as aggressively?

  • MoopahMoopah Member

    @zrj766 said:
    According to this thread, some people are buying VDS/rootserver in large quantities, and the CPU is occupied for a long time. I think these reasons lead to performance degradation and shortage.

    :O I find it hard to believe! People are really trying to order hundreds of VDS at once???!!!

    @Advin said:

    @Moopah said:
    @Advin I'm curious, were both workloads pinned to their own independent group of cores (within same CCX for the Geekbench)?
    I wonder if NUMA configuration or cross-CCX latency across the Infinity Fabric is affecting the performance as well.

    I also see some claims they people were able to hit close to boost clock on all cores even with AVX workloads on Epyc Rome when they hit their max 200W TDP.

    Seems like in your test, it seems to only hit 185W of the 200W TDP limit?

    I just let the Linux scheduler handle everything on stock settings, each Geekbench ran in a VM. Perhaps there could be some performance optimization done there.

    I can't really find any all-core benchmarks for the EPYC 7702, so I can't really compare it. However, I do know that the lower core count EPYC Rome processors can sustain their clocks better, since there is more power for each core.

    AMD only started rating and advertising their all core boost speeds starting from EPYC Genoa, and the Genoa 9634 that Netcup uses downgrades from 3.7 GHz to 3.1 GHz under load.

    The EPYC 7702 is more of a worst case scenario since it's a lower end 64 core chip that focuses simply on power efficiency and core density. Perhaps Milan 7763 or Rome 7742 would be better for a 64 core chip, since there's a larger TDP. I know that my Milan processors will basically hit 280W under half load since they aggressively turbo and try to maintain clocks.

    I don't exactly know why it stuck to 185W, it's something that I should probably look more closely at in further tests. Perhaps it could be because not every core may have been under load.

    Yeah, I couldn't find too much benchmarks for the 7702 either unfortunately. I wish EPYC Genoa was cheaper.

    If you look at the Geekbench history for EPYC 9634, you can literally see the NetCup RS G11 score go down each week : https://browser.geekbench.com/search?page=1&q=9634+

  • edited May 31

    @Moopah said:

    @zrj766 said:
    According to this thread, some people are buying VDS/rootserver in large quantities, and the CPU is occupied for a long time. I think these reasons lead to performance degradation and shortage.

    :O I find it hard to believe! People are really trying to order hundreds of VDS at once???!!!

    Well, like 5-10 years back people emptied Hetzner's server auction to mine Monero. I know nothing about Guacamole but maybe it's as lucrative as the Monero farms back then and people tend to do crazy things for money.

    @Moopah said:
    Seems like in your test, it seems to only hit 185W of the 200W TDP limit?

    A rated TDP of 200W doesn't necessarily mean it'll hit exactly that in real life. It might as well be maxing out at 185W.

  • crunchbitscrunchbits Member, Patron Provider, Top Host
    edited May 31

    @Advin said:
    I just let the Linux scheduler handle everything on stock settings, each Geekbench ran in a VM. Perhaps there could be some performance optimization done there.

    I can't really find any all-core benchmarks for the EPYC 7702, so I can't really compare it. However, I do know that the lower core count EPYC Rome processors can sustain their clocks better, since there is more power for each core.

    Accurate, and usually less of a spread between base and boost theoretical max.

    The EPYC 7702 is more of a worst case scenario since it's a lower end 64 core chip that focuses simply on power efficiency and core density. Perhaps Milan 7763 or Rome 7742 would be better for a 64 core chip, since there's a larger TDP. I know that my Milan processors will basically hit 280W under half load since they aggressively turbo and try to maintain clocks.

    I've tested with the custom variants (7B12, 7B13) as well as 7742/7763 and it's definitely an improvement, however you are at max TDP for socket/coolers usually so a lot of it comes down to environmentals. A lot harder to keep that 280W TDP CPU cooled and boosted in a quad-node 2U versus a single chassis 2U with a big hunk of metal sitting on the CPU.

    This is often where the closeted crypto miners get themselves into trouble. Pinned/dedicated or not CPUs, a lot of cloud or multi-node systems are not built assuming 100% load on every core 24/7 mining some memecoin. The miners that (at least here) let us know what they're doing are explained to what they actually need, what it costs, and if we can provide it.

    I don't exactly know why it stuck to 185W, it's something that I should probably look more closely at in further tests. Perhaps it could be because not every core may have been under load, maybe the 7702 just doesn't turbo as aggressively?

    Could have been cooling/headroom related, silicon lottery, etc.

    @totally_not_banned said:
    @Advin hows this configured? Threads bound to VMs or big scheduler party? Not that it likely matters. The whole thing seems more like thermal throttling or hitting a power consumption limit (or maybe memory bandwidth depending how the GB test is constructed?). Well, at least i hope it's something like that, otherwise i'd call this a pretty shitty CPU. There is little sense in doubled thread counts if the overall performance doesn't increase.

    It's not so much a shitty CPU, just that the difference between base and boost clock (2.0GHz to 3.35GHz) is massive. Without the benefit of boosting, it's doing exactly what you'd expect from a ~2GHz Zen2 core. The more static load, the closer you get to always being @ base clock. I don't know the exact crossover (and this does depend on cooling, silicon, etc) but I'd say @Advin tests mirror closely what I saw. The drop-off is significant after ~50-60% of the actual physical cores are loaded up. Ignore HT, it definitely doesn't 'double' the cores/perf in the remotest sense, especially on synthetic benchmarks designed to load a system up. It's more usable in real-world scenarios though, with true varied workloads.

  • MoopahMoopah Member

    @totally_not_banned said:

    @Moopah said:

    @zrj766 said:
    According to this thread, some people are buying VDS/rootserver in large quantities, and the CPU is occupied for a long time. I think these reasons lead to performance degradation and shortage.

    :O I find it hard to believe! People are really trying to order hundreds of VDS at once???!!!

    Well, like 5-10 years back people emptied Hetzner's server auction to mine Monero. I know nothing about Guacamole but maybe it's as lucrative as the Monero farms back then and people tend to do crazy things for money.

    @Moopah said:
    Seems like in your test, it seems to only hit 185W of the 200W TDP limit?

    A rated TDP of 200W doesn't necessarily mean it'll hit exactly that in real life. It might as well be maxing out at 185W.

    I believe the threshold TDP measured and used by CPU governors is something that can be hit given the proper BIOS and environmental configuration as long as the CPU itself doesn't internally downclock prior to that threshold.

    Thanked by 1totally_not_banned
  • artxsartxs Member

    @Moopah said:
    :O I find it hard to believe! People are really trying to order hundreds of VDS at once???!!!

    that other guy was also buying 453 VMs so maybe he stole your chicken power?

  • edited May 31

    @crunchbits said:
    It's not so much a shitty CPU, just that the difference between base and boost clock (2.0GHz to 3.35GHz) is massive. Without the benefit of boosting, it's doing exactly what you'd expect from a ~2GHz Zen2 core.

    Yeah, when i wrote that i didn't know yet that it was supposed to be low power.

    The more static load, the closer you get to always being @ base clock.

    Well, ignoring the low power aspect for a second that's still not overly impressive given (hacked) Xeons managed to boost on like at least half their cores while keeping base on the others.

    Ignore HT, it definitely doesn't 'double' the cores/perf

    Sure, like i've said above, i didn't expect any doubling but like a 50% increase would have been nice or, well, at least not total stagnation. What's HT good for if it doesn't add any performance at all (again ignoring the likelyhood of the CPU just hitting it's power target - well kind of... why not simply scale down if the power target prevents fully using the hardware anyways?).

  • MoopahMoopah Member
    edited May 31

    @totally_not_banned said:

    @crunchbits said:
    It's not so much a shitty CPU, just that the difference between base and boost clock (2.0GHz to 3.35GHz) is massive. Without the benefit of boosting, it's doing exactly what you'd expect from a ~2GHz Zen2 core.

    Yeah, when i wrote that i didn't know yet that it was supposed to be low power.

    The more static load, the closer you get to always being @ base clock.

    Well, ignoring the low power aspect for a second that's still not overly impressive given (hacked) Xeons managed to boost on like at least half their cores while keeping base on the others.

    Ignore HT, it definitely doesn't 'double' the cores/perf

    Sure, like i've said above, i didn't expect any doubling but like a 50% increase would have been nice or, well, at least not total stagnation. What's HT good for if it doesn't add any performance at all (again ignoring the likelyhood of the CPU just hitting it's power target - well kind of... if the power targets prevents fully using the hardware, why not simply scale down?).

    HT/SMT is heavily dependent on the workload with some use cases where there is no performance difference (or even decreased) whether SMT is enabled or disabled. It's quite suitable for varied workloads like VM hosting. I run a lot of uniform workloads on Ryzen's and EPYCs with little performance benefits with SMT enabled (0-5% increase).

  • dev_vpsdev_vps Member

    @Advin

    May I request you to run similar load test with HT turned off in bios?

    Thanks and much appreciated

    Thanked by 1totally_not_banned
  • dev_vpsdev_vps Member

    @Moopah said:

    HT/SMT is heavily dependent on the workload with some use cases where there is no performance difference (or even decreased) whether SMT is enabled or disabled. It's quite suitable for varied workloads like VM hosting. I run a lot of uniform workloads on Ryzen's and EPYCs with little performance benefits with SMT enabled (0-5% increase).

    The CPU cache is a critical resource for performance. When multiple threads run on the same core, they share the same cache, which can lead to frequent cache evictions and cache misses. This can significantly increase the latency for memory accesses and degrade performance.

    This is clearly evident when active thread load > 64 on cpu with 64 cores (128 HT)

  • crunchbitscrunchbits Member, Patron Provider, Top Host
    edited May 31

    Yeah, when i wrote that i didn't know yet that it was supposed to be low power.

    Makes sense, it's more of just a standard with AMD since you're getting a lot of cores per watt. Not necessarily low power, unless you need to use them all at the same time. Kind of meta to this whole discussion if you really think about it :D

    Well, ignoring the low power aspect for a second that's still not overly impressive given (hacked) Xeons managed to boost on like at least half their cores while keeping base on the others.

    I do think it's impressive when you consider ~200W for 64 cores in a single socket (especially at 2019 launch date) where even two beefy Xeons E5/G1 Scalable CPUs weren't getting to 64 cores while being double the wattage. It just comes down to use case. The EPYC will still boost well on half the cores (32 of 64 was still pretty close to unloaded score) but the further you push it the quicker it diminishes.

    Sure, like i've said above, i didn't expect any doubling but like a 50% increase would have been nice or, well, at least not total stagnation. What's HT good for if it doesn't add any performance at all (again ignoring the likelyhood of the CPU just hitting it's power target - well kind of... why not simply scale down if the power target prevents fully using the hardware anyways?).

    @Moopah explained it really well. I thought of a few examples but they all sucked, HT is just kind of hard to benchmark with the standard tools. They're not meant for that. It's just giving you the ability to use the 'unused' parts of a core (per cycle) by emulating 1 core as 2. Obviously if you're trying to do the same computations it'll have to wait, but it can parallelize a lot of things effectively. Plenty of use cases (especially in heavy compute) where you don't want the overhead (or SMT is actually detrimental) and turn it off, though.

  • vpn2024vpn2024 Member
    edited May 31

    Curious, what's special about 453?

    And quilibrium is basically like AWS? I buy some coins with some greenbacks, exchange them for some quota of compute/storage, and somehow the quilibrium networks deploy my code on some nasty lowgrade phpfriends or netcup oversold node, and Moopah makes a decent profit? Is that the essence of it? Anything standout or just another shitcoin with typical discord group hyping it?

    Thanked by 1bdl
  • edited May 31

    @crunchbits said:

    Well, ignoring the low power aspect for a second that's still not overly impressive given (hacked) Xeons managed to boost on like at least half their cores while keeping base on the others.

    I do think it's impressive when you consider ~200W for 64 cores in a single socket (especially at 2019 launch date) where even two beefy Xeons E5/G1 Scalable CPUs weren't getting to 64 cores while being double the wattage. It just comes down to use case. The EPYC will still boost well on half the cores (32 of 64 was still pretty close to unloaded score) but the further you push it the quicker it diminishes.

    Sure, it's kind of an achievement. Still in my opinion it's somewhat underwhelming. There's a lot of cores that don't do much once you start using them. Maybe they could do 1024 cores next. If one tries to actually use them all they degrade to C64 performance and run like a 15 year old desktop PC but the huge number will look nice on paper ;)

  • MoopahMoopah Member
    edited May 31

    @vpn2024 said:
    Curious, what's special about 453?

    And quilibrium is basically like AWS? I buy some coins with some greenbacks, exchange them for some quota of compute/storage, and somehow the quilibrium networks deploy my code on some nasty lowgrade phpfriends or netcup oversold node, and Moopah makes a decent profit? Is that the essence of it? Anything standout or just another shitcoin with typical discord group hyping it?

    Ordering exactly 453 instances of VDS nodes at one time against a single hosting provider with unrestricted automated provisioning is very important.

    My research (reading LET threads) suggests this number is the most optimal for generating high quantities DramaState™ on LET.

    My Grafana and Prometheus metrics charts is showing that the level of DramaState™ on LET is unstable and needs to be maintained at a higher healthy level.

    DramaState mining on LET is a extremely compute intensive workload.

    Thanked by 2iKeyZ coolice
  • MoopahMoopah Member
    edited May 31

    @totally_not_banned said:

    @crunchbits said:

    Well, ignoring the low power aspect for a second that's still not overly impressive given (hacked) Xeons managed to boost on like at least half their cores while keeping base on the others.

    I do think it's impressive when you consider ~200W for 64 cores in a single socket (especially at 2019 launch date) where even two beefy Xeons E5/G1 Scalable CPUs weren't getting to 64 cores while being double the wattage. It just comes down to use case. The EPYC will still boost well on half the cores (32 of 64 was still pretty close to unloaded score) but the further you push it the quicker it diminishes.

    Sure, it's kind of an achievement. Still in my opinion it's somewhat underwhelming. There's a lot of cores that don't do much once you start using them. Maybe they could do 1024 cores next. If one tries to actually use them all they degrade to C64 performance and run like a 15 year old desktop PC but the huge number will look nice on paper ;)

    Technically AMD advertises cores and threads separate and doesn't market their threads as actual cores. It's primarily in the hosting space where vCores and vCPUs are used instead instead of "threads" that lead to confusion on the true performance of a dedicated "core" due to 2 vCPUs contending on a single physical core and sharing cache.

    Thanked by 1emgh
  • dev_vpsdev_vps Member

    That is why it is recommended to pin both the HT to the VDS for optimal performance

    Just like @crunchbits VDS

  • MoopahMoopah Member
    edited May 31

    @dev_vps said:
    That is why it is recommended to pin both the HT to the VDS for optimal performance

    Just like @crunchbits VDS

    But that means you can't make more $$$ by selling each pinned thread as a dedicated Core

Sign In or Register to comment.