The IncogNET thread - Discussion, news and updates.

Nekopara · February 28

@oloke said:

same here

iriska · February 28

Same issue on my end.

photo:

ServerBachelor · February 28

@iriska said:
Same issue on my end.

photo:

I don't have uptime monitoring software installed on my BG VM, but info about the server is not resolving in the control panel, so I can only assume my VM is undergoing similar ups and downs.

MatthewM · February 28

@iriska said:
Same issue on my end.

photo:

Over 5 hours of downtime today (In total) which started way before MannDude logged off LET today and left in the dark on what causing it.

Nekopara · March 1

BG is finally back up 🥳

ServerBachelor · March 1

@Nekopara said:
BG is finally back up 🥳

Likewise.

Steal consistently below 1% for me in BG.

Steal usually around ~3% with spikes up to 30% in SE.

tentor · March 1

If any of you using Prometheus, I highly recommend combination of vmagent + Prometheus to collect system metrics locally on same VPS and then push to Prometheus, see docs, very useful to figure out what's happening with the system when network is offline temporarily

Nekopara · March 1

just got this

JohnFilch123 · March 1

Incognet is prem, nothing else to add.

ServerBachelor · March 1

@JohnFilch123 said:
Incognet is prem, nothing else to add.

Yup, @MannDude make Incognet one of my favorite providers.

MannDude · March 1

Ok, sorry about the Bulgaria incident. This falls on me since the weekends is generally my time to keep an eye on things.

Got the downtime alert but overlooked it by mistake. Been moving to a new home and between packing, going back and forth across town, and other personal life stuff didn't notice until I sat down at my desk and saw the tickets. That is 100% on me.

I've updated how alerts are delivered so they spam us every 30 minutes until resolved. Also working on some better monitoring for individual VM resource consumption since a few VMs in Sweden still like to jump up and use a metric shit-ton of CPU for an amount of time that would be considered abuse. VirtFusion is great but it really lacks internal monitoring like Virtualizor has. Going to review this in more detail ( https://github.com/noxitylabs/virtfusion-cpu-abuse-detector ) and see about some alert delivery via webhooks (Slack, Telegram, SMS, etc)

Anyhow, credit (+1 week) has been applied to everyone already on the impacted hypervisor today. I think there was 3 or 4 people with lifetime plans on this hypervisor, I've added +256MB RAM to your VPS but you need to reboot for it to take effect.

We don't actually have or advertise an SLA of any sort, but when stuff like this happens, especially when things are basically "our fault" (as the case for the slow response today), it only seems fair to compensate.

buggedout · March 1

@MannDude said:

We don't actually have or advertise an SLA of any sort, but when stuff like this happens, especially when things are basically "our fault" (as the case for the slow response today), it only seems fair to compensate.

You should advertise an SLA! You did compensate for "our fault case" and that is what SLA is about. Many providers just falsely promise SLA and hide in details that you should open a ticket within 3 days to get compensation on the other hand you are fairly compensating users, you should definitely advertise your pros!

ServerBachelor · March 1

@MannDude said:
Ok, sorry about the Bulgaria incident. This falls on me since the weekends is generally my time to keep an eye on things.

Got the downtime alert but overlooked it by mistake. Been moving to a new home and between packing, going back and forth across town, and other personal life stuff didn't notice until I sat down at my desk and saw the tickets. That is 100% on me.

I've updated how alerts are delivered so they spam us every 30 minutes until resolved. Also working on some better monitoring for individual VM resource consumption since a few VMs in Sweden still like to jump up and use a metric shit-ton of CPU for an amount of time that would be considered abuse. VirtFusion is great but it really lacks internal monitoring like Virtualizor has. Going to review this in more detail ( https://github.com/noxitylabs/virtfusion-cpu-abuse-detector ) and see about some alert delivery via webhooks (Slack, Telegram, SMS, etc)

Anyhow, credit (+1 week) has been applied to everyone already on the impacted hypervisor today. I think there was 3 or 4 people with lifetime plans on this hypervisor, I've added +256MB RAM to your VPS but you need to reboot for it to take effect.

Thanks, I see the free week in BG.

As compensation for the hypervisor issue in Sweden, support offered 512 MB extra RAM on my lifetime plan per ticket #0217Y55Y7. I'm mentioning this again since extra RAM was already applied for Bulgaria lifetime plans, but I understand if persistent hypervisor issues mean that RAM cannot be applied at this moment.

We don't actually have or advertise an SLA of any sort, but when stuff like this happens, especially when things are basically "our fault" (as the case for the slow response today), it only seems fair to compensate.

Incognet has been consistent about this and I commend you and the team for that.

forest · March 1

@MannDude Your SE node is still suffering extremely high steal. I'm getting a consistent 35-40% right now (my CPU is 50-60% idle time). Could you look into it? It's not impacting me very heavily because I'm not CPU bound currently, but it should still be addressed.

@ServerBachelor said: As compensation for the hypervisor issue in Sweden, support offered 512 MB extra RAM on my lifetime plan per ticket #0217Y55Y7. I'm mentioning this again since extra RAM was already applied for Bulgaria lifetime plans, but I understand if persistent hypervisor issues mean that RAM cannot be applied at this moment.

I already got some free extra RAM for my lifetime plan after running into severe swapping, so it would be unfair for me to ask for even more. I'll just ask for an extension to my lifetime plan so I can use it in the afterlife too.

Radi · March 1

I have 2 BG lifetime VPS, didn't get RAM upgrade, but didn't ask for it either. I am running Adguard Home on both of them for my family and it's more than happy with what resources it got.

ServerBachelor · March 1

@forest said:
@MannDude Your SE node is still suffering extremely high steal. I'm getting a consistent 35-40% right now (my CPU is 50-60% idle time). Could you look into it? It's not impacting me very heavily because I'm not CPU bound currently, but it should still be addressed.

@ServerBachelor said: As compensation for the hypervisor issue in Sweden, support offered 512 MB extra RAM on my lifetime plan per ticket #0217Y55Y7. I'm mentioning this again since extra RAM was already applied for Bulgaria lifetime plans, but I understand if persistent hypervisor issues mean that RAM cannot be applied at this moment.

Yeah my steal in SE has occasionally spiked to around 30%, once to 52%, when I ran top and watched for a little while.

I already got some free extra RAM for my lifetime plan after running into severe swapping, so it would be unfair for me to ask for even more. I'll just ask for an extension to my lifetime plan so I can use it in the afterlife too.

Good idea

I figure that @MannDude is keeping track of who was affected by the Stockholm hypervisor issue, but as I said before, he has already applied free RAM to lifetime plans in both SE and BG, and I am still waiting on the free +512 MB offered by support on February 17th.

MatthewM · March 2

Still unstable in BG. Getting multiple downtime notifications per day still.

Nekopara · March 2

feels like these are more hardware related issues

MatthewM · March 2

@Nekopara said:
feels like these are more hardware related issues

@MannDude

ChrisMiller · March 2

@Nekopara said: feels like these are more hardware related issues

I'm thinking the same I was talking to @MannDude about it. He just moved houses yesterday/today so might delay it being fully investigated.

ServerBachelor · March 3

SSH refused in Stockholm again. No issues in Sofia.

Michal212 · March 3

@MannDude Would it be possible to add the option to change the domain I manage in DNS Management?

ServerBachelor · March 3

@ServerBachelor said:
SSH refused in Stockholm again. No issues in Sofia.

SSH is still being refused in Stockholm. Anyone else have updates on their SE VMs?

zGato · March 3

@ServerBachelor said:

@ServerBachelor said:
SSH refused in Stockholm again. No issues in Sofia.

SSH is still being refused in Stockholm. Anyone else have updates on their SE VMs?

SSH is working fine here.

MannDude · March 3

Yeah, got moved and office moved. Still no internet (or air conditioning, in hot and humid Asia!) yet but I'm sweating it out for y'all on a mobile hotspot with a newly purchased fan blowing on me for the time being. At least my chair is one of those mesh back ones so I'm not sticking to it like those leather/fake-leather ones. Sadly, no one moves fast or with any sort of urgency here. Such is life. Should be settled in and comfortable in the next few days however.

Reviewing the Sweden stuff. I ran some tests on two project VMs on the same hypervisor. Even with two different VMs running things like GB6 benchmark tests at the same time (One VM has 1 vCPU, one has 4 vCPU) I saw no unusual or unexpected spikes in the CPU load of the hypervisor. Those measuring things like CPU steal, feel free to confirm that things didn't go crazy when purposely stressing two VMs. (Yabs results below for timestamps)

The issue everyone is experiencing is when the hypervisor is hitting triple digit CPU loads. When we receive these high load alerts I'll SSH in and take a look, and it's always a single VM as the culprit. (A single VM, but not always the same VM). This is important to note because I'm unable to replicate this when purposely using 100% of the two test VMs as described above. In fact, some of the "culprit VMs" belonged to some of the ones in here complaining, but I don't think they were actually doing anything malicious. Just pointing it out that it seems almost random.

There is no other measurable metrics spiking at the same time as the CPU load. No disk IO spikes. No network spikes. I'll check out what VM it is and what resources are assigned to it, and it's almost always a single or double core VPS. Any ideas here? There are plenty of available CPU resources. While these aren't dedicated CPU cores we've never really loaded things down to the point where individuals couldn't use 100% of their CPU for prolonged periods. In fact, I've never had to do CPU throttling on a VPS until recently. The option has always been there, but had never had to utilize the feature.

Anyhow, the simultaneous benchmarks for transparency. This is on a "full" and capped hypervisor, which was capped at least one or two weeks before any issues began so it's not like this was a result of new VM creation.

# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
#              Yet-Another-Bench-Script              #
#                     v2025-04-20                    #
# https://github.com/masonr/yet-another-bench-script #
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #

Tue Mar  3 03:25:33 PM GMT 2026

Basic System Information:
---------------------------------
Uptime     : 4 days, 19 hours, 47 minutes
Processor  : AMD EPYC 7402 24-Core Processor
CPU cores  : 4 @ 2794.748 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 3.8 GiB
Swap       : 2.0 GiB
Disk       : 49.2 GiB
Distro     : Debian GNU/Linux 13 (trixie)
Kernel     : 6.17.9-x64v3-xanmod1
VM Type    : KVM
IPv4/IPv6  : ✔ Online / ✔ Online

IPv6 Network Information:
---------------------------------
ISP        : IncogNet LLC
ASN        : AS40663 IncogNet LLC
Host       : Incognet LLC
Location   : Stockholm, Stockholm County (AB)
Country    : Sweden

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda3):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 284.95 MB/s  (71.2k) | 1.38 GB/s    (21.7k)
Write      | 285.70 MB/s  (71.4k) | 1.39 GB/s    (21.8k)
Total      | 570.65 MB/s (142.6k) | 2.78 GB/s    (43.5k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 1.54 GB/s     (3.0k) | 1.88 GB/s     (1.8k)
Write      | 1.63 GB/s     (3.1k) | 2.00 GB/s     (1.9k)
Total      | 3.18 GB/s     (6.2k) | 3.89 GB/s     (3.8k)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
-----           | -----                     | ----            | ----            | ----           
Clouvider       | London, UK (10G)          | 2.70 Gbits/sec  | 2.18 Gbits/sec  | 24.6 ms        
Eranium         | Amsterdam, NL (100G)      | 2.71 Gbits/sec  | 2.21 Gbits/sec  | 24.0 ms        
Uztelecom       | Tashkent, UZ (10G)        | busy            | 2.07 Gbits/sec  | 73.7 ms        
Leaseweb        | Singapore, SG (10G)       | 486 Mbits/sec   | 15.4 Mbits/sec  | 286 ms         
Clouvider       | Los Angeles, CA, US (10G) | 1.05 Gbits/sec  | 592 Mbits/sec   | 148 ms         
Leaseweb        | NYC, NY, US (10G)         | 1.59 Gbits/sec  | 1.85 Gbits/sec  | 94.1 ms        
Edgoo           | Sao Paulo, BR (1G)        | 650 Mbits/sec   | 1.13 Gbits/sec  | 220 ms         

iperf3 Network Speed Tests (IPv6):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
-----           | -----                     | ----            | ----            | ----           
Clouvider       | London, UK (10G)          | 2.70 Gbits/sec  | 2.16 Gbits/sec  | 24.6 ms        
Eranium         | Amsterdam, NL (100G)      | 2.71 Gbits/sec  | 2.21 Gbits/sec  | 24.2 ms        
Uztelecom       | Tashkent, UZ (10G)        | 2.25 Gbits/sec  | 1.98 Gbits/sec  | 73.3 ms        
Leaseweb        | Singapore, SG (10G)       | 591 Mbits/sec   | 657 Mbits/sec   | 252 ms         
Clouvider       | Los Angeles, CA, US (10G) | 1.07 Gbits/sec  | 524 Mbits/sec   | 148 ms         
Leaseweb        | NYC, NY, US (10G)         | 1.82 Gbits/sec  | 1.77 Gbits/sec  | 94.1 ms        
Edgoo           | Sao Paulo, BR (1G)        | 664 Mbits/sec   | 672 Mbits/sec   | 220 ms         

Geekbench 6 Benchmark Test:
---------------------------------
Test            | Value                         
                |                               
Single Core     | 937                           
Multi Core      | 2771                          
Full Test       | https://browser.geekbench.com/v6/cpu/16844921

YABS completed in 16 min 7 sec

# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
#              Yet-Another-Bench-Script              #
#                     v2025-04-20                    #
# https://github.com/masonr/yet-another-bench-script #
# ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #

Tue Mar  3 03:25:24 PM GMT 2026

Basic System Information:
---------------------------------
Uptime     : 0 days, 0 hours, 18 minutes
Processor  : AMD EPYC 7402 24-Core Processor
CPU cores  : 1 @ 2794.748 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 1.9 GiB
Swap       : 1.5 GiB
Disk       : 29.5 GiB
Distro     : Debian GNU/Linux 13 (trixie)
Kernel     : 6.17.9-x64v3-xanmod1
VM Type    : KVM
IPv4/IPv6  : ✔ Online / ✔ Online

IPv6 Network Information:
---------------------------------
ISP        : IncogNet LLC
ASN        : AS40663 IncogNet LLC
Host       : Incognet LLC
Location   : Stockholm, Stockholm County (AB)
Country    : Sweden

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda3):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 188.09 MB/s  (47.0k) | 1.59 GB/s    (24.8k)
Write      | 188.58 MB/s  (47.1k) | 1.60 GB/s    (25.0k)
Total      | 376.67 MB/s  (94.1k) | 3.19 GB/s    (49.9k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 1.94 GB/s     (3.7k) | 1.76 GB/s     (1.7k)
Write      | 2.04 GB/s     (3.9k) | 1.88 GB/s     (1.8k)
Total      | 3.98 GB/s     (7.7k) | 3.64 GB/s     (3.5k)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
-----           | -----                     | ----            | ----            | ----           
Clouvider       | London, UK (10G)          | 1.63 Gbits/sec  | 1.34 Gbits/sec  | 24.9 ms        
Eranium         | Amsterdam, NL (100G)      | 1.63 Gbits/sec  | 1.34 Gbits/sec  | 24.0 ms        
Uztelecom       | Tashkent, UZ (10G)        | busy            | busy            | 72.4 ms        
Leaseweb        | Singapore, SG (10G)       | 487 Mbits/sec   | 12.3 Mbits/sec  | 282 ms         
Clouvider       | Los Angeles, CA, US (10G) | 1.05 Gbits/sec  | 719 Mbits/sec   | 148 ms         
Leaseweb        | NYC, NY, US (10G)         | 1.51 Gbits/sec  | 948 Mbits/sec   | 95.7 ms        
Edgoo           | Sao Paulo, BR (1G)        | 651 Mbits/sec   | 699 Mbits/sec   | 220 ms         

iperf3 Network Speed Tests (IPv6):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
-----           | -----                     | ----            | ----            | ----           
Clouvider       | London, UK (10G)          | 1.63 Gbits/sec  | 1.35 Gbits/sec  | 25.0 ms        
Eranium         | Amsterdam, NL (100G)      | 1.63 Gbits/sec  | 1.32 Gbits/sec  | 24.0 ms        
Uztelecom       | Tashkent, UZ (10G)        | 1.56 Gbits/sec  | busy            | 72.0 ms        
Leaseweb        | Singapore, SG (10G)       | 589 Mbits/sec   | 615 Mbits/sec   | 246 ms         
Clouvider       | Los Angeles, CA, US (10G) | 1.06 Gbits/sec  | 563 Mbits/sec   | 148 ms         
Leaseweb        | NYC, NY, US (10G)         | 1.54 Gbits/sec  | busy            | 94.0 ms        
Edgoo           | Sao Paulo, BR (1G)        | 658 Mbits/sec   | 812 Mbits/sec   | 219 ms         

Geekbench 6 Benchmark Test:
---------------------------------
Test            | Value                         
                |                               
Single Core     | 1060                          
Multi Core      | 1054                          
Full Test       | https://browser.geekbench.com/v6/cpu/16844956

YABS completed in 18 min 54 sec

So, for Sweden, what to do? Well, we're out of stock and have no more hardware in that POP at the moment so will try to get some new hardware online. That doesn't actually fix anything, just allows for a transferable location for those who seek it. Will continue to review things but from my side I've not seen any spikes or grave urgent alerts in the last 12+ hours.

RE: Bulgaria. Looking into it now.

ServerBachelor · March 3

@MannDude said:
Yeah, got moved and office moved. Still no internet (or air conditioning, in hot and humid Asia!) yet but I'm sweating it out for y'all on a mobile hotspot with a newly purchased fan blowing on me for the time being. At least my chair is one of those mesh back ones so I'm not sticking to it like those leather/fake-leather ones. Sadly, no one moves fast or with any sort of urgency here. Such is life. Should be settled in and comfortable in the next few days however.

Glad the move is going well. Most of the chairs I've had were made of wood, so I have no idea what kinds of QOL improvements I might see if I switched to mesh etc

So, for Sweden, what to do? Well, we're out of stock and have no more hardware in that POP at the moment so will try to get some new hardware online. That doesn't actually fix anything, just allows for a transferable location for those who seek it. Will continue to review things but from my side I've not seen any spikes or grave urgent alerts in the last 12+ hours.

RE: Bulgaria. Looking into it now.

Take your time. We appreciate your commitment to the business and to communicating with us.

forest · March 3

@MannDude said: There is no other measurable metrics spiking at the same time as the CPU load. No disk IO spikes. No network spikes. I'll check out what VM it is and what resources are assigned to it, and it's almost always a single or double core VPS. Any ideas here? There are plenty of available CPU resources. While these aren't dedicated CPU cores we've never really loaded things down to the point where individuals couldn't use 100% of their CPU for prolonged periods. In fact, I've never had to do CPU throttling on a VPS until recently. The option has always been there, but had never had to utilize the feature.

Can you post some detailed usage metrics? Something like each QEMU PID, the core(s) it is currently running on, context switches per second, and CPU usage would be very valuable.

Another possible issue is a misconfiguration that's increasing overhead, for example not using virtio_net or not using vhost_net. You might also want to tweak the scheduler settings that determine how aggressively processes are migrated across cores.

slowservers · March 4

I assume there's a qemu KVM process per VM?

What's the process state on the offenders? Is it running or blocked on disk I/O? (D state)

I know you said there's no measurable I/O or other load at the time, but I am curious if I/O is blocked or something for a little bit.

What does your disk setup look like?

Do you have systat / sar logs? Those can be very handy.

ServerBachelor · March 4

@MannDude said:
You can stay in Sweden if you'd like, or we can move you to Bulgaria if you'd prefer. Up to you.

Based on this response to @forest, would it be possible to move my 2 GB VM (Currently in Stockholm, SE) to KCMO, USA? Liberty Lake is also acceptable.

MatthewM · March 5

Looks like the downtime has returned..... 5 different outages today in BG.

Howdy, Stranger!

Categories

In this Discussion

The IncogNET thread - Discussion, news and updates.

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

The IncogNET thread - Discussion, news and updates.

Comments