Clouvider VPS Dropping Traffic and Connection Timeouts constantly

tavii · February 2025

Hi, ever since I bought my first Clouvider VPS I've had issues with connection timeouts and dropped traffic. I asked @Clouvider to look into it and they said everything is fine.

Between 2%-7% of outgoing DNS requests get no replies and outgoing HTTP requests periodically get an exact 5 second response time added to them or they just get timed out completely.

I moved my services from AWS where I never had these kind of problems. The VPS is hosted in Manchester and it's the cheapest one they offer.

I suspect it is their DDoS protection causing all of this but they denied it. I moved to Google's DoH to fix the DNS problem temporarily (works maybe cause higher timeout or it just doesn't drop that traffic?). I also haven't done anything to Ubuntu/Kernel.

I really don't know what to do anymore cause this is unusable and I prepaid for it, I wanted it to be good..

tavii · February 2025

@Clouvider any idea what it could be?

Clouvider · February 2025

@tavii said:
@Clouvider any idea what it could be?

We don’t provide support through the forum. If you can demonstrate the evidence of the issue within our network, please work with the support on a resolution.

tavii · February 2025

@Clouvider said:

@tavii said:
@Clouvider any idea what it could be?

We don’t provide support through the forum. If you can demonstrate the evidence of the issue within our network, please work with the support on a resolution.

Okay I will try to make a ticket again. Thanks

Clouvider · February 2025

@tavii said:

@Clouvider said:

@tavii said:
@Clouvider any idea what it could be?

We don’t provide support through the forum. If you can demonstrate the evidence of the issue within our network, please work with the support on a resolution.

Okay I will try to make a ticket again. Thanks

👍
Once raised, please DM me the ticket ID, I will monitor it for you 😉.

Arirang · February 2025

How often do you make a http connection?

jsg · February 2025

@Clouvider said:

@tavii said:
@Clouvider any idea what it could be?

We don’t provide support through the forum. If you can demonstrate the evidence of the issue within our network, please work with the support on a resolution.

OK. But you might want - in your own interest - offer us a statement on the matter. After all you to a large degree live from you "very good network" reputation.

Unless, of course, OP is not day-dreaming and there actually are some problems. In that case it might seem wiser to just block questions ...

@tavii

Between 2%-7% of outgoing DNS requests get no replies

That in my eyes (a) is not necessarily to do with Clouvider, and (b) is not a live or die issue but rather an annoying but not critical one

and outgoing HTTP requests periodically get an exact 5 second response time added to them or they just get timed out completely.

That probably is concerning and it would serve Clouvider well to either prove that it's not their fault or to investigate and solve the problem, and quickly.

tavii · February 2025

@jsg Do you know any tool I could use to troubleshoot this?

This is what I got for now, my service sends a POST request every about 20 seconds to port 18080. Some requests take 5 seconds longer than normal and some completely timeout (possibly a 73 second timeout).

I also tested it with curl doing different GET/POST, HTTP and HTTPS requests to different websites and the exact same behaviour. I uploaded some screenshots below:

https://imgur.com/a/ZGEYA3w

Again, my services work completely fine everywhere else including AWS.

tavii · February 2025

@Arirang said:
How often do you make a http connection?

Doesn't seem to matter but currently it's every 20 seconds

amarc · February 2025

So you did not take into account that "service at port 18080" you are making POST request to is "glitching" ?

Edit:

I did not see whole imgur page.. So, how does your /etc/resolv look like ? Did you try to curl to IP instead DNS name to isolate DNS resolvers as issue ?

tavii · February 2025

@amarc said:
So you did not take into account that "service at port 18080" you are making POST request to is "glitching" ?

Edit:

I did not see whole imgur page.. So, how does your /etc/resolv look like ? Did you try to curl to IP instead DNS name to isolate DNS resolvers as issue ?

I did curl http://54.84.170.143/user-agent and got the same behaviour:

root@Alba:~/dnspyre# time curl http://54.84.170.143/user-agent
{
"user-agent": "curl/8.5.0"
}
real 0m0.177s
user 0m0.008s
sys 0m0.006s
root@Alba:~/dnspyre# time curl http://54.84.170.143/user-agent
curl: (28) Failed to connect to 54.84.170.143 port 80 after 135992 ms: Couldn't connect to server
real 2m16.021s
user 0m0.020s
sys 0m0.020s

/etc/resolv.conf looks like this:
root@Alba:~# cat /etc/resolv.conf nameserver 8.8.8.8 nameserver 8.8.4.4 nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844 search .

tavii · February 2025

I use Google DoH through cloudflared for my DNS server but that's separate to the system DNS.

amarc · February 2025

Yeah, that looks like rate-limiting to me. But it's hard to believe it's on originating side of story.

Why can't you spin up Nginx on some other random provider/VPS and test this within "your environment" to confirm it's actually Clouvider's VPS network issue. Do not rely some public service is not somehow limiting some requests/user agents/IP's/Providers

tavii · February 2025

@amarc said:
Yeah, that looks like rate-limiting to me. But it's hard to believe it's on originating side of story.

Why can't you spin up Nginx on some other random provider/VPS and test this within "your environment" to confirm it's actually Clouvider's VPS network issue. Do not rely some public service is not somehow limiting some requests/user agents/IP's/Providers

Ok so I hosted a caddy server on my home network and curled a text file. Now I don't see the 5 second delays anymore but I still get the random 2 minute response times.

I tried doing 4 curls in parallel doing 10req/sec each and sometimes they seem to stop at the same time but not consistently.

It's definitely a clouvider issue, either the ubuntu image they provide or their network.

nanankcornering · February 2025

@tavii said: curl: (28) Failed to connect to 54.84.170.143 port 80 after 135992 ms: Couldn't connect to server

I would try going to other services such as cloudflare.

is it the same thing if you do
time curl "https://discord.com/cdn-cgi/trace"
or
time curl "https://cloudflare.com/cdn-cgi/trace"

tavii · February 2025

@nanankcornering said:

@tavii said: curl: (28) Failed to connect to 54.84.170.143 port 80 after 135992 ms: Couldn't connect to server

I would try going to other services such as cloudflare.

is it the same thing if you do
time curl "https://discord.com/cdn-cgi/trace"
or
time curl "https://cloudflare.com/cdn-cgi/trace"

Tried both, same thing. Cloudflare, Google, Linode, no matter what I try same problem

tavii · February 2025

The weirdest thing is why is it always exactly either 5 seconds or 2 minutes and 12 seconds. Even if it would be a DDoS think it's definitely not intended.

Also weird that it's only for outgoing traffic. Incoming DNS queries don't get dropped and incoming HTTP requests also seem fine

Arirang · February 2025

It's weired. I have Clouvider vms in all locations except Manchester you have. I have no problem to make outgoing http connections using Curl every 10 second through ipv4 and ipv6.

ehhthing · February 2025

Does this happen over ICMP as well?

tavii · February 2025

@ehhthing said:
Does this happen over ICMP as well?

No

nullnothere · February 2025

Could you test out with a few changes to --max-time and --connect-timeout (and maybe --retry-max-time) to see if things fail faster and/or more predictably?

You could also just reboot into one of the rescue environments and use curl from there (just to rule out some Ubuntu issue).

Clouvider · February 2025

@Arirang said:
It's weired. I have Clouvider vms in all locations except Manchester you have. I have no problem to make outgoing http connections using Curl every 10 second through ipv4 and ipv6.

We spun up a VM in Manchester on each of the Hypervisors there when this thread has showed up and cannot replicate it neither across a series of 1000 attempts to a known good web server outside of our network nor for the DNS resolution.

Plus it feels like something that even our StatusCake should pick up.

Andreix · February 2025

Dumb suggestion here, but have you tried an old fashion VM reinstall with a fresh OS (different than what you actually have) ?

If @Clouvider said they replicated the issue in the same environment with you and couldnt find any issue, I'm thinking maybe some corrupted binaries/libs on your OS.

JabJab · February 2025

It's probably time to look into tcpdump/wireshark and see what is going on there - retransmissions?
or start with strace to make sure it's not your system getting bottlenecked by something and this is never "send out" at times you think it's send.

jsg · February 2025

@tavii said:
@jsg Do you know any tool I could use to troubleshoot this?

This is what I got for now, my service sends a POST request every about 20 seconds to port 18080. Some requests take 5 seconds longer than normal and some completely timeout (possibly a 73 second timeout).

I also tested it with curl doing different GET/POST, HTTP and HTTPS requests to different websites and the exact same behaviour. I uploaded some screenshots below:

https://imgur.com/a/ZGEYA3w

Again, my services work completely fine everywhere else including AWS.

as well re some of @amarcs thoughts

For a start, I got "error "Imgur is temporarily over capacity. Please try again later."". Can you maybe put it somewhere else as well? If you want on one of your servers + a PM with the URL to me.

Reading the whole thread so far I see the following basic problems wrt your testing:

curl is, well, curl that is, some program whose inner working details you highly likely don't know (+ using "time" makes sense as a first crude step to get a ballpark number but not for precise timing). You see I wrote my own routines connecting to (and downlading from) servers for a reason: The moment you use a library you don't really and in detail know all the small tids and bits - et voilà I saw many, many cases where almost invariably targets as well as very rarely the server I tested acted weirdly like sometimes being very fast and sometimes really slow even up to timing out.
the internet often is a weird place and testing from or towards virtual servers one sees even more weirdness, often a spike from a node neighbour being the culprit.
the internet also is a complex place, at least when down in the plumbing. Your assertion for example is very easy to brush off and very hard to even properly investigate, let alone prove. One major reason being that it's in its very nature to pretty much always have (a) the source, (b) the target, and (c) a hard to really know number of diverse hops in between, some of which even are deliberately hidden (incl. and especially FWs).

Frankly, the main reason I don't simply brush your problem off saying "oh well, the internet" is the fact that I have seen @Clouvider's nodes (all over the world) strangely quite often, e.g. testing against their LAX test node I've seen everything in between "impressive!" and "yuck, again in between being snail slow and dead it seems". And please note that my aim is not to bash Clouvider but rather to stress that even supposedly good providers obviously can't control everything (well, that's the internet), although I've seen it more often with Clouvider than with others.

In your place I'd (a) try to run the same test from different, preferably good quality, providers and (b) towards a few different targets, preferably owned by you and of decent quality.And I'd do that multiple times at different times of day. And I'd strongly suggest to not use a "practical swiss pocket-knife" (like curl) but proper software for specific and detailled tests.
Oh, and btw, before even starting with that I'd run multiple mtrs to/fro your set of sources and targets to get a first impression and ballpark numbers.

Finally, sorry, but my first guess would be that it's not a Clouvider (connectivity) problem but either your VPS/node or some kink in the connection, read: some hop(s) on the internet.
Btw DNS over https? Are you joking? That's adding additional layers and potential problems on top. There's a good reason why I wrote my own DNS test routines for my monitoring software ...

Clouvider · February 2025

@jsg said:
Frankly, the main reason I don't simply brush your problem off saying "oh well, the internet" is the fact that I have seen @Clouvider's nodes (all over the world) strangely quite often, e.g. testing against their LAX test node I've seen everything in between "impressive!" and "yuck, again in between being snail slow and dead it seems". And please note that my aim is not to bash Clouvider but rather to stress that even supposedly good providers obviously can't control everything (well, that's the internet), although I've seen it more often with Clouvider than with others.

Are we talking iperf nodes ? Often changing the port helps - iperf servers tend to bug out and the process needs restarting from time to time (different port = different process). We have it automated but you might be unlucky with your timing. You will appreciate our iperf servers are very popular, being used in many benchmark scripts, so the 10Gbps shared across up to 10 test users certainly doesn’t help. This doesn’t mean there’s any network issue either, nor is it affecting any services. LA has plenty of capacity, same 100% Juniper network and uses premium providers as in every other PoP of ours.

jsg · February 2025

@Clouvider said:

@jsg said:
Frankly, the main reason I don't simply brush your problem off saying "oh well, the internet" is the fact that I have seen @Clouvider's nodes (all over the world) strangely quite often, e.g. testing against their LAX test node I've seen everything in between "impressive!" and "yuck, again in between being snail slow and dead it seems". And please note that my aim is not to bash Clouvider but rather to stress that even supposedly good providers obviously can't control everything (well, that's the internet), although I've seen it more often with Clouvider than with others.

Are we talking iperf nodes ? Often changing the port helps - iperf servers tend to bug out and the process needs restarting from time to time (different port = different process). We have it automated but you might be unlucky with your timing. You will appreciate our iperf servers are very popular, being used in many benchmark scripts, so the 10Gbps shared across up to 10 test users certainly doesn’t help. This doesn’t mean there’s any network issue either, nor is it affecting any services. LA has plenty of capacity, same 100% Juniper network and uses premium providers as in every other PoP of ours.

Nope I was talking about your "download x M|GB test file" speedtest servers. I generally try to avoid Iperf crap as far as any possible.

As for "culprit(s)" I clearly said that I have no basis to presume it's your fault, I merely said that I experience it often with your speedtest servers, whatever the reason for that may be. As I already often stated the whole internet consists, exaggerating somewhat but I guess my point gets clear, of "culprits" by its very nature.

Jokull · February 2025

@tavii said:
I did curl http://54.84.170.143/user-agent and got the same behaviour:

root@Alba:~/dnspyre# time curl http://54.84.170.143/user-agent
{
"user-agent": "curl/8.5.0"
}
real 0m0.177s
user 0m0.008s
sys 0m0.006s
root@Alba:~/dnspyre# time curl http://54.84.170.143/user-agent
curl: (28) Failed to connect to 54.84.170.143 port 80 after 135992 ms: Couldn't connect to server
real 2m16.021s
user 0m0.020s
sys 0m0.020s

In my opinion, this is not a problem on @Clouvider's side, especially not a DDoS or some kind of hacker attack.

I have edited the script and run it across all my VPS on Clouvider, and all ran flawlessly with responses under 500ms:

for i in {1..100}; do time curl http://54.84.170.143/user-agent ; sleep 0.5 ; done

The problem might lie in Cloudflare or DoH, if the package is going through them.

OP, please edit your post, you almost gave me a heart attack.

CloudHopper · February 2025

If you need to check a network for packet loss, the easiest approach is to use MTR.

The following command will run a cycle of 20 tests against Google's DNS server and tell you what packet loss you're seeing at each hop

mtr --report-cycles=20 8.8.8.8

That IP should be available from pretty much anywhere, but you can replace 8.8.8.8 with any other IP or domain

TimboJones · February 2025

@tavii said:

@amarc said:
So you did not take into account that "service at port 18080" you are making POST request to is "glitching" ?

Edit:

I did not see whole imgur page.. So, how does your /etc/resolv look like ? Did you try to curl to IP instead DNS name to isolate DNS resolvers as issue ?

I did curl http://54.84.170.143/user-agent and got the same behaviour:

root@Alba:~/dnspyre# time curl http://54.84.170.143/user-agent
{
"user-agent": "curl/8.5.0"
}
real 0m0.177s
user 0m0.008s
sys 0m0.006s
root@Alba:~/dnspyre# time curl http://54.84.170.143/user-agent
curl: (28) Failed to connect to 54.84.170.143 port 80 after 135992 ms: Couldn't connect to server
real 2m16.021s
user 0m0.020s
sys 0m0.020s

/etc/resolv.conf looks like this:
root@Alba:~# cat /etc/resolv.conf nameserver 8.8.8.8 nameserver 8.8.4.4 nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844 search .

You're using the IP address directly. All this DNS shit is a red herring since its not involved in the above at all.

bobert · February 2025

I'm having the same issues.

Using mtr with TCP flag shows massive packet loss over their transit links (icmp and ix traffic is unaffected). My vps with them is in NJ. This is a very strange problem.

https://lowendtalk.com/discussion/202952/clouvider-vps-lagging-but-no-indications-of-the-problem#latest

Howdy, Stranger!

Categories

In this Discussion

Clouvider VPS Dropping Traffic and Connection Timeouts constantly

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Clouvider VPS Dropping Traffic and Connection Timeouts constantly

Comments