ANYCAST related issue and questions

miu · May 2022

Hello anycast ppl,

situation:
i have 3 VPSes at BuyVM: LUX, NY and VEGAS
and on each of them is configured/assigned the same anycast IP (anycast IPv4 w 3 POPs)

What i did expect and intend, was redundancy & automatic fail over, for example:
When visitor from Germany is requesting service running on my anycast IP and LUX vps (EU POP) is just offline/down, he (visitor and his request) should be automatically re-routed to another (next) nearest working POP (in this case US NY POP.. and when also NY down then VEGAS POP).

But instead this behavior Fran's anycast does this:
If LUX vps (POP) is offline, EU visitor is not re-rerouted to another POP (NY or VEGAS), mtr show lost packets and such anycast IP and its services are for EU visitor unreachable = offline, down. So this gives no redundancy ever

if i good understand how real anycast should work then:
if nearest POP is offline, visitor should be re-routed to another (next nearest) POP and so on..

Or is my expectations and understanding for this wrong?

btw: here it is also so explained as i thought :

Thanks for all explanations, suggestions or recommendations (for services where anycast works with this mentioned redundancy behavior) from ppl who understand to this

for completion here is answer from support:

btw: this is also not any my complain on this provider or somethng similar, but simple i am asking on this and opinions of some experts on this (knowledgeable people in this field), bcs i do not have experiences with this

thanks for all useful responses in advance

sanvit · May 2022

Correct me if I'm wrong. This is just my understanding of how it works.

Anycast can be redundant IF THE IP STOPS BEING ANNOUNCED (which should happen on normal anycast setup, router goes down than the IP isn't announced on the specific region).

However, that requires you to announce your own /24 or bigger block (which is super expensive, and you won't really need 256+ IPs on a single server in most cases), and the router to go down when other systems go down (which should be the case on DC power failure, fiber cut, etc.). So, what BuyVM does instead is announce the /24 on their own equipment, and rent out a single IP from that block. In this case, even if your server is down, BuyVM's router is still announcing the IP block (at least as long as their infrastructure is up), so other routers will still try to reach the specific region that has the closest path.

On the last part of your ticket, the support rep mentions BGP. This is what I was talking on the top (announcing your own /24). If you establish a BGP session with BuyVM, you can announce your own IP blocks (/24 or bigger for v4, /48 or bigger with v6). Then if your router/server goes down, you will stop announcing your block to BuyVM, which will stop announcing on the specific region.

IMO BuyVM's anycast is meant for more specific use cases, like DNS or some CDN setup where GeoDNS routing, etc. isn't an option.

I guess you could do something similar with monitoring + API enabled DNS, or you can get a cheap /48 v6 block, announce it, and connect it via Cloudflare proxy

risharde · May 2022

Hmmm interesting, I never thought anycast did what op was asking for, always thought it would be region specific so I am subscribing in to read the experts of bgp and anycast

Lunar · May 2022

@sanvit is correct, BGP anycast is only redundant if you stop advertising the IP prefix from the location where the server is no longer responding to requests. BGP is not aware of layer 4 protocols, so if TCP connections start to fail, it doesn't know to stop advertising the prefix or something. This has to be done by software that automatically stops the advertisement of a prefix when an issue is detected.

In the case of BuyVM this can't be done because there are several other customers using a /24 anycast prefix. This would only work & be possible if there were one customer per /24 prefix + the automation in place to action it.

Francisco · May 2022

@miu said: What i did expect and intend, was redundancy & automatic fail over, for example:
When visitor from Germany is requesting service running on my anycast IP and LUX vps (EU POP) is just offline/down, he (visitor and his request) should be automatically re-routed to another (next) nearest working POP (in this case US NY POP.. and when also NY down then VEGAS POP).

We could re-route, but it ends up with people just using anycast IP's as free IP's. "Oh, I'll just buy 3 VM's, cancel the next day, keep the 5 free IP's".

Your options are:

BYOIP
Setup a 2nd VM in each DC and then you can do failover/loadbalancing
Live with the current setup

If a location does go down, the location is pulled from BGP. We also pull locations from anycast if there's major maintenance windows (like when we migrated our racks in LUX).

EDIT - Fixed a word.

Francisco

kevertje · May 2022

What i did expect and intend, was redundancy & automatic fail over, for example..

The router that announces your anycast is in front of your VPS. The router is unaware if the VPS is offline and will keep announcing the anycast. Any router that is announcing the anycast to the closest route to the endpoint is offering their route to a VPS.

miu · May 2022

Hello @SplitIce
Can u pls explain how will act X4B config in following theoretical situation?

X4B anycast IP with POPs London and Ams in europe + another in US.

When EU visitor will send request on this anycast IP, and theoretically Ams and also London POPs are offline / down, then:

a) His request and packets will be automatically by your anycast config re-routed to the another nearest POP in US (east) and he will get response from such US POP?

or:

b) He will be not able reach service if to him nearest POP is down and will be not rerouted to nearest another POP (US east)

(in other worlds - if your POPs give redundancy a traffic is rerouted to another nearest POP what is online when local current (nearest POP for concrete geo user) is just offline)

Thank you for response and info in advance

miu · May 2022

@Francisco said:

@miu said: What i did expect and intend, was redundancy & automatic fail over, for example:
When visitor from Germany is requesting service running on my anycast IP and LUX vps (EU POP) is just offline/down, he (visitor and his request) should be automatically re-routed to another (next) nearest working POP (in this case US NY POP.. and when also NY down then VEGAS POP).

We could re-route, but it ends up with people just using anycast IP's as free IP's. "Oh, I'll just buy 3 VM's, cancel the next day, keep the 5 free IP's".

Hello. Thank you for your response and explanations
No my case (i put clients on servers and not interesting then move them there then there then there...) But I fully understand your frustration from this, unfortunately this is often customer "behavior" (order knowingly more than need from reason to get any free perks/free goodies and then cancel redundant services next days )

Your options are:

BYOIP

Setup a 2nd VM in each DC and then you can do failover/loadbalancing

In this concrete case is for me important give redundancy mainly for EU users - i EU
On each VMs i want have running reverse proxy with caching, so when 1 VM nearest to them become offline, them should be immediately able to reach content from another one nearest (minimally be able get cached content from another nearest node = i do not need resolve redundancy/HA and synchro for backend webservers)

So question to this point is: I have 3 your VMs in LUX location
But probably all 3 are on the same node = single point of failure = when this node get offline, failover in this location will still not works

May i ask move 2 instances on another different node pls?
By this way i would be able get failover in your LUX location
(if yes may i open ticket with such req?)

Thanks for the answer and all your time spent with this thread and matter

Live with the current setup

If a location does go down, the location is pulled from BGP. We also pull locations from anycast if there's major maintenance windows (like when we migrated our racks in LUX).

EDIT - Fixed a word.

Francisco

miu · May 2022

@Lunar said:
@sanvit is correct, BGP anycast is only redundant if you stop advertising the IP prefix from the location where the server is no longer responding to requests. BGP is not aware of layer 4 protocols, so if TCP connections start to fail, it doesn't know to stop advertising the prefix or something. This has to be done by software that automatically stops the advertisement of a prefix when an issue is detected.

In the case of BuyVM this can't be done because there are several other customers using a /24 anycast prefix. This would only work & be possible if there were one customer per /24 prefix + the automation in place to action it.

@Francisco We could re-route, but it ends up with people just using anycast IP's as free IP's.

My apologies but now i am a bit confused, so:

a) is possible re-route (by software automatically when VPS or target IP node in concrete POP fail and get offline) traffic to another nearest POP for only part of subnet (say /32 or /29)?? (i assume that Fran said that YES...)

b) or it is possible only for whole subnet (/24 and more)??
(And this said @Lunar & @sanvit)

what option is correct?
THANKS

NoComment · May 2022

What you want to achieve is not that simple and I don't think providers will help you or teach you how to do that for free. The simplest solution is to just use a failover dns solution which does health checks for you. For example, aws route53 and cloudflare. For the reverse proxy part, the free nginx can do health checks.

Another way is probably with failover IPs on any cloud provider and you make use of APIs to automate the health checks yourself. But there may be some form of vendor lock in and your health checking may need failover too and this is why some cloud providers also offer "load balancing" products.

Without a lot of scale, just stick to the dns solution and let the providers do the health checks for you.

sanvit · May 2022

@miu

I think I can answer that.

Traditionally, (b) is correct. However, you could implement (a) via software on the router end. It isn't technically re-routing, (a) can only be accomplished by setting the router proxy/forward the request to whatever node is alive.

I'll give you an example,

Let's say you have one node on each region, and LUX goes out.

(a) could be accomplished by setting the router on LUX to forward the request to MIA, while (b) will just stop announcing on LUX, and the user will directly be routed to MIA.

So on traditional routing (method b), this will be

User -> MIA

while, if you implement (a), this will be (assuming Fran will support it)

User -> LUX -> MIA

However, I doubt that Fran will implement it, since this will require extra development (healthchecks, etc.)

miu · May 2022

@NoComment said:
What you want to achieve is not that simple and I don't think providers will help you or teach you how to do that for free. The simplest solution is to just use a failover dns solution which does health checks for you. For example, aws route53 and cloudflare. For the reverse proxy part, the free nginx can do health checks.

Another way is probably with failover IPs on any cloud provider and you make use of APIs to automate the health checks yourself. But there may be some form of vendor lock in and your health checking may need failover too and this is why some cloud providers also offer "load balancing" products.

Without a lot of scale, just stick to the dns solution and let the providers do the health checks for you.

Thanks but DNS is not real time fail over solution:
Opposite the fact that TTL can be configured short (few seconds), problem is that often browsers (especially on windows) glad held in own cache IPs for domains long time (while are not closed) ignoring A records TTL times. So in most case website with changed A rec may stay unreachable long time - while users will not close or reopen browser and he does not know that he should do it (when see site offline) of course (for example i seen it several times on windows with Firefox - also when i tried /flushdns Firefox did continue ignore it and used next old IP what he had cached)

So from this angle of view imo is best solution use anycast (wha is able re-routing to another POP if current target fails) with reverse proxy, OR: HA automatic failover cluster with fail-over IPs (say OVH + PVE + CEPH, what is a much complicated to configure automation there that resolve this with anycast working with redundancy as i mentioned in begin of this post )

Still i assume that X4B should be working good just for this way and cover good and enough this purpose, but need wait for @SplitIce response and correction

NoComment · May 2022

@miu said:

@NoComment said:
What you want to achieve is not that simple and I don't think providers will help you or teach you how to do that for free. The simplest solution is to just use a failover dns solution which does health checks for you. For example, aws route53 and cloudflare. For the reverse proxy part, the free nginx can do health checks.

Another way is probably with failover IPs on any cloud provider and you make use of APIs to automate the health checks yourself. But there may be some form of vendor lock in and your health checking may need failover too and this is why some cloud providers also offer "load balancing" products.

Without a lot of scale, just stick to the dns solution and let the providers do the health checks for you.

Thanks but DNS is not real time fail over solution:
Opposite the fact that TTL can be configured short (few seconds), problem is that often browsers (especially on windows) glad held in own cache IPs for domains long time (while are not closed) ignoring A records TTL times. So in most case website with changed A rec may stay unreachable long time - while users will not close or reopen browser and he does not know that he should do it (when see site offline) of course (for example i seen it several times on windows with Firefox - also when i tried /flushdns Firefox did continue ignore it and used next old IP what he had cached)

The cloudflare solution should be the simplest, the georouting/failover routing by default is sold as a "load balancing" service so it is real-time failover. If you wanted to use aws, there are multiple ways to do this and there are guides on how to do this written by aws. (Just google high availability aws)

CloudV · May 2022

if nearest POP is offline, visitor should be re-routed to another (next nearest) POP and so on..

this will work if you do your own software BGP announcing IPs from the VM.

Or is my expectations and understanding for this wrong?

in this case, the IPs are being announced at router which cannot detect if your VM is online or offline... so it's an understanding difference.

miu · May 2022

@miu said:

@Francisco said:

@miu said: What i did expect and intend, was redundancy & automatic fail over, for example:
When visitor from Germany is requesting service running on my anycast IP and LUX vps (EU POP) is just offline/down, he (visitor and his request) should be automatically re-routed to another (next) nearest working POP (in this case US NY POP.. and when also NY down then VEGAS POP).

We could re-route, but it ends up with people just using anycast IP's as free IP's. "Oh, I'll just buy 3 VM's, cancel the next day, keep the 5 free IP's".

Hello. Thank you for your response and explanations
No my case (i put clients on servers and not interesting then move them there then there then there...) But I fully understand your frustration from this, unfortunately this is often customer "behavior" (order knowingly more than need from reason to get any free perks/free goodies and then cancel redundant services next days )

Your options are:

BYOIP

Setup a 2nd VM in each DC and then you can do failover/loadbalancing

In this concrete case is for me important give redundancy mainly for EU users - i EU
On each VMs i want have running reverse proxy with caching, so when 1 VM nearest to them become offline, them should be immediately able to reach content from another one nearest (minimally be able get cached content from another nearest node = i do not need resolve redundancy/HA and synchro for backend webservers)

So question to this point is: I have 3 your VMs in LUX location
But probably all 3 are on the same node = single point of failure = when this node get offline, failover in this location will still not works

May i ask move 2 instances on another different node pls?
By this way i would be able get failover in your LUX location
(if yes may i open ticket with such req?)

Thanks for the answer and all your time spent with this thread and matter

Live with the current setup

If a location does go down, the location is pulled from BGP. We also pull locations from anycast if there's major maintenance windows (like when we migrated our racks in LUX).

EDIT - Fixed a word.

Francisco

@Francisco additional question:
If i will assign the same anycast IP in one location to 2 or more instances (3 VPSes in LUX will have configured the same one anycast IP):

it will work?
how will be split incoming traffic and reqs between them? Router will load-balancing randomly to them all 3? or?

(sorry if i have (many) stupid or unknown questions, but i have no experiences and knowledge to this use and setup)

miu · May 2022

@sanvit said:

Many thanks for all your explanations and spent time for it, appreaciated

AXYZE · May 2022

Idea from me:
If this is app/web server then consider using Cloudflare. CF Load Balancing does exactly what you need but will cost you like $20/mo.
You can do it cheaper (~$1 for 1milion req) by using Cloudflare Workers
https://medium.com/@theotow/how-to-build-an-application-loadbalancer-for-1-1mio-requests-5c268e1d9d02
But you also need to implement health checks and remove unhealthy servers from pool - you can do it by Cloudflare API. If you dont have time or skills then CF Load Balancer does everything automatically.

With this method you are not stuck with one VPS provider, which can be really nice if you get good VPS deal in future.

Also consider getting just one big box in between desired locations (NY or London for example) and then cache data at CF Edge servers. You can easily cache API responses so they will be near visitors. If Edge Cache TTL is too long for you then you can cache via CF Workers for like 60seconds. Its very fast for visitors & cost effective for you.
https://coffeencoding.com/how-i-used-cloudflare-to-reduce-api-response-time/

Francisco · May 2022

@miu said:
@Francisco additional question:
If i will assign the same anycast IP in one location to 2 or more instances (3 VPSes in LUX will have configured the same one anycast IP):

it will work?

how will be split incoming traffic and reqs between them? Router will load-balancing randomly to them all 3? or?

(sorry if i have (many) stupid or unknown questions, but i have no experiences and knowledge to this use and setup)

We don't do rerouting for users to other locations.

Yes, if you have multiple VM's on the same node we can spread them out.

Yes, you can load balance the IP's via BGP (we can give you a private ASN), or you can just use heartbeat to do master/slave failover.

Francisco

NoComment · May 2022

The expert has spoken. As I said earlier, unless you are doing things at a large scale, it is just not worth the trouble trying to save a few dollars here and there. Cloudflare is honestly your best bet.

@AXYZE said: CF Load Balancing does exactly what you need but will cost you like $20/mo.

Would be $15/mo for your 3 POPs plus some extra for queries which honestly are not too expensive and simply just works.

@AXYZE said: You can do it cheaper (~$1 for 1milion req) by using Cloudflare Workers

Apart from this, I recently learnt another relatively cheap but hacky way on LET. This dude was using gcp for georouting and failover, then using cloudflare as a cname with short TTL. I have not tried it, but it sounds like it would work.

sots · May 2022

You need CDN & load balancer instead of anycast IP if you just want to keep your web service uptime. It's unnecessary, difficult and expensive to get & operate anycast IP for your service. Cloudflare is cheaper but enough.

Francisco · May 2022

@NoComment said: The expert has spoken. As I said earlier, unless you are doing things at a large scale, it is just not worth the trouble trying to save a few dollars here and there. Cloudflare is honestly your best bet.

It depends what you're load balancing. Does CF's solution load balance straight TCP/UDP? Or just HTTP?

We have plenty of people load balancing stateless applications and such and think our anycast offerings w/ BGP are the greatest thing since sliced bread.

Francisco

SplitIce · May 2022

@miu

I think the core of your problem is that you are confusing Anycast (routing frontend technology) with Load Balancing (a backend service distribution method). Failover specifically seems to be what you are most refering to and is often placed under the umbrella of Load Balancing (and it's where you will find it in our panel).

For a anycast procviders traffic to be re-routed away from a PoP that PoP must go offline. Your backend servers online status is irrelevant to frontend routing (instead ensuring uptime in these cases is a matter for failover / load balancing).

If what you are after is failover and load balancing, it is supported (both failover primary/backup and load balancing active/active modes). You simply enable Load Balancing on the port row and define your load balancing configuration with one or more active backend servers and zero or more backups (HTTP).

If what you are after is geo routing (using anycast technology) it is available. Along with custom configuration for each PoP at the port level. You simply set the region option in the port row (one entry per PoP required).

Combining those two - if you want to define a custom load balancing configuration for each PoP thats supported. You just select the region for the port and define a port and your desired failover / loadbalancing configuration for each.

NoComment · May 2022

@Francisco said:

@NoComment said: The expert has spoken. As I said earlier, unless you are doing things at a large scale, it is just not worth the trouble trying to save a few dollars here and there. Cloudflare is honestly your best bet.

It depends what you're load balancing. Does CF's solution load balance straight TCP/UDP? Or just HTTP?

I believe on the cheapest plan you do get HTTP and TCP but for UDP you may have to pay a lot more. For OP's use case, he could even do away with the reverse proxies if he used cloudflare as a load balancer.

@Francisco said: We have plenty of people load balancing stateless applications and such and think our anycast offerings w/ BGP are the greatest thing since sliced bread.

Yeah, buyvm has a very nice anycast offering. There's no doubt about that here.

miu · May 2022

@sots said:
You need CDN & load balancer instead of anycast IP if you just want to keep your web service uptime. It's unnecessary, difficult and expensive to get & operate anycast IP for your service. Cloudflare is cheaper but enough.

No. Sure i do not need CDN. CDN is very generally solution.. And good mainly for large media files, but not for static and dynamic HTML content (there will be a bunch of problems how resolve static and dynamic requests, ajax req etc)
I want to have full control over caching and URL path processing = Nginx/reverse proxy under full control and with own custom config and rules.

kevertje · May 2022

It's unnecessary, difficult and expensive to get & operate anycast IP for your service.

Yes just go to McDonalds instead of cooking a decent meal

Howdy, Stranger!

Categories

In this Discussion

ANYCAST related issue and questions

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

ANYCAST related issue and questions

Comments