Serverica down

Falzo · July 2021

@Daniel15 said:
My storage VPS is back

@Falzo said: vmware might have carried over the uptime during the freeze

As far as I know, they're 100% Xen. They're the developers/maintainers of a module that connects WHMCS to Xen (https://xenmodule.com/).

yes XEN HVM, not vmware. sorry, my wrong...

Maounique · July 2021

Xen VM state can be fully snapshotted.

Falzo · July 2021

@Maounique said:
Xen VM state can be fully snapshotted.

yeah, that's what I saw and why the uptime continued counting. however shutting them down completly would have been better most likely, as I saw a fresh full start of the VM was needed for getting the network setting properly with the new environment.

daozhi · July 2021

there is something wrong with ipv6 configuration. I cannot reach my ipv6-only storage vps. Waiting them to reply my ticket.

zongyouxiao · July 2021

@daozhi said:
there is something wrong with ipv6 configuration. I cannot reach my ipv6-only storage vps. Waiting them to reply my ticket.

Same here, i tried this, and finally it works, https://servarica.com/clients/knowledgebase/13/Adding-IPv6.html, strange it worked before the migration, I have to change my network configuration according to their document after the migration to make it work again.

cochon · July 2021

@zongyouxiao said:

@daozhi said:
there is something wrong with ipv6 configuration. I cannot reach my ipv6-only storage vps. Waiting them to reply my ticket.

Same here, i tried this, and finally it works, https://servarica.com/clients/knowledgebase/13/Adding-IPv6.html, strange it worked before the migration, I have to change my network configuration according to their document after the migration to make it work again.

Interesting KB article, noting that the debian instructions make no mention of an IPv6 gateway, yet my [debian] VPS has one configured, same as the one in the CentOS instructions.

Can't ping the gateway address at all now or make any outbound connections from the Storage VPS, but it has been working briefly since the migration, just not now. Network status in the portal says 'There are no Network Issues Currently' looks like a ticket's needed...

raindog308 · July 2021

@Daniel15 said: My storage VPS is back

As is mine.

# uptime
 10:12:51 up 2 days, 12:54,  1 user,  load average: 0.07, 0.02, 0.00

zongyouxiao · July 2021

@cochon said:

@zongyouxiao said:

@daozhi said:
there is something wrong with ipv6 configuration. I cannot reach my ipv6-only storage vps. Waiting them to reply my ticket.

Same here, i tried this, and finally it works, https://servarica.com/clients/knowledgebase/13/Adding-IPv6.html, strange it worked before the migration, I have to change my network configuration according to their document after the migration to make it work again.

Interesting KB article, noting that the debian instructions make no mention of an IPv6 gateway, yet my [debian] VPS has one configured, same as the one in the CentOS instructions.

Can't ping the gateway address at all now or make any outbound connections from the Storage VPS, but it has been working briefly since the migration, just not now. Network status in the portal says 'There are no Network Issues Currently' looks like a ticket's needed...

There is no network if I manually write the gateway in the network configuration file, but it used to work, after remove it and reboot, the network comes.

cochon · July 2021

@zongyouxiao said:
There is no network if I manually write the gateway in the network configuration file, but it used to work, after remove it and reboot, the network comes.

Ah yes, I get one step further at least without a gateway at all, though I had one before... but still no connections. Destination unreachable packets at least comes back from some 'other' router address in their network now.

Here's hoping the first team are back on duty after the weekend...

default · July 2021

I don't like the new location. Too many network issues.

servarica_hani · July 2021

Hi Everyone,

Sorry for the bad communication and late updates

As you all know the move was done sometime ago ,
The only big issue we have now is related to ipv6 only vms(mouse storage) , for some reason IPv6 not working and we are working non stop on them (the vms are fine but with no network)

the few downtime you had today and yesterday after the move (30 min each aprox) was due to switching to our main router from temporary one , it didnt work well initially and we have to fix some config and try again

Currently we are on our main router with 1 uplink only (the second uplink should be here in few days)
after that the network is back to be exactly as before (same uplinks )

the IPv6 issue is still not resolved but now since we got some sleep hopefully we can find a solution and fix it

Was planning to send email today after IPv6 issue resolved to explain all things went wrong in the move but due to IPv6 still down we are focusing on it

Thanks

Maounique · July 2021

Good work, we also had a move like this in another building same compound and it was a nightmare.
Problems were solved faster but there were also many we didn't anticipate. I presume you have much more customers (we only had some 10 racks to move).

cochon · July 2021

@servarica_hani said:
Sorry for the bad communication and late updates

the IPv6 issue is still not resolved but now since we got some sleep hopefully we can find a solution and fix it

OK, thanks for the update, sounds like it was hectic holiday weekend.

It doesn't help that the Network Status page has said 'No Issues' all weekend, so many have been hacking their network configurations assuming it was a local issue.

default · July 2021

Thank you for the info @servarica_hani

rahulks · July 2021

@Falzo said: disk speeds are good since long time ago (after they kicked out the torrent sh*t)

So now

It's a blanket no-no for torrents or just for the noisy neighbours?

@servarica_hani

Falzo · July 2021

@rahulks said:

@Falzo said: disk speeds are good since long time ago (after they kicked out the torrent sh*t)

So now

It's a blanket no-no for torrents or just for the noisy neighbours?

@servarica_hani

torrents==noisy neighbours. please simply take it elsewhere. thanks.

hotsnow · July 2021

@servarica_hani said:
Hi Everyone,

Sorry for the bad communication and late updates

As you all know the move was done sometime ago ,
The only big issue we have now is related to ipv6 only vms(mouse storage) , for some reason IPv6 not working and we are working non stop on them (the vms are fine but with no network)

the few downtime you had today and yesterday after the move (30 min each aprox) was due to switching to our main router from temporary one , it didnt work well initially and we have to fix some config and try again

Currently we are on our main router with 1 uplink only (the second uplink should be here in few days)
after that the network is back to be exactly as before (same uplinks )

the IPv6 issue is still not resolved but now since we got some sleep hopefully we can find a solution and fix it

Was planning to send email today after IPv6 issue resolved to explain all things went wrong in the move but due to IPv6 still down we are focusing on it

Thanks

any updates on IPv6? still can't connect...

vyas11 · July 2021

thanks for the update @servarica_hani and you deserve a beer after all that work

servarica_hani · July 2021

@hotsnow said:
any updates on IPv6? still can't connect...

it is up for some time now

you cant connect to google though as current uplink does not peer with them , this will be fixed when we get back the other uplink

id you still cant access your ipv6 open ticket as it should be workin for all by now

contactwajeeh · July 2021

I am seeing this, anyone else facing the same? @servarica_hani

root@debian:~# ping yahoo.com
ping: yahoo.com: Temporary failure in name resolution
root@debian:~# ping 262:ffd5:1:100::1
PING 262:ffd5:1:100::1(262:ffd5:1:100::1) 56 data bytes
From 2001:550:0:1000::9a18:198d: icmp_seq=3 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=4 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=5 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=6 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=7 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=8 Destination unreachable: No route
^C
--- 262:ffd5:1:100::1 ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 48ms

servarica_hani · July 2021

@contactwajeeh said:
I am seeing this, anyone else facing the same? @servarica_hani

remove the ipv6 , by mistake while debugging ipv6 issue we enabled stateless ipv6 which assigned ipv6 to all vms and in linux if you have ipv6 it is preferred for routing

ip -6 addr del / dev

or open a ticket and the guys will do it for you

Aoi · July 2021

@contactwajeeh said:
I am seeing this, anyone else facing the same? @servarica_hani

root@debian:~# ping yahoo.com
ping: yahoo.com: Temporary failure in name resolution
root@debian:~# ping 262:ffd5:1:100::1
PING 262:ffd5:1:100::1(262:ffd5:1:100::1) 56 data bytes
From 2001:550:0:1000::9a18:198d: icmp_seq=3 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=4 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=5 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=6 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=7 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=8 Destination unreachable: No route
^C
--- 262:ffd5:1:100::1 ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 48ms

Maybe you want to ping '2602:ffd5:1:100::1' instead of '262:ffd5:1:100::1' ?

ping -6 -c 4 yahoo.com
PING yahoo.com(media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000)) 56 data bytes
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=1 ttl=49 time=18.2 ms
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=2 ttl=49 time=18.1 ms
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=3 ttl=49 time=18.0 ms
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=4 ttl=49 time=18.0 ms
--- yahoo.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 6ms
rtt min/avg/max/mdev = 18.001/18.091/18.231/0.131 ms

ping -6 -c 4 2602:ffd5:1:100::1
PING 2602:ffd5:1:100::1(2602:ffd5:1:100::1) 56 data bytes
64 bytes from 2602:ffd5:1:100::1: icmp_seq=1 ttl=64 time=0.649 ms
64 bytes from 2602:ffd5:1:100::1: icmp_seq=2 ttl=64 time=0.831 ms
64 bytes from 2602:ffd5:1:100::1: icmp_seq=3 ttl=64 time=0.861 ms
64 bytes from 2602:ffd5:1:100::1: icmp_seq=4 ttl=64 time=0.571 ms
--- 2602:ffd5:1:100::1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 58ms

Shot2 · July 2021

Everything's back up and running, it seems. Just got some email full of details about the move, downtime, and sla refund (...omg i lost billions).

cochon · July 2021

@Aoi said:

@contactwajeeh said:
I am seeing this, anyone else facing the same? @servarica_hani

root@debian:~# ping yahoo.com
ping: yahoo.com: Temporary failure in name resolution
root@debian:~# ping 262:ffd5:1:100::1
PING 262:ffd5:1:100::1(262:ffd5:1:100::1) 56 data bytes
From 2001:550:0:1000::9a18:198d: icmp_seq=3 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=4 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=5 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=6 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=7 Destination unreachable: No route
From 2001:550:0:1000::9a18:198d: icmp_seq=8 Destination unreachable: No route
^C
--- 262:ffd5:1:100::1 ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 48ms

Maybe you want to ping '2602:ffd5:1:100::1' instead of '262:ffd5:1:100::1' ?

ping -6 -c 4 yahoo.com
PING yahoo.com(media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000)) 56 data bytes
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=1 ttl=49 time=18.2 ms
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=2 ttl=49 time=18.1 ms
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=3 ttl=49 time=18.0 ms
64 bytes from media-router-fp73.prod.media.vip.bf1.yahoo.com (2001:4998:124:1507::f000): icmp_seq=4 ttl=49 time=18.0 ms
--- yahoo.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 6ms
rtt min/avg/max/mdev = 18.001/18.091/18.231/0.131 ms

ping -6 -c 4 2602:ffd5:1:100::1
PING 2602:ffd5:1:100::1(2602:ffd5:1:100::1) 56 data bytes
64 bytes from 2602:ffd5:1:100::1: icmp_seq=1 ttl=64 time=0.649 ms
64 bytes from 2602:ffd5:1:100::1: icmp_seq=2 ttl=64 time=0.831 ms
64 bytes from 2602:ffd5:1:100::1: icmp_seq=3 ttl=64 time=0.861 ms
64 bytes from 2602:ffd5:1:100::1: icmp_seq=4 ttl=64 time=0.571 ms
--- 2602:ffd5:1:100::1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 58ms

Actually I had (still have) that 262:ffd5:1:100::1 address in the original gateway clause of my interfaces file. Thought I'd fluffed an edit and removed the '0' myself over the weekend, but no, it's still there but doesn't actually appear in the live route table which shows the correct 2602: version. I guess RA is taking priority.

Some connectivity back up for me, but still no connections to/from various Hurricane Electric POPs, which I guess/hope is related to the peering issue.

Edit: Just got the e-mail @servarica_hani thanks for the detailed and honest explanation.

brueggus · July 2021

@servarica_hani said: you cant connect to google though as current uplink does not peer with them , this will be fixed when we get back the other uplink

They also don't peer with HE...

Kudos for honoring your SLAs and offering refunds, but... no, thank you. My service has been really solid and performing great so far and I'm fine with that short downtime. (Hint to other providers: I probably wouldn't be if communication hadn't been as comprehensive and continual like @servarica_hani did)
I just hope everything goes well with your second upstream so I can connect again from my Virmach boxes running behind HE's tunnelbroker.

Sounds like you had a really tough weekend, so I hope you have some time to relax now!

default · July 2021

I received an email about refunds. Basically people can ask for a refund or partial refund.

Hi [real_name]

As you know from 30th of June and for 2 days after that we have been moving servers to new location.

The move took much more than expected for storage vps plans, and in the email we would like to explain what went wrong , explain the compensation for the downtime and some good news about new product lines

How our storage works

As some of you know we are running our storage servers as SAN storage, that means we have separate server that have the disks and another server that run the VPS (VMs)

The servers that have disks will call them in this email as storage servers and the servers where VMs run will call them compute servers.

The benefit of this system is that it allow us to run compute servers in pools with N+1 redundancy,

So for a compute pool with 4 servers, we will run VPSes(VMs) that fill the RAM of only 3 servers keeping 1 free (we run the vms in all 4 servers but we have enough free ram in the pool to free 1 server)

We can move VPS(VM) live between compute servers of the pool and we use this feature to apply security updates and other maintenance to the servers (we free the servers by moving all its VPS to the other compute servers in the pool live then we can reboot that server)

It is a lot of work to do maintenance this way but it allow us to keep client vms up and running even when we need to reboot the host they run in.

We have several compute pools and several storage servers and any vm on any pool can have its storage in any storage server.

This many to many relationship proved to be huge issue while moving.

As we are forced to move all compute servers and all storage servers in one shot

Factors that affected the move

Now add to that other factor that make this move much slower.

Disks

our top priority was not actually speed, our top priority was delivering the disks safe and keep them secure through the journey since user data was on them.
We cannot move the servers while disks are in due to very heavy weight and due to the fact it is too risky on the disks so all disks have to be removed from servers and packed, and on the new datacenter the disks have to reinstalled again on the servers

In order to reduce move time, we decided to create the network in the new datacenter from scratch
We did all the cabling in the new DC before the move.
Although having the network ready on the new dc was done to save a lot of time it didn’t end up saving much due to mistakes in connecting the servers to wrong network ports (even with correct labeling)

Manpower:

The Move was estimated based on the fact that several movers will be moving the servers.
The move of the first SSD and NVMe servers went fast but we faced few networking issues in the new location, and we decided to fix them before moving the remaining servers.
The network fix and testing that everything on first 2 patches is good took almost all 30th of June.
We didn’t agree with anyone for the 1st of July and non of the movers was available for 1st of July
That means the biggest part of the move (physically not necessarily in the number of clients) has to be done by single person 1st of July

Date:

30th of June and 1st of July are the moving days for every home in Montreal (here all home renters contract finish on 30th of June and in the 1 week before and after it is very hard to find trucks and movers )
That is the reason why no one was available on 1st of July
The contract with old datacenter end for most of the racks on June 1st and to avoid gray situations I decided to proceed with the move alone

Due to those factors the work of moving storage servers was done by single person, now with all those disks and servers moving it took very long time to be done

By the time the server installation done few mistakes were done while connecting the network which caused few extra hours of delay as well

SLA REFUND:

Storage servers availability were between 95% to 97% due to the move and as a result of that based on our SLA every storage user is entitled to 50% refund

For SSD and NVMe servers since their uptime was between 99% to 99.9% their refund will be 15%

For IPv6 only storage servers (mouse storage with IPv6 only) the refund is 100%

So please answer this email or open a ticket for the refund to be applied.

Few notes:

if total refund value was under 6$ then it will be just added as credit to your account which will be applied to next invoice
if last payment cannot be refunded due to the payment method (crypto currencies) or due to being long time since the payment made (6+ months for paypal) the refund will be issued as credit
for non monthly payments the refund is calculated based on the total payment for the term / number of months in the term * refund percentage

So for yearly plan that cost 120$ with 50% refund total refund will be 120/12 * 0.50 = 5$

New Products Soon:

Since we have moved to our own space, we will be able to offer services that we didn’t offer in the past mainly we will start offering dedicated servers and even colocation services

For dedicated servers we may even offer storage dedicated servers

Thanks

Hani

Team ServaRICA

cochon · July 2021

@default said:
I received an email about refunds. Basically people can ask for a refund or partial refund.

Or not ask at all if you weren't hugely impacted. Think it works out $1 for the Mouse plans

default · July 2021

@servarica_hani - If I may ask, why did you not hire more people, so the impact would be minimal?

default · July 2021

@default said:
@servarica_hani - If I may ask, why did you not hire more people, so the impact would be minimal?

EDIT: Also, why did you want to move in the first place? There is a saying: "If it works, don't fix it."

Shot2 · July 2021

@default said:

@default said:
@servarica_hani - If I may ask, why did you not hire more people, so the impact would be minimal?

EDIT: Also, why did you want to move in the first place? There is a saying: "If it works, don't fix it."

While we're at questions: what's the name/location of the new DC (is it a new DC)? so I can update my dns LOC records

Howdy, Stranger!

Categories

In this Discussion

Serverica down

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Serverica down

Comments