Did push-ups break a Ryzen? (Nexril Dallas)

yoursunny · January 2022

Dallas was one of initial locations on my push-ups delivery network.
Today it went offline:

Hello push-up specialist,

We are currently investigating a potential hardware issue with one of our VPS nodes (RYZEN01DAL) that has caused this node to crash and reboot several times this morning.

Customers with servers on this node may notice some downtime in the process. We will try to keep this as short as possible and have everyone back online soon.

Best Regards,
Nexril

According to UptimeRobot, the server has been restarted more than 3 times in the past 4 hours.
The incident is still ongoing.

Given how many push-ups are served from this server, I guess the sheer amount of push-ups has broken the Ryzen.
If you are watching push-up videos from me, my brother, and other contributors, you would experience higher latency and lower resolution because of this.
We apologize for the inconvenience.

Boogeyman · January 2022

Have your push-ups video backed up. The end is nigh.

Neoon · January 2022

to much bending on the motherboard, because one guy needed to show off by doing the pushups with one hand instead of 2.

Ganonk · January 2022

yoursunny · January 2022

The push-ups did indeed break the Ryzen.

Hello push-up specialist,

Due to a catastrophic CPU hardware failure in RYZEN01DAL, we were forced to migrate all Ryzen VPS customers on this node to a different, temporary node until we are able to make the appropriate repairs.

At this time, your service(s) on this node should be back online and you should not expect any unplanned further disruptions. If you are still unable to access your service, please open a ticket immediately so we can investigate.

We will send another follow-up email with more details about the incident, our next steps moving forwards, as well as compensation for the downtime.

Best Regards,
Nexril

Temporary server has a different CPU model:

RYZEN01DAL: AMD Ryzen 9 3950X 16-Core Processor
RYZEN01DAL (TEMP): AMD Ryzen 9 5950X 16-Core Processor

MannDude · January 2022

We weren't the only one to do a mobo replacement to bring in the New Year.
https://portal.incognet.io/serverstatus.php?view=resolved

Also Ryzen.

Node was rebooting as quickly as every other minute.

yoursunny · January 2022

@Boogeyman said:
Have your push-ups video backed up.

The push-ups repository has two replicas, currently in Nexril Dallas and Evolution Host Roubaix.
If I lose a repository node, there would be no data loss.
If I lose both, videos are unavailable.

The push-ups delivery network is completely distributed.
If I lose one node, there would be no service interruption.
If I lose two nodes causing a network partition, videos are unavailable in certain regions.

The end is nigh.

I hope not.
I paid 24 push-ups as setup fee for this server.

@MannDude said:
We weren't the only one to do a mobo replacement to bring in the New Year.
https://portal.incognet.io/serverstatus.php?view=resolved

Also Ryzen.

Node was rebooting as quickly as every other minute.

Not caused by my push-ups.
I have store credit but haven't ordered anything yet.

Blame @Boogeyman and the Nigh Sect.

Levi · January 2022

Time to go from push-ups to pull-ups.

yoursunny · January 2022

@Ganonk said:

That's a crunch, not a push-up.

Crunch is a core exercise.
Push-up is a chest and tricep exercise.

Planet Fitness #UnitedWeMove campaign in 2020 was full of crunches.

@Neoon said:
to much bending on the motherboard, because one guy needed to show off by doing the pushups with one hand instead of 2.

The sheer amount of push-ups burnt the Ryzen, not the motherboard.

I don't have the strength to do push-ups with one hand.
If you can, please contribute a video.

dahartigan · January 2022

Open bobs @yoursunny

bdl · January 2022

Pushups + Constipation = Reverse Log Ryzen Gang

TimboJones · January 2022

@yoursunny said:
Dallas was one of initial locations on my push-ups delivery network.
Today it went offline:

Hello push-up specialist,

We are currently investigating a potential hardware issue with one of our VPS nodes (RYZEN01DAL) that has caused this node to crash and reboot several times this morning.

Customers with servers on this node may notice some downtime in the process. We will try to keep this as short as possible and have everyone back online soon.

Best Regards,
Nexril

According to UptimeRobot, the server has been restarted more than 3 times in the past 4 hours.
The incident is still ongoing.

Given how many push-ups are served from this server, I guess the sheer amount of push-ups has broken the Ryzen.
If you are watching push-up videos from me, my brother, and other contributors, you would experience higher latency and lower resolution because of this.
We apologize for the inconvenience.

Why the fuck does latency matter so much for one way video streaming? It's just eaten up in the upfront buffering, which nobody will notice 50ms vs 150ms. It certainly doesn't make sense that the resolution would be lower unless the bandwidth was lower (which is a function of latency, but not a hard limit). Are you doing it wrong?

yoursunny · January 2022

@TimboJones said:
Why does latency matter so much for one way video streaming? It's just eaten up in the upfront buffering, which nobody will notice 50ms vs 150ms. It certainly doesn't make sense that the resolution would be lower unless the bandwidth was lower (which is a function of latency, but not a hard limit). Are you doing it wrong?

Named Data Networking protocol requires the consumer (data receiver, which is the browser in my app) to send one request for each segment (8KB).
This property is called flow balance and it allows the consumer to perform congestion control.
(TCP has a similar mechanism using ACK packets, but congestion control algorithm runs on the sender side)

The congestion control algorithm is still being designed and improved.
I'm currently using CUBIC, but it's not working well with in-network caching.
In particular, if some segments are cached and some are not, the bandwidth estimation from CUBIC algorithm would fluctuate, and then Shaka Player goes crazy.

Researchers in UCLA are investigating a BBR-inspired congestion control algorithm, that supposedly is compatible with in-network caching.
I'm still reading their paper to understand how it works.
I expect some improvements after that.

TimboJones · January 2022

@yoursunny said:

@TimboJones said:
Why does latency matter so much for one way video streaming? It's just eaten up in the upfront buffering, which nobody will notice 50ms vs 150ms. It certainly doesn't make sense that the resolution would be lower unless the bandwidth was lower (which is a function of latency, but not a hard limit). Are you doing it wrong?

Named Data Networking protocol requires the consumer (data receiver, which is the browser in my app) to send one request for each segment (8KB).
This property is called flow balance and it allows the consumer to perform congestion control.
(TCP has a similar mechanism using ACK packets, but congestion control algorithm runs on the sender side)

The congestion control algorithm is still being designed and improved.
I'm currently using CUBIC, but it's not working well with in-network caching.
In particular, if some segments are cached and some are not, the bandwidth estimation from CUBIC algorithm would fluctuate, and then Shaka Player goes crazy.

Researchers in UCLA are investigating a BBR-inspired congestion control algorithm, that supposedly is compatible with in-network caching.
I'm still reading their paper to understand how it works.
I expect some improvements after that.

Sounds like your buffers are too small.

yoursunny · January 2022

@TimboJones said:

@yoursunny said:

@TimboJones said:
Why does latency matter so much for one way video streaming? It's just eaten up in the upfront buffering, which nobody will notice 50ms vs 150ms. It certainly doesn't make sense that the resolution would be lower unless the bandwidth was lower (which is a function of latency, but not a hard limit). Are you doing it wrong?

Named Data Networking protocol requires the consumer (data receiver, which is the browser in my app) to send one request for each segment (8KB).
This property is called flow balance and it allows the consumer to perform congestion control.
(TCP has a similar mechanism using ACK packets, but congestion control algorithm runs on the sender side)

The congestion control algorithm is still being designed and improved.
I'm currently using CUBIC, but it's not working well with in-network caching.
In particular, if some segments are cached and some are not, the bandwidth estimation from CUBIC algorithm would fluctuate, and then Shaka Player goes crazy.

Researchers in UCLA are investigating a BBR-inspired congestion control algorithm, that supposedly is compatible with in-network caching.
I'm still reading their paper to understand how it works.
I expect some improvements after that.

Sounds like your buffers are too small.

Which buffer are you referring to?

video buffer in the browser?
in-network cache on the software routers?

How did you infer the buffers are too small?

Levi · January 2022

@yoursunny said:

@TimboJones said:

@yoursunny said:

@TimboJones said:
Why does latency matter so much for one way video streaming? It's just eaten up in the upfront buffering, which nobody will notice 50ms vs 150ms. It certainly doesn't make sense that the resolution would be lower unless the bandwidth was lower (which is a function of latency, but not a hard limit). Are you doing it wrong?

Named Data Networking protocol requires the consumer (data receiver, which is the browser in my app) to send one request for each segment (8KB).
This property is called flow balance and it allows the consumer to perform congestion control.
(TCP has a similar mechanism using ACK packets, but congestion control algorithm runs on the sender side)

The congestion control algorithm is still being designed and improved.
I'm currently using CUBIC, but it's not working well with in-network caching.
In particular, if some segments are cached and some are not, the bandwidth estimation from CUBIC algorithm would fluctuate, and then Shaka Player goes crazy.

Researchers in UCLA are investigating a BBR-inspired congestion control algorithm, that supposedly is compatible with in-network caching.
I'm still reading their paper to understand how it works.
I expect some improvements after that.

Sounds like your buffers are too small.

Which buffer are you referring to?

video buffer in the browser?

in-network cache on the software routers?

How did you infer the buffers are too small?

Here is accurate description of the buffer:

https://www.urbandictionary.com/author.php?author=Rob Ta

yoursunny · January 2022

Hello push-up specialist,

This is a follow-up email regarding the incident with Ryzen VPS node RYZEN01DAL on 01/01/2022.

We have determined the cause of the failure to be a faulty processor. The processor threw several MCEs (Machine Check Exceptions) that caused the node to reboot several times during the morning. Later in the afternoon, reboots became more frequent until eventually the node only boot looped. While we do keep many components on-site like power supplies, RAM, and hard drives, we do not stock spare CPUs. We will be shipping this node out to be fully repaired and re-tested before being put back into service. This may take several weeks to complete.

All customers were moved onto a temporary node the evening of 01/01/2022 to avoid further service disruption. We've been monitoring the temporary node and everything has been running smoothly so far with no issues. When RYZEN01DAL has been repaired and extensively tested, we will send another notification out to customers at least one week in advance before migrating customers back.

We apologize for any inconveniences this disruption has caused to your service(s) and will be extending all affected services by two months. You should see this change active on your account now.

If you have any questions regarding this message, feel free to reach out to us via ticket.

Best Regards,
Nexril

We get 2 month service time extension for a few hours of downtime and no data loss?
Not acceptable!

If there's no data loss, the compensation should be 2 days.
If there's data loss (partial or total doesn't matter), the compensation should be 2 weeks.

TimboJones · January 2022

@yoursunny said:

@TimboJones said:

@yoursunny said:

@TimboJones said:
Why does latency matter so much for one way video streaming? It's just eaten up in the upfront buffering, which nobody will notice 50ms vs 150ms. It certainly doesn't make sense that the resolution would be lower unless the bandwidth was lower (which is a function of latency, but not a hard limit). Are you doing it wrong?

Named Data Networking protocol requires the consumer (data receiver, which is the browser in my app) to send one request for each segment (8KB).
This property is called flow balance and it allows the consumer to perform congestion control.
(TCP has a similar mechanism using ACK packets, but congestion control algorithm runs on the sender side)

The congestion control algorithm is still being designed and improved.
I'm currently using CUBIC, but it's not working well with in-network caching.
In particular, if some segments are cached and some are not, the bandwidth estimation from CUBIC algorithm would fluctuate, and then Shaka Player goes crazy.

Researchers in UCLA are investigating a BBR-inspired congestion control algorithm, that supposedly is compatible with in-network caching.
I'm still reading their paper to understand how it works.
I expect some improvements after that.

Sounds like your buffers are too small.

Which buffer are you referring to?

video buffer in the browser?

in-network cache on the software routers?

How did you infer the buffers are too small?

Video buffer on client side if it's the one asking for video. Give it more runway.

Howdy, Stranger!

Categories

In this Discussion

Did push-ups break a Ryzen? (Nexril Dallas)

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Did push-ups break a Ryzen? (Nexril Dallas)

Comments