750k RPS on a single dedicated server

angelius · January 2020

https://vms2.terasp.net/

What do you think about this?

Neoon · January 2020

Hetzner, so it must be good.
But I also read Cloud and Node.js, so I am confused.

angelius · January 2020

@Neoon said:
Hetzner, so it must be good.
But I also read Cloud and Node.js, so I am confused.

Where do you see the word cloud there?

Neoon · January 2020

https://websummit.com/wp-content/uploads/2019/11/Press-release-4-November-2019.pdf

eva2000 · January 2020

Well they test with wrk load testing tool and that is HTTP/1.1 based so they're testing HTTP/1.1 HTTPS loads which in this day and age isn't always real world given that HTTP/2 HTTPS is becoming the norm. They should also test using h2load HTTP/2 load testing tool.

I could much higher requests/sec using wrk with HTTP/1.1 HTTPS than with h2load with HTTP/2 HTTPS.

angelius · January 2020

I'm super interested in this kind of high perf project, what is your best score (RPS)? And with wich stack? GO? C++? Java?

stefeman · January 2020

Let's see how far this goes..

wrk -t8 -c1024 -d24h https://vms2.terasp.net/debug

I highly doubt its one AMD 3600 generating 750k RS

Im doing the above command x5 on my servers and got it up to 1 Mil RS now.

Im highly interested in this project for sure, will run it as a cache or reverse proxy or even as a webserver if it turns out to be good.

stefeman · January 2020

we'll see how far it works.

I'm genuinely impressed now.

angelius · January 2020

Cool, Indeed it's now showing 1M RPS! It will be an amazing web / cache server for sure

eva2000 · January 2020

Google search for the server name and leads to pdf press release for appdrag

AppDrag unveils another transformational break through in full-stack serverless platform technology at Web Summit AppDrag, the full-stack serverless platform, has gained a serious reputation among web professionals for how its technology is transforming the efficiency of enterprise web application development. The company was the first to introduce serverless technology catering for both backend and UI requirements in a single, fully integrated development platform.

Today, at Web Summit in Lisbon where AppDrag has been invited to present at a Showcase event, the company announces a transformational breakthrough in cloud webserver technology. Smashing through conventional limitations, AppDrag has achieved server response times at least 100 times faster than current norms. Dubbed µFTL, the new generation cloud webserver software will be commercially available through AppDrag’s full-stack serverless platform in Q1 2020.

In a statement today the company’s founder and CTO, Joseph Benguira, said:“ This is an incredible breakthrough for the research team at AppDrag. The scale of improvement is more than we imagined possible and will really set AppDrag apart. AppDrag’s full-stack serverless cloud platform was already tuned to produce industry-leading speeds, but this development is truly revolutionary. Responding to the huge interest we’ve seen from enterprise cloud computing leaders, our team is working towards production release of µFTL, through the AppDrag platform, in the first quarter of next year. We are very excited to remain at the forefront of serverless platform technology. ”We’ve already started, so visit AppDrag.com to see how we can build the future, together

and

About AppDragAppDrag, is a Dublin, Ireland, based leader in full-stack serverless platform technology. It enables professionaldevelopers to build enterprise-grade web applications 10X faster and 55% cheaper than traditionally.

About µFTLµFTL is the webserver software developed by AppDrag. Pronounced “micro FTL”, it stands for micro Faster-Than-Light, so-called because of the of less than 3 microsecond server response times it achieves, with super-lowlatency, in a 4 vCPU (virtual CPU) environment. For comparison purposes, µFTL delivers 500,000 requests-per-second (RPS) which compares to speeds achieved by popular languages such as Ruby on Rails (80 RPS), PHP (250RPS), Node.js (5,000 RPS), .Net (15,000 RPS), Java (18,000RPS ), GO (25,000 RPS)

angelius · January 2020

Yup Neoon already found this and published the link to the pdf few posts above, you are 1 day late mate

eva2000 · January 2020

whoops missed that LOL

eva2000 · January 2020

AppDrag's github account at https://github.com/jbenguira

jsg · January 2020

@angelius

Frankly, from what little information (let alone tangible info) is available I think this is a load of BS.

Appdrag marketing said (in their PDF)
For comparison purposes, µFTL delivers 500,000 requests-persecond (RPS) which compares to speeds achieved by popular languages such as Ruby on Rails (80 RPS), PHP (250 RPS), Node.js (5,000 RPS), .Net (15,000 RPS), Java (18,000RPS ), GO (25,000 RPS).

Experience tells us that someone who needs to bend the status quo to make himself look great does not have something great. Also note the fact that µFTL code as well as benchmark code is not yet provided - but marketing blabla is.

They tell us nothing about the context, nothing tangible about the "cluster", virtually nothing tangible about µFTL, nothing about the benchmark. All we have is their marketing blurb.

Plus the benchmark numbers they provide are between ridiculous and nonsensical. and not at all credible.

Also note that in one place they claim µFTL to be 100% faster while in their press release they claim "response times at least 100 times faster".

@eva2000 said:
I could much higher requests/sec using wrk with HTTP/1.1 HTTPS than with h2load with HTTP/2 HTTPS.

I guess that's in part due to http/2 being much more complex and in part due to a quite different request structure. In summary though (from the users perspective) http/2 should deliver somewhat better results (between 5% and 30%, typ. ca. 15%).

eva2000 · January 2020

jsg said: I guess that's in part due to http/2 being much more complex and in part due to a quite different request structure. In summary though (from the users perspective) http/2 should deliver somewhat better results (between 5% and 30%, typ. ca. 15%).

HTTP/1.1 vs HTTP/2 - for latency response time in terms of page speed yes HTTP/2 is faster but throughput as in requests/second maybe not.

But yeah without context of the benchmark environment hard to say.

Did a quick test on my Centmin Mod Nginx builds with CentOS 7.7 64bit and Intel Core i7 4790K 4C/8T with forked version of wrk, wrk-cmm https://github.com/centminmod/wrk/tree/centminmod for wrk-cmm -t4 -c256 test of hello world static file and could push around 160,000 to 165,000 requests/sec for HTTP/1.1 HTTPS work loads.

wrk-cmm -t4 -c256 --latency --breakout https://domain.com/debug
Running 10s test @ https://domain.com/debug
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.50ms    1.51ms  50.57ms   92.75%
    Connect    37.33ms   21.74ms  94.33ms   55.16%
    TTFB        1.49ms    1.51ms  50.55ms   92.75%
    TTLB        3.36us   11.22us   4.79ms   99.97%
    Req/Sec    41.28k     6.54k   56.69k    75.50%
  Latency Distribution
     50%    1.32ms
     75%    1.90ms
     90%    2.51ms
     99%    7.99ms
  1647421 requests in 10.08s, 521.61MB read
Requests/sec: 163497.57
Transfer/sec:     51.77MB

Since µFTL is going to be used on their own AppDrag cloud platform, they have access and ability to tune their whole environment and web stack/networking for such. They maybe doing same thing Facebook and Cloudflare are doing and using XDP/DPDK like tech to move network packet processing away from Kernel to userland which can realistically produce such amount of requests/second on a good web server. I vaguely recall seeing someone experiment with Nginx custom build with DPDK or XDP pushing easily 10-50x times higher requests/sec than Nginx via normal Kernel network processing.

eva2000 · January 2020

eva2000 said: HTTP/1.1 vs HTTP/2 - for latency response time in terms of page speed yes HTTP/2 is faster but throughput as in requests/second maybe not.

guess it depends on testing tool

just did h2load HTTP/2 vs HTTP/1.1 HTTPS benchmarks

HTTP/1.1 HTTPS = 191,757.50 req/s at 200k requests and 202K req/s at 1 million requests
HTTP/2 HTTPS = 226,008.36 req/s at 200K requests and 266K req/s at 1 million requests

HTTP/1.1 HTTPS

domain=domain.com
h2load --h1 -t4 -c256 -n200000 -m60 https://$domain/debug
starting benchmark...
spawning thread #0: 64 total client(s). 50000 total requests
spawning thread #1: 64 total client(s). 50000 total requests
spawning thread #2: 64 total client(s). 50000 total requests
spawning thread #3: 64 total client(s). 50000 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: ECDH P-256 256 bits
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 1.04s, 191757.50 req/s, 60.71MB/s
requests: 200000 total, 200000 started, 200000 done, 200000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 63.32MB (66400000) total, 46.73MB (49000000) headers (space savings 0.00%), 5.34MB (5600000) data
                     min         max         mean         sd        +/- sd
time for request:     1.45ms    197.72ms     59.51ms     28.16ms    72.83%
time for connect:     1.63ms    148.28ms     74.09ms     31.75ms    67.19%
time to 1st byte:    59.39ms    293.07ms    111.77ms     25.08ms    73.83%
req/s           :     749.97     1244.79      920.06      128.68    51.95%

HTTP/2 HTTPS

domain=domain.com
h2load -t4 -c256 -n200000 -m60 https://$domain/debug
starting benchmark...
spawning thread #0: 64 total client(s). 50000 total requests
spawning thread #1: 64 total client(s). 50000 total requests
spawning thread #2: 64 total client(s). 50000 total requests
spawning thread #3: 64 total client(s). 50000 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 884.92ms, 226008.36 req/s, 12.12MB/s
requests: 200000 total, 200000 started, 200000 done, 200000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 200000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 10.73MB (11247994) total, 1.94MB (2035450) headers (space savings 95.67%), 5.34MB (5600000) data
                     min         max         mean         sd        +/- sd
time for request:      571us    148.68ms     51.16ms     18.57ms    73.83%
time for connect:     1.17ms    142.35ms     83.51ms     35.98ms    73.83%
time to 1st byte:    56.55ms    202.70ms    132.96ms     37.59ms    56.25%
req/s           :     883.83     1274.17     1008.70       83.59    64.45%

Note, my Centmin Mod Nginx server was running with Cloudflare full HTTP/2 HPACK encoding patch hence why h2load reported header space savings in 95+% range. Nginx upstream doesn't implement full HTTP/2 HPACK encoding so usually you'd only see header space savings between 15-25%.

So probably difference in HTTP/2 vs HTTP/1.1 HTTPS for h2load tests came down to HTTP/2 HPACK header encoding savings = less data transferred = more requests/sec.

BunnySpeed · January 2020

I can easily make nginx+lua respond with ~450k Hello World requests per second with a slightly tuned setup on an E3-1270 v5 that's doing a whole lot of other checks, configuration etc. so 750k on a Ryzen 3600 would probably be easily done as well.

This seems incredibly stripped down and tweaked to perform really well in benchmarks and even running with Apache Benchmark, for example, fails completely because it was tuned to get the best performance out of wrk.

The interesting thing here is that this is Node.js, but besides that, it's nothing spectacular.

In the end, the CPU will be eaten away by the application, SSL handshakes, disk IO etc. anyway.

jsg · January 2020

@eva2000 said:
Since µFTL is going to be used on their own AppDrag cloud platform, they have access and ability to tune their whole environment and web stack/networking for such. They maybe doing same thing Facebook and Cloudflare are doing and using XDP/DPDK like tech to move network packet processing away from Kernel to userland which can realistically produce such amount of requests/second on a good web server. I vaguely recall seeing someone experiment with Nginx custom build with DPDK or XDP pushing easily 10-50x times higher requests/sec than Nginx via normal Kernel network processing.

With some nodejs based tool? I doubt that. And note that they didn't say that their platform is now so fast but that their new tool is, and that's nodejs based.

I value your hands on approach to run some benchmarks as well as your thoughts re DPDK/XDP but again: This whole thing about some marketing blabla with no relevant information whatsoever and some ridiculous data on the "competition". Frankly, I think your work is but a waste of time. That marketing BS does not deserve your efforts.

"http/1.1 vs http/2"

As you noted correctly this still is somewhat of a lottery because http/2 is relatively new, a lot more complex than 1.1 and current implementations are early/not yet really sound.
FWIW I myself am still quite reluctant re http/2 because at least as of now I don't like the tradeoff between sound and battle-proven http/1.1 vs. often flaky and not yet production quality/real world proven code. Also http/2 does no miracles; if someones application is too slow then the reason is rarely to do with http protocol version.

eva2000 · January 2020

jsg said: "http/1.1 vs http/2"

As you noted correctly this still is somewhat of a lottery because http/2 is relatively new, a lot more complex than 1.1 and current implementations are early/not yet really sound.

FWIW I myself am still quite reluctant re http/2 because at least as of now I don't like the tradeoff between sound and battle-proven http/1.1 vs. often flaky and not yet production quality/real world proven code. Also http/2 does no miracles; if someones application is too slow then the reason is rarely to do with http protocol version.

Yeah HTTP/2 implementations also differ between web servers so it can vary too. But my personal focus on HTTP/2 HTTPS is because all my sites by default use it.

jsg said: I value your hands on approach to run some benchmarks as well as your thoughts re DPDK/XDP but again: This whole thing about some marketing blabla with no relevant information whatsoever and some ridiculous data on the "competition". Frankly, I think your work is but a waste of time. That marketing BS does not deserve your efforts.

Yeah I was just curious heh. But yeah for a 27 bytes debug file which has ~67 bytes network transfer overhead, it's small enough for any web server properly configured to push decent numbers. For real world where file/html sizes are much larger that would be more telling.

eva2000 · January 2020

@BunnySpeed said:
I can easily make nginx+lua respond with ~450k Hello World requests per second with a slightly tuned setup at all on an E3-1270 v5 that's doing a whole lot of other checks, configuration etc. so 750k on a Ryzen 3600 would probably be easily done as well.

This seems incredibly stripped down and tweaked to perform really well in benchmarks and even running with Apache Benchmark, for example, fails completely because it was tuned to get the best performance out of wrk.

The interesting thing here is that this is Node.js, but besides that, it's nothing spectacular.

In the end, the CPU will be eaten away by the application, SSL handshakes, disk IO etc. anyway.

Yeah nginx+lua would be another option. But you do have a point would be interesting to see cpu/memory usage comparisons too.

BunnySpeed · January 2020

@eva2000 said:

@BunnySpeed said:
I can easily make nginx+lua respond with ~450k Hello World requests per second with a slightly tuned setup at all on an E3-1270 v5 that's doing a whole lot of other checks, configuration etc. so 750k on a Ryzen 3600 would probably be easily done as well.

This seems incredibly stripped down and tweaked to perform really well in benchmarks and even running with Apache Benchmark, for example, fails completely because it was tuned to get the best performance out of wrk.

The interesting thing here is that this is Node.js, but besides that, it's nothing spectacular.

In the end, the CPU will be eaten away by the application, SSL handshakes, disk IO etc. anyway.

Yeah nginx+lua would be another option. But you do have a point would be interesting to see cpu/memory usage comparisons too.

I think it's performing decently, but I think this statement is a bit misleading: "Use your hardware resources up to 100x more efficiently. µFTL can be used as a load balancer, a firewall, a DDOS protection layer, an in-memory cache, an api gateway and a serverless runtime for Node.js. Can be scaled horizontally by adding more nodes in cluster mode."

EDIT: Removed the last paragraph as I noticed they're actually doing over 1M, not 750k

eva2000 · January 2020

@BunnySpeed you had me curious about nginx lua hello world tests too and since I have my Centmin Mod Nginx build with optional lua nginx module enabled, decided to test it out.

First result is the best I squeezed out for plain Nginx throughput at ~280K requests/s and 2nd result is Nginx Lua for same hello world test at ~334K requests/s.

h2load -t4 -c256 -n1000000 -m100 https://$domain/debug
starting benchmark...
spawning thread #0: 64 total client(s). 250000 total requests
spawning thread #1: 64 total client(s). 250000 total requests
spawning thread #2: 64 total client(s). 250000 total requests
spawning thread #3: 64 total client(s). 250000 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 3.56s, 280511.44 req/s, 15.00MB/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 53.47MB (56062576) total, 9.58MB (10050032) headers (space savings 95.72%), 26.70MB (28000000) data
                     min         max         mean         sd        +/- sd
time for request:      451us    222.56ms     83.82ms     23.72ms    76.56%
time for connect:     1.60ms    165.09ms     92.76ms     40.13ms    74.22%
time to 1st byte:    55.01ms    247.68ms    176.37ms     44.31ms    63.28%
req/s           :    1096.09     1225.97     1140.31       20.45    80.08%

h2load -t4 -c256 -n1000000 -m100 https://$domain/lua_hello
starting benchmark...
spawning thread #0: 64 total client(s). 250000 total requests
spawning thread #1: 64 total client(s). 250000 total requests
spawning thread #2: 64 total client(s). 250000 total requests
spawning thread #3: 64 total client(s). 250000 total requests
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 2.99s, 334942.73 req/s, 16.63MB/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 49.64MB (52051968) total, 5.76MB (6039424) headers (space savings 95.78%), 26.70MB (28000000) data
                     min         max         mean         sd        +/- sd
time for request:      193us    140.41ms     72.10ms     12.95ms    73.45%
time for connect:     4.75ms    153.01ms     81.14ms     34.87ms    71.48%
time to 1st byte:    50.17ms    234.37ms    147.35ms     44.29ms    62.11%
req/s           :    1308.70     1382.55     1326.85       16.14    59.77%

Difference in header size due to HTTP response headers for Nginx versus Nginx lua module response.

curl -Is https://$domain/debug
HTTP/2 200 
date: Sat, 04 Jan 2020 18:36:50 GMT
content-type: application/octet-stream
content-length: 28
last-modified: Sat, 04 Jan 2020 06:59:42 GMT
vary: Accept-Encoding
etag: "5e1037de-1c"
server: nginx centminmod
x-powered-by: centminmod
accept-ranges: bytes

curl -Is https://$domain/lua_hello                       
HTTP/2 200 
date: Sat, 04 Jan 2020 18:36:43 GMT
content-type: text/plain; charset=utf-8
server: nginx centminmod

nginx -V
nginx version: nginx/1.17.7 (251219-185254-centos7-4a417a7)
built by gcc 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) 
built with OpenSSL 1.1.1d  10 Sep 2019

Howdy, Stranger!

Categories

In this Discussion

750k RPS on a single dedicated server

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

750k RPS on a single dedicated server

Comments