Network Performance
TCP
Compare TCP socket performance via echo server.
Test program
https://github.com/alibaba/PhotonLibOS/blob/main/examples/perf/net-perf.cpp
Build
cmake -B build -D PHOTON_BUILD_TESTING=1 -D PHOTON_ENABLE_URING=1 -D CMAKE_BUILD_TYPE=Release
cmake --build build -j 8 -t net-perf
Run
Server
./build/output/net-perf -port 9527 -buf_size 512
Streaming client
./build/output/net-perf -client -client_mode streaming -ip <server_ip> -port 9527 -buf_size 512
Ping-pong client
./build/output/net-perf -client -client_mode ping-pong -ip <server_ip> -port 9527 -buf_size 512 -client_connection_num 100
note
Of course you can use your own client, as long as it follows the TCP echo protocol.
Measure
You can either monitor the server's network bandwidth via iftop
, or print its QPS periodically from the code.
Results
1. Streaming
Language | Concurrency Model | Buffer Size | Conn Num | QPS | Bandwidth | CPU util | |
---|---|---|---|---|---|---|---|
Photon | C++ | Stackful Coroutine | 512 Bytes | 4 | 1604K | 6.12Gb | 99% |
cocoyaxi | C++ | Stackful Coroutine | 512 Bytes | 4 | 1545K | 5.89Gb | 99% |
tokio | Rust | Stackless Coroutine | 512 Bytes | 4 | 1384K | 5.28Gb | 98% |
acl/lib_fiber | C++ | Stackful Coroutine | 512 Bytes | 4 | 1240K | 4.73Gb | 94% |
Go | Golang | Stackful Coroutine | 512 Bytes | 4 | 1083K | 4.13Gb | 100% |
libgo | C++ | Stackful Coroutine | 512 Bytes | 4 | 770K | 2.94Gb | 99% |
boost::asio | C++ | Async Callback | 512 Bytes | 4 | 634K | 2.42Gb | 97% |
monoio | Rust | Stackless Coroutine | 512 Bytes | 4 | 610K | 2.32Gb | 100% |
Python3 asyncio | Python | Stackless Coroutine | 512 Bytes | 4 | 517K | 1.97Gb | 99% |
libco | C++ | Stackful Coroutine | 512 Bytes | 4 | 432K | 1.65Gb | 96% |
zab | C++20 | Stackless Coroutine | 512 Bytes | 4 | 412K | 1.57Gb | 99% |
asyncio | C++20 | Stackless Coroutine | 512 Bytes | 4 | 186K | 0.71Gb | 98% |
2. Ping-pong
Language | Concurrency Model | Buffer Size | Conn Num | QPS | Bandwidth | CPU util | |
---|---|---|---|---|---|---|---|
Photon | C++ | Stackful Coroutine | 512 Bytes | 1000 | 412K | 1.57Gb | 100% |
monoio | Rust | Stackless Coroutine | 512 Bytes | 1000 | 400K | 1.52Gb | 100% |
boost::asio | C++ | Async Callback | 512 Bytes | 1000 | 393K | 1.49Gb | 100% |
evpp | C++ | Async Callback | 512 Bytes | 1000 | 378K | 1.44Gb | 100% |
tokio | Rust | Stackless Coroutine | 512 Bytes | 1000 | 365K | 1.39Gb | 100% |
netty | Java | Async Callback | 512 Bytes | 1000 | 340K | 1.30Gb | 99% |
Go | Golang | Stackful Coroutine | 512 Bytes | 1000 | 331K | 1.26Gb | 100% |
acl/lib_fiber | C++ | Stackful Coroutine | 512 Bytes | 1000 | 327K | 1.25Gb | 100% |
swoole | PHP | Stackful Coroutine | 512 Bytes | 1000 | 325K | 1.24Gb | 99% |
zab | C++20 | Stackless Coroutine | 512 Bytes | 1000 | 317K | 1.21Gb | 100% |
cocoyaxi | C++ | Stackful Coroutine | 512 Bytes | 1000 | 279K | 1.06Gb | 98% |
libco | C++ | Stackful Coroutine | 512 Bytes | 1000 | 260K | 0.99Gb | 96% |
libgo | C++ | Stackful Coroutine | 512 Bytes | 1000 | 258K | 0.98Gb | 156% |
asyncio | C++20 | Stackless Coroutine | 512 Bytes | 1000 | 241K | 0.92Gb | 99% |
TypeScript | nodejs | Async Callback | 512 Bytes | 1000 | 192K | 0.75Gb | 100% |
Erlang | Erlang | - | 512 Bytes | 1000 | 165K | 0.63Gb | 115% |
Python3 asyncio | Python | Stackless Coroutine | 512 Bytes | 1000 | 136K | 0.52Gb | 99% |
note
- The Streaming client is to measure echo server performance when handling high throughput. A similar scenario in the real world is the multiplexing technology used by RPC and HTTP 2.0. We will set up 4 client processes, and each of them will create only one connection. Send coroutine and recv coroutine are running infinite loops separately.
- The Ping-pong client is to measure echo server performance when handling large amounts of connections. We will set up 10 client processes, and each of them will create 100 connections (totally 1000). For a single connection, it has to send first, then receive.
- Server and client are all cloud VMs, 64Core 128GB, Intel Platinum CPU 2.70GHz. Kernel version is 6.x. The network bandwidth is 32Gb.
- This test was only meant to compare per-core QPS, so we limited the thread number to 1, for instance, set GOMAXPROCS=1 for Golang.
- Some libs didn't provide an easy way to configure the number of bytes we would receive in a single call at server side, which was required by the Streaming test. So we only had their Ping-pong tests run.
Conclusion
Photon socket has the best per-core QPS, no matter in the Streaming or Ping-pong traffic mode.
HTTP
Compare Photon and Nginx
when serving static files, using Apache Bench(ab) as the client.
Test program
https://github.com/alibaba/PhotonLibOS/blob/main/net/http/test/server_perf.cpp
Results
File Size | QPS | CPU util | |
---|---|---|---|
Photon | 4KB | 114K | 100% |
Nginx | 4KB | 97K | 100% |
note
Nginx only enables 1 worker (process).
Conclusion
Photon is faster than Nginx
under this circumstance.