Deno can use shared libraries which follow the normal C convention, thus allowing Deno programs to call C functions. Since there’s no way to create raw Ethernet packets within Deno and I found no library doing this, I think I’ll have to create it myself similar to what I did with Dart and its FFI.
But first I wanted to know what the overhead of calling a C function in a shared library is as this GitHub issue seems to make it very slow.
Time to measure! So I created a simple C function to add up bytes in an array:
int sum(const uint8_t *p, int count) {
int res=0;
for (int i=0; i<count; ++i) {
res += p[i];
}
return res;
}
and call this with variable amounts of count, from 1 to 4096. From Deno call it:
for (let size = 1 ; size < buf.length; size *= 2) {
start = performance.now();
for (let i = 0; i < 1_000_000; ++i) {
result = dylib.symbols.sum(buf, size);
}
end = performance.now();
console.log(`Time used for ${size} bytes: ${end-start}ns`);
}
performance.now() returns a time stamp in ms, thus since the test call is done 1M times, the result shows the time in ns:
Bytes | AMD GX-420CA | AMD Ryzen 5 3400GE | RK3328@1.3GHz | Khadas VIM3L S905D3@1.9GHz | AWS C6G |
4 | 38 | 12 | 1274 | 864 | 228 |
8 | 56 | 18 | 1342 | 902 | 232 |
16 | 90 | 34 | 1464 | 982 | 252 |
32 | 174 | 64 | 1698 | 1134 | 292 |
64 | 310 | 140 | 2170 | 1436 | 386 |
128 | 584 | 268 | 3120 | 2042 | 556 |
256 | 1130 | 544 | 4998 | 3252 | 904 |
512 | 2224 | 1022 | 8764 | 5678 | 1620 |
1024 | 4410 | 2048 | 16304 | 10520 | 3014 |
2048 | 8800 | 4072 | 31378 | 20200 | 5808 |
4096 | 17554 | 8076 | 61690 | 39598 | 11374 |
Overhead | 23 | 14 | 1213 | 830 | 212 |
Linear regression shows 23ns and 14ns overhead (extrapolate for size=0) for the x86_64 CPUs. Note how nicely the time increases with larger payloads. The ARM CPUs start to show linear increases only at about 128 bytes, and their overhead is quite a lot higher at 830ns and 212ns.
Given that one 1500 byte Ethernet frame at 1 GBit/s takes 12μs, the overhead for the slower AMD CPU is only 0.2%, this is very acceptable. Even for more typical frames of 500 byte (128 pixel, 3 colors, plus a bit of overhead), the overhead is only 0.6%.
The ARM CPUs have significantly more overhead (7% for a 1500 byte frame for the S905D3, and 20% for a 500 byte frame). Even using a server type ARM CPU does not improve it by much.
Appendix
Deno version: 1.29.3 for x86_64, and 1.29.4 for ARMv8 compiled via
cargo install deno --locked