Deno and FFI – How much Overhead?

Deno can use shared libraries which follow the normal C convention, thus allowing Deno programs to call C functions. Since there’s no way to create raw Ethernet packets within Deno and I found no library doing this, I think I’ll have to create it myself similar to what I did with Dart and its FFI.

But first I wanted to know what the overhead of calling a C function in a shared library is as this GitHub issue seems to make it very slow.

Time to measure! So I created a simple C function to add up bytes in an array:

int sum(const uint8_t *p, int count) {
        int res=0;
        for (int i=0; i<count; ++i) {
                res += p[i];
        }
        return res;
}

and call this with variable amounts of count, from 1 to 4096. From Deno call it:

for (let size = 1 ; size < buf.length; size *= 2) {
  start = performance.now();
  for (let i = 0; i < 1_000_000; ++i) {
    result = dylib.symbols.sum(buf, size);
  }
  end = performance.now();
  console.log(`Time used for ${size} bytes: ${end-start}ns`);
}

performance.now() returns a time stamp in ms, thus since the test call is done 1M times, the result shows the time in ns:

BytesAMD GX-420CA AMD Ryzen 5 3400GERK3328@1.3GHzKhadas VIM3L S905D3@1.9GHzAWS C6G
438121274864228
856181342902232
1690341464982252
321746416981134292
6431014021701436386
12858426831202042556
256113054449983252904
51222241022876456781620
10244410204816304105203014
20488800407231378202005808
4096175548076616903959811374
Overhead23141213830212
Calling FFI from Deno with increasing amount of work done inside the FFI, Deno version 1.29.3 (ARM: 1.29.4)

Linear regression shows 23ns and 14ns overhead (extrapolate for size=0) for the x86_64 CPUs. Note how nicely the time increases with larger payloads. The ARM CPUs start to show linear increases only at about 128 bytes, and their overhead is quite a lot higher at 830ns and 212ns.

Given that one 1500 byte Ethernet frame at 1 GBit/s takes 12μs, the overhead for the slower AMD CPU is only 0.2%, this is very acceptable. Even for more typical frames of 500 byte (128 pixel, 3 colors, plus a bit of overhead), the overhead is only 0.6%.

The ARM CPUs have significantly more overhead (7% for a 1500 byte frame for the S905D3, and 20% for a 500 byte frame). Even using a server type ARM CPU does not improve it by much.

Appendix

Deno version: 1.29.3 for x86_64, and 1.29.4 for ARMv8 compiled via

cargo install deno --locked
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.