Deno can use shared libraries which follow the normal C convention, thus allowing Deno programs to call C functions. Since there’s no way to create raw Ethernet packets within Deno and I found no library doing this, I think I’ll have to create it myself similar to what I did with Dart and its FFI.
But first I wanted to know what the overhead of calling a C function in a shared library is as this GitHub issue seems to make it very slow.
Time to measure! So I created a simple C function to add up bytes in an array:
int sum(const uint8_t *p, int count) {
int res=0;
for (int i=0; i<count; ++i) {
res += p[i];
}
return res;
}
and call this with variable amounts of count, from 1 to 4096. From Deno call it:
for (let size = 1 ; size < buf.length; size *= 2) {
start = performance.now();
for (let i = 0; i < 1_000_000; ++i) {
result = dylib.symbols.sum(buf, size);
}
end = performance.now();
console.log(`Time used for ${size} bytes: ${end-start}ns`);
}
performance.now() returns a time stamp in ms, thus since the test call is done 1M times, the result shows the time in ns:
Bytes
AMD GX-420CA
AMD Ryzen 5 3400GE
RK3328@1.3GHz
Khadas VIM3L S905D3@1.9GHz
AWS C6G
4
38
12
1274
864
228
8
56
18
1342
902
232
16
90
34
1464
982
252
32
174
64
1698
1134
292
64
310
140
2170
1436
386
128
584
268
3120
2042
556
256
1130
544
4998
3252
904
512
2224
1022
8764
5678
1620
1024
4410
2048
16304
10520
3014
2048
8800
4072
31378
20200
5808
4096
17554
8076
61690
39598
11374
Overhead
23
14
1213
830
212
Calling FFI from Deno with increasing amount of work done inside the FFI, Deno version 1.29.3 (ARM: 1.29.4)
Linear regression shows 23ns and 14ns overhead (extrapolate for size=0) for the x86_64 CPUs. Note how nicely the time increases with larger payloads. The ARM CPUs start to show linear increases only at about 128 bytes, and their overhead is quite a lot higher at 830ns and 212ns.
Given that one 1500 byte Ethernet frame at 1 GBit/s takes 12μs, the overhead for the slower AMD CPU is only 0.2%, this is very acceptable. Even for more typical frames of 500 byte (128 pixel, 3 colors, plus a bit of overhead), the overhead is only 0.6%.
The ARM CPUs have significantly more overhead (7% for a 1500 byte frame for the S905D3, and 20% for a 500 byte frame). Even using a server type ARM CPU does not improve it by much.
Appendix
Deno version: 1.29.3 for x86_64, and 1.29.4 for ARMv8 compiled via
I was sure I had linked to this awesome article about Tensorflow in the browser. The article is relatable, easy to read, fun to test out the actual program (if you have a camera) and it shows what you can do inside your browser and machine learning.
The VIM3L (Cortex A55 cores) I have has a NPU accelerator built-in. An interesting article about it is here, however before doing fancy NPU stuff, let’s get TensorFlow working first. Should be easy.
Famous last words. Turns out that “pip install tensorflow” does not work: on arm64 (AKA aarch64 AKA ARMv8) TensorFlow is not officially supported. So I had to compile it first.
Compiling TensorFlow
https://www.tensorflow.org/install/source described the compile process reasonable well. It is missing a lot of details though, so here is a more detailed walk-through. Start with a Ubuntu 20.xx image with an extra 70 GB disk for TensorFlow source code:
# One-time action: for the data disk, create a volume and a filesystem
# to mount under /data
sudo bash
pvcreate /dev/nvme1n1
vgcreate vg_data /dev/nvme1n1
lvcreate -L69G -n data vg_data
mke2fs -j /dev/vg_data/data
mkdir /data
echo -e '/dev/mapper/vg_data-data\t/data\text4\tdefaults\t0 1' >>/etc/fstab
mount /data
chown ubuntu:users /data
umount /data
exit
sudo apt update
sudo apt -y upgrade
sudo reboot
After a reboot, you now have a /data of about 70GB.
sudo apt -y install build-essential python3 python3-dev python3-venv pkg-config zip zlib1g-dev unzip curl tmux wget vim git htop liblapack3 libblas3 libhdf5-dev openjdk-11-jdk
# Get bazel
wget https://github.com/bazelbuild/bazel/releases/download/4.2.2/bazel-4.2.2-linux-arm64
chmod a+x bazel-4.2.2-linux-arm64
sudo cp bazel-4.2.2-linux-arm64 /usr/local/bin/bazel
# bazel uses ~/.cache/bazel
mkdir -p /data/.cache/bazel
ln -s /data/.cache/bazel ~/.cache/bazel
# Build a Python 3 virtual environment
python3 -m venv ~/venv
source ~/venv/bin/activate
pip install wheel packaging
pip install six mock numpy grpcio h5py
pip install keras_applications --no-deps
pip install keras_preprocessing --no-deps
# Get TensorFlow source
cd /data
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout r2.8
cd /data/tensorflow
./configure
# Build Python package:
bazel build -c opt \
--copt=-O3 \
--copt=-std=c++11 \
--copt=-funsafe-math-optimizations \
--copt=-ftree-vectorize \
--copt=-fomit-frame-pointer \
--copt=-DRASPBERRY_PI \
--host_copt=-DRASPBERRY_PI \
--verbose_failures \
--config=noaws \
--config=nogcp \
//tensorflow/tools/pip_package:build_pip_package
# Build Python whl:
BDIST_OPTS="--universal" bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
# And for tfjs:
# (see https://github.com/tensorflow/tfjs/tree/master/tfjs-node)
bazel build --config=opt --config=monolithic //tensorflow/tools/lib_package:libtensorflow
# The result is at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz
It does take a lot of time (about 2-3h for each the Python package and the tfjs-node library). When I tried 2 CPU and 8 GB RAM, some compiler runs were killed as they were running out of memory. 4 CPU and 16 GB RAM worked fine.Thus AWS m6g.xlarge recommended. m6g.large failed to build.
Using spot instances for the m6g.xlarge (regular $0.154/h, spot price $0.04/h) helped a bit to limit the financial impact.
Python and TensorFlow
It took me several tries:
When using Ubuntu 22.04 to compile TF, the resulting binary wanted GLIBC 2.35 which my VIM3L did not have. It had 2.31. It also used Python 3.10 to compile.
When using Ubuntu 20.04, it used Python 3.8 to compile. My VIM3L had Python 3.9. While GLIBC was fine, Python was not.
The created whl file could be loaded and used on the machine I compiled it on. No Python or GLIBC version problems here.
That covered all my Python needs. Now moving to the main target:
The biggest problem was to make Node.js understand to not use the non-existing arm64 pre-compiled library, but instead use the one I created. The instructions in the above link did not explain in enough details how to make this work. In hindsight it’s easy, but it took some tries to make me understand it. In short:
Do an “npm install –ignore script”
Add a file scripts/custom-binary.json into the modules directory for @tensorflow/tfjs-node (this gave me the hint)
Run “npm install” in the tfjs-node directory
That will download the tensorflow library archive
Now do the “npm install” where your application is (which is the only “npm install” you’d usually do)
As a benchmark I modified slightly server.js from the tfjs-examples/baseball-node to not listen to the port which means after the training it’ll exit. Then run this on the VIM3L (S905D3), my ThinkCentre m75q (Ryzen 5), and my HP T620 (GX-420CA) once with CPU backend (tfjs) and once with the C++ TF library (tfjs-node):
CPU
Backend
Time/s
Amlogic S905D3-N0N @ 1.9GHz
cpu
803
Amlogic S905D3-N0N @ 1.9GHz
tensorflow
189
AMD Ryzen 5 PRO 3400GE @ 3.3GHz
cpu
122
AMD Ryzen 5 PRO 3400GE @ 3.3GHz
tensorflow
36
AMD GX-420CA @ 2GHz
cpu
530
AMD GX-420CA @ 2 GHz
tensorflow
119
All running Node.js 16.x
I did not expect Node.js to be just 4 times slower than C++. Really impressive. Still, using tfjs-node makes a lot of sense. While on x86_64 this was not an issue, with above instructions it’s doable on arm64 too.
Learning is easiest with a goal. The goal here was to recreate this and expand it a bit. Nothing earth-shaking. Still learning how to use querySelector().
Continuing my expedition into the JavaScript front-end land…I like Deno. A lot. It does lots of things right (from my point of view, e.g. permissions), and it fixes many problem Node.js has. Using Deno for back-end work is straightforward, but since I focus currently on the front-end side, how can I create front-end JavaScript with Deno and still use Deno’s testing framework?
It’s actually not hard. Here a small TypeScript file:
import capitalize from "https://unpkg.com/lodash-es@4.17.15/capitalize.js";
function main() {
console.log(capitalize("hello from the web browser"));
}
function sum(a: number, b: number): number {
return a+b;
}
window.onload = () => {
console.info(capitalize("module loaded!"));
};
export { main, sum }
So far that’s straightforward: loading index.html runs main() and prints out the sum of 5+6 on the console.
Here the test script:
import { assertEquals } from "https://deno.land/std@0.106.0/testing/asserts.ts";
import { sum } from "./example.ts";
Deno.test("Sum 5 and 7", () => {
assertEquals(sum(5, 7), 12);
});
Again, standard Deno. And deno test works as expected. And to deploy as plain-JavaScript, a simple deno bundle src/example.ts dist/browser.js does the trick.
And there you go: plain TypeScript and yet the logic can be tested with the normal deno test command. No extra tools needed. No Babel, WebPack, tsc. No require vs import either.
deno_dom is not yet working enough though (see the stub definition of addEventListener() here), so until then only basic DOM operations work.