The VIM3L (Cortex A55 cores) I have has a NPU accelerator built-in. An interesting article about it is here, however before doing fancy NPU stuff, let’s get TensorFlow working first. Should be easy.
Famous last words. Turns out that “pip install tensorflow” does not work: on arm64 (AKA aarch64 AKA ARMv8) TensorFlow is not officially supported. So I had to compile it first.
Compiling TensorFlow
https://www.tensorflow.org/install/source described the compile process reasonable well. It is missing a lot of details though, so here is a more detailed walk-through. Start with a Ubuntu 20.xx image with an extra 70 GB disk for TensorFlow source code:
# One-time action: for the data disk, create a volume and a filesystem
# to mount under /data
sudo bash
pvcreate /dev/nvme1n1
vgcreate vg_data /dev/nvme1n1
lvcreate -L69G -n data vg_data
mke2fs -j /dev/vg_data/data
mkdir /data
echo -e '/dev/mapper/vg_data-data\t/data\text4\tdefaults\t0 1' >>/etc/fstab
mount /data
chown ubuntu:users /data
umount /data
exit
sudo apt update
sudo apt -y upgrade
sudo reboot
After a reboot, you now have a /data of about 70GB.
sudo apt -y install build-essential python3 python3-dev python3-venv pkg-config zip zlib1g-dev unzip curl tmux wget vim git htop liblapack3 libblas3 libhdf5-dev openjdk-11-jdk
# Get bazel
wget https://github.com/bazelbuild/bazel/releases/download/4.2.2/bazel-4.2.2-linux-arm64
chmod a+x bazel-4.2.2-linux-arm64
sudo cp bazel-4.2.2-linux-arm64 /usr/local/bin/bazel
# bazel uses ~/.cache/bazel
mkdir -p /data/.cache/bazel
ln -s /data/.cache/bazel ~/.cache/bazel
# Build a Python 3 virtual environment
python3 -m venv ~/venv
source ~/venv/bin/activate
pip install wheel packaging
pip install six mock numpy grpcio h5py
pip install keras_applications --no-deps
pip install keras_preprocessing --no-deps
# Get TensorFlow source
cd /data
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout r2.8
cd /data/tensorflow
./configure
# Build Python package:
bazel build -c opt \
--copt=-O3 \
--copt=-std=c++11 \
--copt=-funsafe-math-optimizations \
--copt=-ftree-vectorize \
--copt=-fomit-frame-pointer \
--copt=-DRASPBERRY_PI \
--host_copt=-DRASPBERRY_PI \
--verbose_failures \
--config=noaws \
--config=nogcp \
//tensorflow/tools/pip_package:build_pip_package
# Build Python whl:
BDIST_OPTS="--universal" bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
# And for tfjs:
# (see https://github.com/tensorflow/tfjs/tree/master/tfjs-node)
bazel build --config=opt --config=monolithic //tensorflow/tools/lib_package:libtensorflow
# The result is at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz
It does take a lot of time (about 2-3h for each the Python package and the tfjs-node library). When I tried 2 CPU and 8 GB RAM, some compiler runs were killed as they were running out of memory. 4 CPU and 16 GB RAM worked fine.Thus AWS m6g.xlarge recommended. m6g.large failed to build.
Using spot instances for the m6g.xlarge (regular $0.154/h, spot price $0.04/h) helped a bit to limit the financial impact.
Python and TensorFlow
It took me several tries:
- When using Ubuntu 22.04 to compile TF, the resulting binary wanted GLIBC 2.35 which my VIM3L did not have. It had 2.31. It also used Python 3.10 to compile.
- When using Ubuntu 20.04, it used Python 3.8 to compile. My VIM3L had Python 3.9. While GLIBC was fine, Python was not.
- The created whl file could be loaded and used on the machine I compiled it on. No Python or GLIBC version problems here.
That covered all my Python needs. Now moving to the main target:
Node.js and TensorFlow
tfjs-node uses the libtensorflow.so library, so that should remove some of the CPython version problems I have seen. Compiling was easy now: https://github.com/tensorflow/tfjs/tree/master/tfjs-node#optional-build-optimal-tensorflow-from-source is spot on.
The biggest problem was to make Node.js understand to not use the non-existing arm64 pre-compiled library, but instead use the one I created. The instructions in the above link did not explain in enough details how to make this work. In hindsight it’s easy, but it took some tries to make me understand it. In short:
- Do an “npm install –ignore script”
- Add a file scripts/custom-binary.json into the modules directory for @tensorflow/tfjs-node (this gave me the hint)
- Run “npm install” in the tfjs-node directory
- That will download the tensorflow library archive
- Now do the “npm install” where your application is (which is the only “npm install” you’d usually do)
❯ npm install --ignore-script
❯ pushd .
❯ cd node_modules/@tensorflow/tfjs-node/scripts
❯ cat >>custom-binary.json <<_EOF_
{
"tf-lib": "https://MYSERVER.com/libtensorflow-2.8-arm64.tar.gz"
}
_EOF_
❯ cd ..
❯ npm install
[...]
> @tensorflow/tfjs-node@3.16.0 install
> node scripts/install.js
CPU-linux-3.16.0.tar.gz
* Downloading libtensorflow
https://MYSERVER/libtensorflow-2.8-arm64.tar.gz
[==============================] 3685756/bps 100% 0.0s
* Building TensorFlow Node.js bindings
[...]
❯ popd
❯ npm install
Benchmarks
As a benchmark I modified slightly server.js from the tfjs-examples/baseball-node to not listen to the port which means after the training it’ll exit. Then run this on the VIM3L (S905D3), my ThinkCentre m75q (Ryzen 5), and my HP T620 (GX-420CA) once with CPU backend (tfjs) and once with the C++ TF library (tfjs-node):
CPU | Backend | Time/s |
Amlogic S905D3-N0N @ 1.9GHz | cpu | 803 |
Amlogic S905D3-N0N @ 1.9GHz | tensorflow | 189 |
AMD Ryzen 5 PRO 3400GE @ 3.3GHz | cpu | 122 |
AMD Ryzen 5 PRO 3400GE @ 3.3GHz | tensorflow | 36 |
AMD GX-420CA @ 2GHz | cpu | 530 |
AMD GX-420CA @ 2 GHz | tensorflow | 119 |
I did not expect Node.js to be just 4 times slower than C++. Really impressive. Still, using tfjs-node makes a lot of sense. While on x86_64 this was not an issue, with above instructions it’s doable on arm64 too.