TensorFlow on arm64

The VIM3L (Cortex A55 cores) I have has a NPU accelerator built-in. An interesting article about it is here, however before doing fancy NPU stuff, let’s get TensorFlow working first. Should be easy.

Famous last words. Turns out that “pip install tensorflow” does not work: on arm64 (AKA aarch64 AKA ARMv8) TensorFlow is not officially supported. So I had to compile it first.

Compiling TensorFlow

https://www.tensorflow.org/install/source described the compile process reasonable well. It is missing a lot of details though, so here is a more detailed walk-through. Start with a Ubuntu 20.xx image with an extra 70 GB disk for TensorFlow source code:

# One-time action: for the data disk, create a volume and a filesystem
# to mount under /data

sudo bash
pvcreate /dev/nvme1n1
vgcreate vg_data /dev/nvme1n1
lvcreate -L69G -n data vg_data
mke2fs -j /dev/vg_data/data
mkdir /data
echo -e '/dev/mapper/vg_data-data\t/data\text4\tdefaults\t0 1' >>/etc/fstab
mount /data
chown ubuntu:users /data
umount /data

sudo apt update
sudo apt -y upgrade
sudo reboot

After a reboot, you now have a /data of about 70GB.

sudo apt -y install build-essential python3 python3-dev python3-venv pkg-config zip zlib1g-dev unzip curl tmux wget vim git htop liblapack3 libblas3 libhdf5-dev openjdk-11-jdk

# Get bazel

wget https://github.com/bazelbuild/bazel/releases/download/4.2.2/bazel-4.2.2-linux-arm64
chmod a+x bazel-4.2.2-linux-arm64
sudo cp bazel-4.2.2-linux-arm64 /usr/local/bin/bazel

# bazel uses ~/.cache/bazel

mkdir -p /data/.cache/bazel
ln -s /data/.cache/bazel ~/.cache/bazel

# Build a Python 3 virtual environment

python3 -m venv ~/venv
source ~/venv/bin/activate
pip install wheel packaging
pip install six mock numpy grpcio h5py
pip install keras_applications --no-deps
pip install keras_preprocessing --no-deps

# Get TensorFlow source

cd /data
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout r2.8
cd /data/tensorflow


# Build Python package:

bazel build -c opt \
--copt=-O3 \
--copt=-std=c++11 \
--copt=-funsafe-math-optimizations \
--copt=-ftree-vectorize \
--copt=-fomit-frame-pointer \
--host_copt=-DRASPBERRY_PI \
--verbose_failures \
--config=noaws \
--config=nogcp \

# Build Python whl:

BDIST_OPTS="--universal" bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg

# And for tfjs:
# (see https://github.com/tensorflow/tfjs/tree/master/tfjs-node)

bazel build --config=opt --config=monolithic //tensorflow/tools/lib_package:libtensorflow
# The result is at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz

It does take a lot of time (about 2-3h for each the Python package and the tfjs-node library). When I tried 2 CPU and 8 GB RAM, some compiler runs were killed as they were running out of memory. 4 CPU and 16 GB RAM worked fine.Thus AWS m6g.xlarge recommended. m6g.large failed to build.

Using spot instances for the m6g.xlarge (regular $0.154/h, spot price $0.04/h) helped a bit to limit the financial impact.

Python and TensorFlow

It took me several tries:

  • When using Ubuntu 22.04 to compile TF, the resulting binary wanted GLIBC 2.35 which my VIM3L did not have. It had 2.31. It also used Python 3.10 to compile.
  • When using Ubuntu 20.04, it used Python 3.8 to compile. My VIM3L had Python 3.9. While GLIBC was fine, Python was not.
  • The created whl file could be loaded and used on the machine I compiled it on. No Python or GLIBC version problems here.

That covered all my Python needs. Now moving to the main target:

Node.js and TensorFlow

tfjs-node uses the libtensorflow.so library, so that should remove some of the CPython version problems I have seen. Compiling was easy now: https://github.com/tensorflow/tfjs/tree/master/tfjs-node#optional-build-optimal-tensorflow-from-source is spot on.

The biggest problem was to make Node.js understand to not use the non-existing arm64 pre-compiled library, but instead use the one I created. The instructions in the above link did not explain in enough details how to make this work. In hindsight it’s easy, but it took some tries to make me understand it. In short:

  • Do an “npm install –ignore script”
  • Add a file scripts/custom-binary.json into the modules directory for @tensorflow/tfjs-node (this gave me the hint)
  • Run “npm install” in the tfjs-node directory
  • That will download the tensorflow library archive
  • Now do the “npm install” where your application is (which is the only “npm install” you’d usually do)
❯ npm install --ignore-script
❯ pushd .
❯ cd node_modules/@tensorflow/tfjs-node/scripts
❯ cat >>custom-binary.json <<_EOF_
  "tf-lib": "https://MYSERVER.com/libtensorflow-2.8-arm64.tar.gz"
❯ cd ..
❯ npm install
> @tensorflow/tfjs-node@3.16.0 install
> node scripts/install.js

* Downloading libtensorflow
[==============================] 3685756/bps 100% 0.0s
* Building TensorFlow Node.js bindings
❯ popd
❯ npm install


As a benchmark I modified slightly server.js from the tfjs-examples/baseball-node to not listen to the port which means after the training it’ll exit. Then run this on the VIM3L (S905D3), my ThinkCentre m75q (Ryzen 5), and my HP T620 (GX-420CA) once with CPU backend (tfjs) and once with the C++ TF library (tfjs-node):

Amlogic S905D3-N0N @ 1.9GHzcpu803
Amlogic S905D3-N0N @ 1.9GHztensorflow189
AMD Ryzen 5 PRO 3400GE @ 3.3GHzcpu122
AMD Ryzen 5 PRO 3400GE @ 3.3GHztensorflow36
AMD GX-420CA @ 2GHzcpu530
AMD GX-420CA @ 2 GHztensorflow119
All running Node.js 16.x

I did not expect Node.js to be just 4 times slower than C++. Really impressive. Still, using tfjs-node makes a lot of sense. While on x86_64 this was not an issue, with above instructions it’s doable on arm64 too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.