TensorFlow on arm64

The VIM3L (Cortex A55 cores) I have has a NPU accelerator built-in. An interesting article about it is here, however before doing fancy NPU stuff, let’s get TensorFlow working first. Should be easy.

Famous last words. Turns out that “pip install tensorflow” does not work: on arm64 (AKA aarch64 AKA ARMv8) TensorFlow is not officially supported. So I had to compile it first.

Compiling TensorFlow

https://www.tensorflow.org/install/source described the compile process reasonable well. It is missing a lot of details though, so here is a more detailed walk-through. Start with a Ubuntu 20.xx image with an extra 70 GB disk for TensorFlow source code:

# One-time action: for the data disk, create a volume and a filesystem
# to mount under /data

sudo bash
pvcreate /dev/nvme1n1
vgcreate vg_data /dev/nvme1n1
lvcreate -L69G -n data vg_data
mke2fs -j /dev/vg_data/data
mkdir /data
echo -e '/dev/mapper/vg_data-data\t/data\text4\tdefaults\t0 1' >>/etc/fstab
mount /data
chown ubuntu:users /data
umount /data
exit

sudo apt update
sudo apt -y upgrade
sudo reboot

After a reboot, you now have a /data of about 70GB.

sudo apt -y install build-essential python3 python3-dev python3-venv pkg-config zip zlib1g-dev unzip curl tmux wget vim git htop liblapack3 libblas3 libhdf5-dev openjdk-11-jdk

# Get bazel

wget https://github.com/bazelbuild/bazel/releases/download/4.2.2/bazel-4.2.2-linux-arm64
chmod a+x bazel-4.2.2-linux-arm64
sudo cp bazel-4.2.2-linux-arm64 /usr/local/bin/bazel

# bazel uses ~/.cache/bazel

mkdir -p /data/.cache/bazel
ln -s /data/.cache/bazel ~/.cache/bazel

# Build a Python 3 virtual environment

python3 -m venv ~/venv
source ~/venv/bin/activate
pip install wheel packaging
pip install six mock numpy grpcio h5py
pip install keras_applications --no-deps
pip install keras_preprocessing --no-deps

# Get TensorFlow source

cd /data
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout r2.8
cd /data/tensorflow

./configure

# Build Python package:

bazel build -c opt \
--copt=-O3 \
--copt=-std=c++11 \
--copt=-funsafe-math-optimizations \
--copt=-ftree-vectorize \
--copt=-fomit-frame-pointer \
--copt=-DRASPBERRY_PI \
--host_copt=-DRASPBERRY_PI \
--verbose_failures \
--config=noaws \
--config=nogcp \
//tensorflow/tools/pip_package:build_pip_package

# Build Python whl:

BDIST_OPTS="--universal" bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg

# And for tfjs:
# (see https://github.com/tensorflow/tfjs/tree/master/tfjs-node)

bazel build --config=opt --config=monolithic //tensorflow/tools/lib_package:libtensorflow
# The result is at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz

It does take a lot of time (about 2-3h for each the Python package and the tfjs-node library). When I tried 2 CPU and 8 GB RAM, some compiler runs were killed as they were running out of memory. 4 CPU and 16 GB RAM worked fine.Thus AWS m6g.xlarge recommended. m6g.large failed to build.

Using spot instances for the m6g.xlarge (regular $0.154/h, spot price $0.04/h) helped a bit to limit the financial impact.

Python and TensorFlow

It took me several tries:

  • When using Ubuntu 22.04 to compile TF, the resulting binary wanted GLIBC 2.35 which my VIM3L did not have. It had 2.31. It also used Python 3.10 to compile.
  • When using Ubuntu 20.04, it used Python 3.8 to compile. My VIM3L had Python 3.9. While GLIBC was fine, Python was not.
  • The created whl file could be loaded and used on the machine I compiled it on. No Python or GLIBC version problems here.

That covered all my Python needs. Now moving to the main target:

Node.js and TensorFlow

tfjs-node uses the libtensorflow.so library, so that should remove some of the CPython version problems I have seen. Compiling was easy now: https://github.com/tensorflow/tfjs/tree/master/tfjs-node#optional-build-optimal-tensorflow-from-source is spot on.

The biggest problem was to make Node.js understand to not use the non-existing arm64 pre-compiled library, but instead use the one I created. The instructions in the above link did not explain in enough details how to make this work. In hindsight it’s easy, but it took some tries to make me understand it. In short:

  • Do an “npm install –ignore script”
  • Add a file scripts/custom-binary.json into the modules directory for @tensorflow/tfjs-node (this gave me the hint)
  • Run “npm install” in the tfjs-node directory
  • That will download the tensorflow library archive
  • Now do the “npm install” where your application is (which is the only “npm install” you’d usually do)
❯ npm install --ignore-script
❯ pushd .
❯ cd node_modules/@tensorflow/tfjs-node/scripts
❯ cat >>custom-binary.json <<_EOF_
{
  "tf-lib": "https://MYSERVER.com/libtensorflow-2.8-arm64.tar.gz"
}
_EOF_
❯ cd ..
❯ npm install
[...]
> @tensorflow/tfjs-node@3.16.0 install
> node scripts/install.js

CPU-linux-3.16.0.tar.gz
* Downloading libtensorflow
https://MYSERVER/libtensorflow-2.8-arm64.tar.gz
[==============================] 3685756/bps 100% 0.0s
* Building TensorFlow Node.js bindings
[...]
❯ popd
❯ npm install

Benchmarks

As a benchmark I modified slightly server.js from the tfjs-examples/baseball-node to not listen to the port which means after the training it’ll exit. Then run this on the VIM3L (S905D3), my ThinkCentre m75q (Ryzen 5), and my HP T620 (GX-420CA) once with CPU backend (tfjs) and once with the C++ TF library (tfjs-node):

CPUBackendTime/s
Amlogic S905D3-N0N @ 1.9GHzcpu803
Amlogic S905D3-N0N @ 1.9GHztensorflow189
AMD Ryzen 5 PRO 3400GE @ 3.3GHzcpu122
AMD Ryzen 5 PRO 3400GE @ 3.3GHztensorflow36
AMD GX-420CA @ 2GHzcpu530
AMD GX-420CA @ 2 GHztensorflow119
All running Node.js 16.x

I did not expect Node.js to be just 4 times slower than C++. Really impressive. Still, using tfjs-node makes a lot of sense. While on x86_64 this was not an issue, with above instructions it’s doable on arm64 too.

XKCD Userscript

Learning is easiest with a goal. The goal here was to recreate this and expand it a bit. Nothing earth-shaking. Still learning how to use querySelector().

To install, have Tampermonkey installed and click here.

Here the script:

// ==UserScript==
// @name         Display XKCD IMG title
// @namespace    https://hkubota.wordpress.com/
// @downloadURL  https://raw.githubusercontent.com/haraldkubota/xkcd-userscript/main/display_title.user.js
// @supportURL   https://hkubota.wordpress.com/2021/09/04/xkcd-userscript/
// @version      0.1
// @description  Show the IMG title and Explain XKCD Link
// @author       Harald Kubota
// @match        http*://*xkcd.com/*
// @icon         https://www.google.com/s2/favicons?domain=xkcd.com
// @grant        none
// ==/UserScript==

(function() {
    window.addEventListener('load', () => {
        const titleElement = document.querySelector('#comic [title]');
        if (titleElement) {
            const title = titleElement.title;
            let p = document.createElement('p');
            p.innerText = title;
            document.querySelector('#comic').append(p);
            const mystyle = {
                "font-variant": "none",
                "background": "lightgray",
                "padding": "10px"
            };
            Object.assign(p.style, mystyle);
        }
        const currentComic = document.querySelector('div#middleContainer.box > a');
        if (currentComic) {
            let uriPath = currentComic.text.split('/');
            let currentComicNumber=1;
            for (let i=uriPath.length-1; i>=0; --i) {
                let n = parseInt(uriPath[i]);
                if (!isNaN(n)) {
                    currentComicNumber = n;
                    break;
                }
            }
            let div = document.createElement('div');
            let a = document.createElement('a');
            var link = document.createTextNode('https://www.explainxkcd.com/wiki/index.php/'+currentComicNumber);
            a.appendChild(link);
            a.setAttribute('href', 'https://www.explainxkcd.com/wiki/index.php/'+currentComicNumber);
            div.appendChild(a);
            document.querySelector('#comic').append(div);
        }
    }, false);
})();

Now it looks like this:

The elements with the red arrows are added via above userscript

Deno and ES6 Modules

Continuing my expedition into the JavaScript front-end land…I like Deno. A lot. It does lots of things right (from my point of view, e.g. permissions), and it fixes many problem Node.js has. Using Deno for back-end work is straightforward, but since I focus currently on the front-end side, how can I create front-end JavaScript with Deno and still use Deno’s testing framework?

It’s actually not hard. Here a small TypeScript file:

import capitalize from "https://unpkg.com/lodash-es@4.17.15/capitalize.js";

function main() {
  console.log(capitalize("hello from the web browser"));
}

function sum(a: number, b: number): number {
    return a+b;
}

window.onload = () => {
  console.info(capitalize("module loaded!"));
};

export { main, sum }                                                                                                   

embedded in a simple index.html file:

<!DOCTYPE html>
<html>
<head>
    <title>Testing Deno</title>
</head>
<body>
    <div>Some Text</div>
<script type="module">
    import {main, sum} from './dist/browser.js';
    main();
    console.log(sum(5,6));
    console.log("Done.");
</script>
</body>
</html>

So far that’s straightforward: loading index.html runs main() and prints out the sum of 5+6 on the console.

Here the test script:

import { assertEquals } from "https://deno.land/std@0.106.0/testing/asserts.ts";

import { sum } from "./example.ts";

Deno.test("Sum 5 and 7", () => {
  assertEquals(sum(5, 7), 12);
});

Again, standard Deno. And deno test works as expected. And to deploy as plain-JavaScript, a simple deno bundle src/example.ts dist/browser.js does the trick.

And there you go: plain TypeScript and yet the logic can be tested with the normal deno test command. No extra tools needed. No Babel, WebPack, tsc. No require vs import either.

deno_dom is not yet working enough though (see the stub definition of addEventListener() here), so until then only basic DOM operations work.

Update: This commit added addEventListener()