Deno and FFI – How much Overhead?

Deno can use shared libraries which follow the normal C convention, thus allowing Deno programs to call C functions. Since there’s no way to create raw Ethernet packets within Deno and I found no library doing this, I think I’ll have to create it myself similar to what I did with Dart and its FFI.

But first I wanted to know what the overhead of calling a C function in a shared library is as this GitHub issue seems to make it very slow.

Time to measure! So I created a simple C function to add up bytes in an array:

int sum(const uint8_t *p, int count) {
        int res=0;
        for (int i=0; i<count; ++i) {
                res += p[i];
        }
        return res;
}

and call this with variable amounts of count, from 1 to 4096. From Deno call it:

for (let size = 1 ; size < buf.length; size *= 2) {
  start = performance.now();
  for (let i = 0; i < 1_000_000; ++i) {
    result = dylib.symbols.sum(buf, size);
  }
  end = performance.now();
  console.log(`Time used for ${size} bytes: ${end-start}ns`);
}

performance.now() returns a time stamp in ms, thus since the test call is done 1M times, the result shows the time in ns:

BytesAMD GX-420CA AMD Ryzen 5 3400GERK3328@1.3GHzKhadas VIM3L S905D3@1.9GHzAWS C6G
438121274864228
856181342902232
1690341464982252
321746416981134292
6431014021701436386
12858426831202042556
256113054449983252904
51222241022876456781620
10244410204816304105203014
20488800407231378202005808
4096175548076616903959811374
Overhead23141213830212
Calling FFI from Deno with increasing amount of work done inside the FFI, Deno version 1.29.3 (ARM: 1.29.4)

Linear regression shows 23ns and 14ns overhead (extrapolate for size=0) for the x86_64 CPUs. Note how nicely the time increases with larger payloads. The ARM CPUs start to show linear increases only at about 128 bytes, and their overhead is quite a lot higher at 830ns and 212ns.

Given that one 1500 byte Ethernet frame at 1 GBit/s takes 12μs, the overhead for the slower AMD CPU is only 0.2%, this is very acceptable. Even for more typical frames of 500 byte (128 pixel, 3 colors, plus a bit of overhead), the overhead is only 0.6%.

The ARM CPUs have significantly more overhead (7% for a 1500 byte frame for the S905D3, and 20% for a 500 byte frame). Even using a server type ARM CPU does not improve it by much.

Appendix

Deno version: 1.29.3 for x86_64, and 1.29.4 for ARMv8 compiled via

cargo install deno --locked
Advertisement

BASIC Benchmarks

Found this Wikipedia article about BASIC benchmarks and it had some run times for some old computers I used before. E.g. benchmark 7 took 21.1s on a BBC Micro which was particularly fast. A C64 took 47.5s

How long does a current computer take for this kind of work?

I got no BASIC, but JavaScript is kind’a similar: it’s often the first language to learn programming. So let’s see how long that takes (after translating the BASIC program into JavaScript):

function doNothing() {
    return;
}

function bench7() {
    let k = 0;
    let m = [];
    do {
        ++k;
        let a = k / 2 * 3 + 4 - 5;
        doNothing();
        for (let l = 0; l < 5; ++l) {
            m[l] = a;
        }
    } while (k < 1000);
}

function manyBench(n) {
    console.log("S");
    for (let i=0; i<n; ++i) {
        bench7();
    }
    console.log("E");
}

manyBench(500000);

Running this took not that long:

❯ time node benchmark7.js
S
E
node benchmark7.js  2.82s user 0.02s system 99% cpu 2.845 total

That’s for 500,000 times though, so each benchmark run takes about 0.056ms on my low-end PC (Ryzen 5 Pro 3400GE). That’s over 3.7M times faster.

And before anyone mentions it: yes, any modern compiler will optimize the whole benchmark away since no useful output or calculation is done. I am not sure how much Node.js (resp. the V8 engine) will remove. Making the code less do-nothing-like and taking the number of loops from the command line did not increase the run time significantly beside what I would have expected from the additional code, so I concluded that the code is executed as-is and parts have not been optimized away.

Dart & Pool

Tip of the day: If you use Dart and want to use the Pool library, expect not much help from Google when searching for those keywords: you get the expected results. Adding “future” or “async” helps.

Anyway, the point of this post is a small example how to use a Pool to run commands in parallel, but not too many concurrently.

import 'dart:io';
import 'package:pool/pool.dart';
import 'package:test/test.dart';

Future<ProcessResult> runCommand(String command, List<String> args) {
  return Process.run(command, args);
}

void main() {
  test('Run 20 slow date commands', () async {
    var pool = Pool(5);
    List<Future<ProcessResult>> results = [];
    for (var i = 0; i < 20; ++i) {
      results.add(pool.withResource(() => runCommand('./date_slow', ['+%N'])));
    }
    pool.close();
    await pool.done;
    for (var process in results) {
      var res = await process;
      print(res.stdout);
    }

./date_slow is a simple script which returns something on STDOUT and finishes in 2s:

#!/bin/bash
date $*
sleep 2

What happens then is that Pool(5) creates a pool with 5 slots. The first for loop tries to run 20 commands though and it’ll queue all 20 immediately, but only 5 at most will run. The rest simply waits until it’s their turn.

pool.close() stops any new entries and the await pool.done simply waits until the pool is closed and all jobs are executed.

The 2nd for loop (with the print() statement) uses await to get the ProcessResult from the Future<ProcessResult> which Process.run() returns and which is store in the results[] list.

The outcome here is that if the pool is 5 slots large, and each command runs for 2 seconds, the complete set of 20 jobs runs in 20/5*2=8 seconds. If I make the pool 10 slots large, it’ll run in 20/10*2=4 seconds and in case of 20 or more slots, it’ll be 2 seconds. And it’ll never run more processes than slots are available.

Why I need this? I have a list of URLs from few to hundreds which I need to query. While I can query many concurrently, each instance takes up some non-trivial amount of memory since it’s an external program. Currently at 100 concurrent calls, it uses up all memory on a 16GB RAM notebook. While there are many ways to work around this (do one at a time is the safest and slowest one), using a pool is perfect: it runs many commands in parallel, but I can limit the number of how many run in parallel.

TensorFlow on arm64

The VIM3L (Cortex A55 cores) I have has a NPU accelerator built-in. An interesting article about it is here, however before doing fancy NPU stuff, let’s get TensorFlow working first. Should be easy.

Famous last words. Turns out that “pip install tensorflow” does not work: on arm64 (AKA aarch64 AKA ARMv8) TensorFlow is not officially supported. So I had to compile it first.

Compiling TensorFlow

https://www.tensorflow.org/install/source described the compile process reasonable well. It is missing a lot of details though, so here is a more detailed walk-through. Start with a Ubuntu 20.xx image with an extra 70 GB disk for TensorFlow source code:

# One-time action: for the data disk, create a volume and a filesystem
# to mount under /data

sudo bash
pvcreate /dev/nvme1n1
vgcreate vg_data /dev/nvme1n1
lvcreate -L69G -n data vg_data
mke2fs -j /dev/vg_data/data
mkdir /data
echo -e '/dev/mapper/vg_data-data\t/data\text4\tdefaults\t0 1' >>/etc/fstab
mount /data
chown ubuntu:users /data
umount /data
exit

sudo apt update
sudo apt -y upgrade
sudo reboot

After a reboot, you now have a /data of about 70GB.

sudo apt -y install build-essential python3 python3-dev python3-venv pkg-config zip zlib1g-dev unzip curl tmux wget vim git htop liblapack3 libblas3 libhdf5-dev openjdk-11-jdk

# Get bazel

wget https://github.com/bazelbuild/bazel/releases/download/4.2.2/bazel-4.2.2-linux-arm64
chmod a+x bazel-4.2.2-linux-arm64
sudo cp bazel-4.2.2-linux-arm64 /usr/local/bin/bazel

# bazel uses ~/.cache/bazel

mkdir -p /data/.cache/bazel
ln -s /data/.cache/bazel ~/.cache/bazel

# Build a Python 3 virtual environment

python3 -m venv ~/venv
source ~/venv/bin/activate
pip install wheel packaging
pip install six mock numpy grpcio h5py
pip install keras_applications --no-deps
pip install keras_preprocessing --no-deps

# Get TensorFlow source

cd /data
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout r2.8
cd /data/tensorflow

./configure

# Build Python package:

bazel build -c opt \
--copt=-O3 \
--copt=-std=c++11 \
--copt=-funsafe-math-optimizations \
--copt=-ftree-vectorize \
--copt=-fomit-frame-pointer \
--copt=-DRASPBERRY_PI \
--host_copt=-DRASPBERRY_PI \
--verbose_failures \
--config=noaws \
--config=nogcp \
//tensorflow/tools/pip_package:build_pip_package

# Build Python whl:

BDIST_OPTS="--universal" bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg

# And for tfjs:
# (see https://github.com/tensorflow/tfjs/tree/master/tfjs-node)

bazel build --config=opt --config=monolithic //tensorflow/tools/lib_package:libtensorflow
# The result is at bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz

It does take a lot of time (about 2-3h for each the Python package and the tfjs-node library). When I tried 2 CPU and 8 GB RAM, some compiler runs were killed as they were running out of memory. 4 CPU and 16 GB RAM worked fine.Thus AWS m6g.xlarge recommended. m6g.large failed to build.

Using spot instances for the m6g.xlarge (regular $0.154/h, spot price $0.04/h) helped a bit to limit the financial impact.

Python and TensorFlow

It took me several tries:

  • When using Ubuntu 22.04 to compile TF, the resulting binary wanted GLIBC 2.35 which my VIM3L did not have. It had 2.31. It also used Python 3.10 to compile.
  • When using Ubuntu 20.04, it used Python 3.8 to compile. My VIM3L had Python 3.9. While GLIBC was fine, Python was not.
  • The created whl file could be loaded and used on the machine I compiled it on. No Python or GLIBC version problems here.

That covered all my Python needs. Now moving to the main target:

Node.js and TensorFlow

tfjs-node uses the libtensorflow.so library, so that should remove some of the CPython version problems I have seen. Compiling was easy now: https://github.com/tensorflow/tfjs/tree/master/tfjs-node#optional-build-optimal-tensorflow-from-source is spot on.

The biggest problem was to make Node.js understand to not use the non-existing arm64 pre-compiled library, but instead use the one I created. The instructions in the above link did not explain in enough details how to make this work. In hindsight it’s easy, but it took some tries to make me understand it. In short:

  • Do an “npm install –ignore script”
  • Add a file scripts/custom-binary.json into the modules directory for @tensorflow/tfjs-node (this gave me the hint)
  • Run “npm install” in the tfjs-node directory
  • That will download the tensorflow library archive
  • Now do the “npm install” where your application is (which is the only “npm install” you’d usually do)
❯ npm install --ignore-script
❯ pushd .
❯ cd node_modules/@tensorflow/tfjs-node/scripts
❯ cat >>custom-binary.json <<_EOF_
{
  "tf-lib": "https://MYSERVER.com/libtensorflow-2.8-arm64.tar.gz"
}
_EOF_
❯ cd ..
❯ npm install
[...]
> @tensorflow/tfjs-node@3.16.0 install
> node scripts/install.js

CPU-linux-3.16.0.tar.gz
* Downloading libtensorflow
https://MYSERVER/libtensorflow-2.8-arm64.tar.gz
[==============================] 3685756/bps 100% 0.0s
* Building TensorFlow Node.js bindings
[...]
❯ popd
❯ npm install

Benchmarks

As a benchmark I modified slightly server.js from the tfjs-examples/baseball-node to not listen to the port which means after the training it’ll exit. Then run this on the VIM3L (S905D3), my ThinkCentre m75q (Ryzen 5), and my HP T620 (GX-420CA) once with CPU backend (tfjs) and once with the C++ TF library (tfjs-node):

CPUBackendTime/s
Amlogic S905D3-N0N @ 1.9GHzcpu803
Amlogic S905D3-N0N @ 1.9GHztensorflow189
AMD Ryzen 5 PRO 3400GE @ 3.3GHzcpu122
AMD Ryzen 5 PRO 3400GE @ 3.3GHztensorflow36
AMD GX-420CA @ 2GHzcpu530
AMD GX-420CA @ 2 GHztensorflow119
All running Node.js 16.x

I did not expect Node.js to be just 4 times slower than C++. Really impressive. Still, using tfjs-node makes a lot of sense. While on x86_64 this was not an issue, with above instructions it’s doable on arm64 too.

Dart HTTPS Server

In my previous post I used a simple HTTPS server written in Node.js and I was curious how that would look like in Dart. And it’s very short too:

import 'dart:io';

Future<void> main() async {
  var chain = Platform.script.resolve('cert.pem').toFilePath();
  var key = Platform.script.resolve('key.pem').toFilePath();
  var context = SecurityContext()
    ..useCertificateChain(chain)
    ..usePrivateKey(key);
  var server = await HttpServer.bindSecure(InternetAddress.anyIPv4, 8080, context);
  await server.forEach((HttpRequest request) {
    print("${request.method} ${request.uri.path}");
    print(request.headers);
    request.response.close();
  });
}

Works just as well as the Node.js counterpart.

TP-Link Kasa KC120 – Streaming without Kasa

The main problems I have with IoT devices are:

  • They might send data home without me knowing about it
    • But I can monitor their traffic pattern and if they send home way more data than expected, I could disconnect them
  • They might be vulnerable to exploits
    • But I can put them on a separate VLAN at home so they don’t see other devices unless I allow it (via firewall rules)
    • I can sometimes update firmware (definitely a problem after few years)
  • They stop to work when the company turns off their servers
    • I am able to use them without Internet connectivity

Most Kasa products I own (power switches) are supported by various projects like Home Assistant or python-kasa, so turning on my Kasa power switch on my own is a simple task. Same for my LIFX light bulbs there’s even an official API.

The TP-Link KC120 camera however does not have any supported local API and contrary to my expectation, it does not support a local stream mode via a web browser interface. I can watch a live (and local) video stream via the Kasa application on the phone, but that functionality is at the mercy of TP-Link. I don’t like that.

Following are the steps to have local streaming (resp. recording) for the KC120. And with that it’s possible to do whatever I’d like to do with the stream: publishing on the Internet, processing via OpenCV, local archiving etc.

python-kasa

python-kasa does not support the camera, so you won’t see it during a normal discovery:

❯ kasa
No host name given, trying discovery..
Discovering devices on 255.255.255.255 for 3 seconds
== Plug Three - HS105(JP) ==
        Host: 192.168.21.180
        Device state: OFF

        == Generic information ==
        Time:         2022-05-03 11:37:55 (tz: {'index': 90, 'err_code': 0}
        Hardware:     2.1
        Software:     1.0.3 Build 210506 Rel.161924
        MAC (rssi):   10:27:F5:XX:XX:XX (-62)
        Location:     {'latitude': XX.0, 'longitude': XX.0}

        == Device specific information ==
        LED state: True
        On since: None

        == Modules ==
        + <Module Schedule (schedule) for 192.168.21.130>
        + <Module Usage (schedule) for 192.168.21.130>
        + <Module Antitheft (anti_theft) for 192.168.21.130>
        + <Module Time (time) for 192.168.21.130>
        + <Module Cloud (cnCloud) for 192.168.21.130>

== Plug One - HS105(JP) ==
        Host: 192.168.21.182
        Device state: OFF

        == Generic information ==
        Time:         2022-05-03 11:37:55 (tz: {'index': 90, 'err_code': 0}
        Hardware:     1.0
        Software:     1.5.8 Build 191125 Rel.135255
        MAC (rssi):   B0:BE:76:XX:XX:XX (-54)
        Location:     {'latitude': XX.0, 'longitude': XX.0}

        == Device specific information ==
        LED state: True
        On since: None

        == Modules ==
        + <Module Schedule (schedule) for 192.168.21.182>
        + <Module Usage (schedule) for 192.168.21.182>
        + <Module Antitheft (anti_theft) for 192.168.21.182>
        + <Module Time (time) for 192.168.21.182>
        + <Module Cloud (cnCloud) for 192.168.21.182>

But the camera shows up with an additional -d switch, although it’s being ignored since the tool does not know how to handle it:

❯ kasa -d
No host name given, trying discovery..
Discovering devices on 255.255.255.255 for 3 seconds
DEBUG:kasa.discover:[DISCOVERY] ('255.255.255.255', 9999) >> {'system': {'get_sysinfo': None}}
DEBUG:kasa.discover:Waiting 3 seconds for responses...
[...]
DEBUG:kasa.discover:Unable to find device type from {'system': {'get_sysinfo': {'err_code': 0, 'system': {'sw_ver': '2.3.6 Build 20XXXXXX rel.XXXXX', 'hw_ver': '1.0', 'model': 'KC120(EU)', 'hwId': 'CBXXXXD5XXXXDEEFA98A18XXXXXX65CD', 'oemId': 'A2XXXX60XXXX108AD36597XXXXXX572D', 'deviceId': '80XXXX88XXXX76XXXX88XXXXX3AXXXXXXXXXXXB6', 'dev_name': 'Kasa Cam', 'c_opt': [0, 1], 'f_list': [], 'a_type': 2, 'type': 'IOT.IPCAMERA', 'alias': 'Camera', 'mic_mac': 'D80D17XXXXXX', 'mac': 'D8:0D:17:XX:XX:XX', 'longitude': XX, 'latitude': XX, 'rssi': -38, 'system_time': 1651545748, 'led_status': 'on', 'updating': False, 'status': 'configured', 'resolution': '720P', 'camera_switch': 'on', 'bind_status': True, 'last_activity_timestamp': 1651545210}}}}: Unable to find the device type field!
[...]

Important fields here are the deviceID and via the MAC address, you can find out what IP address the camera has (if you use DHCP). In my case 192.168.21.187 is the camera’s IP address.

nmap

nmap shows only port 9999 open which is the known TP-Link debug port. But there’s more ports:

❯ sudo nmap -p- 192.168.21.187
Starting Nmap 7.80 ( https://nmap.org ) at 2022-05-03 11:51 JST
Nmap scan report for kc120.lan (192.168.21.187)
Host is up (0.012s latency).
Not shown: 65531 closed ports
PORT      STATE SERVICE
9999/tcp  open  abyss
10443/tcp open  unknown
18443/tcp open  unknown
19443/tcp open  unknown
MAC Address: D8:0D:17:XX:XX:XX (Tp-link Technologies)

Nmap done: 1 IP address (1 host up) scanned in 9.28 seconds

And with that port information I found this article: https://medium.com/@hu3vjeen/reverse-engineering-tp-link-kc100-bac4641bf1cd. It’s about a slightly different camera model, but since the ports patch, maybe more does.

I followed it, however I could not get the authentication working: the Kasa account password as per article did not work. Time to do the ARP spoofing to see what the Android app uses to authenticate! Geistless did a great job explaining the steps he took.

My overall plan:

  1. Redirect the traffic from the Kasa app on the phone to my Linux machine (via arpspoof)
  2. Redirect the incoming HTTPS traffic to my HTTPS server (via iptables)
  3. Print the URL and headers for incoming HTTPS traffic which arrives at my HTTPS server

arpspoof

The dsniff package contains arpspoof:

❯ sudo apt install dsniff
[...]
❯ sudo setcap CAP_NET_RAW+ep /usr/sbin/arpspoof

My HTTPS Server

While the original author had a https server as part of his Rust learning, I created a NodeJS version. But first we’ll need keys. Self-signed is fine:

❯ openssl genrsa -out key.pem
❯ openssl req -new -key key.pem -out csr.pem
❯ openssl x509 -req -days 999 -in csr.pem -signkey key.pem -out cert.pem
❯ rm csr.pem

Now the simple HTTPS server listening on port 8080:

const https = require('https');
const fs = require('fs');

const options = {
  key: fs.readFileSync('key.pem'),
  cert: fs.readFileSync('cert.pem')
};

https.createServer(options, function (req, res) {
  console.log(req.url);
  console.log(req.headers);
  res.writeHead(200);
  res.end("");
}).listen(8080);

Some IP traffic routing rules to redirect all incoming TCP traffic on enp1s0 for ports 10443, 18443 and 19443 to port 8080:

❯ sudo iptables -t nat -A PREROUTING -i enp1s0 -p tcp --dport 10443 -j REDIRECT --to-port 8080
❯ sudo iptables -t nat -A PREROUTING -i enp1s0 -p tcp --dport 18443 -j REDIRECT --to-port 8080
❯ sudo iptables -t nat -A PREROUTING -i enp1s0 -p tcp --dport 19443 -j REDIRECT --to-port 8080
❯ sudo sysctl net.ipv4.ip_forward=1

Now run the https server and watch it display the URL and the headers for an incoming request on port 19443:

❯ node ./https.js

and to test, on another machine I ran:

$ curl -k -u admin:abc 'https://t621.lan:19443/test?a=3&b=5'

and this is the output of my https server:

/test?a=3&b=5
{
  host: 't621.lan:19443',
  authorization: 'Basic YWRtaW46YWJj',
  'user-agent': 'curl/7.68.0',
  accept: '*/*'
}

The basic authentication is base64 encoded. To decode:

❯ echo YWRtaW46YWJj | base64 -d
admin:abc

So that works. Now putting it all together.

  • Start the Kasa app on the phone. Make sure the KC120 is enabled and can display a live video stream. Stop the stream.
  • Have the iptables redirect rules in place. And IP forwarding in the kernel.
  • Start the HTTPS server.
  • Run arpspoof. 192.168.21.55 is the phone’s IP which runs the Kasa application. 192.168.21.187 is the IP of the KC120.
❯ arpspoof -i enp1s0 -t 192.168.21.55 192.168.21.187
7c:d3:a:xx:xx:xx 38:78:62:xx:xx:xx 0806 42: arp reply 192.168.21.187 is-at 7c:d3:a:xx:xx:xx
  • On the mobile app, try to connect to the video stream of the KC120 again
  • You should now see some output of the HTTPS server:
/https/stream/mixed?video=H264&audio=G711
{
  authorization: 'Basic aXXXXXXXXXXXXXXXXM=',
  connection: 'keep-alive',
  'user-agent': 'Dalvik/2.1.0 (Linux; U; Android 10; H8296 Build/52.1.A.3.49)',
  host: '192.168.21.187:19443',
  'accept-encoding': 'gzip'
}

And then I finally had the authentication string the camera wanted!

❯ echo 'aXXXXXXXXXXXXXXXXM=' | base64 -d
MY_KASA_ACCOUNT:THE_CAMERA_PASSWORD

Turns out that the password to use was not the Kasa password: it’s a longish string of hex digits. That might be a KC120 specialty or it might depend on the firmware version. I cannot say since I have no KC100, but whatever the password is, it’s possible to find out relatively easily using above approach.

The Result: Local Streaming!

I can connect to the video stream! And with very little CPU usage too.

❯ curl -k -u 'MY_KASA_ACCOUNT:THE_CAMERA_PASSWORD' \
--ignore-content-length \
"https://192.168.21.187:19443/https/stream/mixed?video=h264&audio=g711&resolution=hd&deviceId=80XXXX88XXXX76XXXX88XXXXX3AXXXXXXXXXXXB6" \
--output - | ffmpeg -hide_banner -y -i - -vcodec copy kc120stream.mp4

To change resolution, change it in the Kasa app. 1920×1080 (1.4Mb/s), 1280×720 (850kbit/s) and 640×360 (350kbit/s) are possible.

TODO

  • There is no audio coming from the camera. Audio works on the Kasa app.
  • It would also be nice to understand how to change the configuration of the camera (e.g. change resolution), but it’s ok to set them once via the Kasa app.
  • What options do the parameter video, audio and resolution support?

Moving Things

Servos are great to rotate/move things around, but they are limited in their capabilities. Steppers are more versatile and controlling them is not hard with the help of stepper driver modules. But since they do expect a fairly high rate of step pulses, a dedicated controller is needed. This is a solved problem though: GRBL takes care of that and it accepts G-Code which looks like this:

G0X100

To move the X axis to the position at 100mm. Generating movements are simply a stream of such strings. Sending a

G0X100.5

after 1 second will results in a moving speed of 0.5mm/s. A nice part of GRBL is that it also controls acceleration and deceleration. Important for moving heavy objects for long distances at high speed.

But traditional GRBL uses an Arduino which is not network connected. Luckily GRBL was ported to the ESP32 CPU with its WiFi interface. Even better: FluidNC was created improving on a lot of areas, like configuration (no need to recompile for a config change) and connectivity (IP or Bluetooth and of course serial).

Naturally that looked like an interesting thing to try out.

Hardware

  • A NEMA17 stepper (200 steps/rotation) with a timing belt moving a slider along an aluminium profile
  • A stepper driver (DRV8825 I think I use)
  • Makerbase MKS DLC32
  • A end-stop sensor (microswitch in my case)

Configuration

  • Get the FluidNC firmware from here
  • Erase the FLASH on the ESP32 with the included erase script (on Windows: run erase.bat)
  • Flash the WiFi version (on Windows: run install-wifi.bat)
  • You should now be able to connect via fluidterm.bat and for any debugging this is very helpful as you can see the boot process and early errors.
  • Configure WiFi according to this. That should be it as most default are sensible and thus not much to configure beside the SSID and the password:
$Sta/SSID=myssid
$Sta/Password=mypasswordforthessid
  • Reboot ($Bye) and check network parameters ($I):
$I
[VER:3.4 FluidNC v3.4.3:]
[OPT:PHS]
[MSG: Machine: Slider]
[MSG: Mode=STA:SSID=myssid:Status=Connected:IP=192.168.3.18:MAC=66-55-44-33-22-11]
ok
  • Connect to the Web UI at http://192.168.3.18 (the IP you get via $I obviously)
  • Upload a file for the configuration for the MKS DLC32 and the hardware setup you have. In my case: I only use the x-axis, so my config file looks like below. It’s almost 100% of the example config and the main changes are:
    • idle_ms=255 which keeps the stepper powered forever so it can hold things in place
    • steps_per_mm and max)travel_mm for the x-axis to match my hardware
    • turn off homing for y and z axis since I don’t use them
board: MKS-DLC32 V2.1
name: Slider
meta: (01.01.2022) by Skorpi

kinematics:
  Cartesian:

stepping:
  engine: I2S_STREAM
  idle_ms: 255
  pulse_us: 4
  dir_delay_us: 1
  disable_delay_us: 0
axes:
  shared_stepper_disable_pin: I2SO.0
  x:
    steps_per_mm: 40.7
    max_rate_mm_per_min: 15000.000
    acceleration_mm_per_sec2: 500.000
    max_travel_mm: 440.000
    soft_limits: true
    homing:
      cycle: 1
      positive_direction: false
      mpos_mm: 0.000
      feed_mm_per_min: 300.000
      seek_mm_per_min: 5000.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: gpio.36
      hard_limits: true
      pulloff_mm: 2.000
      stepstick:
        step_pin: I2SO.1
        direction_pin: I2SO.2

  y:
    steps_per_mm: 428.0
    max_rate_mm_per_min: 12000.000
    acceleration_mm_per_sec2: 300.000
    max_travel_mm: 440.000
    soft_limits: true
    homing:
      cycle: 0
      positive_direction: false
      mpos_mm: 0.000
      feed_mm_per_min: 300.000
      seek_mm_per_min: 5000.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: gpio.35
      hard_limits: false
      pulloff_mm: 2.000
      stepstick:
        step_pin: I2SO.5
        direction_pin: I2SO.6:low

  z:
    steps_per_mm: 157.750
    max_rate_mm_per_min: 12000.000
    acceleration_mm_per_sec2: 500.000
    max_travel_mm: 80.000
    soft_limits: true
    homing:
      cycle: 0
      positive_direction: false
      mpos_mm: 0.000
      feed_mm_per_min: 300.000
      seek_mm_per_min: 1000.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: gpio.34
      hard_limits: false
      pulloff_mm: 1.000
      stepstick:
        step_pin: I2SO.3
        direction_pin: I2SO.4

i2so:
  bck_pin: gpio.16
  data_pin: gpio.21
  ws_pin: gpio.17

spi:
  miso_pin: gpio.12
  mosi_pin: gpio.13
  sck_pin: gpio.14

sdcard:
  cs_pin: gpio.15
  card_detect_pin: NO_PIN

control:
  safety_door_pin: NO_PIN
  reset_pin: NO_PIN
  feed_hold_pin: NO_PIN
  cycle_start_pin: NO_PIN
  macro0_pin: gpio.33:low:pu
  macro1_pin: NO_PIN
  macro2_pin: NO_PIN
  macro3_pin: NO_PIN

macros:
  startup_line0:
  startup_line1:
  macro0: $SD/Run=lasertest.gcode
  macro1: $SD/Run=home.gcode
  macro2:
  macro3:

coolant:
  flood_pin: NO_PIN
  mist_pin: NO_PIN
  delay_ms: 0

probe:
  pin: gpio.22
  check_mode_start: true

Laser:
  pwm_hz: 5000
  #L on Beeper / IN on TTL
  output_pin: gpio.32
  enable_pin: I2SO.7
  disable_with_s0: false
  s0_with_disable: false
  tool_num: 0
  speed_map: 0=0.000% 0=12.500% 1700=100.000%
# 135=0mA 270=5mA 400=10mA 700=16mA
user_outputs:
  analog0_pin: NO_PIN
  analog1_pin: NO_PIN
  analog2_pin: NO_PIN
  analog3_pin: NO_PIN
  analog0_hz: 5000
  analog1_hz: 5000
  analog2_hz: 5000
  analog3_hz: 5000
  digital0_pin: NO_PIN
  digital1_pin: NO_PIN
  digital2_pin: NO_PIN
  digital3_pin: NO_PIN

start:
  must_home: false

  • When done, name the file you just uploaded:
$Config/Filename=config2.yaml

  • Then you have to “Home” once so the controller knows where everything is (using telnet for a change since network is up now):
❯ telnet 192.168.3.18 23
Trying 192.168.3.18...
Connected to 192.168.3.18.
Escape character is '^]'.

Grbl 3.4 [FluidNC v3.4.3 (wifi) '$' for help]
$H
ok
?
<Idle|MPos:0.000,0.000,0.000|FS:0,0|Pn:PYZ|Ov:100,100,100>
ok
  • and now you should be able to move the slider via very simple G-Code (x axis to 100mm position):
G0X100
ok
  • If you get an error for the $H command, it’s likely that you don’t have a working end-stop for the axis which are supposed to have one. A quick fix is to use $X to disable end-stop checks. It’ll allow axis movements, but it does no checks for movements.

Node.js sending commands

GRBL has no single command to do a slow controlled motion, so in order to do that, a program needs to send G-Code commands to it. Node.js to the rescue! Below test program moves the slider 2 times back and forth and when done, it closes the connection:

// Test to send commands to GRBL (FluidNC)

const net=require('net');

let stateIsIdle=false;
let statusLine='';

function gotALine(s) {
  console.log('Got a line: '+s);
  if (s.startsWith('<Idle|')) {
    if (stateIsIdle==true) {
      console.log('Idle detected again');
      client.end();
      process.exit(0);
    } else {
      stateIsIdle=true;
      console.log('Idle detected');
    }
  }
}

let client=new net.Socket();
client.connect(23, '192.168.21.118', () => { console.log('Got connected'); });
client.on('data', (data) => {
  let s=data.toString();
  if (s.indexOf('\n') < 0) {
    statusLine+=s;
  } else {
    statusLine+=s;
    gotALine(statusLine.trim());
    statusLine='';
  }
});

client.on('close', () => { console.log('Closed connection'); });

function sendStatusRequest() {
  if (client) client.write('?\n');
}

setInterval(sendStatusRequest, 1000);

for (let i=0; i<2; ++i) {
  client.write('G0X0\n');
  client.write('G0X400\n');
}

Problems

  • When requesting a status via ‘?’, it seems the stepper steps take a short break which causes a jerky movement. This is very reproducible. Issue created for this. Using I2S_STREAM helps a lot, but it’s not 100% fixed. I2S_STREAM has another problem though…
  • I2S_STREAM seems to be inaccurate: moving 4 times 100mm and then moving back to 0 leaves several mm missing. The same test with I2S_STATIC shows zero error.