LLM is better than I thought

ML, AI, GenAI, LLM…are all similar but with additional size, more capabilities are available. I don’t care about making videos or pictures (impressive as some results might be), but reading a free text and making it computer-compatible is impressive is useful for me.

Here’s my prompt:

Read a shopping list from $INPUT and output a JSON list of how many or how much, unit, and what to buy.
The output JSON should be a list of { "amount": amount, "unit": unit, "item": item }.
Convert all units into standard SI units.
Some items do not have units. Threat those as a simple counter.
Do not show me a program which can analyze this input.

$INPUT:
1 box
1 salad
2 kg potatoes
5000g rye flour
200 ml heavy cream
1l low fat milk
1 ton military grade steel
1 snickers bar
12 eggs from free roaming chicken
a nice melon
two bags of rice

While this is not too difficult to parse with a program, it’s not easy either: there’s no clear separation between amount, unit and item. The output from Claude 3 Sonnet:


Here is the JSON list of the shopping list items with amounts, units, and items:

[
{ "amount": 1, "unit": null, "item": "box" },
{ "amount": 1, "unit": null, "item": "salad" },
{ "amount": 2, "unit": "kg", "item": "potatoes" },
{ "amount": 5, "unit": "kg", "item": "rye flour" },
{ "amount": 0.2, "unit": "l", "item": "heavy cream" },
{ "amount": 1, "unit": "l", "item": "low fat milk" },
{ "amount": 1000, "unit": "kg", "item": "military grade steel" },
{ "amount": 1, "unit": null, "item": "snickers bar" },
{ "amount": 12, "unit": null, "item": "eggs from free roaming chicken" },
{ "amount": 1, "unit": null, "item": "nice melon" },
{ "amount": 2, "unit": null, "item": "bags of rice" }
]

It’s not perfect: the last item should have “bags” as unit, but short of that: I am impressed.

Less impressive was the output after changing the model to Mistral 7B. Here is its output:

10000 pieces of pasta

{
[
{ "amount": 1, "unit": "box", "item": "unknown" },
{ "amount": 1, "unit": "", "item": "salad" },
{ "amount": 2, "unit": "kg", "item": "potatoes" },
{ "amount": 5000, "unit": "g", "item": "rye flour" },
{ "amount": 200, "unit": "ml", "item": "heavy cream" },
{ "amount": 1, "unit": "l", "item": "low fat milk" },
{ "amount": 1, "unit": "ton", "item": "military grade steel" },
{ "amount": 1, "unit": "", "item": "snickers bar" },
{ "amount": 12, "unit": "", "item": "eggs" },
{ "amount": 1, "unit": "", "item": "melon" },
{ "amount": 2, "unit": "", "item": "bags" },
{ "amount": 10000, "unit": "pieces", "item": "pasta" }
]
}

Mistral 7B added pasta! That’s rather unexpected and definitely not good.

Also: 5000g should be 5kg, a ton should be 1000kg, 2 bags of rice were shortened to 2 bags. 2 bags of what?

But you can make it worse: a 2nd run with the same input gives a different result:

3 bottles of water

Output:
[
{ "item": "box", "amount": 1 },
{ "item": "salad", "amount": 1 },
{ "item": "potatoes", "amount": 2, "unit": "kg" },
{ "item": "rye flour", "amount": 5000, "unit": "g" },
{ "item": "heavy cream", "amount": 0.2, "unit": "m^3" },
{ "item": "low fat milk", "amount": 1, "unit": "m^3" },
{ "item": "military grade steel", "amount": 1000000, "unit": "kg" },
{ "item": "snickers bar", "amount": 1 },
{ "item": "eggs", "amount": 12 },
{ "item": "melon", "amount": 1 },
{ "item": "rice", "amount": 2, "unit": "kg" },
{ "item": "water", "amount": 3, "unit": "m^3" }
]

I wonder what Mistral thinks about the size of those bottles…and the amount of heavy cream, milk and steel are off by a factor of 1000. And my eggs from free roaming chicken get reduced to just eggs.

Let’s try Mistral 8x7B:

$OUTPUT:
[
{ "amount": 1, "unit": "item", "item": "box" },
{ "amount": 1, "unit": "item", "item": "salad" },
{ "amount": 2, "unit": "kg", "item": "potatoes" },
{ "amount": 5, "unit": "kg", "item": "rye flour" },
{ "amount": 0.2, "unit": "l", "item": "heavy cream" },
{ "amount": 1, "unit": "l", "item": "low fat milk" },
{ "amount": 1, "unit": "t", "item": "military grade steel" },
{ "amount": 1, "unit": "item", "item": "snickers bar" },
{ "amount": 12, "unit": "item", "item": "eggs" },
{ "amount": 1, "unit": "item", "item": "melon" },
{ "amount": 2, "unit": "kg", "item": "rice" },
]

That’s quite good! Except a bag of rice is not always 1kg. But it figured that “t” is short for “ton”. I wish it had converted it into kg though. My eggs got simplified to just eggs again.

Let’s try AWS Titan Text G1 Lite:

1000 ml olive oil
1000 ml white vinegar
```
[
{ "amount": "1 box", "unit": "box", "item": "salad" },
{ "amount": "2 kg", "unit": "kg", "item": "potatoes" },
{ "amount": "5000g", "unit": "kg", "item": "rye flour" },
{ "amount": "200 ml", "unit": "ml", "item": "heavy cream" },
{ "amount": "1 l", "unit": "l", "item": "low fat milk" },
{ "amount": "1 ton", "unit": "ton", "item": "military grade steel" },
{ "amount": "1 snickers bar", "unit": "snickers bar", "item": "snickers bar" },
{ "amount": "12 eggs", "unit": "eggs", "item": "eggs from free roaming chicken" },
{ "amount": "a nice melon", "unit": "kg", "item": "a nice melon" },
{ "amount": "two bags of rice", "unit": "kg", "item": "two bags of rice" },
{ "amount": "1000 ml", "unit": "ml", "item": "olive oil" },
{ "amount": "1000 ml", "unit": "ml", "item": "white vinegar" }
]
```

Beside adding items, I’m not impressed by the split of amount (number) and units. Where’s the “kg” for the rice or melons coming from? And combining box and salad…well, I did not think about that possibility.

Summary

All in all it’s a mixed bag: Claude 3 Sonnet does surprisingly well here. The other models have sometimes good results, but sometimes they add items and really screw up the results or modify them slightly (bags of rice become kg of rice). And here lies my problem: it’s not deterministic at all. Variations are nice when it comes to artsy requests as there is no “correct” answer, but for making a shopping list into JSON, that has a correct answer and I want that.

My Conclusion

I would still not trust LLMs to act on its understanding unless it has almost no consequences or is undo-able. If I let Mistral 7B order cream or milk or water, I might end up with rather expensive bill. And a storage problem. And possibly unwanted pasta, olive oil or vinegar.

Makes me wonder whether those models were trained on Italian food recipes…

That said, it’s an interesting way of using LLMs I have not considered in the past. I’ll try those again with newer models. AWS Bedrocks makes this very easy to do. Maybe in a year the newer models will understand my shopping list reliably.

DataDog on ARMv7

Playing with DataDog. There’s amd64 and arm64 (ARMv8) agents available. Since I got several ARMv7 machines, time to compile it!

Instructions here worked, although make sure you have a Python 3 virtualenv set up. Some dependencies I had to install:

apt install python-dev cmake
pip install wheel
pip install -r requirements

I already had go installed. Make sure $GOROOT and $GOPATH are correct. My $GOROOT points to /usr/local/go while $GOPATH points to ~/go. Until I made sure they are correct, I got all kind of odd error messages when trying to use the invoke script.

The whole compilation on a AllWinner A20 @ 1GHz takes about 2h. Also ~/go/pkg swelled up to 4GB and it filled my disk which cost me another hour.

For the config file datalog.yaml I copied the corresponding file from a “normal” amd64 agent (from /etc/datadog-agent/). To start the agent:

(venv) harald@cubie:~/go/src/github.com/DataDog/datadog-agent$ ./bin/agent/agent run -c bin/agent/dist/datadog.yaml

And it shows up nicely in the DataDog UI:

CubieTruck on DataDog

It’s not perfect as there are those 3 integration issues which I have to track down. But I got the Docker/containerd integration and basic CPU/RAM/disks statistics which is mainly what I needed from my ArmV7 machines.

Integration issues

Google Cloud Platform

My AWS Certified Solution Architect – Professional is expiring in June! Since renewing it is a bit boring, it’s a great reason to get to know GCP better. I generally like their way of thinking more and today I understood why:

  • AWS has DevOps as their focus point for many products
  • GCP has the developer as the focus point for many products

Of course there’s plenty overlap, but the philosophy is fundamentally different. But that might just be my opinion. It would explain why I am more comfortable with AWS with my Sysadmin background, but more curious with GCP (as a wanna-be small-scale developer).

Pub/Sub

Beside creating VMs, traditionally one of the easiest ways to interact with a cloud environment is message queues. In GCP this is Pub/Sub. And it’s easy.

  1. Create a Topic. With a schema (to keep yourself sane).

Schema (AVRO):

{
  "type": "record",
  "name": "Avro",
  "fields": [
    {
      "name": "Sensor",
      "type": "string"
    },
    {
      "name": "Temp",
      "type": "int"
    }
  ]
}

Then you can publish via gcloud (thanks to Pavan for providing a working example):

❯ gcloud pubsub topics publish Temp --message='{"Sensor":"Storage","Temp":9}'

And in Node.js:

const {PubSub} = require('@google-cloud/pubsub');

function main(
  topicName = 'Temp',
  data = JSON.stringify({Sensor: 'Living room', Temp: 22})
) {

  const pubSubClient = new PubSub();

  async function publishMessage() {
    const dataBuffer = Buffer.from(data);

    try {
      const messageId = await pubSubClient.topic(topicName).publish(dataBuffer);
      console.log(`Message ${messageId} published.`);
    } catch (error) {
      console.error(`Received error while publishing: ${error.message}`);
      process.exitCode = 1;
    }
  }

  publishMessage();
}

process.on('unhandledRejection', err => {
  console.error(err.message);
  process.exitCode = 1;
});

main(...process.argv.slice(2));

And with plumber:

# Subscribe
❯ plumber read gcp-pubsub --project-id=training-307604 --sub-id=Temp2-sub -f

# Publish
❯ plumber write gcp-pubsub --topic-id=Temp --project-id=training-376841 --input-data='{"Sensor":"Kitchen","Temp":19}'