Dart & Pool

Tip of the day: If you use Dart and want to use the Pool library, expect not much help from Google when searching for those keywords: you get the expected results. Adding “future” or “async” helps.

Anyway, the point of this post is a small example how to use a Pool to run commands in parallel, but not too many concurrently.

import 'dart:io';
import 'package:pool/pool.dart';
import 'package:test/test.dart';

Future<ProcessResult> runCommand(String command, List<String> args) {
  return Process.run(command, args);
}

void main() {
  test('Run 20 slow date commands', () async {
    var pool = Pool(5);
    List<Future<ProcessResult>> results = [];
    for (var i = 0; i < 20; ++i) {
      results.add(pool.withResource(() => runCommand('./date_slow', ['+%N'])));
    }
    pool.close();
    await pool.done;
    for (var process in results) {
      var res = await process;
      print(res.stdout);
    }

./date_slow is a simple script which returns something on STDOUT and finishes in 2s:

#!/bin/bash
date $*
sleep 2

What happens then is that Pool(5) creates a pool with 5 slots. The first for loop tries to run 20 commands though and it’ll queue all 20 immediately, but only 5 at most will run. The rest simply waits until it’s their turn.

pool.close() stops any new entries and the await pool.done simply waits until the pool is closed and all jobs are executed.

The 2nd for loop (with the print() statement) uses await to get the ProcessResult from the Future<ProcessResult> which Process.run() returns and which is store in the results[] list.

The outcome here is that if the pool is 5 slots large, and each command runs for 2 seconds, the complete set of 20 jobs runs in 20/5*2=8 seconds. If I make the pool 10 slots large, it’ll run in 20/10*2=4 seconds and in case of 20 or more slots, it’ll be 2 seconds. And it’ll never run more processes than slots are available.

Why I need this? I have a list of URLs from few to hundreds which I need to query. While I can query many concurrently, each instance takes up some non-trivial amount of memory since it’s an external program. Currently at 100 concurrent calls, it uses up all memory on a 16GB RAM notebook. While there are many ways to work around this (do one at a time is the safest and slowest one), using a pool is perfect: it runs many commands in parallel, but I can limit the number of how many run in parallel.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.