How to transfer binary data efficiently across worker threads in NodeJs
How to return a Buffer from a worker
postMessage calls
Worker communication uses postMessage
calls, and values passed to it will be cloned using the structured clone
algorithm. This makes it easy to send objects, string, arrays, numbers, and even supports
circular references and Dates.
But what if the data is binary data in a Buffer? For example, an image that the worker generated, a PDF file, or a ZIP.
I got curious then: what are the ways to send binary data via postMessage
? And what is the best one?
Sending a Buffer
Let's start with the baseline: what happens if the Buffer is sent directly?
parentPort.postMessage(buffer);
Where buffer
is a Buffer with some bytes:
<Buffer 72 61 77 20 62 75 66 66 65 72>
In this case, the other side gets an Uint8Array:
Uint8Array(10) [
114, 97, 119, 32,
98, 117, 102, 102,
101, 114
]
The two objects contain the same bytes, just the <Buffer ...>
is in hexadecimal and the Uint8Array() [...]
is in decimal (7 * 16 + 2 = 114).
On the receiving end, it's easy to convert back to a Buffer: Buffer.from(uint8array)
.
While the bytes are copied from one end to the other, this solution is easy: just return the Buffer and the data makes it to the other side. The downside is that it changes the type of the value (Buffer to Uint8Array) and that it will have a performance impact for larger data.
ArrayBuffer
While a Buffer is not transferable, an ArrayBuffer is:
The items that various specifications indicate can be transferred are:
- ArrayBuffer
...
So, what is an ArrayBuffer and how does it relate to Buffer and Uint8Array?
A Buffer and the typed arrays (for example, an Uint8Array) is a view over an ArrayBuffer. This means the ArrayBuffer holds the bytes underlying the other
objects. For example, Buffer.buffer
returns the ArrayBuffer that backs the Buffer.
ArrayBuffer is transferable, which means it can be sent via postMessage
without copying. How it works is transferring the object makes it unusable on the
sending end, i.e. no reading or writing it after that point.
While it's tempting to go and transfer the ArrayBuffer of a Buffer, it is not correct. Since the Buffer is a view over the ArrayBuffer, the latter can contain more data. This is especially apparent for small Buffers:
Buffer.from("abc", "utf8").buffer
ArrayBuffer {
[Uint8Contents]: <2f 00 00 00 00 00 00 00 61 62 63 00 00 00 00 00 61 62 63 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 8092 more bytes>,
byteLength: 8192
}
The Buffer has 2 properties that define which part of the underlying ArrayBuffer it uses: byteOffset
and length
:
> buffer.byteOffset
128
> buffer.length
3
Part of the ArrayBuffer can be copied to a new ArrayBuffer using the slice
method. With that, it's possible to construct one that contains exactly the
bytes needed to transfer.
As an extra performance trick, if the binary data is large enough the ArrayBuffer contains no extra bytes:
> Buffer.from("a".padStart(10000, " ")).buffer
ArrayBuffer {
[Uint8Contents]: <20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 ... 9900 more bytes>,
byteLength: 10000
}
In that case, there is no need for copying it. And since that happens for larger arrays, it makes sure that copying does not happen when its impact is the biggest.
To get an ArrayBuffer from a Buffer that contains only the necessary bytes:
const arrayBuf = (() => {
if (buffer.byteOffset === 0 && buffer.byteLength === buffer.buffer.byteLength) {
// no extra bytes, return the ArrayBuffer
return buffer.buffer;
}else {
// copy the relevant part to a new ArrayBuffer
return buffer.buffer.slice(buffer.byteOffset, buffer.byteOffset + buffer.byteLength);
}
})();
Finally, transfer it:
parentPort.postMessage(arrayBuf, [arrayBuf]);
The receiving end gets an ArrayBuffer that can be converted back to a Buffer:
Buffer.from(arrayBuf);
This implements an efficient, in the larger cases, zero-copy transfer between contexts.
ReadableStream
A surprising transferable object is a ReadableStream. In this case, the receiving end can read bytes on demand which can provide an even more efficient way of transfer because:
- the result can be larger than the available memory
- the receiving end might not want to read the whole stream
In NodeJS there is a mess at this moment in regards to streams. There is the Readable that implements readable streams in a Node-specific way, and then there is the ReadableStream (a.k.a. web streams) that is more cross-platform. The reasoning, from the documentation:
It is similar to the Node.js Streams API but emerged later and has become the "standard" API for streaming data across many JavaScript environments.
The ReadableStream is transferable and the mechanism is similar to how the ArrayBuffer works: once transferred, it can not be used on the sending side anymore.
To construct a ReadableStream from a Buffer:
const readableStream = Readable.toWeb(Readable.from(buffer));
parentPort.postMessage(readableStream, [readableStream])
And to read the contents:
import {buffer} from "node:stream/consumers";
await buffer(msg);
Of all the possible ways to pass binary data, this is the most resource-friendly. On the other hand, reading a transferred stream requires the worker running. If it is terminated or otherwise stopped, the stream becomes unreadable.
This is especially problematic when the worker is used with a thread pool, as it usually builds a clear "end state" for the workers after which the pool manager is free to reuse or terminate them.
Nevertheless, I found it interesting that streams can be transferred as well.