Parallel Processing in JS
A detailed introduction to Web Workers
Luckily, with the introduction and widespread adoption of Web Workers, we can now do resource-intensive calculations on background threads. On the down side, the specification had to fit into the current ecosystem, and it feels quite awkward at times. If you came from languages where threading is supported from the beginning, you might find the amount of restrictions surprising. It’s far from just instantiating a new Thread and everything you write into that will be processed in parallel.
This post is an introduction to Web Workers, how and when you can use them, and their peculiarities. I’ll also cover how to use them in WebPack, and the possible pitfalls.
Code (html, js) and demo are available.
This will create a Worker instance with the desired code. You can then communicate with the worker using postMessage, same as you would with IFrames. Since there are no cross-origin problems in play, there is no need to verify origins.
And in the worker’s code, you can listen to these events.
This works both ways, so you can postMessage some data back from the worker’s code into your main program.
These are the basics to get started with workers.
You have multiple methods to handle errors in the worker’s code. You can catch and pass them via the postMessage channel. That would require some extra coding, but this is the most versatile and safest way.
The other way is to use the onerror handler. This catches all exceptions that are not handled inside the worker itself and lets the caller code decide how to proceed. To set up error handling, all you need is to attach a handler.
To ease debugging, there are some extra fields in the exception object. There are filename, lineno, and colno properties that indicate where things went wrong.
Cleaning up workers after they are not needed is crucial. They spawn real OS-level threads, and they can easily kill the browser process if you spawn too many of them simultaneously.
You have two ways to kill a worker process: inside the worker, or from outside. I think it’s best to handle the lifecycle from the main page, but there might be certain situations you might think otherwise.
To kill a worker, simply call its terminate() method. This will abruptly kill it, freeing all resources it is using. If it was processing something, that would halt too.
If you want the worker to manage its lifecycle, just call the stop() method from the worker code.
Either way, the worker is stopped, and should not leave anything behind.
If you are using one-shot workers, that are doing some computations and then get discarded, make sure you are terminating them in the onerror handler too. Failing to do so will introduce hard-to-find leaks in your code.
Moving the worker code out to a separate file is making some simple cases more difficult than they should be. Luckily, the workers can be instantiated using a Blob, and you can make them however you’d like.
To make an inline worker, just create a Blob with the desired code, make an Object URL from it, and you can give it the worker constructor.
Since you are creating a global ObjectURL, don’t forget to get rid of it when it’s not needed. Generally, you’d revoke it when you terminate the worker instance.
Workers in Workers
In theory, you can spawn subworkers inside a worker, and it should work the same as you create one from the main thread. There is even an example in the spec on how to do it. But unfortunately there is a long-standing Chrome bug, that prevents this use case. It might get fixed at some point, but this ticket was opened back in 2010, and there has been little progress since. You should not rely on this feature.
Update: It should work now just fine.
There are a few edge cases that you should be aware of when passing data to and from the worker. Passing simple values like numbers, strings, and arrays work as you’d expect. You can pass simple structures and they get serialized/deserialized properly. In effect, you should not resort to serializing objects into JSON just to keep the structure; in fact, postMessage uses a structured clone algorithm, which can process a few more types, like RegExps and Blobs, and circular references.
That said, you should still limit what you pass to the simplest types as possible. There is no way you could pass functions, and even the supported types have some limitations; those would easily manifest themselves as hard-to-debug bugs. If you could define your API to handle only strings, numbers, arrays, and objects, you are less likely to face these kinds of problems.
If you have a complex object, there can be circular references in it. If you try to serialize it into JSON, you’ll get a TypeError: Converting circular structure to JSON.
But you can safely pass the same object to the postMessage, and you can use it inside the worker.
To prevent concurrent modification, everything you pass to postMessage is copied to the other side. This makes sure that you can not modify the same object from two places in parallel.
But if you want to pass large amounts of data around, you’ll quickly experience how slow these copy operations are. For example, if you are doing image-related calculations, you’re likely to pass whole images; making the copies might easily be the bottleneck.
Fortunately, there are transferable object, and you can, well, transfer them instead of copy them. One such transferable object is an ArrayBuffer, which can contain just about any raw data.
If you transfer an object, the thread that originally owned it loses access. It makes sure that while the data is not copied, no concurrent modifications can happen.
The postMessage syntax is quite awkward regarding transferables. You need to pass the data as previous as the first argument, but you need to pass an array of transferables as the second one.
Make sure you pass the transferable in the second argument; if you forget, the data will be copied over.
To use Web Workers with Webpack, you can use the worker-loader. Just add it to the devDependencies section at your package.json, run npm install, and it’s all set up.
To use a worker, simply require it.
This will instantiate the worker, and you can use it the same way as without Webpack.
To instantiate an inline worker, all you need to do is add the inline query parameter to the loader.
You can even import and use any modules inside the workers.
The worse parts
Getting the workers up and running in Webpack is quite easy. But there are a few obstacles you need to be aware of when you use this approach.
First, there seems to be no way to move out the common parts of the code. If you have a worker that depends on a piece of code, then it will be included no matter whether other parts of your codebase also use it. And if you have multiple workers using the same library, then it will be included in all of them.
You might think that if you dump worker-loader and specify a new entry point then use the CommonsChunkPlugin, it will take care of this. But unfortunately workers are not like a browser window, and some features are not available that the resulting code would require.
Also, using inline workers do no better in this regard. The shared code is still present in multiple places inside the bundle.
And second, inline workers leak ObjectURLs. They are created, but never freed. This might not seem a big problem, but if you use a lot of one-shot workers, it may affect performance.
Based on the above observations, my advice is to use normal workers, and look out what you import into them. And also make sure you send appropriate cache headers, so that the browser does not need to download the code more than once.
Web Workers are very similar to IFrames, and they might give the impression that using them would also result in parallel processing. But since IFrames have access to non-threadsafe APIs, like the DOM, the browser cannot spawn new threads for them. Click here for a demonstration.
Cross-origin IFrames are quite different. They do not have access to most of the APIs, and they can communicate only via postMessage, same as Web Workers. This, in theory, allows browsers to run these IFrames on different threads and that would result in parallel processing.
But in practice, they are still single-threaded, and the browser doesn’t give them any special handling. To see it in action, click here.
That said, if you need parallel processing, you don’t need to wait for something else. Learn its rough edges, use the tools you need, and you can drastically improve the user experience.