How to write a Webpack loader

Getting started with processing files during a Webpack build

Author's image
Tamás Sallai
7 mins

Processing files in a webapp

Out of the box, Webpack can load many file types, such as JSON, images, and even raw data. It covers most of the use cases. In one of my projects, I needed to show a PDF file as images and I started to wonder what is the best approach for this and whether a loader would be a good solution.

In a situation like this, there are two approaches. The first one is to process the file on the client-side with a suitable library or web API. This is not always possible, as the browser limits what can be done. The other approach is to do the processing on the server-side, either via a run-time API or during a compilation step, and only present the result to the client.

I opted for the latter approach. But that raised another question: how?

I could just run a script that takes the PDF and puts the results in a folder and use the file-loader to use the images on the client-side. But I wanted something without intermediary files.

Webpack loaders offer a way to write any custom processing step between the source file and the result the frontend gets and it runs on the server-side as part of the compilation. It's exactly what I needed: put the PDF next to the client sources and get the images ready-to-use in the frontend code.

Unfortunately, writing a loader is not documented that much. There are some resources, but some pieces of the puzzle are missing. This article is about my experiences of writing a custom loader that processes an input file using an external program.

Loaders in Webpack

A loader is a function that gets some input data, usually a file or the output of another loader, and returns something.

Loaders load files in frontend code. For example, consider this import:

import Styles from 'style-loader!css-loader?modules!./styles.css';

It reads the style.css, feeds it to the css-loader, then what comes out of it is the input for the style-loader. The last one outputs a JS module which will be the Styles value.

This sequential processing of loaders is roughly equivalent to UNIX pipes:

cat styles.css | css-loader | style-loader

Apart from producing a value, loaders can also emit files that Webpack will include in the result bundle. This is how file-loader works. It emits a file and returns the path so that the frontend code knows where to fetch it:

import lambdaIcon from "aws-svg-icons/lib/Arch_Compute/64/Arch_AWS-Lambda_64.svg";

<img src={lambdaIcon}/>

During compilation, Webpack emits a file for the svg and the lambdaIcon value is the path to it. In the browser, the image will be a regular svg file loaded from a remote location.

Loaders run during the Webpack build process. They can use the full power of Node.js and all the programs available in the system. You can run graphviz to generate graphs, MozJPEG to optimize images, or pdftocairo to create PNGs out of PDFs. As a compilation-time step, whatever you do here won't slow down the frontend clients. All visitors can see is the result and none of the work that it required.

Developing Webpack loaders

As a loader is a mapper function, it gets a source and returns the output:

module.exports = function (source) {
	const results = "Hello world!";

	return `export default ${JSON.stringify(results)}`;
}

The last loader in the chain should return a JS module so that when the client-side app imports it, it will be in a suitable format. That's what the export default ${JSON.stringify()} part does.

The documentation shows a way to use the loader without the need for a separate package:

rules: [
	{
		test: /\.js$/,
		use: [
			{
				loader: path.resolve('path/to/loader.js')
			},
		],
	},
]

In my experience, it's easier to start with a separate npm package from the start. First, you can use it inline, such as:

import pdf from "pdf-loader!./my.pdf";

Second, it's likely the custom loader needs some dependencies and by having a separate package.json the libraries are separated.

With npm link it's super easy to develop an npm package locally. In the loader's folder use npm link to create a linkable project. Then in the webapp, use npm link <packagename> to setup the symlink. And that's it, from this moment it's like you'd npm installed the project every time it's modified.

Raw mode

Webpack converts the input to a string, which is great for text files, but not for binaries. To get the contents without conversions, exports raw:

module.exports.raw = true;

With this, the source will be a Buffer with the file's contents.

Async mode

Loaders are synchronous by default and you need to switch them to async mode if you need to use await. Doing this provides a callback that you need to use with the result. This feels like an old approach, especially as Promises and async/await are mainstream now, but fortunately it's not hard to convert to a modern structure:

module.exports = function (source) {
	const callback = this.async();
	(async () => {

		// calculate the results

		return `export default ${JSON.stringify(results)}`;
	})().then((res) => callback(undefined, res), (err) => callback(err));
};

With this boilerplate, you can use await in the function body and errors are also propagated properly.

Process and emit files

To generate the image files from the input PDF I used the node-pdftocairo project. It operates on Buffers and returns an array of images:

const {input} = require("node-pdftocairo");

module.exports = function (source) {
	// ...

	const images = await input(source, {format: "png"}).output();
	// images: Buffer[]
};

These images have to be included in the Webpack output so that the frontend can load them. And the most critical part of that is naming them.

Emitted files should be unique. Moreover, they should be revved to provide cache-busting.

Revving means the hash of the contents of the file is in its name. This allows perfect caching as the same filename is guaranteed to have the same content while different files have different contents. Built-in loaders emit files this way by default and it boosts client-side performance especially for returning visitors.

There is a separate package called loader-utils that provides an interpolateName function. It makes it easy to include the hash in the filename:

const filename = loaderUtils.interpolateName(this, `[name]-page${i}-[contenthash].png`, {content: file});

The [contenthash] is the hash of the content argument.

Emitting the file is a simple call with the filename and the contents:

this.emitFile(filename, file);

This includes the images in the output bundle. But the frontend also needs to know from where to load them. This is where the result of the loader is useful. But there is one detail we need to look into first.

The frontend needs the filenames of the emitted files but also their paths relative to the webapp URL. After some searching, I found references to a variable called __webpack_public_path__, but it seems like the more supported way is to use the process.env.ASSET_PATH.

With the asset path and the filename, the only task left is to export the results as a JS module:

const ASSET_PATH = process.env.ASSET_PATH || "/";
const results = `${ASSET_PATH}${filename}`;

return `export default ${JSON.stringify(results)}`;

On the frontend-side, the result of the import is the path to the image. For example, a React app can show the image:

import pdf from "pdf-loader!./my.pdf";

<img src={pdf}/>

In my case, I needed to output multiple images, but that's just a matter of emitting and outputting multiple files:

const imageNames = images.map((file, i) => {
	return loaderUtils.interpolateName(this, `[name]-page${i}-[contenthash].png`, {content: file});
});

images.forEach((file, i) => {
	this.emitFile(imageNames[i], file);
});
const results = imageNames.map((imageName, i) => {
	return `${ASSET_PATH}${imageName}`;
});
return `export default ${JSON.stringify(results)}`;

And the frontend-side gets an Array of image paths:

import pdf from "pdf-loader!./my.pdf";

// first page
<img src={pdf[0]}/>

Conclusion

The above steps form the basis of writing a custom Webpack loader. With these, you're able to run any code during the compilation and emit files and use the results in the frontend.

March 23, 2021
In this article