Better collection processing with collection pipelines
Why moving past loops is an important step
Collections, collections everywhere
Collection processing is an everyday task. So much so, most of the program logic is about transforming, searching, ordering data. Mastering it, therefore, is an essential skill to move up the programmer ladder.
When you work on a leaderboard that shows some data of users ordered by score, or on a company dashboard that shows the headlines of the latest news, or even on a game of chess that draws the board, the core of them is to process collections.
Learn the basics, and how to do it right from this series.
The for
loop
The traditional approach — i.e., how everyone did it 20 years ago — is to use a for
loop.
The typical scenario is to have an array of elements, and want to do something with each of them. Like adding three (Try it):
const array = [5, 10, 15];
for (let i = 0; i < array.length; i++) {
array[i] += 3;
}
console.log(array); // 8, 13, 18
The problem
This seems like a good solution, but unfortunately, there are many problems with this approach.
The first is the side effect: when this code runs, it modifies the original array. This problem is not immediately apparent, and it is ostensibly the best solution.
But when you pass the same array around, it quickly becomes a problem.
Consider the following example, where the first call modifies the array, and thus the second function returns the wrong result (Try it):
const users = [
{active: true},
{active: true},
{active: false}
];
const active = countActiveUsers(users); // 2
const all = countAllUsers(users); // 2 !!!
If the first function call removes non-active users, the second one can no longer count the total.
While changing the order of the calls solves this particular problem, it also creates a constraint. For larger codebases, these constraints accumulate, and simple changes might break totally unrelated parts of the program.
Ever felt that no matter what you change, something breaks? Now you know one quite common cause.
Avoiding side effects
Let's move on to step two! How to solve the side-effect problem?
Do not modify the original array, but build a new one (Try it):
const array = [5, 10, 15];
const result = [];
for (let num of array) {
result.push(num + 3);
}
console.log(result); // 8, 13, 18
Note: Since the index
is no longer used, the for..of
loop provides a shorter version.
This approach avoids the side effect. Problem solved.
Runaway complexity
The other problem is that for
loops can quickly get out of control. This is especially true for nested loops,
the truer the more nested they are (Try it):
const array = [5, 10, 15];
const result = [];
for (let num of array) {
if (num % 2 !== 0) {
result.push(num < 10 ? num * 2 : num);
}
}
console.log(result); // 10, 15
The above code keeps only the odd numbers and doubles those that are less than 10. Simple specification, but it is not immediately obvious from looking at the code.
A one-page-long, nested to multiple layers for
loop is usually enough to survive eternally, as no one dares
to refactor it; and the brave souls who do fail spectacularly.
Towards a solution
Fortunately, most real-world problems require only a handful of operations. And by combining them, most of the use cases can be covered (you'd be surprised by how many!).
We'll look into two of these basic building blocks: the map
and the filter
.
The map
function
The map
is the one we've already used. It transforms the elements of a collection and returns a new one.
The tricky part is how to write a generalized function that can transform the elements in every possible way?
The solution is the iteratee function. The map
gets a function as one of its parameters that does the transformation.
It's signature is (element) => newElement
.
A simple map implementation (Try it):
const map = (coll, iter) => {
const result = [];
for (let e of coll) {
result.push(iter(e));
}
return result;
}
And its usage:
const array = [5, 10, 15];
map(array, (i) => i + 3); // 8, 13, 18
The filter
function
filter
makes a subarray, using a predicate. The predicate follows the same principles as the iteratee for the map
,
but its signature is (element) => bool
. Only the elements the predicate returns truthy for will be present in the
result.
A simple filter
implementation is:
const filter = (coll, iter) => {
const result = [];
for (let e of coll) {
if(iter(e)){
result.push(e);
}
}
return result;
}
And its usage:
const array = [5, 10, 15];
filter(array, (i) => i % 2 === 0); // 10
Composition
Now that we have the building blocks, let's consider how to compose them for more complex processing!
Just call them one after the other, passing the intermediary result (Try it):
const array = [5, 10, 15];
filter(
map(
array,
(i) => i + 3
),
(i) => i % 2 === 0
) // 8, 18
Well, this looks ugly, not to mention that it is written backward.
Instead, store the intermediary results:
const array = [5, 10, 15];
const mapped = map(array, (i) => i + 3); // 8, 13, 18
const filtered = filter(mapped, (i) => i % 2 === 0);
console.log(filtered); // 8, 18
Collection pipelines
Usually, you don't need the intermediary results, and their only purpose is to prevent writing the operations backward. As it's error-prone to have many one-shot variables around, it would be better to define multiple processing steps and just pass in the source collection and get back the result.
These are collection pipelines.
Implementations
There are several ways to define collection pipelines. In the next episodes in this series, we'll look into the different implementations, and detail their strengths and weaknesses.