Collections, collections everywhere
Collection processing is an everyday task. So much so, most of the program logic is about transforming, searching, ordering data. Mastering it, therefore, is an essential skill to move up the programmer ladder.
When you work on a leaderboard that shows some data of users ordered by score, or on a company dashboard that shows the headlines of the latest news, or even on a game of chess that draws the board, the core of them is to process collections.
Learn the basics, and how to do it right from this series.
The traditional approach — i.e., how everyone did it 20 years ago — is to use a
The typical scenario is to have an array of elements, and want to do something with each of them. Like adding three (Try it):
We write articles like this regularly. Join our mailing list and let's keep in touch.
This seems like a good solution, but unfortunately, there are many problems with this approach.
The first is the side effect: when this code runs, it modifies the original array. This problem is not immediately apparent, and it is ostensibly the best solution.
But when you pass the same array around, it quickly becomes a problem.
Consider the following example, where the first call modifies the array, and thus the second function returns the wrong result (Try it):
If the first function call removes non-active users, the second one can no longer count the total.
While changing the order of the calls solves this particular problem, it also creates a constraint. For larger codebases, these constraints accumulate, and simple changes might break totally unrelated parts of the program.
Ever felt that no matter what you change, something breaks? Now you know one quite common cause.
Avoiding side effects
Let’s move on to step two! How to solve the side-effect problem?
Do not modify the original array, but build a new one (Try it):
Note: Since the
index is no longer used, the
for..of loop provides a shorter version.
This approach avoids the side effect. Problem solved.
The other problem is that
for loops can quickly get out of control. This is especially true for nested loops, the truer the more nested they are (Try it):
The above code keeps only the odd numbers and doubles those that are less than 10. Simple specification, but it is not immediately obvious from looking at the code.
A one-page-long, nested to multiple layers
for loop is usually enough to survive eternally, as no one dares to refactor it; and the brave souls who do fail spectacularly.
Towards a solution
Fortunately, most real-world problems require only a handful of operations. And by combining them, most of the use cases can be covered (you’d be surprised by how many!).
We’ll look into two of these basic building blocks: the
map and the
map is the one we’ve already used. It transforms the elements of a collection and returns a new one.
The tricky part is how to write a generalized function that can transform the elements in every possible way?
The solution is the iteratee function. The
map gets a function as one of its parameters that does the transformation. It’s signature is
(element) => newElement.
A simple map implementation (Try it):
And its usage:
filter makes a subarray, using a predicate. The predicate follows the same principles as the iteratee for the
map, but its signature is
(element) => bool. Only the elements the predicate returns truthy for will be present in the result.
filter implementation is:
And its usage:
Now that we have the building blocks, let’s consider how to compose them for more complex processing!
Just call them one after the other, passing the intermediary result (Try it):
Well, this looks ugly, not to mention that it is written backward.
Instead, store the intermediary results:
Usually, you don’t need the intermediary results, and their only purpose is to prevent writing the operations backward. As it’s error-prone to have many one-shot variables around, it would be better to define multiple processing steps and just pass in the source collection and get back the result.
These are collection pipelines.
There are several ways to define collection pipelines. In the next episodes in this series, we’ll look into the different implementations, and detail their strengths and weaknesses.