Function tests in Bash and the errexit mode
#!/bin/sh vs #!/bin/bash, POSIX mode and the inconsistencies of errexit
Leaking Bash options can cause serious headaches when
unit testing shell scripts
because sourcing a test file might completely change the behavior of the subsequent commands used in
the test. A leaking set -e
might be especially troublesome as it can cause the test case to terminate early.
However, there's one more problem with it: in some cases set -e
does not work with subshells.
How it affects testing
In unit testing it's common to invoke a function in a subshell to capture its output:
source myscript.sh # 'import' function definitions
result=$(myfunction) # execute function
... # verify the output
In fact most Bash testing frameworks --- such as Bats or bash_unit --- recommend this practice.
Due to inconsistencies related to the -e
option, it might be possible that when a function is
executed like this, it will not take -e
into account, not even when set -e
is defined in the sourced
script.
Example
Here is an example where this problem surfaces.
#!/bin/bash
set -e
function getResponseCode() {
local url=$1
local result
result=$(curl --fail -o /dev/null -s -w "%{http_code}\n" "${url}")
echo $result
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
getResponseCode "$url"
fi
The script queries a URL to print the response's HTTP result code. It is designed to terminate silently in case of errors, such as HTTP 404.
# A valid url returning HTTP 200
» ./query.sh "https://www.google.com"
200
» echo $? # check exit code
0 # 0 -> OK
# An URL that redirects to https://www.google.com
» ./query.sh "https://google.com"
301
» echo $? # check exit code
0 # 0 -> OK
# URL returning HTTP 404 not found
» ./query.sh "https://www.google.com/invalid-url-returning-404"
» echo $? # check exit code
22 # non-zero -> terminated with error
This behavior is achieved with two things:
- At the beginning of the script
set -e
enables Bash'serrexit
option. It causes the early termination of the script when a command returns with a non-zero exit code. - Curl uses the
--fail
flag, which makes curl to exit with code22
when it encounters typical 4XX and 5XX HTTP responses.
So far so good, works as expected. Let's write some function-level tests for getResponseCode
.
Unit tests on the loose
Let's jump directly into the negative test case where things get interesting. As a start, I wrote the following using bash_unit:
test_getResponseCode_should_be_silent_on_404() {
source query.sh
result=$(getResponseCode "https://www.google.com/invalid-url-returning-404")
assert_equals "" "${result}"
}
I expected this test case to fail because of sourcing query.sh
the test case inherits the script's errexit
option
and I knew that getResponseCode
will exit with 22
. However, it failed for a different reason:
Running test_getResponseCode_should_be_silent_on_404... FAILURE
expected [] but was [404]
😲 😱 😳
For some reason, in this test scenario getResponseCode
behaved differently than when it was called during
normal execution.
At first sight I suspected that maybe the test framework does something that negates the effects of errexit
, but the
problem persisted even when I removed it from the equation:
#!/bin/bash
source query.sh
result=$(getResponseCode "https://www.google.com/invalid-url-returning-404")
echo "${result}" # still printing 404 instead of failing early
The cause
Being puzzled, I've posted this problem to r/bash where experienced veterans revealed the truth.
This strange behavior is a quirk of Bash's errexit
option:
subshells do not inherit the -e
option
unless Bash is running in POSIX mode.
Subshells spawned to execute command substitutions inherit the value of the -e option from the parent shell. When not in posix mode, bash clears the -e option in such subshells.
Examining the Bash POSIX Mode manual
revealed that POSIX mode is not the default unless the script is invoked as sh
.
So what's going on here?
My original test case was executed via bash_unit. Here's how the source code of the test framework starts:
#!/usr/bin/env bash
...
When I replaced bash_unit with a plain Bash solution to load and exercise the function. If you scroll up to the end of the last section, you can see the full source code, but the important detail is in the first line:
#!/bin/bash
...
None of these are invoked as sh
... so can it be that POSIX mode was not enabled in these cases?
I've replaced the first line of the plain Bash solution to invoke sh
rather than bash
...
#!/bin/sh
...
... and the snippet started to behave as I expected.
Such small thing changes how Bash works big time. If you are interested in all the 59 ways how POSIX mode (or the lack of it) might change your script's behavior, just check the manual.
🙊 🙈 🙉
POSIX mode can be enabled explicitly with set -o posix
. Alternatively, it's possible to make subshells inherit -e
without
using POSIX mode with shopt -s inherit_errexit
, but that's only available in newer Bash versions.
Summary
First when I bumped into the Use Bash Strict Mode
article I started using -e
, as it seemed to bring the capabilities of Bash closer to a real programming language.
This might be true for shorter scripts, but for anything just a little bit more complex its downsides
start to appear.