I dislike shell scripting

2023-02-08

I do a lot of shell scripting for Linux systems, and I have never liked it. By shell scripting, I mean writing small programs in shell script interpreted by dash (POSIX), bash or zsh. And since the terms are really fuzzy, from now on I will call shell script the common language interpreted by dash, bash or zsh, and I will call shell scripting the act of using shell script.

Let me explain why I do shell scripting, show you the flaws which make me dislike it, and show you some traps to validate my dislike.

Why do I do shell scripting?

I like to automatize things on my computer as much as possible, and I am also paid for doing that as a software developer. Since I am using Linux (or GNU/Linux if you prefer), the best way to do that is to do shell scripting instead of using another tool.

The first reason I use shell script is for its availability. I know when I start my scripts with #!/bin/sh that they will certainly run on any Linux systems. This is not the case if I use Python or Tcl.

The second reason is its lack of breaking changes. If I write a script once, especially in POSIX, it will certainly be valid later. This is not the case with Python.

The third reason is for its speed. Shell scripts often run faster than Python scripts.

All those reasons are excellent reasons to use shell script. Now let's talk about its flaws which make me dislike it.

Flaws

I show here the most obvious flaws of shell script. If you know shell script well, you may be more interested by seeing some traps I describe in the next section.

The syntax

Once you write shell scripts with the motto everything is a command, then writing scripts can be an enjoyable process. But before you have that in mind, you will certainly be confused to discover that the following scripts are incorrect.

This mistake is the one I was doing the most when I started shell scripting since I had a C background.

#!/bin/sh
my_var="foo"
if["$my_var" = "foo"]; then
	echo "my_var is foo"
fi

If you use this script, you will have a very non explicit error message.

3: Syntax error: "then" unexpected

Contrary to C in which parenthesis are punctuations and does not require spaces, in shell script the character [ is command and not a punctuation and you can check its manual by entering man [ in a terminal. The whole if["$my_var" is interpreted as a command. When you have the motto everything is a command, you stop making this mistake because you are more careful to separate the commands.

Talking about mistakes due to spaces, here another common rookie mistake.

print-argument

#!/bin/sh
if [ $1 = 'the argument' ]; then
	echo 'Your first argument is "the argument".'
fi

Execute the script like that.

./print-argument "the argument"

And you get this unexpected and weird error message.

./print-argument: 2: [: the: unexpected operator

Why? Because you forgot to put quotes around $1 so it was expanded into the argument. The interpreter was not expecting here the command the after the command [.

I cannot imagine how many hours was wasted in software development with this particular mistake. This mistake is so common that the common advice given to users is to never use spaces into a file name. This is insane: it should not be to the user to adapt its behaviour for the programming language, it should be the other way around!

Very loose data structure

In shell script basically the only data structure that exists is the string. There is no structure, no associative array (or dictionary or hash map), no set, and the array structure in Bash looks more like a hack than something else.

The lack of data structure is so ridiculous that you often have to use silly hacks like echo "$my_var" | cut -f2 to group and access multiple data in a variable!

If you need something a bit more elaborate in your script than a list of instructions with strings, you should use another language. And if you choose another language, do not choose Awk if you need an associative array inside an associative array!

Inconsistency

When you search online for help, most often than not you find things for Bash but not for POSIX. But POSIX, Bash and Zsh all share a lot of similarity but have a lot of subtle differences. Do not expect read, set, trap or even test instructions to work the same everywhere.

Even if you stick with one shell script flavour you can have issue. For example with Zsh, if you put in a Zsh script echo "Hello World!" it will print Hello World! fine, but if you do it in your terminal it will complain about a missing double quote!

And because shell script language is so limited, you must rely a lot on the software installed on the system. You must ask yourself if your script is executed on a UNIX system or a GNU/Linux system. When you use awk, is it POSIX awk or GNU awk? When you use grep, is it POSIX grep or GNU grep? It is so easy to have a script that works on my machine but cannot be used in production.

Some traps

Now let see some fun traps that I encountered. They prove that you cannot spell shell without hell.

Using `trap` in a function

The first trap I want to show includes trap! I use only Bash here because the code is not compatible with dash.

You know that when a function returns or a command exits with 0, it means true. And with another value, it means false.

To have the return value of the last executed function or the exit value of the last executed command, you can check the variable $?. If you want to check in an if if means true or false, you can execute the function or the command directly in the if. Here is the example from the ShellCheck SC2181 error code wiki page. (You should use ShellCheck to check your shell scripts!)

make mytarget
if [ $? -ne 0 ]; then
  echo "Build failed"
fi

# The same behaviour (spoiler alert: it is not!):
if ! make mytarget; then
  echo "Build failed"
fi

To show you the trap here, I create a function named return_false which returns 1.

#/bin/bash
return_false() {
	true
	return 1
}
return_false
echo "$?"
if return_false; then
	echo "This is true."
else
	echo "This is false."
fi

This code has the expected behaviour when executed.

1
This is false.

Now I want a more elaborate function. I want this function to return 1 only if a command in it returns something other than 0. If no command fails, then the function does not fail so it returns 0. To do that I use trap.

#!/bin/bash
return_false() {
	local old_ERR_trap=$(trap -p ERR)
	if [[ -z $old_ERR_trap ]]; then old_ERR_trap="trap - ERR"; fi
	trap 'local ret=$?; eval "$old_ERR_trap"; [[ $ret -ne 0 ]] && to_return=1' ERR
	to_return=0
	false
	true
	return $to_return
}
return_false
echo "$?"
if return_false; then
	echo "This is true."
else
	echo "This is false."
fi

And here is the strange behaviour.

1
This is true.

As you can see, even if the function returns 1, it is interpreted as 0 if it is checked directly in the if. Fortunately this problem does not happen when trap is used in another script, it only happens when it is used in a function.

As always, do not blindly follow the advices of a linter.

Using `read` in `read`

Let's say that I have a file named lines.txt containing the following.

lines.txt

Line 1
Line 2
Line 3
Line 4

I want to read this file line by line in a script named read-lines. The first method I find on internet, and the most common one, is this one.

read-lines

#!/bin/sh
while IFS= read -r line; do
	printf '%s\n' "$line"
done < lines.txt

The script prints the content of lines.txt. Now I want to ask the user if a line should be printed. I create a script named ask-confirmation which exits with 0 if the user confirms by pressing y, else 1.

ask-confirmation

#!/bin/sh
printf '%s (y/N) ' "$1"
read -r result
[ "$result" = 'y' ]

Now I implement this small script in read-lines. I test it before really implementing it. First I put some fake code.

read-lines

#!/bin/sh

line_number=1
line="This is a fake line just to test."
if ./ask-confirmation "Do you want to print line #$line_number?"; then
	printf '\t%s\n' "$line"
fi

# while IFS= read -r line; do
	# printf '%s\n' "$line"
# 	line_number=$((line_number+1))
# done < lines.txt

And then I test it. (Lines starting with $ are commands that I executed in my terminal.)

$ ./read-lines
Do you want to print line #1? (y/N) y
	This is a fake line just to test.
$ ./read-lines
Do you want to print line #1? (y/N) N
$ ./read-lines
Do you want to print line #1? (y/N) 
$

Great, it works as expected so now I can implement that for real!

read-lines

#!/bin/sh
line_number=1
while IFS= read -r line; do
	line="This is a fake line just to test."
	if ./ask-confirmation "Do you want to print line #$line_number?"; then
		printf '\t%s\n' "$line"
	fi
	line_number=$((line_number+1))
done < lines.txt

And now here is the result for my biggest disappointment. (The character % at the end means that it was prompted without a new line character at the end.)

$ ./read-lines
Do you want to print line #1? (y/N) Do you want to print line #2? (y/N) %
$

As you can see, I could not even insert something. This is the typical behaviour when you use a read inside a read. And this is so easy to have this issue when you use the recommended way to read a file line by line. It also happens when in your loop you run a program prompting for an action (fossil sync, git push...).

Using `cd` in a script

Let's say that I have a project, and in this project I have a directory with a multiple scripts. One of this script is used to build the project. I have this file hierarchy.

Project root:     /tmp/My-Project/
Project scripts:  /tmp/My-Project/scripts/
Build script:     /tmp/My-Project/scripts/build-project

I want to be able to call build-project from anywhere in my system. Here is the header of the script.

build-project

#!/bin/sh
echo "dirname is: $(dirname "$0")"
echo "Project root path is: $(cd "$(dirname "$0")"/.. && pwd -P)"

When I am in /tmp/My-Project, running the script as ./scripts/build-project returns the following.

dirname is: ./scripts
Project root path is: /tmp/My-Project

Now most of the scripts used by build-project are in the scripts directory, and I do not want to prefix all of them with the project root path! So I decide to use cd in the script so my script is executed from /tmp/My-Project/scripts.

build-project

#!/bin/sh
cd "$(dirname "$0")"
echo "dirname is: $(dirname "$0")"
echo "Project root path is: $(cd "$(dirname "$0")"/.. && pwd -P)"

But now when I am in /tmp/My-Project just as before, running the script as ./scripts/build-project messes up the result for the project root path even if the dirname is the same!

dirname is: ./scripts
Project root path is: /tmp/My-Project/scripts

I do not know why this problem happens. All I know is that I should get the project root path before any use of cd in the script.

Argument expansion with `echo` and `printf`

For this example I create a script named print-args which prints the arguments. I always execute this script this way.

./print-args "a b c" "x y z"

To access the list of arguments, you can use either $* or $@. I use echo to print these variables.

print-args

#!/bin/sh
echo "\$@: \"$@\""
echo "\$*: \"$*\""

It returns what I expect.

$@: "a b c x y z"
$*: "a b c x y z"

Using echo is not ideal: I have to escape special characters and it is not very portable. I change my code to use printf.

print-args

#!/bin/sh
printf '$@: %s\n' "$@"
printf '$*: %s\n' "$*"

And now I get a completely unexpected behaviour!

$@: a b c
$@: x y z
$*: a b c x y z

Why suddenly did the behaviour change for $@? The answer is explained in the manual of dash.

$*           Expands to the positional parameters, starting from one.
             When the expansion occurs within a double-quoted string it
             expands to a single field with the value of each parameter
             separated by the first character of the IFS variable, or by
             a ⟨space⟩ if IFS is unset.

$@           Expands to the positional parameters, starting from one.
             When the expansion occurs within double-quotes, each posi‐
             tional parameter expands as a separate argument.  If there
             are no positional parameters, the expansion of @ generates
             zero arguments, even when @ is double-quoted.  What this ba‐
             sically means, for example, is if $1 is “abc” and $2 is “def
             ghi”, then "$@" expands to the two arguments:

                   "abc" "def ghi"

This means that this line.

printf '$@: %s\n' "$@"

Is equivalent in this example to this line.

printf '$@: %s\n' "a b c" "x y z"

This is obvious when this is explicitly described just like here. But imagine you are not a seasoned shell script developer, you do not really know the difference between $* and $@. Maybe you would write the following code.

if [ $# -ge 2 ]; then
	printf 'The second argument in "%s" is "%s".\n' "$@" "$2"
fi

Then the result is certainly not what you expected.

The second argument in "a b c" is "x y z".
The second argument in "x y z" is "".

Conclusion

After years of experience, I still have to regularly make small scripts just to check the behaviour of the language before implementing code in my scripts. This is ridiculous that I still have to do that.

When I write a script, I want to write it as fast and as correct as possible. All those traps and all the weirdness of shell script reduce a lot my productivity.

And unfortunately for the moment, shell script is the best language I can use to make portable, reliable and fast scripts. So I will continue to use it, even if I dislike shell scripting.

I dislike shell scripting

Why do I do shell scripting?

Flaws

The syntax

Very loose data structure

Inconsistency

Some traps

Using trap in a function

Using read in read

Using cd in a script

Argument expansion with echo and printf

Conclusion

Using `trap` in a function

Using `read` in `read`

Using `cd` in a script

Argument expansion with `echo` and `printf`