Note that some Linux commands have options which as a side-effect suppress some failures, e.g.
“mkdir -p” and “rm -f”.
Also note, that the “errexit” mode, while a valuable first line of defense, does not catch all failures, i.e. under certain circumstances failing commands will go undetected.
(For more info, have a look at this thread.)
A reader suggested the additional use of "set -o pipefail"
Functions
Bash lets you define functions which behave like other commands -- use them liberally; it will give your bash scripts a much needed boost in readability:
ExtractBashComments() {
egrep "^#"
}
cat myscript.sh | ExtractBashComments | wc
comments=$(ExtractBashComments < myscript.sh)
Some more instructive examples:
SumLines() { # iterating over stdin - similar to awk
local sum=0
local line=””
while read line ; do
sum=$((${sum} + ${line}))
done
echo ${sum}
}
SumLines < data_one_number_per_line.txt
log() { # classic logger
local prefix="[$(date +%Y/%m/%d\ %H:%M:%S)]: "
echo "${prefix} $@" >&2
}
log "INFO" "a message"
Try moving all bash code into functions leaving
only global variable/constant definitions and a call to “main” at the top-level.
Variable Annotations
Bash allows for a limited form of variable annotations. The most important ones are:
- local (for local variables inside a function)
- readonly (for read-only variables)
# a useful idiom: DEFAULT_VAL can be overwritten
# with an environment variable of the same name
readonly DEFAULT_VAL=${DEFAULT_VAL:-7}
myfunc() {
# initialize a local variable with the global default
local some_var=${DEFAULT_VAL}
...
}
Note that it is possible to make a variable
read-only that wasn't before:
x=5
x=6
readonly x
x=7 # failure
Strive to annotate almost all variables in a bash script with either local or readonly.
Favor $() over backticks (`)
Backticks are hard to read and in some fonts easily confused with single quotes.
$()also permits nesting without the quoting headaches.
# both commands below print out: A-B-C-D
echo "A-`echo B-\`echo C-\\\`echo D\\\`\``"
echo "A-$(echo B-$(echo C-$(echo D)))"
Favor [[]] (double brackets) over []
[[]] avoids problems like unexpected pathname expansion, offers some syntactical improvements,
and adds new functionality:
Operator Meaning
|| logical or (double brackets only)
&& logical and (double brackets only)
< string comparison (no escaping necessary within double brackets)
-lt numerical comparison
= string matching with globbing
== string matching with globbing (double brackets only, see below)
=~ string matching with regular expressions (double brackets only , see below)
-n string is non-empty
-z string is empty
-eq numerical equality
-ne numerical inequality
single bracket
[ "${name}" \> "a" -o ${name} \< "m" ]
double brackets
[[ "${name}" > "a" && "${name}" < "m" ]]
Regular Expressions/Globbing
These new capabilities within double brackets are best illustrated via examples:
t="abc123"
[[ "$t" == abc* ]] # true (globbing)
[[ "$t" == "abc*" ]] # false (literal matching)
[[ "$t" =~ [abc]+[123]+ ]] # true (regular expression)
[[ "$t" =~ "abc*" ]] # false (literal matching)
Note, that starting with bash version 3.2 the regular or globbing expression
must not be quoted. If your expression contains whitespace you can store it in a variable:
r="a b+"
[[ "a bbb" =~ $r ]] # true
Avoiding Temporary Files
Some commands expect filenames as parameters so straightforward pipelining does not work.
This is where
<() operator comes in handy as it takes a command and transforms it into something
which can be used as a filename:
# download and diff two webpages
diff <(wget -O - url1) <(wget -O - url2)
Also useful are "here documents" which allow arbitrary multi-line string to be passed
in on stdin. The two occurrences of 'MARKER' brackets the document.
'MARKER' can be any text.
# DELIMITER is an arbitrary string
command << MARKER
...
${var}
$(cmd)
...
MARKER
If parameter substitution is undesirable simply put quotes around the first occurrence of MARKER:
command << 'MARKER'
...
no substitution is happening here.
$ (dollar sign) is passed through verbatim.
...
MARKER
Built-In Variables
For reference
$0 name of the script
$
n positional parameters to script/function
$$
PID of the script
$!
PID of the last command executed (and run in the background)
$?
exit status of the last command (${PIPESTATUS} for pipelined commands)
$#
number of parameters to script/function
$@ all parameters to script/function (sees arguments as separate word)
$* all parameters to script/function (sees arguments as single word)
Note
$* is
rarely the right choice.
$@ handles empty parameter list and white-space within parameters correctly
$@ should usually be quoted like so "$@"
Debugging
To perform a syntax check/dry run of your bash script run:
bash -n myscript.sh
To produce a trace of every command executed run:
bash -v myscripts.sh
To produce a trace of the expanded command use:
bash -x myscript.sh
-v and -x can also be made permanent by adding
set -o verbose and set -o xtrace to the script prolog.
This might be useful if the script is run on a remote machine, e.g.
a build-bot and you are logging the output for remote inspection.
Signs you should not be using a bash script
- your script is longer than a few hundred lines of code
- you need data structures beyond simple arrays
- you have a hard time working around quoting issues
- you do a lot of string manipulation
- you do not have much need for invoking other programs or pipe-lining them
- you worry about performance
Instead consider scripting languages like Python or Ruby.
References
Thanks to Peter Brinkmann and Kim Hazelwood for their feedback on drafts of this post.