The tips and tricks below originally appeared as one of Google's "Testing on the Toilet" (TOTT) episodes.
This is a revised and augmented version.
Safer Scripting
I start every bash script with the following prolog:
If a failing command is to be tolerated use this idiom:
“mkdir -p” and “rm -f”.
Also note, that the “errexit” mode, while a valuable first line of defense, does not catch all failures, i.e. under certain circumstances failing commands will go undetected.
(For more info, have a look at this thread.)
A reader suggested the additional use of "set -o pipefail"
Some more instructive examples:
Try moving all bash code into functions leaving only global variable/constant definitions and a call to “main” at the top-level.
Strive to annotate almost all variables in a bash script with either local or readonly.
$()also permits nesting without the quoting headaches.
and adds new functionality:
Operator Meaning
|| logical or (double brackets only)
&& logical and (double brackets only)
< string comparison (no escaping necessary within double brackets)
-lt numerical comparison
= string matching with globbing
== string matching with globbing (double brackets only, see below)
=~ string matching with regular expressions (double brackets only , see below)
-n string is non-empty
-z string is empty
-eq numerical equality
-ne numerical inequality
single bracket
double brackets
These new capabilities within double brackets are best illustrated via examples:
must not be quoted. If your expression contains whitespace you can store it in a variable:
This is where <() operator comes in handy as it takes a command and transforms it into something
which can be used as a filename:
in on stdin. The two occurrences of 'MARKER' brackets the document.
'MARKER' can be any text.
If parameter substitution is undesirable simply put quotes around the first occurrence of MARKER:
$n positional parameters to script/function
$$ PID of the script
$! PID of the last command executed (and run in the background)
$? exit status of the last command (${PIPESTATUS} for pipelined commands)
$# number of parameters to script/function
$@ all parameters to script/function (sees arguments as separate word)
$* all parameters to script/function (sees arguments as single word)
$@ handles empty parameter list and white-space within parameters correctly
$@ should usually be quoted like so "$@"
bash -n myscript.sh
To produce a trace of every command executed run:
bash -v myscripts.sh
To produce a trace of the expanded command use:
bash -x myscript.sh
-v and -x can also be made permanent by adding
set -o verbose and set -o xtrace to the script prolog.
This might be useful if the script is run on a remote machine, e.g.
a build-bot and you are logging the output for remote inspection.
#!/bin/bashThis will take care of two very common errors:
set -o nounset
set -o errexit
- Referencing undefined variables (which default to "")
- Ignoring failing commands
If a failing command is to be tolerated use this idiom:
if ! <possible failing command> ; then
echo "failure ignored"
fi
Note that some Linux commands have options which as a side-effect suppress some failures, e.g.
“mkdir -p” and “rm -f”.
Also note, that the “errexit” mode, while a valuable first line of defense, does not catch all failures, i.e. under certain circumstances failing commands will go undetected.
(For more info, have a look at this thread.)
A reader suggested the additional use of "set -o pipefail"
Functions
Bash lets you define functions which behave like other commands -- use them liberally; it will give your bash scripts a much needed boost in readability:ExtractBashComments() {
egrep "^#"
}
cat myscript.sh | ExtractBashComments | wc
comments=$(ExtractBashComments < myscript.sh)
Some more instructive examples:
SumLines() { # iterating over stdin - similar to awk
local sum=0
local line=””
while read line ; do
sum=$((${sum} + ${line}))
done
echo ${sum}
}
SumLines < data_one_number_per_line.txt
log() { # classic logger
local prefix="[$(date +%Y/%m/%d\ %H:%M:%S)]: "
echo "${prefix} $@" >&2
}
log "INFO" "a message"
Try moving all bash code into functions leaving only global variable/constant definitions and a call to “main” at the top-level.
Variable Annotations
Bash allows for a limited form of variable annotations. The most important ones are:- local (for local variables inside a function)
- readonly (for read-only variables)
# a useful idiom: DEFAULT_VAL can be overwritten
# with an environment variable of the same name
readonly DEFAULT_VAL=${DEFAULT_VAL:-7}
myfunc() {Note that it is possible to make a variable read-only that wasn't before:
# initialize a local variable with the global default
local some_var=${DEFAULT_VAL}
...
}
x=5
x=6
readonly x
x=7 # failure
Strive to annotate almost all variables in a bash script with either local or readonly.
Favor $() over backticks (`)
Backticks are hard to read and in some fonts easily confused with single quotes.$()also permits nesting without the quoting headaches.
# both commands below print out: A-B-C-D
echo "A-`echo B-\`echo C-\\\`echo D\\\`\``"
echo "A-$(echo B-$(echo C-$(echo D)))"
Favor [[]] (double brackets) over []
[[]] avoids problems like unexpected pathname expansion, offers some syntactical improvements,and adds new functionality:
Operator Meaning
|| logical or (double brackets only)
&& logical and (double brackets only)
< string comparison (no escaping necessary within double brackets)
-lt numerical comparison
= string matching with globbing
== string matching with globbing (double brackets only, see below)
=~ string matching with regular expressions (double brackets only , see below)
-n string is non-empty
-z string is empty
-eq numerical equality
-ne numerical inequality
single bracket
[ "${name}" \> "a" -o ${name} \< "m" ]
double brackets
[[ "${name}" > "a" && "${name}" < "m" ]]
Regular Expressions/Globbing
These new capabilities within double brackets are best illustrated via examples:
t="abc123"Note, that starting with bash version 3.2 the regular or globbing expression
[[ "$t" == abc* ]] # true (globbing)
[[ "$t" == "abc*" ]] # false (literal matching)
[[ "$t" =~ [abc]+[123]+ ]] # true (regular expression)
[[ "$t" =~ "abc*" ]] # false (literal matching)
must not be quoted. If your expression contains whitespace you can store it in a variable:
r="a b+"
[[ "a bbb" =~ $r ]] # true
Globbing based string matching is also available via the case statement:
Basics
Substitution (with globbing)
case $t in
abc*) <action> ;;
esac
String Manipulation
Bash has a number of (underappreciated) ways to manipulate strings.Basics
f="path1/path2/file.ext"
len="${#f}" # = 20 (string length)
# slicing: ${<var>:<start>} or ${<var>:<start>:<length>}
slice1="${f:6}" # = "path2/file.ext"
slice2="${f:6:5}" # = "path2"
slice3="${f: -8}" # = "file.ext"(Note: space before "-")
pos=6
len=5
slice4="${f:${pos}:${len}}" # = "path2"
Substitution (with globbing)
f="path1/path2/file.ext"
single_subst="${f/path?/x}" # = "x/path2/file.ext"
global_subst="${f//path?/x}" # = "x/x/file.ext"
# string splitting
readonly DIR_SEP="/"
array=(${f//${DIR_SEP}/ })
second_dir="${array[1]}" # = path2
Deletion at beginning/end (with globbing)
f="path1/path2/file.ext"
# deletion at string beginning extension="${f#*.}" # = "ext"
# greedy deletion at string beginning
filename="${f##*/}" # = "file.ext"
# deletion at string end
dirname="${f%/*}" # = "path1/path2"
# greedy deletion at end
root="${f%%/*}" # = "path1"
Avoiding Temporary Files
Some commands expect filenames as parameters so straightforward pipelining does not work.This is where <() operator comes in handy as it takes a command and transforms it into something
which can be used as a filename:
# download and diff two webpagesAlso useful are "here documents" which allow arbitrary multi-line string to be passed
diff <(wget -O - url1) <(wget -O - url2)
in on stdin. The two occurrences of 'MARKER' brackets the document.
'MARKER' can be any text.
# DELIMITER is an arbitrary string
command << MARKER
...
${var}
$(cmd)
...
MARKER
If parameter substitution is undesirable simply put quotes around the first occurrence of MARKER:
command << 'MARKER'
...
no substitution is happening here.
$ (dollar sign) is passed through verbatim.
...
MARKER
Built-In Variables
For reference
$0 name of the script$n positional parameters to script/function
$$ PID of the script
$! PID of the last command executed (and run in the background)
$? exit status of the last command (${PIPESTATUS} for pipelined commands)
$# number of parameters to script/function
$@ all parameters to script/function (sees arguments as separate word)
$* all parameters to script/function (sees arguments as single word)
Note
$* is rarely the right choice.$@ handles empty parameter list and white-space within parameters correctly
$@ should usually be quoted like so "$@"
Debugging
To perform a syntax check/dry run of your bash script run:bash -n myscript.sh
To produce a trace of every command executed run:
bash -v myscripts.sh
To produce a trace of the expanded command use:
bash -x myscript.sh
-v and -x can also be made permanent by adding
set -o verbose and set -o xtrace to the script prolog.
This might be useful if the script is run on a remote machine, e.g.
a build-bot and you are logging the output for remote inspection.
Signs you should not be using a bash script
- your script is longer than a few hundred lines of code
- you need data structures beyond simple arrays
- you have a hard time working around quoting issues
- you do a lot of string manipulation
- you do not have much need for invoking other programs or pipe-lining them
- you worry about performance
Instead consider scripting languages like Python or Ruby.
References
- Advanced Bash-Scripting Guide: http://tldp.org/LDP/abs/html/
- Bash Reference Manual
Thanks to Peter Brinkmann and Kim Hazelwood for their feedback on drafts of this post.