NAME

Commands::Guarded - Better scripts through guarded commands

SYNOPSIS

  use Commands::Guarded;
  
  my $var = 0;
  
  step something =>
    ensure { $var == 1 }
    using { $var = 1 }
    ;  # $var is now 1
  
  step nothing =>
    ensure { $var == 1 }
    using { $var = 2 } # bug!
    ;  # $var is still 1 (good thing too)
  
  my $brokeUnless5 =
    step brokenUnless5 =>
    ensure { $var == 5 }
    using { $var = shift }
    ; # nothing happens yet
  
  print "var: $var\n"; # prints 1
  
  $brokeUnless5->do(5);
  
  print "now var: $var\n"; # prints 5
  
  step fail =>
    ensure { $var == 3 }
    using { $var = 2 }
    ; # Exception thrown here

DESCRIPTION

This module implements a deterministic, rectifying variant on Dijkstra's guarded commands. Each named step is passed two blocks: an "ensure" block that defines a test for a necessary and sufficient condition of the step, and a "using" block that will cause that condition to obtain. (If the "using" block is ommitted, the step acts as a simple assertion.)

If "step" is called in void context (i.e., is not assigned to anything or used as a value), the step is run immediately, as in this pseudocode:

  unless (ENSURE) {
    USING;
    die unless ENSURE;
  }

If "step" is called in scalar or array context, execution is deferred and instead a Commands::Guarded object is returned, which can be executed as above using the "do" method. If "do" is given arguments, they will be passed to the "ensure" block and (if necessary) the "using" block.

The interface to Commands::Guarded is thus a hybrid of exported subroutines (see SUBROUTINES below) and non-exported methods (see METHODS).

For a detailed discussion of the reason for this module's existence, see RATIONALE below.

SUBROUTINES

step NAME => EXPR...

Defines a new guarded command step. If called in void context, the step is executed immediately. If called in scalar or array context (i.e., in an expression or assignment), a Commands::Guarded object is returned (see METHODS below).

NAME is a string that will be printed on failure (also see "verbose" below).

EXPR is one or more Commands::Guarded blocks (see BLOCKS below). Typically at least a "ensure" and "using" block will be included.

Note that because "step" is a subroutine and not a control structure (though it acts like one in void context), it typically must be followed by a semicolon. It's recommended therefore to use the style

  step name =>
    ensure { ... }
    using { ... }
    ;

so as not to forget it.

verbose SCALAR

(Not exported by default.) If true, will print output not only on failure of a step, but also at the beginning of a step (i.e., after the "ensure" block is first run) indicating whether the condition failed ("Doing step name") or succeeded ("Skipping step name"). Also prints a message ("Step step name succeeded") if the "ensure" condition now obtains after running "using".

Whether or not "verbose" is set, an exception will be thrown if the condition fails to obtain after running "using", with the message "Step step name failed at line...".

Besides using this subroutine, the environment variable GUARDED_VERBOSE can also be used to control this behavior without modifying the code. GUARDED_VERBOSE will set the default behavior of "verbose"; when set to a true value, the script will run as if a verbose(1) were specified at the beginning. (A verbose(0) will always disable verbosity, no matter the value of GUARDED_VERBOSE.)

clear_rollbacks

(Not exported by default.) Clears rollbacks. See "rollback" in the section BLOCKS below.

BLOCKS

ensure BLOCK

Defines a test for the step. Should return true if the condition of the test has been met, false otherwise. It's common to write ensure blocks as a chain of boolean expressions:

  ensure { -d "$ENV{HOME}" and fgrep qr/^$userid:/, '/etc/passwd' }

but it is also possible to use "return" for more complicated tests:

  ensure {
    foreach my $dir (@dirs) {
      return 0 unless -d $dir;
    }
    return 1
  }

A true return from "ensure" will cause the script to continue execution. A false return can have two possible effects: it will run the step's "using" block, or, if the "using" block has already been run, it will throw an exception.

using BLOCK

Defines the code to affect the condition in "ensure". If the containing step's "ensure" block returns a false value, BLOCK will be run.

If the "using" block is omitted, the step will work as a simple assertion: if the "ensure" block returns a false value, an exception will be thrown.

sanity BLOCK

Defines a sanity check for a step. Like "ensure", BLOCK should define a condition. The condition is checked at the beginning of the enclosing step (prior to "ensure"), and again after running the "using" block (if the "using" block is run, of course). If it returns a false value, an exception is thrown with the message "Sanity check for step name failed".

Note given this behavior that a sanity check should specify an invariant condition, i.e. something you expect to be true whether or not the step has run with success or failure. For example:

  step removeScratch =>
    ensure { not fgrep qr|^\S*\s+/scratch|, '/etc/fstab' }
    using { ... }
    sanity { # Don't lose boot partition!
      fgrep qr|^\S*\s+/boot\s|, '/etc/fstab'
    }
    ;

rollback BLOCK

Defines a rollback action for the step. If this step, or any following step, fails (either through "ensure" verification or "sanity" check failure), the rollback will be run. If multiple rollbacks are defined, they will be run in LIFO (Last-In, First-Out) order.

Warning: if an exception ("die" or "croak") is thrown in your rollback, the script will stop and other rollbacks will not be called. If you truly intend to abort all previously set rollbacks, you should use "clear_rollbacks". You can (and probably should in most cases) call "clear_rollbacks" itself from within a "rollback" block:

  step clearRollbacks =>
    ensure { ... }
    using { ... }
    rollback { 
      clear_rollbacks;
      ...
    }
    ;

METHODS

->do
->do ARGS: Executes a step, possibly with arguments. If arguments are supplied, they will be passed to every block within the step. Note that the arguments are read-only within the block (i.e., attempting to modify an element of @_ will throw an exception), though you can use "shift", etc.
Some attempt is made to deal with return values, so you can get something approximating a reasonable result from "do" when the "using" block has executed. But the author has not found a real-world need for return values, so their behavior is not very well-defined. (Feel free to contact him if you believe you have a solution.)
->do_foreach LIST: For each item of LIST, check "ensure", passing the item as an argument. After all "ensure"s have been run, run "using" with those arguments whose "ensure" failed. Return values are not supported. At present, multiple arguments for each call are not supported, either (though you can certainly simulate that using a list-of-lists, if you write your blocks to take an arrayref).

UTILITY SUBROUTINES

These subroutines have nothing directly to do with the module, but they are so useful in conjunction with them, they have been included.

fgrep REGEX, SCALAR

Returns true if REGEX is found on any line of the file referenced by SCALAR. SCALAR can be a filehandle variable (not a bare filehandle) or a string, in which case it is opened. For instance:

  die "Load too high" 
    unless fgrep qr/averages: 0[.]/, '/usr/bin/uptime|';

Will throw an exception if the file cannot be opened for reading.

readf FILENAME

Returns a filehandle opened on FILENAME for reading. Will throw an exception if the file cannot be opened for reading.

writef FILENAME

Returns a filehandle opened on FILENAME for writing. Will throw an exception if the file cannot be opened for writing.

appendf

Returns a filehandle opened on FILENAME for appending. Will throw an exception if the file cannot be opened for appending.

RATIONALE

People often intuitively refer to some sorts of executables as "scripts" and others as "programs." When pressed for a definition, they will often fall back on language-specific criteria (such as whether the program is compiled or interpreted) that really do not capture the essence of the difference between scripting and more general-purpose programming.

A script generally differs from other programs in the following ways (there are exceptions):

1.: It makes heavy use of the external environment in which it runs
2.: It exports no complex data structures (though it may use them)
3.: It has no outer event loop and does not daemonize (a simple interactive prompt loop does not count)
4.: It is usually run by the author, the author's agent (cron, etc.), or by a system administrator, rather than by the anonymous "user"
5.: It has as its primary purpose ensuring that some desired state obtains in the system on which it runs (with "system" being defined as broadly as necessary).

Much has been written on good programming methodology, but in general such methodologies have general-purpose programs in mind. When applied to scripts, which are generally very high-level and procedural in nature, the methodologies can rapidly result in unreadable spaghetti, with more code devoted to methodology than to method.

Most scripters react in one of two ways: they either let the spaghetti ensue, or they throw up their hands and write fragile code.

An example

Suppose you want to write a script to mount a scratch directory from an NFS server. (This would usually be accomplished via a shell language such as bash, but for the sake of argument let's suppose that you're writing in Perl, because you need access to another module or perhaps just because you like Perl better.)

An optimistic implementation on a Red Hat Linux machine might be:

  # Add mount to filesystem table
  open FSTAB, ">>/etc/fstab";
  print FSTAB "$source:$scratch /net/$source/$scratch nfs $mount_opts\n";
  close FSTAB;
  # Create mountpoint
  mkdir $scratch;
  # Symlink to /scratch
  symlink "/net/$source/$scratch", '/scratch';
  # Start NFS services automatically at boot
  system "/sbin/chkconfig --level 3 portmap on";
  system "/sbin/chkconfig --level 3 nfslock on";
  # Start NFS services
  system "/sbin/service portmap start";
  system "/sbin/service nfslock start";
  # Mount at boot time
  system "/sbin/chkconfig --level 3 netfs on";
  # Mount now
  system "/sbin/service netfs start";

With no error-checking at all, this script would blindly charge on oblivious to any problems. If anything at all went wrong, the user would be left to pick up the pieces afterwards. Running the script a second time could be perilous, as the print statement would continue to append to /etc/fstab even if it had previously succeeded.

Good scripters will check for errors. The most common response to such errors is to abort:

  # Add mount to filesystem table
  open FSTAB, ">>/etc/fstab"
    or die "Can't open fstab for appending: $!\n";
  print FSTAB "$source:$scratch /net/$source/$scratch nfs $mount_opts\n";
  close FSTAB;
  # Create mountpoint
  mkdir $scratch
    or die "Can't create directory $scratch: $!\n";
  # Symlink to /scratch
  symlink "/net/$source/$scratch", '/scratch'
    or die "Can't make symlink to /scratch: $!\n";
  # Start NFS services automatically at boot
  system "/sbin/chkconfig --level 3 portmap on";
  if ($?) {
     die "Couldn't chkconfig on portmap\n";
  }
  system "/sbin/chkconfig --level 3 nfslock on";
  if ($?) {
     die "Couldn't chkconfig on nfslock\n";
  }
  # Start NFS services
  system "/sbin/service portmap start";
  if ($?) {
     die "Couldn't start portmap\n";
  }
  system "/sbin/service nfslock start";
  if ($?) {
     die "Couldn't start nfslock\n";
  }
  # Mount at boot time
  system "/sbin/chkconfig --level 3 netfs on";
  if ($?) {
     die "Couldn't start nfslock\n";
  }
  # Mount now
  system "/sbin/service netfs start";
  if ($?) {
     die "Couldn't start netfs\n";
  }

This implementation is certainly less likely to cause weird results, but it is by no means perfect. There are now nine places where the script may abnormally terminate, leaving the task incomplete and the user still to pick up the pieces. If the script aborts early, the user may choose to try to fix the problem encountered and then manually revert to the initial state so that the script can be re-executed.

But if the user misses any of the steps (say, deleting the line in /etc/fstab), the script will blithely carry on, unaware that some steps of the task are already done. (Worse yet, the first response of many users to an unexpected error message is simply to try the command again.)

If the script aborts late in the process, the user may try to fix the encountered problem and then finish the task manually. This too, is fraught with peril--and the entire point of automating the task was to reduce the chance of operator error!

One last observation about this new script--the functional code of the script has now been largely obscured by the error-checking code. In a larger, more complicated script, the code could rapidly degenerate into an unreadable mass.

Judicious use of a subroutine to factor out some of the error-checking improves readability somewhat:

  sub doOrDie (@) {
     system @_;
     if ($?) {
        die "Couldn't @_\n";
     }
  }
  # Add mount to filesystem table
  open FSTAB, ">>/etc/fstab"
    or die "Can't open fstab for appending: $!\n";
  print FSTAB "$source:$scratch /net/$source/$scratch nfs $mount_opts\n";
  close FSTAB;
  # Create mountpoint
  mkdir $scratch
    or die "Can't create directory $scratch: $!\n";
  # Symlink to /scratch
  symlink "/net/$source/$scratch", '/scratch'
    or die "Can't make symlink to /scratch: $!\n";
  # Start NFS services automatically at boot
  doOrDie "/sbin/chkconfig --level 3 portmap on";
  doOrDie "/sbin/chkconfig --level 3 nfslock on";
  # Start NFS services
  doOrDie "/sbin/service portmap start";
  doOrDie "/sbin/service nfslock start";
  # Mount at boot time
  doOrDie "/sbin/chkconfig --level 3 netfs on";
  # Mount now
  doOrDie "/sbin/service netfs start";

But suppose the system already had a preexisting mountpoint or symlink? This hardly seems like good reason for the script to entirely fail. The problem is that naive error-checking as above is syntactic in basis--a result of conditions intrinsic to the implementation of the script--rather than being semantic--i.e., relating to the state the script is trying to bring about.

Guarded commands to the rescue

These observations have resulted in the development of this module. Using guarded commands, the script can be written more resiliently, more clearly, and in many cases, more easily.

The first step in writing a script using guarded commands is to decompose the actions desired into a set of procedures, or steps. The above script can be so decomposed by observing the comments marking each action of the script:

  # Add mount to filesystem table
  # Create mountpoint
  # Symlink to /scratch
  # Start NFS services automatically at boot
  # Start NFS services
  # Mount at boot time
  # Mount now

These are the script's steps. (In this script, like many in system automation programming, the steps are strictly linear, with each dependent on one or more steps prior. Some scripts will have more complicated dependencies, loops, conditionals and the like.) For each step, one needs to define two things:

1.: A necessary and sufficient condition to judge whether the step has been completed.
2.: Code that will cause that condition to come into being.

To take the first step, "add mount to filesystem table," a necessary and sufficient condition can be expressed as

  `cat /etc/fstab` =~ m|^$source:$scratch\s+/net/$source/$scratch|

Note first that this check is semantic in nature. The code above would have created exactly one space between the two fields, but the regex allows for any amount of whitespace. One might be tempted to write the condition as

  `cat /etc/fstab` eq "$source:$scratch /net/$source/$scratch nfs $mount_opts\n"

since that is the text that the script will be writing out. But the script will be more resilient with the first condition, because it expresses exactly what later steps in the script need, no more, no less: that /etc/fstab contain an entry that will cause the desired filesystem to be mounted in the desired place via NFS. If conditions change--for example, a new machine is preconfigured with a suitable fstab entry--the script will continue to function.

Having written the condition for the step--expressed in an "ensure" block--the scripter then turns to how to bring the condition about. In this case, the code can be written

  open my $fstab, ">>/etc/fstab";
  print $fstab "$source:$scratch /net/$source/$scratch nfs $mount_opts";

Note that we do not check the return value of "open". There is no need. If we fail to open /etc/fstab, the "print" will fail. If the "print" fails, there will be no fstab entry corresponding to the regex above, and the script will fail for want of having obtained the condition. It may seem wrong at first--even blasphemous!--to willfully ignore the return value of a call like "open". This is the first Lesson:

Lesson 1.: Trust your conditions, and the rest will follow.

The entire script, rewritten with guarded commands, looks like this:

  use Commands::Guarded qw(:default fgrep appendf);
  
  step "Add mount to filesystem table" =>
    ensure { fgrep qr|^$source:$scratch\s+/net/$source/$scratch|, 
                   "/etc/fstab" }
    using {
       my $fstab = appendf '/etc/fstab';
       print $fstab 
         "$source:$scratch /net/$source/$scratch nfs $mount_opts";
    }
    ;
  step "Create mountpoint" =>
    ensure { -d $scratch }
    using { mkdir $scratch }
    ;
  step "Symlink to /scratch" =>
    ensure { readlink '/scratch' eq  "/net/$source/$scratch" }
    using { symlink "/net/$source/$scratch", '/scratch' }
    ;
  step "Start NFS services automatically at boot" =>
    ensure {
       fgrep qr/3:on/, '/sbin/chkconfig --list portmap|'
         and fgrep qr/3:on/, '/sbin/chkconfig --list nfslock|';
    }
    using {
       system "/sbin/chkconfig --level 3 portmap on";
       system "/sbin/chkconfig --level 3 nfslock on";
    }
    ;
  step "Start NFS services" =>
    ensure {
       fgrep qr/running/, "/sbin/service portmap status|"
         and fgrep qr/running/, "/sbin/service nfslock status|";
    }
    using {
       system "/sbin/service portmap start";
       system "/sbin/service nfslock start";
    }
    ;
  step "Mount at boot time" =>
    ensure { fgrep qr/3:on/, '/sbin/chkconfig --list netfs|' }
    using { system "/sbin/chkconfig --level 3 netfs on" }
    ;
  step "Mount now" =>
    ensure { fgrep qr|^$source:$scratch\b|, 'df|' }
    using { system "/sbin/service netfs start" }
    ;

With guarded commands, this script has numerous advantages over the previous ones:

It is more resilient. Because its checks are semantic in nature, the script will react properly to minor changes in the environment that would derail a conventionally written script.
If it fails due to some unforeseen problem, it can be rerun once the problem has been fixed. It will automatically pick up exactly where it left off.
Not only can this script be used to cause the intended state to come into being (in this case, mounting the scratch filesystem), it can be used to verify that the state exists. If it exits without failure, then the desired state is verified.
It would be perfectly reasonable, for instance, to include the above script in a crontab entry run periodically. If something went awry--e.g., the directory were unmounted or portmap was removed from the init list--the script would notice this problem and repair it.
Only important tests--the semantic ones--need to be written. There is no need to check every possible error condition of every line of code.
The error-checking code and the running code are held together in a single "step" command, but are separated into "ensure" and "using" blocks. This results in a more readable script without error-checking spaghetti.

Lesson 2.: With guarded commands, you can afford to be audacious.

If every line of code in a script that has a side effect is put into "using" blocks, the script becomes much less dangerous. There is a smaller chance that the script will "run away" and do something unexpected and horrible. Each step is checked after a "using" block is run, and so long as your semantic tests are correct, the script will halt if things start to go awry.

Lesson 3.: Take note of idempotence, and turn it to your advantage.

A function f is idempotent if it has the property

f(f(x)) = f(x)

in other words, something is idempotent if doing it twice has the same effect as doing it once. Many UNIX tools have the property of idempotence: ln -f, cp, and rsync are three examples. (Some tools look like they're idempotent but aren't, e.g. mount: you can mount the same filesystem twice on the "same", overlapping mountpoints.)

Idempotence is a large part of the power of Commands::Guarded. If you put every expression with side-effects (or, more precisely, side effects that will persist beyond the life of the script, e.g. writing files) into a "using" block, each successful step in the script becomes idempotent. In turn, a script that completes successfully is also idempotent: you can run it again and it should change nothing.

Knowing about idempotence can help you in writing your "ensure" and "using" blocks. For instance, the step above

  step "Start NFS services automatically at boot" =>
    ensure {
       fgrep qr/3:on/, '/sbin/chkconfig --list portmap|'
         and fgrep qr/3:on/, '/sbin/chkconfig --list nfslock|';
    }
    using {
       system "/sbin/chkconfig --level 3 portmap on";
       system "/sbin/chkconfig --level 3 nfslock on";
    }
    ;

Has been made simpler by noting the idempotence of chkconfig. It's possible that portmap is already enabled but <nfslock> is not, causing the "ensure" to fail. But because the chkconfig statements in the "using" block are idempotent, it is safe to run the line

       system "/sbin/chkconfig --level 3 portmap on";

again, even if portmap is already enabled.

EXPORTS

By default, "step", "ensure", and "using".

The following import tags can be used:

":step" (or ":default")

Imports the default subs of "step", "ensure", "using", "sanity", and "rollback". This is also what you get if you just say

  use Commands::Guarded;

but you will need to use one of these tags if you import another tag or named sub.

":utils"

Imports "fgrep", "readf", "writef", and "appendf".

TODO

A method to selectively clear rollbacks. This is complicated because the same rollback codeblock might be registered several times with different arguments using "do(ARGS)".
Rational behavior when "ensure" is omitted. Today it just throws an error.
A reasonable way to extend and subclass. You could do it today, but it would be relatively tough--which is why it's not documented.

SOURCE REPOSITORY

The source is available via git at <http://github.com/treyharris/Commands-Guarded/>.

ACKNOWLEDGMENTS

I would like to thank Damian Conway for his invaluable assistance on this module, including on naming of the constructs and the module itself, and for pointing out to me Dijkstra's prior work.

Thanks and love to J.D., for keeping me sane. As sane as I ever am, anyway.

AUTHOR

Trey Harris, <treyharris@gmail.com>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.