 |
|
| |
GOTO-INSTRUMENT(1) |
User Commands |
GOTO-INSTRUMENT(1) |
goto-instrument - Perform analysis or instrumentation of goto
binaries
goto-instrument reads a GOTO binary, performs a given
program transformation, and then writes the resulting program as GOTO binary
on disk.
- --nondet-volatile
- --nondet-volatile-variable
variable
- By default, cbmc(1) treats volatile variables the same as
non-volatile variables. That is, it assumes that a volatile variable does
not change between subsequent reads, unless it was written to by the
program. With the above options, goto-instrument can be instructed
to instrument the given goto program such as to (1) make reads from all
volatile expressions non-deterministic (--nondet-volatile), (2)
make reads from specific variables non-deterministic
(--nondet-volatile-variable), or (3) model reads from specific
variables by given models (--nondet-volatile-model).
Below we give two usage examples for the options. Consider the
following test, for function get_celsius and with harness
test_get_celsius:
#include <assert.h>
#include <limits.h>
#include <stdint.h>
// hardware sensor for temperature in kelvin
extern volatile uint16_t temperature;
int get_celsius() {
if (temperature > (1000 + 273)) {
return INT_MIN; // value indicating error
}
return temperature - 273;
}
void test_get_celsius() {
int t = get_celsius();
assert(t == INT_MIN || t <= 1000);
assert(t == INT_MIN || t >= -273);
}
Here the variable temperature corresponds to a hardware
sensor. It returns the current temperature on each read. The
get_celsius function converts the value in Kelvin to degrees
Celsius, given the value is in the expected range. However, it has a bug
where it reads temperature a second time after the check, which
may yield a value for which the check would not succeed. Verifying this
program as is with cbmc(1) would yield a verification success. We
can use goto-instrument to make reads from temperature
non-deterministic:
goto-cc -o get_celsius_test.gb get_celsius_test.c
goto-instrument --nondet-volatile-variable temperature \
get_celsius_test.gb get_celsius_test-mod.gb
cbmc --function test_get_celsius get_celsius_test-mod.gb
Here the final invocation of cbmc(1) correctly reports
a verification failure.
- --nondet-volatile-model
variable:model
- Simply treating volatile variables as non-deterministic may for some use
cases be too inaccurate. Consider the following test, for function
get_message and with harness test_get_message:
#include <assert.h>
#include <stdint.h>
extern volatile uint32_t clock;
typedef struct message {
uint32_t timestamp;
void *data;
} message_t;
void *read_data();
message_t get_message() {
message_t msg;
msg.timestamp = clock;
msg.data = read_data();
return msg;
}
void test_get_message() {
message_t msg1 = get_message();
message_t msg2 = get_message();
assert(msg1.timestamp <= msg2.timestamp);
}
The harness verifies that get_message assigns
non-decreasing time stamps to the returned messages. However, simply
treating clock as non-deterministic would not suffice to prove
this property. Thus, we can supply a model for reads from
clock:
// model for reads of the variable clock
uint32_t clock_read_model() {
static uint32_t clock_value = 0;
uint32_t increment;
__CPROVER_assume(increment <= 100);
clock_value += increment;
return clock_value;
}
The model is stateful in that it keeps the current clock value
between invocations in the variable clock_value. On each
invocation, it increments the clock by a non-deterministic value in the
range 0 to 100. We can tell goto-instrument to use the model
clock_read_model for reads from the variable clock as
follows:
goto-cc -o get_message_test.gb get_message_test.c
goto-instrument --nondet-volatile-model clock:clock_read_model \
get_message_test.gb get_message_test-mod.gb
cbmc --function get_message_test get_message_test-mod.gb
Now the final invocation of cbmc(1) reports
verification success.
- --isr
function
- instruments an interrupt service routine
- --mmio
- instruments memory-mapped I/O
- --nondet-static
- add nondeterministic initialization of variables with static lifetime
- --nondet-static-exclude
e
- same as nondet-static except for the variable e (use multiple times
if required)
- --nondet-static-matching
r
- add nondeterministic initialization of variables with static lifetime
matching regex r
- --function-enter
f
- --function-exit
f
- --branch
f
- instruments a call to f at the beginning, the exit, or a branch
point, respectively
- --splice-call
caller,callee
- prepends a call to callee in the body of caller
- --check-call-sequence
seq
- instruments checks to assert that all call sequences match seq
- --undefined-function-is-assume-false
- convert each call to an undefined function to assume(false)
- --insert-final-assert-false
function
- generate assert(false) at end of function
- --generate-function-body
regex
- This transformation inserts implementations of functions without
definition, i.e., a body. The behavior of the generated function is chosen
via --generate-function-body-options option:
- --generate-function-body-options
option
- One of assert-false, assume-false, nondet-return,
assert-false-assume-false and
havoc[,params:regex][,globals:regex][,params:p_n1;p_n2;..]
(default: nondet-return)
- assert-false: The body consists of a single command:
assert(0).
- assume-false: The body consists of a single command:
assume(0).
- assert-false-assume-false: Two commands as above.
- nondet-return: The generated function returns a non-deterministic
value of its return type.
- havoc[,params:p-regex][,globals:g-regex]:
Assign non-deterministic values to the targets of pointer-to-non-constant
parameters matching the regular expression p-regex, and
non-constant globals matching g-regex, and then (in case of
non-void function) returning as with nondet-return. The following
example demonstrates the use:
// main.c
int global;
const int c_global;
int function(int *param, const int *c_param);
Often we want to avoid overwriting internal symbols, i.e.,
those with an __ prefix, which is done using the pattern
(?!__).
goto-cc main.c -o main.gb
goto-instrument main.gb main-out.gb \
--generate-function-body-options 'havoc,params:(?!__).*,globals:(?!__).*' \
--generate-funtion-body function
This leads to a GOTO binary equivalent to the following C
code:
// main-mod.c
int function(int *param, const int *c_param) {
*param = nondet_int();
global = nondet_int();
return nondet_int();
}
The parameters should that should be non-deterministically
updated can be specified either by a regular expression (as above) or by
a semicolon-separated list of their numbers. For example
havoc,params:0;3;4 will assign non-deterministic values to the
first, fourth, and fifth parameter.
Note that only parameters of pointer type can be havoced and
goto-instrument will produce an error report if given a parameter
number associated with a non-pointer parameter. Requesting to havoc a
parameter with a number higher than the number of parameters a given
function takes will also results in an error report.
- --generate-havocing-body
option
fun_name,params:p_n1;p_n2;..
- --generate-havocing-body
option
fun_name[,call-site-id,params:p_n1;p_n2;..>]+
- Request a different implementation for a number of call-sites of a single
function. The option --generate-havocing-body inserts new functions
for selected call-sites and replaces the calls to the origin function with
calls to the respective new functions.
// main.c
int function(int *first, int *second, int *third);
int main() {
int a = 10;
int b = 10;
int c = 10;
function(&a, &b, &c);
function(&a, &b, &c);
}
The user can specify different behavior for each call-site as
follows:
goto-cc main.c -o main.gb
goto-instrument main.gb main-mod.gb \
--generate-havocing-body 'function,1,params:0;2,2,params:1'
This results in a GOTO binary equivalent to:
// main-mod.c
int function_1(int *first, int *second, int *third) {
*first = nondet_int();
*third = nondet_int();
}
int function_2(int *first, int *second, int *third) { *second = nondet_int(); }
int main() {
int a = 10;
int b = 10;
int c = 10;
function_1(&a, &b, &c);
function_2(&a, &b, &c);
}
- --restrict-function-pointer
- pointer_name/target[,targets]* Replace function
pointers by a user-defined set of targets. This may be required when
--remove-function-pointers creates to large a set of direct calls.
Consider the example presented for --remove-function-pointers.
Assume that call will always receive pointers to either f or
g during actual executions of the program, and symbolic execution
for h is too expensive to simply ignore the cost of its branch.
To facilitate the controlled replace, we will label the places
in each function where function pointers are being called, to this
pattern:
function-name.function_pointer_call.N
where N is refers to the N-th function call via
a function pointer in function-name, i.e., the first call to a
function pointer in a function will have N=1, the fifth
N=5 etc. Alternatively, if the calls carry labels in the source
code, we can also refer to a function pointer as
function-name.label
To implement this assumption that the first call to a function
pointer in function call an only be a call to f or
g, use
goto-instrument --restrict-function-pointer \
call.function_pointer_call.1/f,g in.gb out.gb
The resulting output (written to GOTO binary out.gb)
looks similar to the original example, except now there will not be a
call to h:
void call(fptr_t fptr) {
int r;
if (fptr == &f) {
r = f(10);
} else if (fptr == &g) {
r = g(10);
} else {
// sanity check
assert(false);
assume(false);
}
return r;
}
As another example imagine we have a simple virtual filesystem
API and implementation like this:
typedef struct filesystem_t filesystem_t;
struct filesystem_t {
int (*open)(filesystem_t *filesystem, const char *file_name);
};
int fs_open(filesystem_t *filesystem, const char *file_name) {
filesystem->open(filesystem, file_name);
}
int nullfs_open(filesystem_t *filesystem, const char *file_name) { return -1; }
filesystem_t nullfs_val = {.open = nullfs_open};
filesystem *const nullfs = &nullfs_val;
filesystem_t *get_fs_impl() {
// some fancy logic to determine
// which filesystem we're getting -
// in-memory, backed by a database, OS file system
// - but in our case, we know that
// it always ends up being nullfs
// for the cases we care about
return nullfs;
}
int main(void) {
filesystem_t *fs = get_fs_impl();
assert(fs_open(fs, "hello.txt") != -1);
}
In this case, the assumption is that in function main,
fs can be nothing other than nullfs. But perhaps due to
the logic being too complicated, symbolic execution ends up being unable
to figure this out, so in the call to fs_open we end up branching
on all functions matching the signature of filesystem_t::open,
which could be quite a few functions within the program. Worst of all,
if its address is ever taken in the program, as far as function pointer
removal via --remove-function-pointers is concerned it could be
fs_open itself due to it having a matching signature, leading to
symbolic execution being forced to follow a potentially infinite
recursion until its unwind limit.
In this case we can again restrict the function pointer to the
value which we know it will have:
goto-instrument --restrict-function-pointer \
fs_open.function_pointer_call.1/nullfs_open in.gb out.gb
- --function-pointer-restrictions-file
file_name
- If you have many places where you want to restrict function pointers, it'd
be a nuisance to have to specify them all on the command line. In these
cases, you can specify a file to load the restrictions from instead, which
you can give the name of a JSON file with this format:
{
"function_call_site_name": ["function1", "function2", ...],
...
}
If you pass in multiple files, or a mix of files and command
line restrictions, the final restrictions will be a set union of all
specified restrictions.
Note that if something goes wrong during type checking (i.e.,
making sure that all function pointer replacements refer to functions in
the symbol table that have the correct type), the error message will
refer to the command line option --restrict-function-pointer
regardless of whether the restriction in question came from the command
line or a file.
- --restrict-function-pointer-by-name
symbol_name/target[,targets]*
- Restrict a function pointer where symbol_name is the unmangled
name, before labeling function pointers.
- --remove-calls-no-body
- remove calls to functions without a body
- --add-library
- add models of C library functions
- --malloc-may-fail
- allow malloc calls to return a null pointer
- --malloc-fail-assert
- set malloc failure mode to assert-then-assume
- --malloc-fail-null
- set malloc failure mode to return null
- --no-malloc-may-fail
- do not allow malloc calls to fail by default
- --string-abstraction
- track C string lengths and zero-termination
- --model-argc-argv
n
- Create up to n non-deterministic C strings as entries to
argv and set argc accordingly. In absence of such modelling,
argv is left uninitialized except for a terminating NULL
pointer. Consider the following example:
// needs_argv.c
#include <assert.h>
int main(int argc, char *argv[]) {
if (argc >= 2)
assert(argv[1] != 0);
return 0;
}
If cbmc(1) is run directly on this example, it will
report a failing assertion for the lack of modeling of argv. To
make the assertion succeed, as expected, use:
goto-cc needs_argv.c
goto-instrument --model-argc-argv 2 a.out a.out
cbmc a.out
- --remove-function-body
f
- remove the implementation of function f (may be repeated)
- --replace-calls
f:g
- replace calls to f with calls to g
- --max-nondet-tree-depth
N
- limit size of nondet (e.g. input) object tree; at level N pointers are set
to null
- --min-null-tree-depth
N
- minimum level at which a pointer can first be NULL in a recursively nondet
initialized struct
- --ensure-one-backedge-per-target
- transform loop bodies such that there is a single edge back to the loop
head
- --drop-unused-functions
- drop functions trivially unreachable from main function
- --remove-pointers
- converts pointer arithmetic to base+offset expressions
- --constant-propagator
- propagate constants and simplify expressions
- --inline
- perform full inlining
- --partial-inline
- perform partial inlining
- --function-inline
function
- transitively inline all calls function makes
- --no-caching
- disable caching of intermediate results during transitive function
inlining
- --log
file
- log in JSON format which code segments were inlined, use with
--function-inline
- --remove-function-pointers
- Resolve calls via function pointers to direct function calls. Candidate
functions are chosen based on their signature and whether or not they have
their address taken somewhere in the program The following example
illustrates the approach taken. Given that there are functions with these
signatures available in the program:
int f(int x);
int g(int x);
int h(int x);
And we have a call site like this:
typedef int (*fptr_t)(int x);
void call(fptr_t fptr) {
int r = fptr(10);
assert(r > 0);
}
Function pointer removal will turn this into code similar to
this:
void call(fptr_t fptr) {
int r;
if (fptr == &f) {
r = f(10);
} else if (fptr == &g) {
r = g(10);
} else if (fptr == &h) {
r = h(10);
} else {
// sanity check
assert(false);
assume(false);
}
return r;
}
Beware that there may be many functions matching a particular
signature, and some of them may be costly to a subsequently run
analysis. Consider using --restrict-function-pointer to manually
specify this set of functions, or --value-set-fi-fp-removal.
- --remove-const-function-pointers
- remove function pointers that are constant or constant part of an
array
- --value-set-fi-fp-removal
- Build a flow-insensitive value set and replace function pointers by a case
statement over the possible assignments. If the set of possible
assignments is empty the function pointer is removed using the standard
--remove-function-pointers pass.
- --show-loops
- show the loops in the program
- --unwind
nr
- unwind all loops nr times
- --unwindset
[T:]L:B,...
- unwind loop L with a bound of B (optionally restricted to
thread T) (use --show-loops to get the loop IDs)
- --unwindset-file
file
- read unwindset from file
- --partial-loops
- permit paths that execute loops only partially (up to the given unwinding
bound) and then continue beyond the loop even when the loop condition
would still hold (such paths may be spurious, resulting in false
alarms)
- --no-unwinding-assertions
- do not generate unwinding assertions
- --unwinding-assertions
- generate unwinding assertions (which are enabled by default; overrides
--no-unwinding-assertions when both of these are given)
- --continue-as-loops
- add loop for remaining iterations after unwound part
- --k-induction
k
- check loops with k-induction
- --step-case
- k-induction: do step-case
- --base-case
- k-induction: do base-case
- --havoc-loops
- over-approximate all loops
- --accelerate
- add loop accelerators
- --z3
- use Z3 when computing loop accelerators
- --skip-loops
loop-ids
- add gotos to skip selected loops during execution
- --show-lexical-loops
- Show lexical loops. A lexical loop is a block of goto program instructions
with a single entry edge at the top and a single backedge leading from
bottom to top, where "top" and "bottom" refer to
program order. The loop may have holes: instructions which sit in between
the top and bottom in program order, but which can't reach the loop
backedge. Lexical loops are a subset of the natural loops, which are
cheaper to compute and include most natural loops generated from typical C
code.
- --show-natural-loops
- Show natural loop heads. A natural loop is when the nodes and edges of a
graph make one self-encapsulating circle with no incoming edges from
external nodes. For example A -> B -> C -> D -> A is a natural
loop, but if B has an incoming edge from X, then it isn't a natural loop,
because X is an external node. Outgoing edges don't affect the
natural-ness of a loop.
All tools honor the TMPDIR environment variable when generating
temporary files and directories.
If you encounter a problem please create an issue at
https://github.com/diffblue/cbmc/issues
2008-2013, Daniel Kroening
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|