lat_ctx measures context switching time for any reasonable
number of processes of any reasonable size.
The processes are connected in a ring of Unix pipes. Each process
reads a token from its pipe, possibly does some work, and then writes
the token to the next process.
Processes may vary in number. Smaller numbers of processes result in
faster context switches. More than 20 processes is not supported.
Processes may vary in size. A size of zero is the baseline process that
does nothing except pass the token on to the next process. A process size
of greater than zero means that the process does some work before passing
on the token. The work is simulated as the summing up of an array of the
specified size. The summing is an unrolled loop of about a 2.7 thousand
The effect is that both the data and the instruction cache
get polluted by some amount before the token is passed on. The data
cache gets polluted by approximately the process size. The instruction
cache gets polluted by a constant amount, approximately 2.7
The pollution of the caches results in larger context switching times for
the larger processes. This may be confusing because the benchmark takes
pains to measure only the context switch time, not including the overhead
of doing the work. The subtle point is that the overhead is measured using
hot caches. As the number and size of the processes increases, the caches
are more and more polluted until the set of processes do not fit. The
context switch times go up because a context switch is defined as the switch
plus the time it takes to restore all of the process state, including
cache state. This means that the switch includes the time for the cache
misses on larger processes.