is a process that happens whenever a new class or new provider
is created, and it provides the class a chance to automatically configure an
instance on providers which it recognizes as its own.
A typical example is the MBR disk-partition class which will look for
the MBR table in the first sector and, if found and validated, will
instantiate a geom to multiplex according to the contents of the MBR.
A new class will be offered to all existing providers in turn and a new provider will be offered to all classes in turn.
Exactly what a class does to recognize if it should accept the offered provider is not defined by GEOM, but the sensible set of options are:
is the process by which a provider is removed while
it potentially is still being used.
When a geom orphans a provider, all future I/O requests will "bounce" on the provider with an error code set by the geom. Any consumers attached to the provider will receive notification about the orphanization when the event loop gets around to it, and they can take appropriate action at that time.
A geom which came into being as a result of a normal taste operation should self-destruct unless it has a way to keep functioning whilst lacking the orphaned provider. Geoms like disk slicers should therefore self-destruct whereas RAID5 or mirror geoms will be able to continue as long as they do not lose quorum.
When a provider is orphaned, this does not necessarily result in any immediate change in the topology: any attached consumers are still attached, any opened paths are still open, any outstanding I/O requests are still outstanding.
The typical scenario is:
While this approach seems byzantine, it does provide the maximum flexibility and robustness in handling disappearing devices.
The one absolutely crucial detail to be aware of is that if the device driver does not return all I/O requests, the tree will not unravel.
is a special case of orphanization used to protect
against stale metadata.
It is probably easiest to understand spoiling by going through
Imagine a disk, da0, on top of which an MBR geom provides da0s1 and da0s2, and on top of da0s1 a BSD geom provides da0s1a through da0s1e, and that both the MBR and BSD geoms have autoconfigured based on data structures on the disk media. Now imagine the case where da0 is opened for writing and those data structures are modified or overwritten: now the geoms would be operating on stale metadata unless some notification system can inform them otherwise.
To avoid this situation, when the open of da0 for write happens, all attached consumers are told about this and geoms like MBR and BSD will self-destruct as a result. When da0 is closed, it will be offered for tasting again and, if the data structures for MBR and BSD are still there, new geoms will instantiate themselves anew.
Now for the fine print:
If any of the paths through the MBR or BSD module were open, they would have opened downwards with an exclusive bit thus rendering it impossible to open da0 for writing in that case. Conversely, the requested exclusive bit would render it impossible to open a path through the MBR geom while da0 is open for writing.
From this it also follows that changing the size of open geoms can only be done with their cooperation.
Finally: the spoiling only happens when the write count goes from zero to non-zero and the retasting happens only when the write count goes from non-zero to zero.
is the process where the administrator issues instructions
for a particular class to instantiate itself.
There are multiple
ways to express intent in this case - a particular provider may be
specified with a level of override forcing, for instance, a BSD
disklabel module to attach to a provider which was not found palatable
during the TASTE operation.
Finally, I/O is the reason we even do this: it concerns itself with sending I/O requests through the graph.
.Vt struct bio , originate at a consumer, are scheduled on its attached provider and, when processed, are returned to the consumer. It is important to realize that the
.Vt struct bio which enters through the provider of a particular geom does not " come out on the other side ". Even simple transformations like MBR and BSD will clone the
.Vt struct bio , modify the clone, and schedule the clone on their own consumer. Note that cloning the
.Vt struct bio does not involve cloning the actual data area specified in the I/O request.
In total, four different I/O requests exist in GEOM: read, write, delete, and "get attribute".
Read and write are self explanatory.
Delete indicates that a certain range of data is no longer used and that it can be erased or freed as the underlying technology supports. Technologies like flash adaptation layers can arrange to erase the relevant blocks before they will become reassigned and cryptographic devices may want to fill random bits into the range to reduce the amount of data available for attack.
It is important to recognize that a delete indication is not a request and consequently there is no guarantee that the data actually will be erased or made unavailable unless guaranteed by specific geoms in the graph. If "secure delete" semantics are required, a geom should be pushed which converts delete indications into (a sequence of) write requests.
"Get attribute" supports inspection and manipulation of out-of-band attributes on a particular provider or path. Attributes are named by ASCII strings and they will be discussed in a separate section below.
(Stay tuned while the author rests his brain and fingers: more to come.)
Several flags are provided for tracing GEOM operations and unlocking protection mechanisms via the kern.geom.debugflags sysctl. All of these flags are off by default, and great care should be taken in turning them on.
0x01(G_T_TOPOLOGY) Provide tracing of topology change events. 0x02(G_T_BIO) Provide tracing of buffer I/O requests. 0x04(G_T_ACCESS) Provide tracing of access check controls. 0x08 (unused)
0x10 (allow foot shooting)
Allow writing to Rank 1 providers. This would, for example, allow the super-user to overwrite the MBR on the root disk or write random sectors elsewhere to a mounted disk. The implications are obvious. 0x40(G_F_DISKIOCTL) This is unused at this time. 0x80(G_F_CTLDUMP) Dump contents of gctl requests.
libgeom(3), disk(9), DECLARE_GEOM_CLASS(9), g_access(9), g_attach(9), g_bio(9), g_consumer(9), g_data(9), g_event(9), g_geom(9), g_provider(9), g_provider_by_name(9)
This software was developed for the
.Fx Project by
.An Poul-Henning Kamp and NAI Labs, the Security Research Division of Network Associates, Inc. under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA CHATS research program.
The first precursor for GEOM was a gruesome hack to Minix 1.2 and was never distributed. An earlier attempt to implement a less general scheme in
.Fx never succeeded.
.An Poul-Henning Kamp Aq phk@FreeBSD.org