crushtool ( -d map | -c map.txt | --build --num_osds numosds layer1 ... | --test ) [ -o outfile ]
CRUSH is a pseudo-random data distribution algorithm that efficiently maps input values (which, in the context of Ceph, correspond to Placement Groups) across a heterogeneous, hierarchically structured device map. The algorithm was originally described in detail in the following paper (although it has evolved some since then):
The tool has four modes of operation.
Unlike other Ceph tools, crushtool does not accept generic options such as --debug-crush from the command line. They can, however, be provided via the CEPH_ARGS environment variable. For instance, to silence all output from the CRUSH subsystem:
CEPH_ARGS="--debug-crush 0" crushtool ...
Note: Each Placement Group (PG) has an integer ID which can be obtained from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32). The pool and PG IDs are combined by a function to get a value which is given to CRUSH to map it to OSDs. crushtool does not know about PGs or pools; it only runs simulations by mapping values in the range [--min-x,--max-x].
rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
shows that rule 1 which is named metadata successfully mapped 1024 values to result size == 5 devices when trying to map them to num_rep 5 replicas. When it fails to provide the required mapping, presumably because the number of tries must be increased, a breakdown of the failures is displayed. For instance:
rule 1 (metadata) num_rep 10 result size == 8: 4/1024 rule 1 (metadata) num_rep 10 result size == 9: 93/1024 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
shows that although num_rep 10 replicas were required, 4 out of 1024 values ( 4/1024 ) were mapped to result size == 8 devices only.
CRUSH rule 1 x 24 [11,6]
shows that value 24 is mapped to devices [11,6] by rule 1.
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
shows that when rule 1 was required to map 7 devices, it could map only six : [8,10,2,11,6,9].
device 0: stored : 951 expected : 853.333 device 1: stored : 963 expected : 853.333 ...
shows that device 0 stored 951 values and was expected to store 853. Implies --show-statistics.
0: 95224 1: 3745 2: 2225 ..
shows that 95224 mappings succeeded without retries, 3745 mappings succeeded with one attempts, etc. There are as many rows as the value of the --set-choose-total-tries option.
metadata-absolute_weights.csv metadata-device_utilization.csv ...
The first line of the file shortly explains the column layout. For instance:
metadata-absolute_weights.csv Device ID, Absolute Weight 0,1 ...
FOO-metadata-absolute_weights.csv FOO-metadata-device_utilization.csv ...
The --set-... options can be used to modify the tunables of the input crush map. The input crush map is modified in memory. For example:
$ crushtool -i mymap --test --show-bad-mappings bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
could be fixed by increasing the choose-total-tries as follows:
Each layer consists of:
bucket ( uniform | list | tree | straw ) size
The bucket is the type of the buckets in the layer (e.g. "rack"). Each bucket name will be built by appending a unique number to the bucket string (e.g. "rack0", "rack1"...).
The second component is the type of bucket: straw should be used most of the time.
The third component is the maximum size of the bucket. A size of zero means a bucket of infinite capacity.
To reflect our hierarchy of devices, nodes, racks and rows, we would execute the following:
$ crushtool -o crushmap --build --num_osds 320 \ node straw 4 \ rack straw 20 \ row straw 2 \ root straw 0 # id weight type name reweight -87 320 root root -85 160 row row0 -81 80 rack rack0 -1 4 node node0 0 1 osd.0 1 1 1 osd.1 1 2 1 osd.2 1 3 1 osd.3 1 -2 4 node node1 4 1 osd.4 1 5 1 osd.5 1 ...
CRUSH rules are created so the generated crushmap can be tested. They are the same rules as the ones created by default when creating a new Ceph cluster. They can be further edited with:
# decompile crushtool -d crushmap -o map.txt # edit emacs map.txt # recompile crushtool -c map.txt -o crushmap