Manual¶
sibench
is a benchmarking tool for storage systems, and for Ceph in particular.
sibench
typically runs on many nodes in parallel, in order to be able to scale
up to the very high bandwidths of which modern scaleable storage systems are
capable.
For more detailed information, visit https://sibench.io.
How It Works¶
Each benchmarking node runs sibench
as a server daemon, listening for incoming
connections from a sibench
client.
Once the client starts a benchmark, the sibench
servers send periodic summaries
back to it, so that the user can see progress.
When a benchmark completes, the sibench
servers send the full results back to
the client. A summary of the results is dislplayed by the client, but the full
results of every individual read and write operation are written out as a json
file in case the user wishes to perform their own statistical analysis.
Once a client starts running a benchmark on some set of sibench
servers, those
servers will reject any other incoming connections until the benchmark completes
or is aborted. There is no job queue or long-lived management process.
The client and server use the same binary, just with different command line options. The client is extremely lightweight, and may be run on one of the server nodes without significantly impacting benchmarking performance, though it may result in higher memory usage on that node.
Command Line¶
The following is a list of all the commands that the sibench binary accepts:
- sibench -h | --help
Outputs the full command line syntax.
- sibench version
Outputs the version number of the sibench binary.
- sibench server [--verbosity LEVEL] [--port PORT] [--mounts-dir DIR]
Starts sibench as a server.
- sibench s3 run [--s3-port PORT] [--s3-bucket BUCKET] (--s3-access-key KEY) (--s3-secret-key KEY) <target> …
Starts a benchmark using the S3 object protocol against the specified targets, which may S3 servers or RadosGateway nodes.
- sibench rados run [--ceph-pool POOL] [--ceph-user USER] (--ceph-key KEY) <target> …
Starts a benchmark using the Rados object protocol against the specified targets, which should be Ceph monitors.
- sibench cephfs run [--mounts-dir DIR] [--ceph-dir DIR] [--ceph-user USER] (--ceph-key KEY) <target> …
Starts a benchmark using CephFS against the specified targets, which should be Ceph monitors.
- sibench rbd run [--ceph-pool POOL] [--ceph-datapool POOL] [--ceph-user USER] (--ceph-key KEY) <target> …
Starts a benchmark using RBD against the specified targets, which should be Ceph monitors.
- sibench block run [--block-device DEVICE]
Starts a benchmark using a locally mounted block device.
- sibench file run [--file-dir DIR]
Starts a benchmark using a locally mounted filesystem.
Additional options shared by all run commands, omitted from above for clarity:
[--verbosity LEVEL]
[--port PORT]
[--object-size SIZE]
[--object-count COUNT]
[--ramp-up TIME]
[--run-time TIME]
[--ramp-down TIME]
[--read-write-mix MIX]
[--bandwidth BW]
[--output FILE]
[--workers FACTOR]
[--generator GEN]
[--slice-dir DIR]
[--slice-count COUNT]
[--slice-size BYTES]
[--skip-read-verification]
[--servers SERVERS]
[--use-bytes]
[--individual-stats]
Option Definitions¶
Long |
Short |
Value |
Description |
Default |
---|---|---|---|---|
--help |
-h |
- |
Show full usage. |
- |
--verbosity |
-v |
LEVEL |
Set debugging output at level “off”, “debug” or “trace”. The “trace” level may generate enough output to affect benchmark performance, and should only be used when trying to track down issues. |
off |
--port |
-p |
PORT |
The port on which |
5150 |
--object-size |
-s |
SIZE |
Object size to test, in units of K or M. |
1M |
--object-count |
-c |
COUNT |
The total number of objects to use as our working set. |
1000 |
--ramp-up |
-u |
TIME |
The number of seconds at the start of each phase where we don’t record data (to discount edge effects caused by new connections). |
5 |
--run-time |
-r |
TIME |
The number of seconds in the middle on each phase of the benchmark where we do record the data. |
30 |
--ramp-down |
-d |
TIME |
The number of seconds at the end of each phase where we don’t record data. |
2 |
--read-write-mix |
-x |
MIX |
The ratio between read and writes, specified as the percentage of reads. A value of zero indicates that reads and writes should be done in separate passes, rather than being combined. |
0 |
--bandwidth |
-b |
BW |
Benchmark at a fixed bandwidth, in units of K, M or G bits/s A value of zero indicates no limit. When the read/write mix is not zero - that is, when we are not doing separate passes for read and write - then this is the bandwidth of the combined operations. |
0 |
--output |
-o |
FILE |
The file to which we write our json results. |
|
--workers |
-w |
FACTOR |
Number of worker threads per server as a factor x number of CPU cores. |
1.0 |
--mounts-dir |
-m |
DIR |
The directory in which we should create any filesystem mounts that are performed by
|
/tmp/sibench_mnt |
--generator |
-g |
GEN |
Which object generator to use: “prng” or “slice”. |
prng |
--skip-read-verification |
- |
Disable validation on reads. This should only be used to check if the number of nodes
in the |
- |
|
--servers |
SERVERS |
A comma-separated list of |
localhost |
|
--s3-port |
PORT |
The port on which to connect to S3. |
7480 |
|
--s3-bucket |
BUCKET |
The name of the bucket we wish to use for S3 operations. |
sibench |
|
--s3-access-key |
KEY |
S3 access key. |
- |
|
--s3-secret-key |
KEY |
S3 secret key. |
- |
|
--ceph-pool |
POOL |
The pool we use for benchmarking. |
sibench |
|
--ceph-datapool |
POOL |
Optional pool used for RBD. If set, ceph-pool is used only for metadata. |
- |
|
--ceph-user |
USER |
The Ceph username we wish to use. |
admin |
|
--ceph-key |
KEY |
The CephX secret key belonging to the ceph user. |
- |
|
--ceph-dir |
DIR |
The directory within CephFS that we should use for a benchmark. This will be created
by |
sibench |
|
--block-device |
DEVICE |
The local block device to use for a benchmark. |
/tmp/sibench_block |
|
--file-dir |
DIR |
The local directory to use for file operations. The directory must already exist. |
- |
|
--slice-dir |
DIR |
The directory of files to be sliced up to form new workload objects. |
- |
|
--slice-count |
COUNT |
The number of slices to construct for workload generation. |
10000 |
|
--slice-size |
BYTES |
The size of each slice in bytes. |
4096 |
|
--use-bytes |
- |
Show bandwidth in Bytes |
off |
|
--individual-stats |
- |
Record the individual stats in the output file. This may be VERY big |
off |
|
--clean-up |
- |
Delete the data at the end of the benchmark run |
off |
Targets¶
The targets are the nodes to which the worker threads connect. Each worker opens a connection to each target and round-robins their reads and writes across those connections.
For most Ceph operations, the targets are monitors, and there is no advantage to specifying more than one. All the monitors do is provide the state-of-the-cluster map so that the workers can connect to the OSDs directly.
For RGW/S3, however, you should definitely list all of the storage cluster’s RGW nodes as targets, since those nodes are doing real work, and it needs to be balanced.
RBD¶
RBD behaviour is a little different than you might expect: each worker creates an RBD image per target, just big enough to hold that worker’s share of the ‘objects’ for the benchmark. All reads and writes that the worker then does are within the RBD image.
For example, if you have the following:
10
sibench
nodes, each with 16 coresA single target monitor
And object count of 1600 and an object size of 1MB
Then sibench
will create 160 workers (by default, it is one per core), each of
which will create a single 10MB RBD image, and then it will proceed to read and
write 1 MB at a time to parts of that image.
Generators¶
Generators create the data that sibench
uses as workloads for the storage
system. There are currently two of them, selectable with the --generator
option.
PRNG Generator¶
The PRNG generator creates data which is entirely pseudorandom. It requires no configuration, and is the default choice. However, it has one shortcoming: because it creates pseudorandom data, it is not compressible. If you wish to test compression in your storage system, then you will need need to create a compressible workload. The same restriction applies to de-duplication technologies.
Slice Generator¶
The Slice generator builds workloads from existing files. It aims to reproduce the compressibility characteristics of those files, whilst still creating an effectively infinite supply of different objects.
It works by taking a directory of files (which will usually be of the same type: source code, VM images, movies, or whatever), and then loading fixed sized slices of bytes from random positions within those files. The end result is that we have a library of (say) 1000 slices, each containing (say) 4Kb of data. Both of those values may be set with command line options.
When asked to generate a new workload object the slice generator does the following:
Creates a random seed.
Writes the seed into the start of the workload object.
Uses the seed to create a random number generator just for this workload object.
Uses that random number generator to select slices from our library, which are concatenated onto the object until we have as many bytes as we were asked for.
This approach means that we do not need to ever store the objects themselves: we can verify a read operation by reading the seed from the first few bytes, and then recreating the object we would expect.
Note that the directory of data to be sliced needs to be in the same location on
each of the sibench
server nodes.
The drivers do not need to have the same files in their slice directories, though it’s likely that they will. One option would be to mount the same NFS share on all the drivers as a repository for the slice data. Performance when loading the slices is not a consideration, since it is done before the benchmark begins, and so will not affect the numbers.
Write Cycles¶
The count
parameter determines how many objects we create. However, for long
benchmarks runs, or for small counts or object sizes, we are likely to wrap
around and start writing from the first object again. If this happens, sibench
internally increments a cycle counter, which it uses to ensure that objects
written in different cycles will have different contents, even though the object
will still use the same key as previously.
The Prepare Phase¶
sibench
either benchmarks write operations first and then read operations, or
else it benchmarks a mixture of the too (depending on the --read-write-mix
option. When benchmarking reads, or a read-write mix, it must first ensure that
there are enough objects there to read before it can start work. This is the
prepare phase, and that is what is happening when you see messages about
‘Preparing’.
It also happens if we are doing separate writes and reads and we did not have a
long enough run time for sibench
to write all of the objects specified by the
object-count
option. In this case, the prepare phase will keep writing until
all the objects are ready for reading.
The Delete Phase¶
sibench
does not clean up after itself by default, since Ceph can be very
slow at deleting objects. However, if you wish to execute multiple runs over a
weekend (perhaps by using Benchmaster to control sibench), then you may run the
risk of running out of storage space on the Ceph cluster. In such cases,
deleting the objects at the end of the run may be necessary. You can enable
this by using the --clean-up
flag.
Setting --clean-up
behaves differently depending on the protocol, but in
essence there are two parts to it: deleting the individual objects, and cleaning
up other resources. Protocols may do either, neither or both.
In addition, the cleanup may be synchonous or not. This is best illustrated by comparing the behaviour or RADOS and RBD.
- With RADOS, we can delete the individual objects, and we can do it synchronously
meaning that when sibench completes the run, Ceph will have deleted the objects
and will have no pending workload.
With RBD, we delete the RBD image synchronously, but under the hood, that image is comprised of multiple objects, and Ceph does not delete them at once, but adds them to a queue for later deletion.
Clealy asynchonous deletes are bad if we wish to run a set of benchmarks: when the benchmark terminates, the Ceph cluster under test may still be deleting in background, and thus degrading the performance of subsequent runs.
Sadly, there’s nothing sibench can do to determine completion in such cases.
Protocol |
Object Delete |
End Of Run Clean-up |
Synchronous |
---|---|---|---|
s3 |
yes |
Deletes the bucket, but only if we created it |
yes |
rados |
yes |
no |
yes |
cephfs |
yes |
Deletes the directories only if we created them |
yes |
rbd |
no |
Deletes the images |
no |
block |
no |
no |
n/a |
file |
yes |
no |
dependent on underlying filesystem |
Lastly, if you’re not running a production cluster, then you can tell Ceph to delete more quickly (or more accurately, to insert smaller delays between delete operations) by adding the following to your ceph config (and then restarting the osd daemons).
:: osd_delete_sleep_hybrid = 0.001 osd_delete_sleep_hdd = 0.001 osd_delete_sleep_ssd = 0.001