Manual¶

sibench is a benchmarking tool for storage systems, and for Ceph in particular.

sibench typically runs on many nodes in parallel, in order to be able to scale up to the very high bandwidths of which modern scaleable storage systems are capable.

For more detailed information, visit https://sibench.io.

How It Works¶

Each benchmarking node runs sibench as a server daemon, listening for incoming connections from a sibench client.

Once the client starts a benchmark, the sibench servers send periodic summaries back to it, so that the user can see progress.

When a benchmark completes, the sibench servers send the full results back to the client. A summary of the results is dislplayed by the client, but the full results of every individual read and write operation are written out as a json file in case the user wishes to perform their own statistical analysis.

Once a client starts running a benchmark on some set of sibench servers, those servers will reject any other incoming connections until the benchmark completes or is aborted. There is no job queue or long-lived management process.

The client and server use the same binary, just with different command line options. The client is extremely lightweight, and may be run on one of the server nodes without significantly impacting benchmarking performance, though it may result in higher memory usage on that node.

Command Line¶

The following is a list of all the commands that the sibench binary accepts:

sibench -h | --help: Outputs the full command line syntax.
sibench version: Outputs the version number of the sibench binary.
sibench server [--verbosity LEVEL] [--port PORT] [--mounts-dir DIR]: Starts sibench as a server.
sibench s3 run [--s3-port PORT] [--s3-bucket BUCKET] (--s3-access-key KEY) (--s3-secret-key KEY) <target> …: Starts a benchmark using the S3 object protocol against the specified targets, which may S3 servers or RadosGateway nodes.
sibench rados run [--ceph-pool POOL] [--ceph-user USER] (--ceph-key KEY) <target> …: Starts a benchmark using the Rados object protocol against the specified targets, which should be Ceph monitors.
sibench cephfs run [--mounts-dir DIR] [--ceph-dir DIR] [--ceph-user USER] (--ceph-key KEY) <target> …: Starts a benchmark using CephFS against the specified targets, which should be Ceph monitors.
sibench rbd run [--ceph-pool POOL] [--ceph-datapool POOL] [--ceph-user USER] (--ceph-key KEY) <target> …: Starts a benchmark using RBD against the specified targets, which should be Ceph monitors.
sibench block run [--block-device DEVICE]: Starts a benchmark using a locally mounted block device.
sibench file run [--file-dir DIR]: Starts a benchmark using a locally mounted filesystem.

Additional options shared by all run commands, omitted from above for clarity:

[--verbosity LEVEL]
[--port PORT]
[--object-size SIZE]
[--object-count COUNT]
[--ramp-up TIME]
[--run-time TIME]
[--ramp-down TIME]
[--read-write-mix MIX]
[--bandwidth BW]
[--output FILE]
[--workers FACTOR]
[--generator GEN]
[--slice-dir DIR]
[--slice-count COUNT]
[--slice-size BYTES]
[--skip-read-verification]
[--servers SERVERS]
[--use-bytes]
[--individual-stats]

Option Definitions¶

Long	Short	Value	Description	Default
--help	-h	-	Show full usage.	-
--verbosity	-v	LEVEL	Set debugging output at level “off”, “debug” or “trace”. The “trace” level may generate enough output to affect benchmark performance, and should only be used when trying to track down issues.	off
--port	-p	PORT	The port on which `sibench` communicates.	5150
--object-size	-s	SIZE	Object size to test, in units of K or M.	1M
--object-count	-c	COUNT	The total number of objects to use as our working set.	1000
--ramp-up	-u	TIME	The number of seconds at the start of each phase where we don’t record data (to discount edge effects caused by new connections).	5
--run-time	-r	TIME	The number of seconds in the middle on each phase of the benchmark where we do record the data.	30
--ramp-down	-d	TIME	The number of seconds at the end of each phase where we don’t record data.	2
--read-write-mix	-x	MIX	The ratio between read and writes, specified as the percentage of reads. A value of zero indicates that reads and writes should be done in separate passes, rather than being combined.	0
--bandwidth	-b	BW	Benchmark at a fixed bandwidth, in units of K, M or G bits/s A value of zero indicates no limit. When the read/write mix is not zero - that is, when we are not doing separate passes for read and write - then this is the bandwidth of the combined operations.	0
--output	-o	FILE	The file to which we write our json results.
--workers	-w	FACTOR	Number of worker threads per server as a factor x number of CPU cores.	1.0
--mounts-dir	-m	DIR	The directory in which we should create any filesystem mounts that are performed by `sibench` itself, such as when using CephFS. It is not needed for running generic filesystem benchmarks, because those must be mounted outside of `sibench`.	/tmp/sibench_mnt
--generator	-g	GEN	Which object generator to use: “prng” or “slice”.	prng
--skip-read-verification		-	Disable validation on reads. This should only be used to check if the number of nodes in the `sibench` cluster is a limiting factor when benchmarking read performance.	-
--servers		SERVERS	A comma-separated list of `sibench` servers to connect to.	localhost
--s3-port		PORT	The port on which to connect to S3.	7480
--s3-bucket		BUCKET	The name of the bucket we wish to use for S3 operations.	sibench
--s3-access-key		KEY	S3 access key.	-
--s3-secret-key		KEY	S3 secret key.	-
--ceph-pool		POOL	The pool we use for benchmarking.	sibench
--ceph-datapool		POOL	Optional pool used for RBD. If set, ceph-pool is used only for metadata.	-
--ceph-user		USER	The Ceph username we wish to use.	admin
--ceph-key		KEY	The CephX secret key belonging to the ceph user.	-
--ceph-dir		DIR	The directory within CephFS that we should use for a benchmark. This will be created by `sibench` if it does not already exist.	sibench
--block-device		DEVICE	The local block device to use for a benchmark.	/tmp/sibench_block
--file-dir		DIR	The local directory to use for file operations. The directory must already exist.	-
--slice-dir		DIR	The directory of files to be sliced up to form new workload objects.	-
--slice-count		COUNT	The number of slices to construct for workload generation.	10000
--slice-size		BYTES	The size of each slice in bytes.	4096
--use-bytes		-	Show bandwidth in Bytes	off
--individual-stats		-	Record the individual stats in the output file. This may be VERY big	off
--clean-up		-	Delete the data at the end of the benchmark run	off

Targets¶

The targets are the nodes to which the worker threads connect. Each worker opens a connection to each target and round-robins their reads and writes across those connections.

For most Ceph operations, the targets are monitors, and there is no advantage to specifying more than one. All the monitors do is provide the state-of-the-cluster map so that the workers can connect to the OSDs directly.

For RGW/S3, however, you should definitely list all of the storage cluster’s RGW nodes as targets, since those nodes are doing real work, and it needs to be balanced.

RBD¶

RBD behaviour is a little different than you might expect: each worker creates an RBD image per target, just big enough to hold that worker’s share of the ‘objects’ for the benchmark. All reads and writes that the worker then does are within the RBD image.

For example, if you have the following:

10 sibench nodes, each with 16 cores
A single target monitor
And object count of 1600 and an object size of 1MB

Then sibench will create 160 workers (by default, it is one per core), each of which will create a single 10MB RBD image, and then it will proceed to read and write 1 MB at a time to parts of that image.

Generators¶

Generators create the data that sibench uses as workloads for the storage system. There are currently two of them, selectable with the --generator option.

PRNG Generator¶

The PRNG generator creates data which is entirely pseudorandom. It requires no configuration, and is the default choice. However, it has one shortcoming: because it creates pseudorandom data, it is not compressible. If you wish to test compression in your storage system, then you will need need to create a compressible workload. The same restriction applies to de-duplication technologies.

Slice Generator¶

The Slice generator builds workloads from existing files. It aims to reproduce the compressibility characteristics of those files, whilst still creating an effectively infinite supply of different objects.

It works by taking a directory of files (which will usually be of the same type: source code, VM images, movies, or whatever), and then loading fixed sized slices of bytes from random positions within those files. The end result is that we have a library of (say) 1000 slices, each containing (say) 4Kb of data. Both of those values may be set with command line options.

When asked to generate a new workload object the slice generator does the following:

Creates a random seed.
Writes the seed into the start of the workload object.
Uses the seed to create a random number generator just for this workload object.
Uses that random number generator to select slices from our library, which are concatenated onto the object until we have as many bytes as we were asked for.

This approach means that we do not need to ever store the objects themselves: we can verify a read operation by reading the seed from the first few bytes, and then recreating the object we would expect.

Note that the directory of data to be sliced needs to be in the same location on each of the sibench server nodes.

The drivers do not need to have the same files in their slice directories, though it’s likely that they will. One option would be to mount the same NFS share on all the drivers as a repository for the slice data. Performance when loading the slices is not a consideration, since it is done before the benchmark begins, and so will not affect the numbers.

Write Cycles¶

The count parameter determines how many objects we create. However, for long benchmarks runs, or for small counts or object sizes, we are likely to wrap around and start writing from the first object again. If this happens, sibench internally increments a cycle counter, which it uses to ensure that objects written in different cycles will have different contents, even though the object will still use the same key as previously.

The Prepare Phase¶

sibench either benchmarks write operations first and then read operations, or else it benchmarks a mixture of the too (depending on the --read-write-mix option. When benchmarking reads, or a read-write mix, it must first ensure that there are enough objects there to read before it can start work. This is the prepare phase, and that is what is happening when you see messages about ‘Preparing’.

It also happens if we are doing separate writes and reads and we did not have a long enough run time for sibench to write all of the objects specified by the object-count option. In this case, the prepare phase will keep writing until all the objects are ready for reading.

The Delete Phase¶

sibench does not clean up after itself by default, since Ceph can be very slow at deleting objects. However, if you wish to execute multiple runs over a weekend (perhaps by using Benchmaster to control sibench), then you may run the risk of running out of storage space on the Ceph cluster. In such cases, deleting the objects at the end of the run may be necessary. You can enable this by using the --clean-up flag.

Setting --clean-up behaves differently depending on the protocol, but in essence there are two parts to it: deleting the individual objects, and cleaning up other resources. Protocols may do either, neither or both.

In addition, the cleanup may be synchonous or not. This is best illustrated by comparing the behaviour or RADOS and RBD.

With RADOS, we can delete the individual objects, and we can do it synchronously

meaning that when sibench completes the run, Ceph will have deleted the objects

and will have no pending workload.

With RBD, we delete the RBD image synchronously, but under the hood, that image is comprised of multiple objects, and Ceph does not delete them at once, but adds them to a queue for later deletion.

Clealy asynchonous deletes are bad if we wish to run a set of benchmarks: when the benchmark terminates, the Ceph cluster under test may still be deleting in background, and thus degrading the performance of subsequent runs.

Sadly, there’s nothing sibench can do to determine completion in such cases.

Protocol	Object Delete	End Of Run Clean-up	Synchronous
s3	yes	Deletes the bucket, but only if we created it	yes
rados	yes	no	yes
cephfs	yes	Deletes the directories only if we created them	yes
rbd	no	Deletes the images	no
block	no	no	n/a
file	yes	no	dependent on underlying filesystem

Lastly, if you’re not running a production cluster, then you can tell Ceph to delete more quickly (or more accurately, to insert smaller delays between delete operations) by adding the following to your ceph config (and then restarting the osd daemons).

:: osd_delete_sleep_hybrid = 0.001 osd_delete_sleep_hdd = 0.001 osd_delete_sleep_ssd = 0.001