Manual

sibench is a benchmarking tool for storage systems, and for Ceph in particular.

sibench typically runs on many nodes in parallel, in order to be able to scale up to the very high bandwidths of which modern scaleable storage systems are capable.

For more detailed information, visit https://sibench.io.

How It Works

Each benchmarking node runs sibench as a server daemon, listening for incoming connections from a sibench client.

Once the client starts a benchmark, the sibench servers send periodic summaries back to it, so that the user can see progress.

When a benchmark completes, the sibench servers send the full results back to the client. A summary of the results is dislplayed by the client, but the full results of every individual read and write operation are written out as a json file in case the user wishes to perform their own statistical analysis.

Once a client starts running a benchmark on some set of sibench servers, those servers will reject any other incoming connections until the benchmark completes or is aborted. There is no job queue or long-lived management process.

The client and server use the same binary, just with different command line options. The client is extremely lightweight, and may be run on one of the server nodes without significantly impacting benchmarking performance, though it may result in higher memory usage on that node.

Command Line

The following is a list of all the commands that the sibench binary accepts:

sibench -h | --help

Outputs the full command line syntax.

sibench version

Outputs the version number of the sibench binary.

sibench server [--verbosity LEVEL] [--port PORT] [--mounts-dir DIR]

Starts sibench as a server.

sibench s3 run [--s3-port PORT] [--s3-bucket BUCKET] (--s3-access-key KEY) (--s3-secret-key KEY) <target> …

Starts a benchmark using the S3 object protocol against the specified targets, which may S3 servers or RadosGateway nodes.

sibench rados run [--ceph-pool POOL] [--ceph-user USER] (--ceph-key KEY) <target> …

Starts a benchmark using the Rados object protocol against the specified targets, which should be Ceph monitors.

sibench cephfs run [--mounts-dir DIR] [--ceph-dir DIR] [--ceph-user USER] (--ceph-key KEY) <target> …

Starts a benchmark using CephFS against the specified targets, which should be Ceph monitors.

sibench rbd run [--ceph-pool POOL] [--ceph-datapool POOL] [--ceph-user USER] (--ceph-key KEY) <target> …

Starts a benchmark using RBD against the specified targets, which should be Ceph monitors.

sibench block run [--block-device DEVICE]

Starts a benchmark using a locally mounted block device.

sibench file run [--file-dir DIR]

Starts a benchmark using a locally mounted filesystem.

Additional options shared by all run commands, omitted from above for clarity:

  • [--verbosity LEVEL]

  • [--port PORT]

  • [--object-size SIZE]

  • [--object-count COUNT]

  • [--ramp-up TIME]

  • [--run-time TIME]

  • [--ramp-down TIME]

  • [--read-write-mix MIX]

  • [--bandwidth BW]

  • [--output FILE]

  • [--workers FACTOR]

  • [--generator GEN]

  • [--slice-dir DIR]

  • [--slice-count COUNT]

  • [--slice-size BYTES]

  • [--skip-read-verification]

  • [--servers SERVERS]

  • [--use-bytes]

  • [--individual-stats]

Option Definitions

Long

Short

Value

Description

Default

--help

-h

-

Show full usage.

-

--verbosity

-v

LEVEL

Set debugging output at level “off”, “debug” or “trace”. The “trace” level may generate enough output to affect benchmark performance, and should only be used when trying to track down issues.

off

--port

-p

PORT

The port on which sibench communicates.

5150

--object-size

-s

SIZE

Object size to test, in units of K or M.

1M

--object-count

-c

COUNT

The total number of objects to use as our working set.

1000

--ramp-up

-u

TIME

The number of seconds at the start of each phase where we don’t record data (to discount edge effects caused by new connections).

5

--run-time

-r

TIME

The number of seconds in the middle on each phase of the benchmark where we do record the data.

30

--ramp-down

-d

TIME

The number of seconds at the end of each phase where we don’t record data.

2

--read-write-mix

-x

MIX

The ratio between read and writes, specified as the percentage of reads. A value of zero indicates that reads and writes should be done in separate passes, rather than being combined.

0

--bandwidth

-b

BW

Benchmark at a fixed bandwidth, in units of K, M or G bits/s A value of zero indicates no limit. When the read/write mix is not zero - that is, when we are not doing separate passes for read and write - then this is the bandwidth of the combined operations.

0

--output

-o

FILE

The file to which we write our json results.

--workers

-w

FACTOR

Number of worker threads per server as a factor x number of CPU cores.

1.0

--mounts-dir

-m

DIR

The directory in which we should create any filesystem mounts that are performed by sibench itself, such as when using CephFS. It is not needed for running generic filesystem benchmarks, because those must be mounted outside of sibench.

/tmp/sibench_mnt

--generator

-g

GEN

Which object generator to use: “prng” or “slice”.

prng

--skip-read-verification

-

Disable validation on reads. This should only be used to check if the number of nodes in the sibench cluster is a limiting factor when benchmarking read performance.

-

--servers

SERVERS

A comma-separated list of sibench servers to connect to.

localhost

--s3-port

PORT

The port on which to connect to S3.

7480

--s3-bucket

BUCKET

The name of the bucket we wish to use for S3 operations.

sibench

--s3-access-key

KEY

S3 access key.

-

--s3-secret-key

KEY

S3 secret key.

-

--ceph-pool

POOL

The pool we use for benchmarking.

sibench

--ceph-datapool

POOL

Optional pool used for RBD. If set, ceph-pool is used only for metadata.

-

--ceph-user

USER

The Ceph username we wish to use.

admin

--ceph-key

KEY

The CephX secret key belonging to the ceph user.

-

--ceph-dir

DIR

The directory within CephFS that we should use for a benchmark. This will be created by sibench if it does not already exist.

sibench

--block-device

DEVICE

The local block device to use for a benchmark.

/tmp/sibench_block

--file-dir

DIR

The local directory to use for file operations. The directory must already exist.

-

--slice-dir

DIR

The directory of files to be sliced up to form new workload objects.

-

--slice-count

COUNT

The number of slices to construct for workload generation.

10000

--slice-size

BYTES

The size of each slice in bytes.

4096

--use-bytes

-

Show bandwidth in Bytes

off

--individual-stats

-

Record the individual stats in the output file. This may be VERY big

off

--clean-up

-

Delete the data at the end of the benchmark run

off

Targets

The targets are the nodes to which the worker threads connect. Each worker opens a connection to each target and round-robins their reads and writes across those connections.

For most Ceph operations, the targets are monitors, and there is no advantage to specifying more than one. All the monitors do is provide the state-of-the-cluster map so that the workers can connect to the OSDs directly.

For RGW/S3, however, you should definitely list all of the storage cluster’s RGW nodes as targets, since those nodes are doing real work, and it needs to be balanced.

RBD

RBD behaviour is a little different than you might expect: each worker creates an RBD image per target, just big enough to hold that worker’s share of the ‘objects’ for the benchmark. All reads and writes that the worker then does are within the RBD image.

For example, if you have the following:

  1. 10 sibench nodes, each with 16 cores

  2. A single target monitor

  3. And object count of 1600 and an object size of 1MB

Then sibench will create 160 workers (by default, it is one per core), each of which will create a single 10MB RBD image, and then it will proceed to read and write 1 MB at a time to parts of that image.

Generators

Generators create the data that sibench uses as workloads for the storage system. There are currently two of them, selectable with the --generator option.

PRNG Generator

The PRNG generator creates data which is entirely pseudorandom. It requires no configuration, and is the default choice. However, it has one shortcoming: because it creates pseudorandom data, it is not compressible. If you wish to test compression in your storage system, then you will need need to create a compressible workload. The same restriction applies to de-duplication technologies.

Slice Generator

The Slice generator builds workloads from existing files. It aims to reproduce the compressibility characteristics of those files, whilst still creating an effectively infinite supply of different objects.

It works by taking a directory of files (which will usually be of the same type: source code, VM images, movies, or whatever), and then loading fixed sized slices of bytes from random positions within those files. The end result is that we have a library of (say) 1000 slices, each containing (say) 4Kb of data. Both of those values may be set with command line options.

When asked to generate a new workload object the slice generator does the following:

  1. Creates a random seed.

  2. Writes the seed into the start of the workload object.

  3. Uses the seed to create a random number generator just for this workload object.

  4. Uses that random number generator to select slices from our library, which are concatenated onto the object until we have as many bytes as we were asked for.

This approach means that we do not need to ever store the objects themselves: we can verify a read operation by reading the seed from the first few bytes, and then recreating the object we would expect.

Note that the directory of data to be sliced needs to be in the same location on each of the sibench server nodes.

The drivers do not need to have the same files in their slice directories, though it’s likely that they will. One option would be to mount the same NFS share on all the drivers as a repository for the slice data. Performance when loading the slices is not a consideration, since it is done before the benchmark begins, and so will not affect the numbers.

Write Cycles

The count parameter determines how many objects we create. However, for long benchmarks runs, or for small counts or object sizes, we are likely to wrap around and start writing from the first object again. If this happens, sibench internally increments a cycle counter, which it uses to ensure that objects written in different cycles will have different contents, even though the object will still use the same key as previously.

The Prepare Phase

sibench either benchmarks write operations first and then read operations, or else it benchmarks a mixture of the too (depending on the --read-write-mix option. When benchmarking reads, or a read-write mix, it must first ensure that there are enough objects there to read before it can start work. This is the prepare phase, and that is what is happening when you see messages about ‘Preparing’.

It also happens if we are doing separate writes and reads and we did not have a long enough run time for sibench to write all of the objects specified by the object-count option. In this case, the prepare phase will keep writing until all the objects are ready for reading.

The Delete Phase

sibench does not clean up after itself by default, since Ceph can be very slow at deleting objects. However, if you wish to execute multiple runs over a weekend (perhaps by using Benchmaster to control sibench), then you may run the risk of running out of storage space on the Ceph cluster. In such cases, deleting the objects at the end of the run may be necessary. You can enable this by using the --clean-up flag.

Setting --clean-up behaves differently depending on the protocol, but in essence there are two parts to it: deleting the individual objects, and cleaning up other resources. Protocols may do either, neither or both.

In addition, the cleanup may be synchonous or not. This is best illustrated by comparing the behaviour or RADOS and RBD.

With RADOS, we can delete the individual objects, and we can do it synchronously
  • meaning that when sibench completes the run, Ceph will have deleted the objects

and will have no pending workload.

With RBD, we delete the RBD image synchronously, but under the hood, that image is comprised of multiple objects, and Ceph does not delete them at once, but adds them to a queue for later deletion.

Clealy asynchonous deletes are bad if we wish to run a set of benchmarks: when the benchmark terminates, the Ceph cluster under test may still be deleting in background, and thus degrading the performance of subsequent runs.

Sadly, there’s nothing sibench can do to determine completion in such cases.

Protocol

Object Delete

End Of Run Clean-up

Synchronous

s3

yes

Deletes the bucket, but only if we created it

yes

rados

yes

no

yes

cephfs

yes

Deletes the directories only if we created them

yes

rbd

no

Deletes the images

no

block

no

no

n/a

file

yes

no

dependent on underlying filesystem

Lastly, if you’re not running a production cluster, then you can tell Ceph to delete more quickly (or more accurately, to insert smaller delays between delete operations) by adding the following to your ceph config (and then restarting the osd daemons).

:: osd_delete_sleep_hybrid = 0.001 osd_delete_sleep_hdd = 0.001 osd_delete_sleep_ssd = 0.001