This benchmark is named bonnie++.  I was originally hoping to work with
Tim Bray on developing the next version of bonnie, but we could not agree
on the issue of whether C++ should be used in the program.  Tim has
graciously given me permission to use the name "bonnie++" for my program
which is based around his benchmark.
This program adds the facility to test more than 2G of storage on a 32bit
machine, and tests for file creat(), stat(), unlink() operations.
Also it has the option (via -t csv) to output in CSV spread-sheet format.
The program csv2html.pl takes csv format data on stdin and writes a HTML
file on standard output which has a nice display of all the data.

Now for the serious doco:

Specifically, these are the types of filesystem activity that have been
observed to be bottlenecks in I/O-intensive applications, in particular
the text database work done in connection with the New Oxford English
Dictionary Project at the University of Waterloo.
Now also tests file create/stat/unlink to simulate some operations that
are common bottlenecks on large Squid and INN servers, and machines with
tens of thousands of mail files in /var/spool/mail.

It initially performs a series of tests on a file (or files) of known size.
By default, that size is 200 Mb (but that's not enough - see below).  For
each test, Bonnie reports the bytes processed per elapsed second, per CPU
second, and the % CPU usage (user and system).  If a size >1G is specified
then we will use a number of files of size 1G or less.  This was we can use
a 32bit program to test machines with 8G of RAM!  NB I have not yet tested
more than 2100M of file storage.  If you test with larger storage then this
please send me the results.

In each case, an attempt is made to keep optimizers from noticing it's 
all bogus.  The idea is to make sure that these are real transfers to/from
user space to the physical disk.  The file IO tests are:

1. Sequential Output

1.1 Per-Character.  The file is written using the putc() stdio macro.
The loop that does the writing should be small enough to fit into any
reasonable I-cache.  The CPU overhead here is that required to do the
stdio code plus the OS file space allocation.

1.2 Block.  The file is created using write(2).  The CPU overhead
should be just the OS file space allocation.

1.3 Rewrite.  Each BUFSIZ of the file is read with read(2), dirtied, and
rewritten with write(2), requiring an lseek(2).  Since no space
allocation is done, and the I/O is well-localized, this should test the
effectiveness of the filesystem cache and the speed of data transfer.

2. Sequential Input

2.1 Per-Character.  The file is read using the getc() stdio macro.  Once
again, the inner loop is small.  This should exercise only stdio and
sequential input.

2.2 Block.  The file is read using read(2).  This should be a very pure
test of sequential input performance.

3. Random Seeks

This test runs SeekProcCount processes in parallel, doing a total of
4000 lseek()s to locations in the file specified by random() in bsd systems,
drand48() on sysV systems.  In each case, the block is read with read(2).  
In 10% of cases, it is dirtied and written back with write(2).

The idea behind the SeekProcCount processes is to make sure there's always 
a seek queued up.

AXIOM: For any unix filesystem, the effective number of lseek(2) calls
per second declines asymptotically to near 30, once the effect of
caching is defeated.

The size of the file has a strong nonlinear effect on the results of
this test.  Many Unix systems that have the memory available will make
aggressive efforts to cache the whole thing, and report random I/O rates
in the thousands per second, which is ridiculous.  As an extreme
example, an IBM RISC 6000 with 64 Mb of memory reported 3,722 per second
on a 50 Mb file.  Some have argued that bypassing the cache is artificial
since the cache is just doing what it's designed to.  True, but in any 
application that requires rapid random access to file(s) significantly
larger than main memory which is running on a system which is doing
significant other work, the caches will inevitably max out.  There is
a hard limit hiding behind the cache which has been observed by the
author to be of significant import in many situations - what we are trying
to do here is measure that number.

The file creation tests use file names with 7 digits numbers and a random
number (from 0 to 12) of random alpha-numeric characters.
For the sequential tests the random characters the random characters in
the file name follow the number.  For the random tests the random
characters are first.
The sequential tests involve creating the files in numeric order, then
stat()ing them in readdir() order (IE the order they are stored in the
directory which is very likely to be the same order as which they were
created), and deleting them in the same order.
For the random tests we create the files in an order that will appear
random to the file system (the last 7 characters are in numeric order on
the files).  Then we stat() random files (NB this will return very good
results on file systems with sorted directories because not every file
will be stat()ed and the cache will be more effective).  After that we
delete all the files in random order.

COPYRIGHT NOTICE: 
Copyright (c) Tim Bray, 1990.
Copyright (c) Russell Coker, 1999.  I have updated the program, added
support for >2G on 32bit machines, and tests for file creation.  Same
license as Tim's original code.
Everybody is hereby granted rights to use, copy, and modify this program, 
 provided only that this copyright notice and the disclaimer below
 are preserved without change.
DISCLAIMER:
This program is provided AS IS with no warranty of any kind, and
The author makes no representation with respect to the adequacy of this
program for any particular purpose or with respect to its adequacy to 
produce any particular result, and
The authors shall not be liable for loss or damage arising out of
the use of this program regardless of how sustained, and
In no event shall the author be liable for special, direct, indirect
or consequential damage, loss, costs or fees or expenses of any
nature or kind.

NB The results of running this program on live server machines can include
extremely bad performance of server processes, and excessive consumption of
disk space and/or Inodes which may cause the machine to cease performing it's
designated tasks.  Also the benchmark results are likely to be bad.
Do not run this program on live production machines.
