# Benchmarking and comparing DwarFS DwarFS is a filesystem developed by the user mhx on GitHub [1], which is self-described as "A fast high compression read-only file system for Linux, Windows, and macOS." One of my ideas for blendOS was to layer different packages, and combined with its compression and option to be mounted as a FUSE-based filesystem, it's an appealing option for this use case - blendOS is immutable, so it might as well have some compression. ## Methodology The datasets being used for this test will be the following: - 25 GiB of null data (just `00000000` in binary) - 25 GiB of random data[^1] - Data for a 100 million-sided regular polygon; ~26.5 GiB[^2] - The current Linux longterm release source ([6.6.58](https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.58.tar.xz) [2]); ~1.5 GB - For some rough latency testing: - 1024 4 KiB files filled with null data (again, just `00000000` in binary) - 1024 4 KiB files filled with random data All this data should cover both latency and read speed testing for data that compresses differently - extremely compressible files with null data, decently compressible files, and random data which can't be compressed well. ### What filesystems? I'll be benchmarking DwarFS, fuse-archive (with tar files), and btrfs. In some early, basic testing, I found that mounting any *compressed* archives with `fuse-archive`, a tool for mounting archive file formats as read-only filesystems, took far too long. Additionally, being FUSE-based, these would have slightly worse performance than kernel filesystems, so I tried to use a FUSE driver as well for btrfs. Unforunately, I ran into a bug, so I won't be able to quite do an equivalent test; btrfs will only be running in the kernel. During said early testing, I also ran into the fact that most compressed archives, like Gzip-compressed tar archives, also took far too long to *create*, because Gzip is single-threaded. So all the options with no chance of being used have been marked off, and I'll only be looking into these three. DwarFS also took far too long to create on its default setting, but on compression level 1, it's much faster - 11m2.738s for the ~80 GiB total, and considering ## Running the benchmark First installed it by cloning the repository, installing it using Cargo, then added its completions to fish (just for this session): ```sh git clone https://git.askiiart.net/askiiart/disk-read-benchmark cd ./disk-read-benchmark cargo install --path . disk-read-benchmark generate-fish-completions | source ``` Then I prepared all the data: ```sh disk-read-benchmark prep-dirs disk-read-benchmark grab-data ./prepare.sh ``` `disk-read-benchmark` prepares all the directories, generates the data to be used for testing, then `./prepare.sh` uses the data to generate the DwarFS and tar archives. To run it, I just ran this: ```sh disk-read-benchmark benchmark ``` Which outputs the data at `data/benchmark-data.csv` and `data/bulk.csv` for the single and bulk files, respectively. ## Results After processing [the data](/assets/benchmarking-dwarfs/data/) with [this script](/assets/benchmarking-dwarfs/process-data.py) to make it a bit easier, I put the resulting graphs in here ↓ ### Sequential read ### Random read ### Sequential read latency