finish up benchmarking-dwarfs
This commit is contained in:
parent
9afa1fb075
commit
4d4ad996ba
4 changed files with 207 additions and 83 deletions
|
@ -10,12 +10,13 @@
|
|||
<h1 id="benchmarking-and-comparing-dwarfs">Benchmarking and
|
||||
comparing DwarFS</h1>
|
||||
<p>DwarFS is a filesystem developed by the user mhx on GitHub
|
||||
[1], which is self-described as "A fast high compression
|
||||
read-only file system for Linux, Windows, and macOS." One of my
|
||||
ideas for blendOS was to layer different packages, and combined
|
||||
with its compression and option to be mounted as a FUSE-based
|
||||
filesystem, it's an appealing option for this use case - blendOS
|
||||
is immutable, so it might as well have some compression.</p>
|
||||
(<em>mhx/dwarfs</em>), which is self-described as "A fast high
|
||||
compression read-only file system for Linux, Windows, and
|
||||
macOS." One of my ideas for blendOS was to layer different
|
||||
packages, and combined with its compression and option to be
|
||||
mounted as a FUSE-based filesystem, it's an appealing option for
|
||||
this use case - blendOS is immutable, so it might as well have
|
||||
some compression.</p>
|
||||
<h2 id="methodology">Methodology</h2>
|
||||
<p>The datasets being used for this test will be the
|
||||
following:</p>
|
||||
|
@ -29,7 +30,7 @@
|
|||
role="doc-noteref"><sup>2</sup></a></li>
|
||||
<li>The current Linux longterm release source (<a
|
||||
href="https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.58.tar.xz">6.6.58</a>
|
||||
[2]); ~1.5 GB</li>
|
||||
(<em>The Linux Kernel Archives</em>)); ~1.5 GB</li>
|
||||
<li>For some rough latency testing:
|
||||
<ul>
|
||||
<li>1024 4 KiB files filled with null data (again, just
|
||||
|
@ -42,7 +43,8 @@
|
|||
compressible files with null data, decently compressible files,
|
||||
and random data which can't be compressed well.</p>
|
||||
<h3 id="what-filesystems">What filesystems?</h3>
|
||||
<p>I'll be benchmarking DwarFS, fuse-archive (with tar files),
|
||||
<p>I'll be benchmarking DwarFS (<em>mhx/dwarfs</em>),
|
||||
fuse-archive (<em>Google/Fuse-Archive</em>) (with tar files),
|
||||
and btrfs. In some early, basic testing, I found that mounting
|
||||
any <em>compressed</em> archives with <code>fuse-archive</code>,
|
||||
a tool for mounting archive file formats as read-only
|
||||
|
@ -58,12 +60,19 @@
|
|||
single-threaded. So all the options with no chance of being used
|
||||
have been marked off, and I'll only be looking into these
|
||||
three.</p>
|
||||
<p>DwarFS also took far too long to create on its default
|
||||
setting, but on compression level 1, it's much faster -
|
||||
11m2.738s for the ~80 GiB total, and considering</p>
|
||||
<p>DwarFS also took far too long to create an archive on its
|
||||
default setting, but on compression level 1, it's much faster -
|
||||
11m2.738s for the ~80 GiB total, and considering my entire
|
||||
system is about 20 GiB, that should be about 2-3 minutes, which
|
||||
is reasonable; With no compression, tar took 3m3.378s. Mounting
|
||||
the DwarFS archive was nearly instant (0.022s), while mounting
|
||||
the tar archive took 1.352s - not bad, but not ideal when
|
||||
mounting many, and will absolutely be taken into
|
||||
consideration.</p>
|
||||
<h2 id="running-the-benchmark">Running the benchmark</h2>
|
||||
<p>First installed it by cloning the repository, installing it
|
||||
using Cargo, then added its completions to fish (just for this
|
||||
<p>First off, installed I installed my benchamark (<em>Disk Read
|
||||
Benchmark</em>) by cloning the repository, installing it using
|
||||
Cargo, then added its completions to fish (just for this
|
||||
session):</p>
|
||||
<div class="sourceCode" id="cb2"><pre
|
||||
class="language-sh"><code class="language-bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://git.askiiart.net/askiiart/disk-read-benchmark</span>
|
||||
|
@ -93,50 +102,126 @@
|
|||
script</a> to make it a bit easier, I put the resulting graphs
|
||||
in here ↓</p>
|
||||
<h3 id="sequential-read">Sequential read</h3>
|
||||
<p>These results interest me quite a bit; unsurprisingly, DwarFS
|
||||
has an advantage on the null file, due to its compression,
|
||||
though it's disappointing the difference in time wasn't greater.
|
||||
However, it does far worse on the random file, and I'm not sure
|
||||
why; as discussed further down, DwarFS doesn't try to compress
|
||||
incompressible files as far as I know, but I could be wrong. As
|
||||
for the 100 million-sided polygon, it's somewhere in between,
|
||||
with an advantage due to its compression, but still taking
|
||||
longer than expected.</p>
|
||||
<p>As for fuse-archive, it handles the null file well, but takes
|
||||
longer on the others; not much to say.</p>
|
||||
<div>
|
||||
<canvas id="seq_read_chart" class="chart">
|
||||
</canvas>
|
||||
</div>
|
||||
<h3 id="random-read">Random read</h3>
|
||||
<p>There's nothing much to say here; although DwarFS took
|
||||
significantly longer, it's still pretty fast - a different of
|
||||
about 14 milliseconds worst case, across a 25 GiB file; similar
|
||||
resuls for the 100 million-sided polygon, though to a less
|
||||
extent, given it can be compressed better. With the null file,
|
||||
due to its compression, DwarFS was actually on par with
|
||||
fuse-archive, but it can't compete with btrfs's performance,
|
||||
given it's so heavily optimized, and in the kernel.</p>
|
||||
<div>
|
||||
<canvas id="rand_read_chart" class="chart">
|
||||
</canvas>
|
||||
</div>
|
||||
<h3 id="sequential-read-latency">Sequential read latency</h3>
|
||||
<p>As expected, DwarFS performs a bit worse on the
|
||||
incompressible random data, but otherwise they'll all roughly
|
||||
equal. I wasn't expecting this, given btrfs is in the kernel,
|
||||
while the other two are using FUSE.</p>
|
||||
<div>
|
||||
<canvas id="seq_read_latency_chart" class="chart">
|
||||
</canvas>
|
||||
</div>
|
||||
<h3 id="random-read-latency">Random read latency</h3>
|
||||
<p>Both DwarFS and fuse-archive had some trouble with this test.
|
||||
DwarFS doesn't seem to handle random access very well; this is
|
||||
supposedly fixed, as seen in issue 139 (<em>Issue #139 ·
|
||||
mhx/dwarfs</em>), but the performance issues are obvious
|
||||
regardless; I'm not sure why, given it doesn't compress
|
||||
uncompressible data, not to mention it does just fine on the
|
||||
random read test, where the only difference is that it reads
|
||||
<em>more</em> data. But regardless, DwarFS ended up performing
|
||||
far worse than expected on both the incompressible random data,
|
||||
and the highly compressible null data.</p>
|
||||
<p>Meanwhile, when testing random read latency in
|
||||
<code>fuse-archive</code> pretty much just dies, becoming
|
||||
ridiculously slow (even compared to DwarFS), so I didn't include
|
||||
its single-file results. It succeeds on the bulk files, but
|
||||
given it just shows as 0 seconds anyways, given the massive
|
||||
scale, I opted to not include it in this graph at all.</p>
|
||||
<div>
|
||||
<canvas id="rand_read_latency_chart" class="chart">
|
||||
</canvas>
|
||||
</div>
|
||||
<p>The FUSE-based filesystems run into a bit of trouble here -
|
||||
with incompressible data, DwarFS has a hard time keeping up for
|
||||
some reason, despite keeping up just fine with larger random
|
||||
reads on the same data, and so it takes 3 to 4 seconds to run
|
||||
random read latency testing on the 25 GiB random file.
|
||||
Meanwhile, when testing random read latency in
|
||||
<code>fuse-archive</code> pretty much just dies, becoming
|
||||
ridiculously slow (even compared to DwarFS), so I didn't test
|
||||
its random read latency at all and just had its results put as 0
|
||||
milliseconds.</p>
|
||||
<h3 id="summary-and-notes">Summary and notes</h3>
|
||||
<h2 id="sources">Sources</h2>
|
||||
<ol type="1">
|
||||
<li><a href="https://github.com/mhx/dwarfs"
|
||||
class="uri">https://github.com/mhx/dwarfs</a></li>
|
||||
<li><a href="https://www.kernel.org/"
|
||||
class="uri">https://www.kernel.org/</a></li>
|
||||
<li><a
|
||||
href="https://git.askiiart.net/askiiart/disk-read-benchmark"
|
||||
class="uri">https://git.askiiart.net/askiiart/disk-read-benchmark</a></li>
|
||||
<li><a
|
||||
href="https://git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic"
|
||||
class="uri">https://git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic</a></li>
|
||||
</ol>
|
||||
<h2 id="misc-notes">Misc notes</h2>
|
||||
<p>DwarFS can take up a fair amount of memory if mounting it
|
||||
many times (<em>Issue #219 · mhx/dwarfs</em>), and this should
|
||||
be kept in mind for use in BlendOS.</p>
|
||||
<hr />
|
||||
<p>Ratarmount (<em>mxmlnkn/ratarmount</em>) should also be
|
||||
investigated; it's similar to fuse-archive, but with some
|
||||
improvements, and some important notes. From its README
|
||||
file:</p>
|
||||
<blockquote>
|
||||
<p>Note that fuse-archive daemonizes instantly but the mount
|
||||
point will not be usable for a long time and everything trying
|
||||
to use it will hang until then when not using
|
||||
--asyncprogress</p>
|
||||
</blockquote>
|
||||
<blockquote>
|
||||
<p>Mounting bzip2 and xz archives has actually become faster
|
||||
than archivemount and fuse-archive with ratarmount -P 0 on most
|
||||
modern processors because it actually uses more than one core
|
||||
for decoding those compressions. indexed_bzip2 supports block
|
||||
parallel decoding since version 1.2.0.</p>
|
||||
</blockquote>
|
||||
<p>Despite being written in Python, Ratarmount seems to have
|
||||
significant performance improvements over fuse-archive.</p>
|
||||
<hr />
|
||||
<p>This should also be tested on systems with different specs,
|
||||
like my Chromebook and laptop, and should try getting the btrfs
|
||||
FUSE driver working and benchmarking that.</p>
|
||||
<h2 id="summary">Summary</h2>
|
||||
<p>DwarFS, or just the normal filesystem plus overlayfs, seem
|
||||
like they may be the best options - DwarFS's compression and
|
||||
deduplication are great, and the deduplication could probably be
|
||||
used in way I haven't even thought of yet, but it has some niche
|
||||
issues. Overall, I'm leaning towards using DwarFS as an option,
|
||||
with just overlayfs as the default, but further testing is
|
||||
needed.</p>
|
||||
<h2 id="footnotes">Footnotes</h2>
|
||||
<h2 id="sources">Sources</h2>
|
||||
<p> - “Confused_ace_noises/Maths-Demos - Branch:
|
||||
Headless-Deterministic.” Forgea: Git with a Cup of Jea,
|
||||
git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic.<br />
|
||||
- “Disk Read Benchmark - A Simple and Performant Read-Only Disk
|
||||
Benchmark, Written in Rust.” Forgea: Git with a Cup of Jea,
|
||||
git.askiiart.net/askiiart/disk-read-benchmark.<br />
|
||||
- Google. “Google/Fuse-Archive: Fuse File System for Archives
|
||||
and Compressed Files (ZIP, RAR, 7z, ISO, TGZ, Xz...).” GitHub,
|
||||
github.com/google/fuse-archive.<br />
|
||||
- The Linux Kernel Archives, Linux Kernel Organization, Inc.,
|
||||
<www.kernel.org/>.<br />
|
||||
- Mhx. “Feature Request: Improve Block Management for
|
||||
Uncompressed Blocks to Save Memory and Enhance Deduplication ·
|
||||
ISSUE #139 · MHX/Dwarfs.” GitHub,
|
||||
github.com/mhx/dwarfs/issues/139.<br />
|
||||
- mhx. “mhx/Dwarfs: A Fast High Compression Read-Only File
|
||||
System for Linux, Windows and Macos.” GitHub,
|
||||
github.com/mhx/dwarfs.<br />
|
||||
- mhx. “[Feature Request] Mounting Multiple Archives to the
|
||||
Same Path · Issue #219 · MHX/Dwarfs.” GitHub,
|
||||
github.com/mhx/dwarfs/issues/219.<br />
|
||||
- mxmlnkn. “mxmlnkn/ratarmount: Access Large Archives as a
|
||||
Filesystem Efficiently, e.g., Tar, Rar, Zip, Gz, BZ2, XZ, ZSTD
|
||||
Archives.” GitHub, github.com/mxmlnkn/ratarmount.</p>
|
||||
<!-- JavaScript for graphs goes hereeeeeee -->
|
||||
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
||||
<script src="/assets/benchmarking-dwarfs/js/declare_vars.js"></script>
|
||||
|
@ -156,7 +241,8 @@
|
|||
href="#fnref1" class="footnote-back"
|
||||
role="doc-backlink">↩︎</a></p></li>
|
||||
<li id="fn2">This data is from a modified version of an
|
||||
abandoned math demonstration program [4] made by a friend; it
|
||||
abandoned math demonstration program
|
||||
(<em>confused_ace_noises/maths-demos</em>) made by a friend; it
|
||||
generates regular polygons and writes their data to a file. I
|
||||
chose this because it was an artificial and reproducible yet
|
||||
fairly compressible dataset (without being extremely
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue