finish up benchmarking-dwarfs
This commit is contained in:
parent
9afa1fb075
commit
4d4ad996ba
4 changed files with 207 additions and 83 deletions
|
@ -10,12 +10,13 @@
|
||||||
<h1 id="benchmarking-and-comparing-dwarfs">Benchmarking and
|
<h1 id="benchmarking-and-comparing-dwarfs">Benchmarking and
|
||||||
comparing DwarFS</h1>
|
comparing DwarFS</h1>
|
||||||
<p>DwarFS is a filesystem developed by the user mhx on GitHub
|
<p>DwarFS is a filesystem developed by the user mhx on GitHub
|
||||||
[1], which is self-described as "A fast high compression
|
(<em>mhx/dwarfs</em>), which is self-described as "A fast high
|
||||||
read-only file system for Linux, Windows, and macOS." One of my
|
compression read-only file system for Linux, Windows, and
|
||||||
ideas for blendOS was to layer different packages, and combined
|
macOS." One of my ideas for blendOS was to layer different
|
||||||
with its compression and option to be mounted as a FUSE-based
|
packages, and combined with its compression and option to be
|
||||||
filesystem, it's an appealing option for this use case - blendOS
|
mounted as a FUSE-based filesystem, it's an appealing option for
|
||||||
is immutable, so it might as well have some compression.</p>
|
this use case - blendOS is immutable, so it might as well have
|
||||||
|
some compression.</p>
|
||||||
<h2 id="methodology">Methodology</h2>
|
<h2 id="methodology">Methodology</h2>
|
||||||
<p>The datasets being used for this test will be the
|
<p>The datasets being used for this test will be the
|
||||||
following:</p>
|
following:</p>
|
||||||
|
@ -29,7 +30,7 @@
|
||||||
role="doc-noteref"><sup>2</sup></a></li>
|
role="doc-noteref"><sup>2</sup></a></li>
|
||||||
<li>The current Linux longterm release source (<a
|
<li>The current Linux longterm release source (<a
|
||||||
href="https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.58.tar.xz">6.6.58</a>
|
href="https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.58.tar.xz">6.6.58</a>
|
||||||
[2]); ~1.5 GB</li>
|
(<em>The Linux Kernel Archives</em>)); ~1.5 GB</li>
|
||||||
<li>For some rough latency testing:
|
<li>For some rough latency testing:
|
||||||
<ul>
|
<ul>
|
||||||
<li>1024 4 KiB files filled with null data (again, just
|
<li>1024 4 KiB files filled with null data (again, just
|
||||||
|
@ -42,7 +43,8 @@
|
||||||
compressible files with null data, decently compressible files,
|
compressible files with null data, decently compressible files,
|
||||||
and random data which can't be compressed well.</p>
|
and random data which can't be compressed well.</p>
|
||||||
<h3 id="what-filesystems">What filesystems?</h3>
|
<h3 id="what-filesystems">What filesystems?</h3>
|
||||||
<p>I'll be benchmarking DwarFS, fuse-archive (with tar files),
|
<p>I'll be benchmarking DwarFS (<em>mhx/dwarfs</em>),
|
||||||
|
fuse-archive (<em>Google/Fuse-Archive</em>) (with tar files),
|
||||||
and btrfs. In some early, basic testing, I found that mounting
|
and btrfs. In some early, basic testing, I found that mounting
|
||||||
any <em>compressed</em> archives with <code>fuse-archive</code>,
|
any <em>compressed</em> archives with <code>fuse-archive</code>,
|
||||||
a tool for mounting archive file formats as read-only
|
a tool for mounting archive file formats as read-only
|
||||||
|
@ -58,12 +60,19 @@
|
||||||
single-threaded. So all the options with no chance of being used
|
single-threaded. So all the options with no chance of being used
|
||||||
have been marked off, and I'll only be looking into these
|
have been marked off, and I'll only be looking into these
|
||||||
three.</p>
|
three.</p>
|
||||||
<p>DwarFS also took far too long to create on its default
|
<p>DwarFS also took far too long to create an archive on its
|
||||||
setting, but on compression level 1, it's much faster -
|
default setting, but on compression level 1, it's much faster -
|
||||||
11m2.738s for the ~80 GiB total, and considering</p>
|
11m2.738s for the ~80 GiB total, and considering my entire
|
||||||
|
system is about 20 GiB, that should be about 2-3 minutes, which
|
||||||
|
is reasonable; With no compression, tar took 3m3.378s. Mounting
|
||||||
|
the DwarFS archive was nearly instant (0.022s), while mounting
|
||||||
|
the tar archive took 1.352s - not bad, but not ideal when
|
||||||
|
mounting many, and will absolutely be taken into
|
||||||
|
consideration.</p>
|
||||||
<h2 id="running-the-benchmark">Running the benchmark</h2>
|
<h2 id="running-the-benchmark">Running the benchmark</h2>
|
||||||
<p>First installed it by cloning the repository, installing it
|
<p>First off, installed I installed my benchamark (<em>Disk Read
|
||||||
using Cargo, then added its completions to fish (just for this
|
Benchmark</em>) by cloning the repository, installing it using
|
||||||
|
Cargo, then added its completions to fish (just for this
|
||||||
session):</p>
|
session):</p>
|
||||||
<div class="sourceCode" id="cb2"><pre
|
<div class="sourceCode" id="cb2"><pre
|
||||||
class="language-sh"><code class="language-bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://git.askiiart.net/askiiart/disk-read-benchmark</span>
|
class="language-sh"><code class="language-bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://git.askiiart.net/askiiart/disk-read-benchmark</span>
|
||||||
|
@ -93,50 +102,126 @@
|
||||||
script</a> to make it a bit easier, I put the resulting graphs
|
script</a> to make it a bit easier, I put the resulting graphs
|
||||||
in here ↓</p>
|
in here ↓</p>
|
||||||
<h3 id="sequential-read">Sequential read</h3>
|
<h3 id="sequential-read">Sequential read</h3>
|
||||||
|
<p>These results interest me quite a bit; unsurprisingly, DwarFS
|
||||||
|
has an advantage on the null file, due to its compression,
|
||||||
|
though it's disappointing the difference in time wasn't greater.
|
||||||
|
However, it does far worse on the random file, and I'm not sure
|
||||||
|
why; as discussed further down, DwarFS doesn't try to compress
|
||||||
|
incompressible files as far as I know, but I could be wrong. As
|
||||||
|
for the 100 million-sided polygon, it's somewhere in between,
|
||||||
|
with an advantage due to its compression, but still taking
|
||||||
|
longer than expected.</p>
|
||||||
|
<p>As for fuse-archive, it handles the null file well, but takes
|
||||||
|
longer on the others; not much to say.</p>
|
||||||
<div>
|
<div>
|
||||||
<canvas id="seq_read_chart" class="chart">
|
<canvas id="seq_read_chart" class="chart">
|
||||||
</canvas>
|
</canvas>
|
||||||
</div>
|
</div>
|
||||||
<h3 id="random-read">Random read</h3>
|
<h3 id="random-read">Random read</h3>
|
||||||
|
<p>There's nothing much to say here; although DwarFS took
|
||||||
|
significantly longer, it's still pretty fast - a different of
|
||||||
|
about 14 milliseconds worst case, across a 25 GiB file; similar
|
||||||
|
resuls for the 100 million-sided polygon, though to a less
|
||||||
|
extent, given it can be compressed better. With the null file,
|
||||||
|
due to its compression, DwarFS was actually on par with
|
||||||
|
fuse-archive, but it can't compete with btrfs's performance,
|
||||||
|
given it's so heavily optimized, and in the kernel.</p>
|
||||||
<div>
|
<div>
|
||||||
<canvas id="rand_read_chart" class="chart">
|
<canvas id="rand_read_chart" class="chart">
|
||||||
</canvas>
|
</canvas>
|
||||||
</div>
|
</div>
|
||||||
<h3 id="sequential-read-latency">Sequential read latency</h3>
|
<h3 id="sequential-read-latency">Sequential read latency</h3>
|
||||||
|
<p>As expected, DwarFS performs a bit worse on the
|
||||||
|
incompressible random data, but otherwise they'll all roughly
|
||||||
|
equal. I wasn't expecting this, given btrfs is in the kernel,
|
||||||
|
while the other two are using FUSE.</p>
|
||||||
<div>
|
<div>
|
||||||
<canvas id="seq_read_latency_chart" class="chart">
|
<canvas id="seq_read_latency_chart" class="chart">
|
||||||
</canvas>
|
</canvas>
|
||||||
</div>
|
</div>
|
||||||
<h3 id="random-read-latency">Random read latency</h3>
|
<h3 id="random-read-latency">Random read latency</h3>
|
||||||
|
<p>Both DwarFS and fuse-archive had some trouble with this test.
|
||||||
|
DwarFS doesn't seem to handle random access very well; this is
|
||||||
|
supposedly fixed, as seen in issue 139 (<em>Issue #139 ·
|
||||||
|
mhx/dwarfs</em>), but the performance issues are obvious
|
||||||
|
regardless; I'm not sure why, given it doesn't compress
|
||||||
|
uncompressible data, not to mention it does just fine on the
|
||||||
|
random read test, where the only difference is that it reads
|
||||||
|
<em>more</em> data. But regardless, DwarFS ended up performing
|
||||||
|
far worse than expected on both the incompressible random data,
|
||||||
|
and the highly compressible null data.</p>
|
||||||
|
<p>Meanwhile, when testing random read latency in
|
||||||
|
<code>fuse-archive</code> pretty much just dies, becoming
|
||||||
|
ridiculously slow (even compared to DwarFS), so I didn't include
|
||||||
|
its single-file results. It succeeds on the bulk files, but
|
||||||
|
given it just shows as 0 seconds anyways, given the massive
|
||||||
|
scale, I opted to not include it in this graph at all.</p>
|
||||||
<div>
|
<div>
|
||||||
<canvas id="rand_read_latency_chart" class="chart">
|
<canvas id="rand_read_latency_chart" class="chart">
|
||||||
</canvas>
|
</canvas>
|
||||||
</div>
|
</div>
|
||||||
<p>The FUSE-based filesystems run into a bit of trouble here -
|
<h2 id="misc-notes">Misc notes</h2>
|
||||||
with incompressible data, DwarFS has a hard time keeping up for
|
<p>DwarFS can take up a fair amount of memory if mounting it
|
||||||
some reason, despite keeping up just fine with larger random
|
many times (<em>Issue #219 · mhx/dwarfs</em>), and this should
|
||||||
reads on the same data, and so it takes 3 to 4 seconds to run
|
be kept in mind for use in BlendOS.</p>
|
||||||
random read latency testing on the 25 GiB random file.
|
<hr />
|
||||||
Meanwhile, when testing random read latency in
|
<p>Ratarmount (<em>mxmlnkn/ratarmount</em>) should also be
|
||||||
<code>fuse-archive</code> pretty much just dies, becoming
|
investigated; it's similar to fuse-archive, but with some
|
||||||
ridiculously slow (even compared to DwarFS), so I didn't test
|
improvements, and some important notes. From its README
|
||||||
its random read latency at all and just had its results put as 0
|
file:</p>
|
||||||
milliseconds.</p>
|
<blockquote>
|
||||||
<h3 id="summary-and-notes">Summary and notes</h3>
|
<p>Note that fuse-archive daemonizes instantly but the mount
|
||||||
<h2 id="sources">Sources</h2>
|
point will not be usable for a long time and everything trying
|
||||||
<ol type="1">
|
to use it will hang until then when not using
|
||||||
<li><a href="https://github.com/mhx/dwarfs"
|
--asyncprogress</p>
|
||||||
class="uri">https://github.com/mhx/dwarfs</a></li>
|
</blockquote>
|
||||||
<li><a href="https://www.kernel.org/"
|
<blockquote>
|
||||||
class="uri">https://www.kernel.org/</a></li>
|
<p>Mounting bzip2 and xz archives has actually become faster
|
||||||
<li><a
|
than archivemount and fuse-archive with ratarmount -P 0 on most
|
||||||
href="https://git.askiiart.net/askiiart/disk-read-benchmark"
|
modern processors because it actually uses more than one core
|
||||||
class="uri">https://git.askiiart.net/askiiart/disk-read-benchmark</a></li>
|
for decoding those compressions. indexed_bzip2 supports block
|
||||||
<li><a
|
parallel decoding since version 1.2.0.</p>
|
||||||
href="https://git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic"
|
</blockquote>
|
||||||
class="uri">https://git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic</a></li>
|
<p>Despite being written in Python, Ratarmount seems to have
|
||||||
</ol>
|
significant performance improvements over fuse-archive.</p>
|
||||||
|
<hr />
|
||||||
|
<p>This should also be tested on systems with different specs,
|
||||||
|
like my Chromebook and laptop, and should try getting the btrfs
|
||||||
|
FUSE driver working and benchmarking that.</p>
|
||||||
|
<h2 id="summary">Summary</h2>
|
||||||
|
<p>DwarFS, or just the normal filesystem plus overlayfs, seem
|
||||||
|
like they may be the best options - DwarFS's compression and
|
||||||
|
deduplication are great, and the deduplication could probably be
|
||||||
|
used in way I haven't even thought of yet, but it has some niche
|
||||||
|
issues. Overall, I'm leaning towards using DwarFS as an option,
|
||||||
|
with just overlayfs as the default, but further testing is
|
||||||
|
needed.</p>
|
||||||
<h2 id="footnotes">Footnotes</h2>
|
<h2 id="footnotes">Footnotes</h2>
|
||||||
|
<h2 id="sources">Sources</h2>
|
||||||
|
<p> - “Confused_ace_noises/Maths-Demos - Branch:
|
||||||
|
Headless-Deterministic.” Forgea: Git with a Cup of Jea,
|
||||||
|
git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic.<br />
|
||||||
|
- “Disk Read Benchmark - A Simple and Performant Read-Only Disk
|
||||||
|
Benchmark, Written in Rust.” Forgea: Git with a Cup of Jea,
|
||||||
|
git.askiiart.net/askiiart/disk-read-benchmark.<br />
|
||||||
|
- Google. “Google/Fuse-Archive: Fuse File System for Archives
|
||||||
|
and Compressed Files (ZIP, RAR, 7z, ISO, TGZ, Xz...).” GitHub,
|
||||||
|
github.com/google/fuse-archive.<br />
|
||||||
|
- The Linux Kernel Archives, Linux Kernel Organization, Inc.,
|
||||||
|
<www.kernel.org/>.<br />
|
||||||
|
- Mhx. “Feature Request: Improve Block Management for
|
||||||
|
Uncompressed Blocks to Save Memory and Enhance Deduplication ·
|
||||||
|
ISSUE #139 · MHX/Dwarfs.” GitHub,
|
||||||
|
github.com/mhx/dwarfs/issues/139.<br />
|
||||||
|
- mhx. “mhx/Dwarfs: A Fast High Compression Read-Only File
|
||||||
|
System for Linux, Windows and Macos.” GitHub,
|
||||||
|
github.com/mhx/dwarfs.<br />
|
||||||
|
- mhx. “[Feature Request] Mounting Multiple Archives to the
|
||||||
|
Same Path · Issue #219 · MHX/Dwarfs.” GitHub,
|
||||||
|
github.com/mhx/dwarfs/issues/219.<br />
|
||||||
|
- mxmlnkn. “mxmlnkn/ratarmount: Access Large Archives as a
|
||||||
|
Filesystem Efficiently, e.g., Tar, Rar, Zip, Gz, BZ2, XZ, ZSTD
|
||||||
|
Archives.” GitHub, github.com/mxmlnkn/ratarmount.</p>
|
||||||
<!-- JavaScript for graphs goes hereeeeeee -->
|
<!-- JavaScript for graphs goes hereeeeeee -->
|
||||||
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
||||||
<script src="/assets/benchmarking-dwarfs/js/declare_vars.js"></script>
|
<script src="/assets/benchmarking-dwarfs/js/declare_vars.js"></script>
|
||||||
|
@ -156,7 +241,8 @@
|
||||||
href="#fnref1" class="footnote-back"
|
href="#fnref1" class="footnote-back"
|
||||||
role="doc-backlink">↩︎</a></p></li>
|
role="doc-backlink">↩︎</a></p></li>
|
||||||
<li id="fn2">This data is from a modified version of an
|
<li id="fn2">This data is from a modified version of an
|
||||||
abandoned math demonstration program [4] made by a friend; it
|
abandoned math demonstration program
|
||||||
|
(<em>confused_ace_noises/maths-demos</em>) made by a friend; it
|
||||||
generates regular polygons and writes their data to a file. I
|
generates regular polygons and writes their data to a file. I
|
||||||
chose this because it was an artificial and reproducible yet
|
chose this because it was an artificial and reproducible yet
|
||||||
fairly compressible dataset (without being extremely
|
fairly compressible dataset (without being extremely
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# Benchmarking and comparing DwarFS
|
# Benchmarking and comparing DwarFS
|
||||||
|
|
||||||
DwarFS is a filesystem developed by the user mhx on GitHub [1], which is self-described as "A fast high compression read-only file system for Linux, Windows, and macOS." One of my ideas for blendOS was to layer different packages, and combined with its compression and option to be mounted as a FUSE-based filesystem, it's an appealing option for this use case - blendOS is immutable, so it might as well have some compression.
|
DwarFS is a filesystem developed by the user mhx on GitHub (*mhx/dwarfs*), which is self-described as "A fast high compression read-only file system for Linux, Windows, and macOS." One of my ideas for blendOS was to layer different packages, and combined with its compression and option to be mounted as a FUSE-based filesystem, it's an appealing option for this use case - blendOS is immutable, so it might as well have some compression.
|
||||||
|
|
||||||
## Methodology
|
## Methodology
|
||||||
|
|
||||||
|
@ -9,7 +9,7 @@ The datasets being used for this test will be the following:
|
||||||
- 25 GiB of null data (just `00000000` in binary)
|
- 25 GiB of null data (just `00000000` in binary)
|
||||||
- 25 GiB of random data[^1]
|
- 25 GiB of random data[^1]
|
||||||
- Data for a 100 million-sided regular polygon; ~26.5 GiB[^2]
|
- Data for a 100 million-sided regular polygon; ~26.5 GiB[^2]
|
||||||
- The current Linux longterm release source ([6.6.58](https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.58.tar.xz) [2]); ~1.5 GB
|
- The current Linux longterm release source ([6.6.58](https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.58.tar.xz) (*The Linux Kernel Archives*)); ~1.5 GB
|
||||||
- For some rough latency testing:
|
- For some rough latency testing:
|
||||||
- 1024 4 KiB files filled with null data (again, just `00000000` in binary)
|
- 1024 4 KiB files filled with null data (again, just `00000000` in binary)
|
||||||
- 1024 4 KiB files filled with random data
|
- 1024 4 KiB files filled with random data
|
||||||
|
@ -18,15 +18,15 @@ All this data should cover both latency and read speed testing for data that com
|
||||||
|
|
||||||
### What filesystems?
|
### What filesystems?
|
||||||
|
|
||||||
I'll be benchmarking DwarFS, fuse-archive (with tar files), and btrfs. In some early, basic testing, I found that mounting any *compressed* archives with `fuse-archive`, a tool for mounting archive file formats as read-only filesystems, took far too long. Additionally, being FUSE-based, these would have slightly worse performance than kernel filesystems, so I tried to use a FUSE driver as well for btrfs. Unforunately, I ran into a bug, so I won't be able to quite do an equivalent test; btrfs will only be running in the kernel.
|
I'll be benchmarking DwarFS (*mhx/dwarfs*), fuse-archive (*Google/Fuse-Archive*) (with tar files), and btrfs. In some early, basic testing, I found that mounting any *compressed* archives with `fuse-archive`, a tool for mounting archive file formats as read-only filesystems, took far too long. Additionally, being FUSE-based, these would have slightly worse performance than kernel filesystems, so I tried to use a FUSE driver as well for btrfs. Unforunately, I ran into a bug, so I won't be able to quite do an equivalent test; btrfs will only be running in the kernel.
|
||||||
|
|
||||||
During said early testing, I also ran into the fact that most compressed archives, like Gzip-compressed tar archives, also took far too long to *create*, because Gzip is single-threaded. So all the options with no chance of being used have been marked off, and I'll only be looking into these three.
|
During said early testing, I also ran into the fact that most compressed archives, like Gzip-compressed tar archives, also took far too long to *create*, because Gzip is single-threaded. So all the options with no chance of being used have been marked off, and I'll only be looking into these three.
|
||||||
|
|
||||||
DwarFS also took far too long to create an archive on its default setting, but on compression level 1, it's much faster - 11m2.738s for the ~80 GiB total, and considering my entire system is about 20 GiB, that should be about 2-3 minutes, which is reasonable.
|
DwarFS also took far too long to create an archive on its default setting, but on compression level 1, it's much faster - 11m2.738s for the ~80 GiB total, and considering my entire system is about 20 GiB, that should be about 2-3 minutes, which is reasonable; With no compression, tar took 3m3.378s. Mounting the DwarFS archive was nearly instant (0.022s), while mounting the tar archive took 1.352s - not bad, but not ideal when mounting many, and will absolutely be taken into consideration.
|
||||||
|
|
||||||
## Running the benchmark
|
## Running the benchmark
|
||||||
|
|
||||||
First installed it by cloning the repository, installing it using Cargo, then added its completions to fish (just for this session):
|
First off, installed I installed my benchamark (*Disk Read Benchmark*) by cloning the repository, installing it using Cargo, then added its completions to fish (just for this session):
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
git clone https://git.askiiart.net/askiiart/disk-read-benchmark
|
git clone https://git.askiiart.net/askiiart/disk-read-benchmark
|
||||||
|
@ -59,43 +59,66 @@ After processing [the data](/assets/benchmarking-dwarfs/data/) with [this script
|
||||||
|
|
||||||
### Sequential read
|
### Sequential read
|
||||||
|
|
||||||
|
These results interest me quite a bit; unsurprisingly, DwarFS has an advantage on the null file, due to its compression, though it's disappointing the difference in time wasn't greater. However, it does far worse on the random file, and I'm not sure why; as discussed further down, DwarFS doesn't try to compress incompressible files as far as I know, but I could be wrong. As for the 100 million-sided polygon, it's somewhere in between, with an advantage due to its compression, but still taking longer than expected.
|
||||||
|
|
||||||
|
As for fuse-archive, it handles the null file well, but takes longer on the others; not much to say.
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
<canvas id="seq_read_chart" class="chart"></canvas>
|
<canvas id="seq_read_chart" class="chart"></canvas>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
### Random read
|
### Random read
|
||||||
|
|
||||||
|
There's nothing much to say here; although DwarFS took significantly longer, it's still pretty fast - a different of about 14 milliseconds worst case, across a 25 GiB file; similar resuls for the 100 million-sided polygon, though to a less extent, given it can be compressed better. With the null file, due to its compression, DwarFS was actually on par with fuse-archive, but it can't compete with btrfs's performance, given it's so heavily optimized, and in the kernel.
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
<canvas id="rand_read_chart" class="chart"></canvas>
|
<canvas id="rand_read_chart" class="chart"></canvas>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
### Sequential read latency
|
### Sequential read latency
|
||||||
|
|
||||||
|
As expected, DwarFS performs a bit worse on the incompressible random data, but otherwise they'll all roughly equal. I wasn't expecting this, given btrfs is in the kernel, while the other two are using FUSE.
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
<canvas id="seq_read_latency_chart" class="chart"></canvas>
|
<canvas id="seq_read_latency_chart" class="chart"></canvas>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
### Random read latency
|
### Random read latency
|
||||||
|
|
||||||
|
Both DwarFS and fuse-archive had some trouble with this test. DwarFS doesn't seem to handle random access very well; this is supposedly fixed, as seen in issue 139 (*Issue #139 · mhx/dwarfs*), but the performance issues are obvious regardless; I'm not sure why, given it doesn't compress uncompressible data, not to mention it does just fine on the random read test, where the only difference is that it reads *more* data. But regardless, DwarFS ended up performing far worse than expected on both the incompressible random data, and the highly compressible null data.
|
||||||
|
|
||||||
|
Meanwhile, when testing random read latency in `fuse-archive` pretty much just dies, becoming ridiculously slow (even compared to DwarFS), so I didn't include its single-file results. It succeeds on the bulk files, but given it just shows as 0 seconds anyways, given the massive scale, I opted to not include it in this graph at all.
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
<canvas id="rand_read_latency_chart" class="chart"></canvas>
|
<canvas id="rand_read_latency_chart" class="chart"></canvas>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
The FUSE-based filesystems run into a bit of trouble here - with incompressible data, DwarFS has a hard time keeping up for some reason, despite keeping up just fine with larger random reads on the same data, and so it takes 3 to 4 seconds to run random read latency testing on the 25 GiB random file. Meanwhile, when testing random read latency in `fuse-archive` pretty much just dies, becoming ridiculously slow (even compared to DwarFS), so I didn't test its random read latency at all and just had its results put as 0 milliseconds.
|
## Misc notes
|
||||||
|
|
||||||
### Summary and notes
|
DwarFS can take up a fair amount of memory if mounting it many times (*Issue #219 · mhx/dwarfs*), and this should be kept in mind for use in BlendOS.
|
||||||
|
|
||||||
## Sources
|
---
|
||||||
|
|
||||||
1. <https://github.com/mhx/dwarfs>
|
Ratarmount (*mxmlnkn/ratarmount*) should also be investigated; it's similar to fuse-archive, but with some improvements, and some important notes. From its README file:
|
||||||
2. <https://www.kernel.org/>
|
|
||||||
3. <https://git.askiiart.net/askiiart/disk-read-benchmark>
|
> Note that fuse-archive daemonizes instantly but the mount point will not be usable for a long time and everything trying to use it will hang until then when not using --asyncprogress
|
||||||
4. <https://git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic>
|
|
||||||
|
> Mounting bzip2 and xz archives has actually become faster than archivemount and fuse-archive with ratarmount -P 0 on most modern processors because it actually uses more than one core for decoding those compressions. indexed_bzip2 supports block parallel decoding since version 1.2.0.
|
||||||
|
|
||||||
|
Despite being written in Python, Ratarmount seems to have significant performance improvements over fuse-archive.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
This should also be tested on systems with different specs, like my Chromebook and laptop, and should try getting the btrfs FUSE driver working and benchmarking that.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
DwarFS, or just the normal filesystem plus overlayfs, seem like they may be the best options - DwarFS's compression and deduplication are great, and the deduplication could probably be used in way I haven't even thought of yet, but it has some niche issues. Overall, I'm leaning towards using DwarFS as an option, with just overlayfs as the default, but further testing is needed.
|
||||||
|
|
||||||
## Footnotes
|
## Footnotes
|
||||||
|
|
||||||
[^1]: My code can generate up to 25 GB/s. However, it does random writes to my drive, which is *much* slower. So on one hand, you could say my code is so amazingly fast that current day technologies simply can't keep up. Or you could say that I have no idea how to code for real world scenarios.
|
[^1]: My code can generate up to 25 GB/s. However, it does random writes to my drive, which is *much* slower. So on one hand, you could say my code is so amazingly fast that current day technologies simply can't keep up. Or you could say that I have no idea how to code for real world scenarios.
|
||||||
[^2]: This data is from a modified version of an abandoned math demonstration program [4] made by a friend; it generates regular polygons and writes their data to a file. I chose this because it was an artificial and reproducible yet fairly compressible dataset (without being extremely compressible like null data).
|
[^2]: This data is from a modified version of an abandoned math demonstration program (*confused_ace_noises/maths-demos*) made by a friend; it generates regular polygons and writes their data to a file. I chose this because it was an artificial and reproducible yet fairly compressible dataset (without being extremely compressible like null data).
|
||||||
<details open>
|
<details open>
|
||||||
<summary>3-sided regular polygon data</summary>
|
<summary>3-sided regular polygon data</summary>
|
||||||
<br>
|
<br>
|
||||||
|
@ -108,6 +131,17 @@ The FUSE-based filesystems run into a bit of trouble here - with incompressible
|
||||||
</div>
|
</div>
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
 - “Confused_ace_noises/Maths-Demos - Branch: Headless-Deterministic.” Forgea: Git with a Cup of Jea, git.askiiart.net/confused_ace_noises/maths-demos/src/branch/headless-deterministic.\
|
||||||
|
 - “Disk Read Benchmark - A Simple and Performant Read-Only Disk Benchmark, Written in Rust.” Forgea: Git with a Cup of Jea, git.askiiart.net/askiiart/disk-read-benchmark.\
|
||||||
|
 - Google. “Google/Fuse-Archive: Fuse File System for Archives and Compressed Files (ZIP, RAR, 7z, ISO, TGZ, Xz...).” GitHub, github.com/google/fuse-archive.\
|
||||||
|
 - The Linux Kernel Archives, Linux Kernel Organization, Inc., <www.kernel.org/>.\
|
||||||
|
 - Mhx. “Feature Request: Improve Block Management for Uncompressed Blocks to Save Memory and Enhance Deduplication · ISSUE #139 · MHX/Dwarfs.” GitHub, github.com/mhx/dwarfs/issues/139.\
|
||||||
|
 - mhx. “mhx/Dwarfs: A Fast High Compression Read-Only File System for Linux, Windows and Macos.” GitHub, github.com/mhx/dwarfs.\
|
||||||
|
 - mhx. “[Feature Request] Mounting Multiple Archives to the Same Path · Issue #219 · MHX/Dwarfs.” GitHub, github.com/mhx/dwarfs/issues/219.\
|
||||||
|
 - mxmlnkn. “mxmlnkn/ratarmount: Access Large Archives as a Filesystem Efficiently, e.g., Tar, Rar, Zip, Gz, BZ2, XZ, ZSTD Archives.” GitHub, github.com/mxmlnkn/ratarmount.
|
||||||
|
|
||||||
<!-- JavaScript for graphs goes hereeeeeee -->
|
<!-- JavaScript for graphs goes hereeeeeee -->
|
||||||
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
|
||||||
<script src="/assets/benchmarking-dwarfs/js/declare_vars.js"></script>
|
<script src="/assets/benchmarking-dwarfs/js/declare_vars.js"></script>
|
||||||
|
|
22
feed.xml
22
feed.xml
|
@ -5,18 +5,10 @@
|
||||||
<title>eng.askiiart.net</title>
|
<title>eng.askiiart.net</title>
|
||||||
<description>This is the feed for engl.askiiart.net, I guess</description>
|
<description>This is the feed for engl.askiiart.net, I guess</description>
|
||||||
<link>https://askiiart.net</link>
|
<link>https://askiiart.net</link>
|
||||||
<lastBuildDate>Sun, 17 Nov 2024 07:02:04 +0000</lastBuildDate>
|
<lastBuildDate>Tue, 19 Nov 2024 18:55:27 +0000</lastBuildDate>
|
||||||
<item>
|
<item>
|
||||||
<title></title>
|
<title>Benchmarking and comparing DwarFS</title>
|
||||||
<link>https://engl.askiiart.net/blog/minimum.html</link>
|
<link>https://engl.askiiart.net/blog/benchmarking-dwarfs.html</link>
|
||||||
</item>
|
|
||||||
<item>
|
|
||||||
<title>Using `clap`</title>
|
|
||||||
<link>https://engl.askiiart.net/blog/using-clap.html</link>
|
|
||||||
</item>
|
|
||||||
<item>
|
|
||||||
<title>Checking out blendOS</title>
|
|
||||||
<link>https://engl.askiiart.net/blog/blendos.html</link>
|
|
||||||
</item>
|
</item>
|
||||||
<item>
|
<item>
|
||||||
<title>Building blendOS (and its packages)</title>
|
<title>Building blendOS (and its packages)</title>
|
||||||
|
@ -26,6 +18,14 @@
|
||||||
<title>OCI Images as a "Filesystem": Vanilla OS</title>
|
<title>OCI Images as a "Filesystem": Vanilla OS</title>
|
||||||
<link>https://engl.askiiart.net/blog/vanilla-os.html</link>
|
<link>https://engl.askiiart.net/blog/vanilla-os.html</link>
|
||||||
</item>
|
</item>
|
||||||
|
<item>
|
||||||
|
<title>Checking out blendOS</title>
|
||||||
|
<link>https://engl.askiiart.net/blog/blendos.html</link>
|
||||||
|
</item>
|
||||||
|
<item>
|
||||||
|
<title>Using `clap`</title>
|
||||||
|
<link>https://engl.askiiart.net/blog/using-clap.html</link>
|
||||||
|
</item>
|
||||||
<item>
|
<item>
|
||||||
<title>Glossary</title>
|
<title>Glossary</title>
|
||||||
<link>https://engl.askiiart.net/glossary.html</link>
|
<link>https://engl.askiiart.net/glossary.html</link>
|
||||||
|
|
44
style.css
44
style.css
|
@ -34,7 +34,8 @@ body {
|
||||||
}
|
}
|
||||||
|
|
||||||
li {
|
li {
|
||||||
font: 18px/1.35 'Atkinson Hyperlegible', sans-serif;}
|
font: 18px/1.35 'Atkinson Hyperlegible', sans-serif;
|
||||||
|
}
|
||||||
|
|
||||||
p,
|
p,
|
||||||
footer {
|
footer {
|
||||||
|
@ -81,22 +82,6 @@ code {
|
||||||
.force-word-wrap pre code {
|
.force-word-wrap pre code {
|
||||||
white-space: normal;
|
white-space: normal;
|
||||||
word-wrap: break-word;
|
word-wrap: break-word;
|
||||||
}
|
|
||||||
|
|
||||||
@media (max-device-width: 1200px) {
|
|
||||||
h1 {
|
|
||||||
line-height: 1.2;
|
|
||||||
font-size: 40px;
|
|
||||||
}
|
|
||||||
|
|
||||||
h2 {
|
|
||||||
line-height: 1.2;
|
|
||||||
font-size: 30px;
|
|
||||||
}
|
|
||||||
|
|
||||||
body {
|
|
||||||
font-size: 20px;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@media print {
|
@media print {
|
||||||
|
@ -120,7 +105,26 @@ img {
|
||||||
}
|
}
|
||||||
|
|
||||||
.chart {
|
.chart {
|
||||||
min-width: 600px;
|
max-width: 80vw;
|
||||||
max-width: 50vw;
|
max-height: 60vh;
|
||||||
max-height: 50vh;
|
}
|
||||||
|
|
||||||
|
@media (max-device-width: 1200px) {
|
||||||
|
h1 {
|
||||||
|
line-height: 1.2;
|
||||||
|
font-size: 40px;
|
||||||
|
}
|
||||||
|
|
||||||
|
h2 {
|
||||||
|
line-height: 1.2;
|
||||||
|
font-size: 30px;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
font-size: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.chart {
|
||||||
|
max-width: 95vw;
|
||||||
|
}
|
||||||
}
|
}
|
Loading…
Reference in a new issue