Re: [linux-audio-dev] [Fwd: [patch]2.4.0-test6 "spinlock" preemption patch]

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] [Fwd: [patch]2.4.0-test6 "spinlock" preemption patch]
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Thu Sep 07 2000 - 21:44:33 EEST


On Thu, 07 Sep 2000, Aki M Laukkanen wrote:
> On Thu, 7 Sep 2000, Benno Senoner wrote:
> > And yes your graph looks MUCH better than mine
> > (eg: http://www.linuxdj.com/hdrbench/graph-18-18.gif on a IBM16GB EIDE ext2
> > with 1024byte blocks)
>
> What I said that vm is very fragile, is not perhaps best seen in that graph.
> There are better and worse runs. For example on some repeat I can get
> behaviour which very much looks like what is in your graph. In that case
> interactive performance seems to go down the toilet too. Mouse pointer
> doesn't move etc.

yes this is a problem of linux, when doing heavy disk IO,
and it is even worse if you don't enable DMA on EIDE drives.

>
> Hmm. I must admit that I didn't look very carefully what exactly your
> benchmark was measuring and how it relates to audio apps. I'm now
> enlightened. :) It should still prove very beneficial if the audio buffer
> sizes could be made smaller and interactive performance is always a
> priority. 62 tracks take 62MB afterall which is mlocked.

Yes hdrbench's goal was to see how much disk streaming throughput
linux can deliver under realworld conditions , plus to see how much buffering
we need in that cases.
Ideally the curves should smooth and not to big peaks, but Linux produces
nice rollercoaster-like curves (and the peak happens when write buffers are
flushed, reading seems less problematic).
And I am not sure if windows can outperform us in terms of reliable buffersize
per track, although I was HDR software on Windows which needs 1MB per track
too.
But I think that is definitively too much, since 1MB with 32bit samples is
almost 5secs, which is a VERY long time, for an OS.

>
> > Keep in mind that blocksizes of 1024 on the ext2 filesystem perform very poorly
> > compared to blocksizes of 4096 bytes.
>
> Btw. I thought mkfs.ext2 nowadays uses blocksize of 4096 by default.

yes, but my box at home is a RH 6.1 upgraded from a RH 5.2 , and the old
installer created the partition with an ext2 blocksize of 1024.

>
> > Some suggestions for your filesystem:
> > if you can, disable the caching , write-behind and read-ahead completely.
>
> Within a filesystem disabling all these would mean duplicating most of
> fs/buffer.c code. In my simple prototype, I've added a new inode flag
> and modified the generic code. I think sct has plans for the real
> implementation in 2.5 timeframe.

I think, although it will not boost performance (number of simultaneous tracks),
it will allow lower buffersize, which is a good thing, especially on low memory
boxes.

> > As for fragmentation, I believe that this is not THAT a big factor, because
> > when streaming multiple tracks we have to seek after each block (256KB) anyway,
> > thus as long the fragments are in the range of the blocksize (256KB) it does
> > not matter if the file is continuous or fragmented.
>
> Hmm. I'm not completely following you. Why would you have to seek after each
> 256kB write? These writes are anyway broken in to fs blocksize pieces and
> given to the block device layer (ll_rw_block). Then elevator and the block
> layer code try to merge these requests into bigger ones which are issued
> on the disk. This can only be done if each request is held in the queue for
> some time to wait for incoming requests and the requests are physically
> adjacent. Hence, tunable variables elevator read and write latency.

Ok write may be a special case because it can be queued up so that
adiacent block can get written in one rush, but for reading this will not
work so well IMHO.
(how long does the kernel wait before reading ?)

>
> Optimal case is of course that every incoming request can be merged until
> the request has aged and one huge request is given to the block device
> drivers. This why streamfs interleaves the tracks together. On ext2 fs there
> is no guarantee of the physical placement of the disk blocks. I think the
> block allocation strategies are optimized so that blocks within a file
> should be close to eachother but not with respect to other files. Disk seeks
> are not particularly cheap (possibly over 10ms) and thus should be minimized.

I agree that the seeks should be minimized, but IMHO the best way to achieve
this is to let the userspace app read and write large blocks, and at some point
 increasing the read/write blocksize does not increase the throughput anymore.
And I think this value lies around 256-512KB (ad PBD will agree on this).
So a smart of stupid disk scheduling algorithm does not matter in that case.
I think HDR applications should not rely too much on smart disk subsystems too
much.
Plus think about the fact that a good HDR app has to provide varispeed too
(eg one track playing slower or faster than the other), thus all the
interleaving at FS layer are worthless in that case (and perhaps it can
even lead to lower performance compared to a plain FS)
>
> I've not completely validated that it works currently as I wish it would but
> sard indicates large part of the requests are merged.

I am very sceptical that a big amount of requests get merged,
especially in the case of big read/write sizes.
(eg. reading 256KB x 40 tracks = 10MB , that means that all the requests,
in order to optimize the reading path, would need to get delayed by 1-2secs
before reading, which I HARDLY believe.

thoughts ?

Benno.

>
> --
> D.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Thu Sep 07 2000 - 21:10:41 EEST