Re: [linux-audio-dev] more preallocation vs no prealloc / async vs sync tests.

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] more preallocation vs no prealloc / async vs sync tests.
From: Benno Senoner (sbenno_AT_gardena.net)
Date: Fri Apr 21 2000 - 20:02:06 EEST


On Fri, 21 Apr 2000, Paul Barton-Davis wrote:
> >PS: Paul run it on your 10k rpm SCSI disk so that we can do some comparison.
>
> I hope you are ready for some *very* different numbers.
>
> /tmp/hdtest 500 async trunc
> SINGLE THREADED: 12.788 MByte/sec
> MULTI-THREADED: 12.788 MByte/sec
>
> /tmp/hdtest 500 async notrunc
> SINGLE THREADED: 6.096 MByte/sec
> MULTI-THREADED: 6.168 MByte/sec
>
> /tmp/hdtest 150 sync trunc
> SINGLE THREADED: 11.292 MByte/sec
> MULTI-THREADED: 12.233 MByte/sec
>
> /tmp/hdtest 150 sync trunc
> SINGLE THREADED: 5.437 MByte/sec
> MULTI-THREADED: 6.383 MByte/sec
>
> A few notes.
>
> In the source you sent, you are not doing 256kB writes, but 1MB
> writes, since you defined MYSIZE as (262144*4). This is puzzling.
> However, changing it to 256kB doesn't change the results in any
> significant way, as far as I can tell.

Yes you are right I forgot to remove that *4 while before sending the mail.
I actually experimented with writing 1-2MB at time, rather than the default
256KB but basically the performance is the same, and you can gain a few %
at maximum, but nothing relevant.
Therefore I think 256KByte is actually the best tradeoff between buffersize and
speed.

>
> It troubles me that the ongoing rate display is always significantly
> higher than the eventual effective speed. I understand the reason for
> the initially very high rate, but I typically see final rates from the
> ongoing display that are very much higher than in your effective rate
> display (e.g. 13MB/sec versus 5.5MB/sec, 20MB/sec versus 12MB/sec).
> I don't have the time to stare at the source and figure out why this
> is.

I think that the effective speed doesn't reflect real speed (it's too low,
since I am calling sync() at the end of writes, and then get the elapsed time
of the whole process.
I added this in order to avoid that the write() loops finishes , with much data
still in the buffer cache instead of on the disk.

I think the best way to get reasonable numbers is :
- use a testsize which is at least 2-3 times your RAM in order to avoid that the
cache distorts the results.
- use the last number of the ongoing rate display as the "effective" average
datatransfer rate.

Anyway I ran the test on the RAID box again, this time with 256KB writes,
and I got the same performance as before (24-26MB/sec), therefore as you
pointed out 256KB is a quite ideal IO size.

It's amazing that you got the 12MB/sec in the SYNC mode,
wondering if it's your SCSI disk and/or 2.3.x kernel,
I am guessing a combination of both, as Stehpen said, that the
SCSI driver is much more performant when issuing lots of requests.

I wasn't prepared for that fast O_SYNC results, can you please rerun the
test using 500MB as testsize. (plus take the last MB/sec value of the ongoing
rate display as final result).
If you really get the same performance as in the async mode, then
it probably makes more sense to use O_SYNC in ardour on SCSI boxes,
since you get more predictable buffer cache usage.
On EIDE, unfortunately we have to forget O_SYNC.

>
> Its very interesting that writing to pre-allocated files is 50%
> slower for me. This is even though your pre-allocation strategy causes
> block-interleaving of the files. I suspect, but at this time cannot
> prove, that this is due (in my case at least) to fs fragmentation. I
> will try the benchmark on a clean 18GB disk the next time I'm over at
> the studio.

Notice that I even tried to allocate the files in a non interleaved fashion,
(by creating 20 separate files in sequence) giving me basically the same
performance as creating the files using the interleaved mode.

But again, on your case it may be different, the only way to know it is to test
it.
(just create 20 files with dd named outfile0 , outfile1 etc , for example each
of a size 25MB, and then run hdtest with the notrunc parameter).

If you find interesting results let us know.

>
> Stephen Tweedie or someone else would know the answer to my last
> question: I am wondering if contiguous allocation of fs blocks to a
> file reduces the amount of metadata updating ? Does metadata belong to
> a fixed-sized unit, or an inode, or a variable-sized unit, or some
> combination ? I ask this because I see some visual indication of the
> disk stalls you have talked about when running your hdtest program (it
> may just be paging issues, however - hard to tell), and I still have
> not seen them in ardour. Assuming for a second that these are real
> stalls, one obvious difference is that your preallocation strategy
> does not produce contiguous files.
>
> --p

I can't tell exactly here, but I guess that your supposition may be true.
But again , an answer from the filesystem gurus would be nice.

Benno.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Apr 21 2000 - 20:29:43 EEST