[linux-audio-dev] Re: O_DIRECT architecture (was Re: info point on linux hdr)

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Re: O_DIRECT architecture (was Re: info point on linux hdr)
From: Steve Lord (lord_AT_sgi.com)
Date: Tue Apr 18 2000 - 21:17:52 EEST


> Hi,
>
> On Tue, Apr 18, 2000 at 07:56:04AM -0500, Steve Lord wrote:
>
> > I said basic implementation because it is currently paying no attention
> > to cached data. The Irix approach to this was to flush or toss cached
> > data which overlapped a direct I/O, I am leaning towards keeping them
> > as part of the I/O.
>
> The big advantage of the scheme where I map the kiobuf pages into the
> real page cache before the I/O, and unmap after, is that cache
> coherency at the beginning of the I/O and all the way through it is
> guaranteed. The cost is that the direct I/O may end up doing copies
> if there is other I/O going on at the same time to the same page, but
> I don't see that as a problem!

I was thinking along these lines.

So I guess the question here is how do you plan on keeping track of the
origin of the pages? Which ones were originally part of the kernel cache
and thus need copying up to user space? It does not seem hard, just wondering
what you had in mind. Also, I presume, if the page was already present
and up to date then on a read you would not refill it from disk - since it
may be more recent that the on disk data, existing buffer heads would
give you this information.

>

>
> Ultimately we are going to have to review the whole device driver
> interface. We need that both to do things like >2TB block devices, and
> also to achieve better efficiency than we can attain right now with a
> separate buffer_head for every single block in the I/O. It's just using
> too much CPU; being able to pass kiobufs directly to ll_rw_block along
> with a block address list would be much more efficient.

Agreed, XFS was getting killed by this (and the fixed block size requirement
of the interface) we have 512 byte I/O requests we need to do for some
meta-data, having to impose this on all I/O and create 8 buffer heads for
each 4K page was just nasty.

>
> > So if O_ALIAS allows user pages to be put in the cache (provided you use
> > O_UNCACHE with it), you can do this.
>
> Yes.
>
> > However, O_DIRECT would be a bit more
> > than this - since if there already was cached data for part of the I/O
> > you still need to copy those pages up into the user pages which did not
> > get into cache.
>
> That's the intention --- O_ALIAS _allows_ the user page to be mapped
> into the cache, but if existing cached data or alignment constraints
> prevent that, it will fall back to doing a copy.
>
> One consequence is that O_DIRECT I/O from a file which is already cached
> will always result in copies, but I don't mind that too much.

So maybe an O_CLEANCACHE (or something similar) could be used to indicate
that anything which is found cached should be moved out of the way (flushed
to disk or tossed depending on what is happening). Some other sort of API
such as an fsync variant or that fadvise call which was mentioned recently
could be used to clean cache for a file. This would let those apps which really
want direct disk <-> user memory I/O get what they wanted.

>
> The pagebuf stuff sounds like it is fairly specialised for now. As
> long as all of the components that we are talking about can pass kiobufs
> between themselves, we should be able to make them interoperate pretty
> easily.
>
> Is the pagebuf code intended to be core VFS functionality or do you
> see it being an XFS library component for the forseeable future?

We had talked about trying to use it on some other filesystem to see what
happened, but we don't really have the bandwidth to do that. We don't see
it as being just there for XFS - although, for existing Linux filesystems,
there may not be benefits to switching over to it.

>
> --Stephen

Steve


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Apr 18 2000 - 21:45:06 EEST