[linux-audio-dev] Re: O_DIRECT architecture (was Re: info point on linux hdr)

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Re: O_DIRECT architecture (was Re: info point on linux hdr)
From: Stephen C. Tweedie (sct_AT_redhat.com)
Date: Tue Apr 18 2000 - 20:45:19 EEST


Hi,

On Tue, Apr 18, 2000 at 07:56:04AM -0500, Steve Lord wrote:
>
> XFS is using the pagebuf code we wrote (or I should say are writing - it
> needs a lot of work yet). This uses kiobufs to represent data in a set of
> pages. So, we have the infrastructure to take a kiobuf and read or write
> it from disk (OK, it uses buffer heads under the covers).

That's fine, and in fact is exactly what kiobufs were designed for:
to abstract out the storage of the buffer from whatever construction
you happen to use to do the IO. (Raw IO also uses buffer_heads
internally but passes data around in kiobufs.)

> I said basic implementation because it is currently paying no attention
> to cached data. The Irix approach to this was to flush or toss cached
> data which overlapped a direct I/O, I am leaning towards keeping them
> as part of the I/O.

The big advantage of the scheme where I map the kiobuf pages into the
real page cache before the I/O, and unmap after, is that cache
coherency at the beginning of the I/O and all the way through it is
guaranteed. The cost is that the direct I/O may end up doing copies
if there is other I/O going on at the same time to the same page, but
I don't see that as a problem!

> o using caching to remove the alignment restrictions on direct I/O by
> doing unaligned head and tail processing via buffered I/O.

I'm just planning on doing a copy for any unaligned I/O. Raw character
devices simply reject unaligned I/O for now, but O_DIRECT will be a
bit more forgiving.

> > It's something I've been thinking about in the general case. Basically
> > what I want to do is this:
> >
> > Augment the inode operations with a new operation, "rw_kiovec" which
> > performs reads and writes on vectors of kiobufs.
>
> You should probably take a look at what we have been doing to the ops,
> although our extensions are really biased towards extent based filesystems,
> rather than using getblock to identify individual blocks of file data we
> added a bmap interface to return a larger range - this requires different
> locking semantics than getblock, since the mapping we return covers multiple
> pages. I suspect that any approach which assembles multiple pages in advance
> is going to have similar issues.

OK. These are probably orthogonal for now, but doing extent bmaps is
an important optimisation.

Ultimately we are going to have to review the whole device driver
interface. We need that both to do things like >2TB block devices, and
also to achieve better efficiency than we can attain right now with a
separate buffer_head for every single block in the I/O. It's just using
too much CPU; being able to pass kiobufs directly to ll_rw_block along
with a block address list would be much more efficient.

> So if O_ALIAS allows user pages to be put in the cache (provided you use
> O_UNCACHE with it), you can do this.

Yes.

> However, O_DIRECT would be a bit more
> than this - since if there already was cached data for part of the I/O
> you still need to copy those pages up into the user pages which did not
> get into cache.

That's the intention --- O_ALIAS _allows_ the user page to be mapped
into the cache, but if existing cached data or alignment constraints
prevent that, it will fall back to doing a copy.

One consequence is that O_DIRECT I/O from a file which is already cached
will always result in copies, but I don't mind that too much.

> We (SGI) really need to get better hooked in on stuff like this - I really
> don't want to see us going off in one direction (pagebuf) and all the other
> filesystems going off in a different direction.

The pagebuf stuff sounds like it is fairly specialised for now. As
long as all of the components that we are talking about can pass kiobufs
between themselves, we should be able to make them interoperate pretty
easily.

Is the pagebuf code intended to be core VFS functionality or do you
see it being an XFS library component for the forseeable future?
 
> p.s. did you know we also cache meta data in pages directly?

That was one of the intentions in the new page cache structure, and we
may actually end up moving ext2's metadata caching to use the page
cache too in the future.

--Stephen


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Apr 18 2000 - 21:14:30 EEST