[linux-audio-dev] Re: O_DIRECT architecture (was Re: info point on linux hdr)

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Re: O_DIRECT architecture (was Re: info point on linux hdr)
From: Steve Lord (lord_AT_sgi.com)
Date: Tue Apr 18 2000 - 15:56:04 EEST


> Hi,
>
> On Mon, Apr 17, 2000 at 05:58:48PM -0500, Steve Lord wrote:
> >
> > O_DIRECT on Linux XFS is still a work in progress, we only have
> > direct reads so far. A very basic implementation was made available
> > this weekend.
>
> Care to elaborate on how you are doing O_DIRECT?

XFS is using the pagebuf code we wrote (or I should say are writing - it
needs a lot of work yet). This uses kiobufs to represent data in a set of
pages. So, we have the infrastructure to take a kiobuf and read or write
it from disk (OK, it uses buffer heads under the covers). I glued this
together with the map_user_kiobuf() and unmap_kiobuf() calls from your raw
I/O driver and that was about it.

We only build these kiobufs for data which is sequential on disk, not for
the whole user request, the sequence we do things in is a bit different,
basically:

        while data left to copy

                obtain bmap from filesystem representing location of next
                chunk of data (sequential on disk)

                for buffered I/O

                        go find pages covering this range - create if they
                        do not exist.

                        issue blocking read for pages which are not uptodate

                        copy out to user space

                for direct I/O

                        map user pages into a kiobuf

                        issue blocking read for pages

                        unmap pages

I said basic implementation because it is currently paying no attention
to cached data. The Irix approach to this was to flush or toss cached
data which overlapped a direct I/O, I am leaning towards keeping them
as part of the I/O.

Other future possibilities I see are:

  o using caching to remove the alignment restrictions on direct I/O by
    doing unaligned head and tail processing via buffered I/O.

  o Automatically switching to direct I/O under conditions where there
    the I/O would flush to much cache.

>
> It's something I've been thinking about in the general case. Basically
> what I want to do is this:
>
> Augment the inode operations with a new operation, "rw_kiovec" which
> performs reads and writes on vectors of kiobufs.

You should probably take a look at what we have been doing to the ops,
although our extensions are really biased towards extent based filesystems,
rather than using getblock to identify individual blocks of file data we
added a bmap interface to return a larger range - this requires different
locking semantics than getblock, since the mapping we return covers multiple
pages. I suspect that any approach which assembles multiple pages in advance
is going to have similar issues.

>
> Provide a generic_rw_kiovec() function which uses the existing page-
> oriented IO vectors to set up page mappings much as generic_file_{read,
> write} do, but honouring the following flags in the file descriptor:
>
> * O_ALIAS
>
> Allows the write function to install the page in the kiobuf
> into the page cache if the data is correctly aligned and there is
> not already a page in the page cache.
>
> For read, the meaning is different: it allows existing pages in
> the page cache to be installed into the kiobuf.
>
> * O_UNCACHE
>
> If the IO created a new page in the page cache, then attempt to
> unlink the page after the IO completes.
>
> * O_SYNC
>
> Usual meaning: wait for synchronous write IO completion.
>
> O_DIRECT becomes no more than a combination of these options.

So if O_ALIAS allows user pages to be put in the cache (provided you use
O_UNCACHE with it), you can do this. However, O_DIRECT would be a bit more
than this - since if there already was cached data for part of the I/O
you still need to copy those pages up into the user pages which did not
get into cache.

>
> Furthermore, by implementing this mechanism with kiobufs, we can go
> one step further and perform things like Larry's splice operations by
> performing reads and writes in kiobufs. Using O_ALIAS kiobuf reads and
> writes gives us copies between regular files entirely in kernel space
> with the minimum possible memory copies. sendfile() between regular
> files can be optimised to use this mechanism. The data never has to
> hit user space.
>
> As an example of the flexibility of the interface, you can perform
> an O_ALIAS, O_UNCACHE sendfile to copy one file to another, with full
> readahead still being performed on the input file but with no memory
> copies at all. You can also choose not to have O_UNCACHE and O_SYNC
> on the writes, in which case you have both readahead and writebehind
> with zero copy.
>
> This is all fairly easy to implement (at least for ext2), and gives
> us much more than just O_DIRECT for no extra work.
>
> --Stephen

We (SGI) really need to get better hooked in on stuff like this - I really
don't want to see us going off in one direction (pagebuf) and all the other
filesystems going off in a different direction.

Steve

p.s. did you know we also cache meta data in pages directly?


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Tue Apr 18 2000 - 17:35:18 EEST