Re: [linux-audio-dev] Re: sched_setscheduler question ...

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] Re: sched_setscheduler question ...
From: Paul Barton-Davis (pbd_AT_Op.Net)
Date: Sun Jun 11 2000 - 08:05:43 EEST


>So, the goal with SCHED_FIFO is to eliminate those instances.

Understood.

>risk of a click while letting the control-C get through. But this
>approach is less reasonable to try if the only way to let SCHED_OTHER
>processes run incurs a HZ delay, even if no other processes want to
>run.

I really don't see anyway around this. If you put it to sleep, it
will sleep for a minimum of 1/HZ seconds.

Who invented -timesync mode ? Is it part of the MP4-SA specification ?
I ask because it seems fundamentally broken to try to implement
something that meets your description of -timesync on a single
processor controlled by a multi-tasking OS.

You write:

>in this case, in -timesync mode, it spins to finish the
>control-period, then checks the MIDI In queue for new notes, and
>processes them.

I would have thought that a better model is:

  * check for MIDI events at the top of the control cycle
  * do processing/synthesis/whatever
  * send data to card
  * potentially block waiting for next interrupt from soundcard
       rather than busywaiting, which is inimical to a UP/multitasking
       system, particular if the thread that does it is SCHED_FIFO

I can't see any situations in which explicit busy-waiting is a win
over blocking, and from the sound of it, the problem you face would
arise in either case anyway.

>ssentially, in -timesync mode, sfront monitors the data from
>the SNDCTRL_DSP_GETOPTR and SNDCNTRL_DSP_GETOSPACE ioctl's, and
>busy-wait spins when necessay to ensure this always happens. Actually,
>the development version (the upcoming sfront 0.62 in a few weeks) uses
>both, and thus works more reliably -- the current 0.61 only used
>SNDCTRL_DSP_GETOPTR info and tried to deduce blocking status from the
>size value, with enough heuristics to work >99% of the time, and fall
>victim to an infinite loop bug under rare conditions.
>
>This is done because it was the most straightforward way to ensure
>MP4-SA semantics, in the case where you have "pre-recorded" SASL or
>MIDI data that a user is "playing along" with in real-time (either
>audio in or MIDI in or both), in a world where you only have a single
>thread (the "ANSI C in the core code section" requirement).

Well, it may be the most straightforward, but its broken, which isn't
going to help you. Specifically:

>Once you go with this approach, its natural to use MMMM=7FFF in the
>SETFRAGMENT ioctl, and use the maximal number of buffers, which is
>what the C code sfront generates currently does.

This is a bad idea. If you're in search of low-latency (i.e. MIDI
events cause a change in the emitted audio very quickly), using the
maximal number of buffers is a bad idea. You want to use 3 whenever
possible. To recap an endless thread on LAD:

there are 2 kinds of latency:

      * event latency: the time between an external event being
           received by the program, and the time until the results of
           that event on the audio output are delivered to the outputs
           of the audio interface. This is determined by the total
           buffer size, since in the worst case, the event may be
           delivered just as the program has filled the entire buffer
           and the card has just started playing it. The new audio
           cannot be delivered until the entire buffer has played out.
 
      * in/out latency: the time between audio being received from
           the inputs of an audio interface and the time it is
           delivered to the outputs. This is determined by the
           fragment size primarily, since the program can only
           pick up the audio data when the soundcard sends an
           interrupt to notify us that the data is ready, and the
           interrupt frequency is determined by the fragment size
           (99% of the time).

So, a really all-round-low-latency program wants to use a small
fragment size, and have as few fragments as possible. From experiments
that Benno and I have done, 3 seems to be the best answer on a Linux
system.

Now, this will not help your situation when the generated decoder is
taking 100% of the fragment wall-clock time to compute the next
fragment. I can see two situations:

          1) it does this all the time

             In this case, you're just screwed. You can't afford
             to ever allow any other tasks to run, which means
             you can't allow signals to be raised by other tasks
             which might stop the decoder. If you try to break
             from the processing, you'll cause a dropout.

          2) it does this some of the time

             In this case, the question I see is: is it better to
             busywait using the OPTR and IPTR ioctls, or is it better
             to try to block-on-write, and check the time when you return
             from the write(2) ? Its a rhetorical question, of course :)

So, I think you'd do better by doing something like this psuedo-code
super-simple model:

             while (1) {
                  if (MIDI events pending) {
                      do whatever is necessary to deal with them
                  }

                  do_processing_and_synthesis ();

                  write_start = gettimeofday();
                  write (fd, audio_data, nbytes);
                  write_end = gettimeofday();

                  if (write_end - write_start < some_threshold) {
                     damn_i_am_so_busy ();
                  } else {
                     things_are_pretty_relaxed_around_here ();
                  }
             }

obviously, damn_i_am_so_busy() may well be a no-op. By contrast,
things_are_pretty_relaxed_around_here() can yield the processor.

However, here I return to the earlier point about the minimal sleep
delay. If you want "play along performance", you're going to have to
run with fragment sizes on the order of 5ms or less. Thats half the
typical HZ-induced minimal sleep time. So its still not clear to me
what you can possibly do to yield the processor in a dropout-safe
way.

This suggests to me that the logic needs to be reversed a little:

     if (write_end - write_start < some_threshold) {
           i_have_been_busy++;
           if (i_have_been_busy > some_other_threshold) {
                things_are_too_hot_for_a_multitasking_os ();
           }
     } else {
           i_have_been_busy = 0;
     }

This will end up yielding the processor after a certain number of
"heavy duty" cycles, in a fairly natural way, with the implicit
message for the user (if the thresholds are chosen carefully): you're
asking me to compute too much for safety right now.

Another caveat: if the decoder is running on an SMP system, none of
this nonsense is necessary.

I hope that my point of view might be useful. Its a problem near and
dear to my heart, since Quasimodo faces very similar issues, with the
proviso that its multithreaded from the start, making certain
solutions much easier, and not subject to any big black books of specs :)

--p


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Sun Jun 11 2000 - 08:46:57 EEST