Re: [LAU] JACK2 hardware buffer size for browser-based video conferencing

From: David Kastrup <dak@email-addr-hidden>
Date: Tue Jan 05 2021 - 18:45:28 EET

"Andrew A. Grathwohl" <andrew@grathwohl.me> writes:

> Hi David,
>
> Thanks this was super-informative!
>
> We can likely rule out the idea that the smaller buffer sizes tax the
> computer further, leading to more fan/PSU noise, since the machine
> itself is not in the same room as the microphone.
>
> I am intrigued by your comments about the sampling rate on my Babyface
> Pro. I have always set it to 48kHz whenever doing low-latency audio or
> any audio that will be transmitted over a network cable, which I mostly
> do superstitiously. Is there any guidance out there about what the
> correct sampling rate would be for my device, or about how to determine
> this answer for myself?

With RME, I'd trust the soundcard, no questions asked. With other
soundcards offering substantially higher sample rates, there is a chance
that downsampling (after proper digital filtering) can lead to better
quality and/or lower latency.

Cf. the parameters in sox:

        rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
                Change the audio sampling rate (i.e. resample the
                audio) to any given RATE (even non-integer if this is
                supported by the output file format) using a quality
                level defined as follows:
                     Quality .na Band-width Rej dB .na Typical Use
                -q .na quick n/a .na 30 @ Fs/4 .na playback on ancient hardware
                -l low 80% 100 .na playback on old hardware
                -m medium 95% 100 .na audio playback
                -h high 95% 125 .na 16-bit mastering (use with dither)
                -v .na very high 95% 175 24-bit mastering

                where Band-width is the percentage of the audio
                frequency band that is preserved and Rej dB is the
                level of noise rejection. Increasing levels of
                resampling quality come at the expense of increasing
                amounts of time to process the audio. If no quality
                option is given, the quality level used is `high' (but
                see `Playing & Recording Audio' above regarding
                playback).

                The `quick' algorithm uses cubic interpolation; all
                others use band-limited interpolation. By default, all
                algorithms have a `linear' phase response; for
                `medium', `high' and `very high', the phase response is
                configurable (see below).

                The rate effect is invoked automatically if SoX's -r
                option specifies a rate that is different to that of
                the input file(s). Alternatively, if this effect is
                given explicitly, then SoX's -r option need not be
                given. For example, the following two commands are
                equivalent:

                   sox input.wav -r 48k output.wav bass -b 24
                   sox input.wav output.wav bass -b 24 rate 48k

                though the second command is more flexible as it allows
                rate options to be given, and allows the effects to be
                ordered arbitrarily.
                * * *

                Warning: technically detailed discussion follows.

                The simple quality selection described above provides
                settings that satisfy the needs of the vast majority of
                resampling tasks. Occasionally, however, it may be
                desirable to fine-tune the resampler's filter response;
                this can be achieved using override options, as
                detailed in the following table:
                -M/-I/-L Phase response = minimum/intermediate/linear
                -s Steep filter (band-width = 99%)
                -a Allow aliasing/imaging above the pass-band
                -b 74-99.7 Any band-width %
                -p 0-100 .na Any phase response (0 = minimum, 25 =
                            intermediate, 50 = linear, 100 = maximum)

                N.B. Override options cannot be used with the `quick'
                or `low' quality algorithms.

                All resamplers use filters that can sometimes create
                `echo' (a.k.a. `ringing') artefacts with transient
                signals such as those that occur with `finger snaps' or
                other highly percussive sounds. Such artefacts are
                much more noticeable to the human ear if they occur
                before the transient (`pre-echo') than if they occur
                after it (`post-echo'). Note that frequency of any
                such artefacts is related to the smaller of the
                original and new sampling rates but that if this is at
                least 44.1kHz, then the artefacts will lie outside the
                range of human hearing.

                A phase response setting may be used to control the
                distribution of any transient echo between `pre' and
                `post': with minimum phase, there is no pre-echo but
                the longest post-echo; with linear phase, pre and post
                echo are in equal amounts (in signal terms, but not
                audibility terms); the intermediate phase setting
                attempts to find the best compromise by selecting a
                small length (and level) of pre-echo and a medium
                lengthed post-echo.

                Minimum, intermediate, or linear phase response is
                selected using the -M, -I, or -L option; a custom phase
                response can be created with the -p option. Note that
                phase responses between `linear' and `maximum' (greater
                than 50) are rarely useful.

                A resampler's band-width setting determines how much of
                the frequency content of the original signal
                (w.r.t. the original sample rate when up-sampling, or
                the new sample rate when down-sampling) is preserved
                during conversion. The term `pass-band' is used to
                refer to all frequencies up to the band-width point
                (e.g. for 44.1kHz sampling rate, and a resampling
                band-width of 95%, the pass-band represents frequencies
                from 0Hz (D.C.) to circa 21kHz). Increasing the
                resampler's band-width results in a slower conversion
                and can increase transient echo artefacts (and vice
                versa).

                The -s `steep filter' option changes resampling
                band-width from the default 95% (based on the 3dB
                point), to 99%. The -b option allows the band-width to
                be set to any value in the range 74-99.7 %, but note
                that band-width values greater than 99% are not
                recommended for normal use as they can cause excessive
                transient echo.

                If the -a option is given, then aliasing/imaging above
                the pass-band is allowed. For example, with 44.1kHz
                sampling rate, and a resampling band-width of 95%, this
                means that frequency content above 21kHz can be
                distorted; however, since this is above the pass-band
                (i.e. above the highest frequency of
                interest/audibility), this may not be a problem. The
                benefits of allowing aliasing/imaging are reduced
                processing time, and reduced (by almost half) transient
                echo artefacts. Note that if this option is given,
                then the minimum band-width allowable with -b increases
                to 85%.

                Examples:

                   sox input.wav -b 16 output.wav rate -s -a 44100 dither -s

                default (high) quality resampling; overrides: steep
                filter, allow aliasing; to 44.1kHz sample rate;
                noise-shaped dither to 16-bit WAV file.

                   sox input.wav -b 24 output.aiff rate -v -I -b 90 48k

                very high quality resampling; overrides: intermediate
                phase, band-width 90%; to 48k sample rate; store output
                to 24-bit AIFF file.
                * * *

                The pitch and speed effects use the rate effect at
                their core.

As you can see, downsampling is a science... If you sacrifice linear
phase response (which makes the main difference at very high
frequencies), you can achieve lower latency, though the difference at
lower frequencies will be comparatively minimal.

-- 
David Kastrup
_______________________________________________
Linux-audio-user mailing list
Linux-audio-user@lists.linuxaudio.org
https://lists.linuxaudio.org/listinfo/linux-audio-user
Received on Wed Jan 6 04:15:02 2021

This archive was generated by hypermail 2.1.8 : Wed Jan 06 2021 - 04:15:02 EET