Re: [linux-audio-dev] peakfiles and EDL's

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] peakfiles and EDL's
From: Tom Pincince (stillone_AT_snowcrest.net)
Date: Sun Feb 25 2001 - 23:50:11 EET


> might occur at sample 3786). So how can we possibly decide what
> max/min values to use for the 2nd chunk of 2048 samples in the audio
> stream ? Its presumably based on both files, but can we determine it
> without reading the actual audio data for that part of the audio
> stream ?
>
>
> And it gets worse: what happens if the inserted material is not
> aligned with the first sample of the second file, but is offset. Now,
> every precomputed max/min pair for this file is essentially useless
> because they are "out of phase" with the way the audio is actually
> being used.
>
I can see how one's mind could become wrapped up in this. However,
accuracy is not the issue with peakfiles. Their level of resolution is
so crude relative to the actual audio that their only purpose is to
provide a general overview to assist in approximate locating of audio
events. At this level of detail, the issues you are looking at become
non-issues. Detailed edits must be made at a zoom level that does not
use peakfiles. How general can peakfiles be and still be useful? Very
general. Consider that a 10 minute file may be represented on screen as
being 400 pixels wide at a particular zoom level. Most of your peakfile
samples are not even being used. Digidesign uses a method where a
particular zoom level is identified as the maximum level that can use
peakfiles. Zooming in past this level requires that every displayed
waveform be calculated from the soundfile, but only the segment of the
waveform that is being displayed gets computed. This is no big deal
because the zoom level that requires this results in only a few seconds
of audio taking up the whole display. If a region is dragged at this
zoom level by an amount greater than the width of the display, all other
displayed waveforms are computed on the fly, creating a stuttering kind
of scroll. Waveform scrolling during playback is only permitted when
the zoom level is low enough to permit the use of peakfiles.

I suggest the simplest method possible for constructing peakfiles and
using them in waveform displays. Each soundfile has its own peakfile.
Using your example, peakfile sample 0 represents soundfile samples 0 -
2047, peakfile sample 1 represents soundfile samples 2048 - 4095...
When using peakfiles to produce waveform displays, any segment of a
region that begins on sample 0 - 2047 will be represented using peakfile
sample 0 as the first sample in the waveform display. Now your issue is
that the region may begin on sample 2000 and the peak is located at
sample 1234, so the displayed waveform includes a sample that is not
even in the region being used. This simple technique guarantees that
50% of the time, when the beginning of a region is not the beginning of
the soundfile, or at the junction of two files that does not fall
exactly at a multiple of 2048 samples from t=0, or the end of a region
that is not the end of the soundfile, the first and/or last sample in
the waveform display will be computed from a peakfile sample that is not
actually included in the region being displayed. This error only occurs
in the first and last pixel of the waveform display. All others will be
correct. If you synchronize the zoom resolution steps with the peakfile
sample rate, so that waveform display and peakfile samples stay in phase
with each other, then if you zoom out so that one pixel covers more than
one peakfile sample, you may use a method that chooses the highest value
peakfile sample to be represented by that pixel, and that peakfile
sample may not be the first one. In this case the error disappears.
Regarding the display of audio at a junction, simply treat them as
separate regions with their own separate adjacent waveform displays.
Worst case will occur when the zoom level has 1 pixel = 2048 samples
(since at greater zoom levels you will have to compute the display from
the soundfile instead of the peakfile). In your example pixel 0
displays peakfile 1 sample 0. Pixel 1 displays either peakfile 1 sample
1 or peakfile 2 sample 0, whichever is greater (or peakfile 1 sample 1
automatically to keep things simple). Pixel 2 displays peakfile 2
sample 0 (because this segment begins with a sample between 0 and
2047). Pixel 3 displays peakfile 2 sample 1... Regarding "out of
phase", the shift will result in a waveform that is positioned
incorrectly by a maximum of one pixel to the left or right, and this is
simply not a big deal for a generalized waveform display that only shows
the envelope of the sound and not the actual wave. None of these errors
are audible, since they only affect the display, and they disappear as
soon as you zoom in enough to cause a precise recalculation of the
display based on the soundfile. All of my final edits are done at this
level of zoom, so I am completely unconcerned with the accuracy of the
wave overview display as long as it is good enough to get me within 5
seconds of the edit that I want to make. If form follows function, I
can't see why you would try to design a waveform display system that
offers higher resolution than this from peakfile information.

> >instances. Do you think that resampling the peak would hurt? It is hard
>
>
> Well, it depends. The model at this point is that each raw audio file
> has one corresponding peakfile. Since each raw audio file can be
> used many times, with different potential "peak phase" choices, that
> would mean generating (potentially) N-1 different peakfiles (where N
> is the number of samples per peak). This seems like a bad idea.
>
Stick with one peakfile per soundfile and live with the phase error,
unless I am completely missing something and the phase error actually
does more than shift the display by a maximum of one pixel (in which
case, please tell me what I am missing).

> However, in thinking about the "objects" (Samplitude) / "regions"
> (ProTools) model, I can see how they get this too work: you compute
> the peak data for each object/region, and it *never* changes because
> the object/region is atomic (you can't subdivide it without creating a
> new object/region). Hmm. There are other reasons for moving toward
> this model, but this might be the killer.
>
I don't know about pt, but in session this is not true. The peakfile is
header info in the sdII file format. I have a very slow computer, so I
always turn off the "calculate waveform overview" feature. Regions are
then displayed as empty boxes, unless I zoom in to the point where the
actual waveform is calculated. If for some reason I decide that I want
overview info for a particular file, I select it and choose "calculate
selected waveform overview". If I start session A and record soundfiles
1 and 2, then calculate waveform overview for only soundfile 1, then
close session A and start a new session B and import soundfiles 1 and 2
into session B, when I drag regions 1 and 2 into tracks of session B,
region 1 has waveform info and region 2 does not. This means that
session writes the waveform overview data as header data on the
soundfile to be read by any compatible app, and session uses only this
data when computing waveform displays. They do not compute new peak
data for new regions. My slow computer confirms this. If I record a 10
minute soundfile, the peakfile takes about 1 minute to compute. If I
create a new region by trimming the first and last 30 seconds from the
complete 10 minute region, the new region's waveform display is
available immediately. It is also possible to change a region without
generating a new one if it is the only instance of a region that has
been created by modifying a pre-existing region.

Tom

.


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Mon Feb 26 2001 - 00:16:15 EET