[linux-audio-dev] Advice on detecting sentence boundaries

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: [linux-audio-dev] Advice on detecting sentence boundaries
From: Krzysztof Kowalczyk (krzysztofk_AT_pobox.com)
Date: Fri Nov 17 2000 - 01:39:30 EET


I realize this is not strictly about filters, but...

Here's what I would like to do: having a long wav file of spoken text
(eg. digitized audiobook) I would like to be able to play seperate
sentences one after another. To do that I need to detect the end of the
sequences (which in a well read text is a longer moment of silence).
Purpose: when you try to learn pronounciation of a foreign language the
narrator usually goes too fast for you to follow and repeat what he's
saying. Extending pauses between sentences would make it work (pause on
a walkman doesn't really cut it).

I have a feeling that it's not terrible difficult to do and I'm willing
to do my homework (I have basic knowledge of signal processing) but
would appreciate if more knowledgeable people could point me in the
right direction. My questions:
- does anyone knows about thing like that already existing either as
software or papers that describe it
- what would be the best way to go, does writing this as a LADSPA plugin
makes sense or would I be better off just reading stuff directly (using
libsndfile or libaudiofile) from disk and operating on this data
- what kind of processing does it involve, do I need to do DSP at all or
can I just go the simple way eg. say that a sentence boundary is when an
average for past n samples is below certain low threshold (in the
(naive?) thinking that silence is when samples in the time domain are
near 0)

Thanks for any advice you might provide.

Krzysztof Kowalczyk

http://www.fifthgate.org


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Nov 17 2000 - 09:02:14 EET