Subject: Re: [Fwd: [linux-audio-dev] Command parser?]
From: Paul Winkler (slinkp23_AT_yahoo.com)
Date: Wed Aug 23 2000 - 18:31:37 EEST
> Maurizio Umberto Puxeddu wrote:
> > You should have a look to the Python standard library, in particular
> > mudules like re (regular espressions) and cmd (a framework for command
> > interpreter), but there are modules for audio data handling too!
I'm also a great fan of python for this sort of thing. But while
regular expressions are extremely powerful, they're also really
gunky-looking. It's annoying to have to come back to your code a
week later and try to remember what you meant by
m{ \(
(
(?> [^()]+ )
|
\( [^()]* \)
)+
\)
}x
(snipped from the perlre man page).
But there's a Better Way(tm).
With Python you can use the reverb module.
It allows you to create regular expressions in a syntax that is
actually readable by humans instead of "resembling quoted line
noise".
Reverb (Regular Expressions - Verbose) can be downloaded from
http://pobox.com/~JasonHarper/
Here's some example code. This searches a string and spits out
everything that looks like a US-style phone number. It's pretty
complex but it does quite a good job - it can deal with almost any
typical way people type US phone numbers, and it actually breaks the
numbers down into their component parts.
#!/usr/bin/env python
from reverb import * # Make nice regular expressions instead of
gobbledygook.
# Let's come up with some really ugly variations...
test_string = """
(914) 657-0000
212- 300-8299 ,
555 1212
666- 3434
1 888 777 - 2121 , 1 -( 246) 732 2222
1 222 333 4444 ,1 (444) 888
2000
"""
# Regular expression to extract plain old phone numbers.
# This is quite complicated, but it needs to deal with all of the
above.
# Some explanations:
# group() means you can later refer to this part of the regexp by
name.
# The numeric arguments to repeated() give the minimum and maximum
# number of repeats, so repeated(digit(3,3)) matches exactly 3
digits.
# The last argument to RE() is two flags that affect its behavior.
# IGNORECASE does the obvious, DOTALL means whitespace can include
# newlines.
phonegrabber= RE(wordbreak +
group(optional(wordbreak + group(digit,
name="prefix") +
optional(whitespace) +
optional("-")
) + optional(whitespace) +
optional(optional("(") +
group(optional(whitespace) +
repeated(digit, 3, 3) +
optional(whitespace)
, name="area") +
optional( text(")") | "-") +
optional(whitespace)
) +
group(optional(whitespace) +
repeated(digit, 3, 3) +
optional(whitespace)
, name="exchange") +
required("-" | whitespace) +
group(repeated(digit, 4, 4)
, name='last_4')
, name='phone_number') +
optional(whitespace)
, IGNORECASE | DOTALL
)
############ Now do the actual work. ###################
def test():
global test_string
phones = phonegrabber.findall(test_string)
# That returns a list of tuples containing all the matched
groups.
# The string format % operator will insert values from a tuple
into
# a string.
for p in phones:
print """Number: %s
Prefix: %s Area: %s Exchange: %s Number: %s
""" % p
# Now remove all numbers from the string.
test_string = phonegrabber.sub('', test_string)
# No more numbers ... let's make sure by examining the string.
print "Leftovers:\n", test_string
test()
# End of script.
-- ................. paul winkler .................. slinkP arts: music, sound, illustration, design, etc. web page: http://www.slinkp.com A member of ARMS: http://www.reacharms.com
This archive was generated by hypermail 2b28 : Wed Aug 23 2000 - 23:56:35 EEST