Re: [Fwd: [linux-audio-dev] Command parser?]

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [Fwd: [linux-audio-dev] Command parser?]
From: Paul Winkler (slinkp23_AT_yahoo.com)
Date: Wed Aug 23 2000 - 18:31:37 EEST


> Maurizio Umberto Puxeddu wrote:
> > You should have a look to the Python standard library, in particular
> > mudules like re (regular espressions) and cmd (a framework for command
> > interpreter), but there are modules for audio data handling too!

I'm also a great fan of python for this sort of thing. But while
regular expressions are extremely powerful, they're also really
gunky-looking. It's annoying to have to come back to your code a
week later and try to remember what you meant by
                     m{ \(
                           (
                             (?> [^()]+ )
                           |
                             \( [^()]* \)
                           )+
                        \)
                      }x

(snipped from the perlre man page).

But there's a Better Way(tm).
With Python you can use the reverb module.
It allows you to create regular expressions in a syntax that is
actually readable by humans instead of "resembling quoted line
noise".
Reverb (Regular Expressions - Verbose) can be downloaded from
http://pobox.com/~JasonHarper/

Here's some example code. This searches a string and spits out
everything that looks like a US-style phone number. It's pretty
complex but it does quite a good job - it can deal with almost any
typical way people type US phone numbers, and it actually breaks the
numbers down into their component parts.

#!/usr/bin/env python

from reverb import * # Make nice regular expressions instead of
gobbledygook.

# Let's come up with some really ugly variations...
test_string = """
(914) 657-0000
   212- 300-8299 ,

555 1212
666- 3434
 1 888 777 - 2121 , 1 -( 246) 732 2222
1 222 333 4444 ,1 (444) 888
2000
"""

# Regular expression to extract plain old phone numbers.
# This is quite complicated, but it needs to deal with all of the
above.

# Some explanations:

# group() means you can later refer to this part of the regexp by
name.

# The numeric arguments to repeated() give the minimum and maximum
# number of repeats, so repeated(digit(3,3)) matches exactly 3
digits.

# The last argument to RE() is two flags that affect its behavior.
# IGNORECASE does the obvious, DOTALL means whitespace can include
# newlines.

phonegrabber= RE(wordbreak +
                 group(optional(wordbreak + group(digit,
name="prefix") +
                                optional(whitespace) +
                                optional("-")
                                ) + optional(whitespace) +
                       optional(optional("(") +
                                group(optional(whitespace) +
                                      repeated(digit, 3, 3) +
                                      optional(whitespace)
                                      , name="area") +
                                optional( text(")") | "-") +
                                optional(whitespace)
                                ) +
                       group(optional(whitespace) +
                             repeated(digit, 3, 3) +
                             optional(whitespace)
                             , name="exchange") +
                       required("-" | whitespace) +
                       group(repeated(digit, 4, 4)
                             , name='last_4')
                       , name='phone_number') +
                 optional(whitespace)
                 , IGNORECASE | DOTALL
                 )

############ Now do the actual work. ###################

def test():
    global test_string
        
    phones = phonegrabber.findall(test_string)
    # That returns a list of tuples containing all the matched
groups.
    # The string format % operator will insert values from a tuple
into
    # a string.
    for p in phones:
        print """Number: %s
   Prefix: %s Area: %s Exchange: %s Number: %s
 """ % p
    
    # Now remove all numbers from the string.
    test_string = phonegrabber.sub('', test_string)
    # No more numbers ... let's make sure by examining the string.
    print "Leftovers:\n", test_string

test()

# End of script.

-- 
.................    paul winkler    ..................
slinkP arts:   music, sound, illustration, design, etc.
           web page:  http://www.slinkp.com
      A member of ARMS:   http://www.reacharms.com


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Wed Aug 23 2000 - 23:56:35 EEST