Navigation Logo 7.6  Use Parentheses to Extract Subpatterns Navigation Logo

 

 

While parentheses can permit you to write more complicated regular expressions, their main purpose may be to let you extract substrings from a matching string. Suppose you have some document in which negative numbers are represented by being placed inside parentheses, for example, (45.32) (2.94). All numbers have two digits to the right of the decimal point. There may, or may not, be any digits to the left of the decimal point. Here is a pattern to match those numbers.

$LParen_$Number_$RParen_
The pattern relies on these preassigned subpatterns:
set LParen_ {\(}
set RParen_ {\)}
set Digit_ {[0-9]}
set Dot_ {\.}
set Number_ $Digit_*$Dot_$Digit_$Digit_

Now, suppose you want to search for a parenthesized negative number, extract the nonnegative number in the parentheses, and make it negative. There is a variation of regexp that will help:

regexp ?SWITCHES? PATTERN STRING VAR_NAME1 VAR_NAME2 ... VAR_NAMEn
As with the other forms of regexp, VAR_NAME1 is the name of the variable that will be assigned the entire matching substring. The other VAR_NAMEis are new. They are assigned substrings determined by the way you add parentheses to your pattern.

VAR_NAME2 is the name of a variable that will be assigned the part of the matching substring that matches the subpattern in the leftmost parentheses. VAR_NAME3 is the name of a variable that will be assigned the part of the matching substring that matches the subpattern in the next-to-leftmost parentheses. And so on.

You discover whether one set of parentheses is to the left of the other by looking at the actual placement of the two left parentheses. Forget about branches, nesting, or whatever. A set of parentheses appears to the left of another if its left side appears to the left.

The return value is a boolean indicating whether the complete match was successful.

To extract the number part of the previous pattern, we need to put parentheses around it, something like this:

$LParen_($Number_)$RParen_

The variable LParen has been defined so that it will match a left parenthesis and not be seen as a special symbol by regexp. Unfortunately, the string shown above is a case where the left parenthesis can also be a special symbol for Tcl. When interpreting the string, Tcl thinks LParen_ is being used as an array!

Tcl has ways of handling this problem. The left parenthesis could be protected with a backslash, or the variable name could be delineated with curly brackets. Using the second trick, the regexp command looks like this:

 
regexp ${LParen_}($Number_)$RParen_ $Text Junk Number 
If a match is found, Junk will contain the entire matching substring, which we do not care about, and Number will contain the desired number.

Exercise 7.6a

A table is represented in an ASCII file. Each line contains three things. First a label. Then two positive (or at least nonnegative) integers representing "before" and "after" values for the item named in the label. These three things are separated by blanks or tabs. The label can contain anything that is not a blank or tab. The label may be indented. Here is an example line:
BrandX    17    18

Assume that Line contains one of these lines. Write a regexp command that extracts the three things and places them in the variables Label, Before, and After.

Solution

 

 

[Sample TK Application]
Author's Home Page
Navigation Logo [Book's Cover]
Order from Amazon.