Navigation Logo 7.9  Solutions to Exercises Navigation Logo

 

 

Solution To Exercise 7.2a

The \t will not be substituted with the nonprintable tab character unless you are running version 8.1 or later. Instead, the regular expression evaluator will see it as a letter "t." Fix the problem this way:

set Space_ "\[ \t]"
The backslash before the left square bracket prevents the Tcl interpreter from doing command substitution.

Solution To Exercise 7.3a

The first part, $Pre1_, is not interpreted by the Tcl interpreter during preassignment. The backslashes tell the regular-expression translator that there are no special symbols. This part matches [0-9] exactly.

The second part, $Pre_2, is interpreted by the Tcl interpreter during preassignment. The regular-expression translator sees [0-9], which matches any single digit.

So, the whole regular expression pattern would match [0-9]3, but not 3[0-9].

The second preassignment to Pre2_ would cause an error because 0-9 is not a command name and so command substitution fails.

By the way, the preassignment to Pre1_ could have been written this way:

set Pre1_ {\[0-9]}
because the hypen and the right square bracket are not considered to be special symbols by the regular-expression translator unless they follow a left square bracket.

Solution To Exercise 7.3b

regexp "^Tcl$" $Name
Variable substitution is not attempted when a symbol other than a letter, number, or underscore follows a dollar sign. This rule is consistent with what you have had to learn about safe variable names.

Solution To Exercise 7.3c

set NoDot_ {[^\.]}
As it happens, the backslash is not necessary. Within square brackets, the only special symbols that are recognized are ^, -, and ]. I prefer to ignore this rule and do the backslash substitutions for nonalphameric characters. (The word "nonalphameric" is important here. Indeed, with version 8.1, a backslash of a letter is either a request for a special backslash substitution, such as \t or \n, or an error.) If you want to take advantage of it, you should know that the rule even has a counterpart with glob pattern matching that I did not mention there.

Solution To Exercise 7.3d

% regexp -indices "\[a-z]ab"  abab Match
1
% set Match
1 3
% regexp -indices t$ catbert Match
1
% set Match
6 6

Solution To Exercise 7.4a

regexp -indices $Space_$Quote_ {  "} Match
                                  Matches and Match is 1 2
regexp $Digit_.$Digit_ 201 Match  Matches and Match is 201
regexp $NoDot_*$Dot_ "Interesting. But not relevant." Match
                                  Matches and Match is Interesting.
regexp ".*" "" Match              Matches and Match is the empty string.

Solution To Exercise 7.4b

regexp catbert|cat catbert Match  Matches and Match is catbert
regexp cat|catbert catbert Match  Matches and 
                                  Match is cat in version 8.0 and earlier
                                  Match is catbert in version 8.1 and later
regexp c?t|at catbert Match       Matches and Match is at
regexp $NoLowerCase_*at|catbert Catbert Match
                                  Matches and Match is Cat
regexp $NoLowerCase_*bert|bert Catbert Match
                                  Matches and Match is bert
In the last one it is the leftmost branch that is used. Remember that the * repeater lets a quasichar match an empty string, an imaginary empty string exists at the front of each character in a string, and that when two matches are the same length the leftmost one prevails in all versions of Tcl.

Solution To Exercise 7.4c

set CarriageRet_ "\n"
set NoCarriageRet_ "\[^\n]"
regexp "^$NoCarriageRet_*$CarriageRet_" $Str Match

Solution To Exercise 7.5a

This,

regexp "(cat | dog)*bert"  catdogbert Match
returns 1, but the "(cat | dog)*" part had to match an empty string because there is no space before the "dog" in "catdogbert;" Match is "bert."

This,

 
regexp "($NoLetter_+|nil) + ($NoLetter_+|nil)" "Answer: 2.6 +nillem" Match 
returns 0. The + does not match the "+" in the string because it is a repeater. The match you may have thought you were getting happens with this version:
set Plus_ {\+}
regexp "($NoLetter_+|nil) $Plus_ ($NoLetter_+|nil)" "Answer: 2.6 + nillem" 

This,

regexp -nocase "^(From:|To:) *$OkChar_+$" \
       "From: jazimmer@acm.org\n" \
       Match
returns 0. Here it is the \n that causes the trouble. The $ in the pattern does not match it because it is the end of a line, not the end of a string. This string, "From: jazimmer@acm.org," would match just fine.

Solution To Exercise 7.5b

proc getSummary String {
  set Beginning_ "(^|\n)"
  set Space_ "\[ \t]"
  set InLine_ "\[^\n]"
  if [regexp "$Beginning_$Space_*Summary$InLine_*" $String Line] {
     return [string trim $Line "\n "]
  } else {
     return ""
  }
}

Here is the way it is done using parentheses to extract subpatterns as described above in Use Parentheses to Extract Subpatterns.

proc getSummary String {
  set Beginning_ "(^|\n)"
  set Space_ "\[ \t]"
  set InLine_ "\[^\n]"
  if [regexp "$Beginning_$Space_*(Summary$InLine_*)" $String \
             Junk1 Junk2 Summary] \
  {
     return $Summary
  } else {
     return ""
  }
}

Solution To Exercise 7.6a

set Space_ "\[ \t]"
set Labl_ "\[^ \t]+"
set Int_ {[0-9]*}
regexp "$Space_*($Labl_)$Space_+($Int_)$Space_+($Int_)" $Line \
       Junk Label Before After

Solution To Exercise 7.7a

regsub -all & $Str && Str

Solution To Exercise 7.7b

set ToLft_ "^|\[^a-zA-Z]"
set ToRght_ "\[^a-zA-Z]|$"
regsub -all ($ToLft_)cat($ToRght_) $Str \\1dog\\2 Str

regsub -all ($ToLft_)cat(s?)($ToRght_) $Str \\1dog\\2\\3 Str
 

 

[Sample TK Application]
Author's Home Page
Navigation Logo [Book's Cover]
Order from Amazon.