| 7.9 Solutions to Exercises |
|
|
The \t will not be substituted with the nonprintable tab character unless you are running version 8.1 or later. Instead, the regular expression evaluator will see it as a letter "t." Fix the problem this way: set Space_ "\[ \t]"The backslash before the left square bracket prevents the Tcl interpreter from doing command substitution. The first part, $Pre1_, is not interpreted by the Tcl interpreter during preassignment. The backslashes tell the regular-expression translator that there are no special symbols. This part matches [0-9] exactly. The second part, $Pre_2, is interpreted by the Tcl interpreter during preassignment. The regular-expression translator sees [0-9], which matches any single digit. So, the whole regular expression pattern would match [0-9]3, but not 3[0-9]. The second preassignment to Pre2_ would cause an error because 0-9 is not a command name and so command substitution fails. By the way, the preassignment to Pre1_ could have been written this way:
set Pre1_ {\[0-9]}
because the hypen and the right square bracket are not considered to be
special symbols by the regular-expression translator unless they follow a left
square bracket.
regexp "^Tcl$" $NameVariable substitution is not attempted when a symbol other than a letter, number, or underscore follows a dollar sign. This rule is consistent with what you have had to learn about safe variable names.
set NoDot_ {[^\.]}
As it happens, the backslash is not necessary. Within square brackets, the
only special symbols that are recognized are ^, -, and ]. I
prefer to ignore this rule and do the backslash substitutions for
nonalphameric characters. (The word "nonalphameric" is important here.
Indeed, with version 8.1, a backslash of a letter is either a request for a
special backslash substitution, such as \t or \n, or an error.) If
you want to take advantage of it, you should know that the rule even has a
counterpart with glob pattern matching that I did not mention there.
% regexp -indices "\[a-z]ab" abab Match 1 % set Match 1 3 % regexp -indices t$ catbert Match 1 % set Match 6 6
regexp -indices $Space_$Quote_ { "} Match
Matches and Match is 1 2
regexp $Digit_.$Digit_ 201 Match Matches and Match is 201
regexp $NoDot_*$Dot_ "Interesting. But not relevant." Match
Matches and Match is Interesting.
regexp ".*" "" Match Matches and Match is the empty string.
regexp catbert|cat catbert Match Matches and Match is catbert
regexp cat|catbert catbert Match Matches and
Match is cat in version 8.0 and earlier
Match is catbert in version 8.1 and later
regexp c?t|at catbert Match Matches and Match is at
regexp $NoLowerCase_*at|catbert Catbert Match
Matches and Match is Cat
regexp $NoLowerCase_*bert|bert Catbert Match
Matches and Match is bert
In the last one it is the leftmost branch that is used. Remember that the
* repeater lets a quasichar match an empty string, an imaginary empty
string exists at the front of each character in a string, and that when two
matches are the same length the leftmost one prevails in all versions of Tcl.
set CarriageRet_ "\n" set NoCarriageRet_ "\[^\n]" regexp "^$NoCarriageRet_*$CarriageRet_" $Str Match
This, regexp "(cat | dog)*bert" catdogbert Matchreturns 1, but the "(cat | dog)*" part had to match an empty string because there is no space before the "dog" in "catdogbert;" Match is "bert." This, regexp "($NoLetter_+|nil) + ($NoLetter_+|nil)" "Answer: 2.6 +nillem" Matchreturns 0. The + does not match the "+" in the string because it is a repeater. The match you may have thought you were getting happens with this version:
set Plus_ {\+}
regexp "($NoLetter_+|nil) $Plus_ ($NoLetter_+|nil)" "Answer: 2.6 + nillem"
This,
regexp -nocase "^(From:|To:) *$OkChar_+$" \
"From: jazimmer@acm.org\n" \
Match
returns 0. Here it is the \n that causes the trouble. The $ in the
pattern does not match it because it is the end of a line, not the end of a
string. This string, "From: jazimmer@acm.org," would match just fine.
proc getSummary String {
set Beginning_ "(^|\n)"
set Space_ "\[ \t]"
set InLine_ "\[^\n]"
if [regexp "$Beginning_$Space_*Summary$InLine_*" $String Line] {
return [string trim $Line "\n "]
} else {
return ""
}
}
Here is the way it is done using parentheses to extract subpatterns as described above in Use Parentheses to Extract Subpatterns.
proc getSummary String {
set Beginning_ "(^|\n)"
set Space_ "\[ \t]"
set InLine_ "\[^\n]"
if [regexp "$Beginning_$Space_*(Summary$InLine_*)" $String \
Junk1 Junk2 Summary] \
{
return $Summary
} else {
return ""
}
}
set Space_ "\[ \t]"
set Labl_ "\[^ \t]+"
set Int_ {[0-9]*}
regexp "$Space_*($Labl_)$Space_+($Int_)$Space_+($Int_)" $Line \
Junk Label Before After
regsub -all & $Str && Str
set ToLft_ "^|\[^a-zA-Z]" set ToRght_ "\[^a-zA-Z]|$" regsub -all ($ToLft_)cat($ToRght_) $Str \\1dog\\2 Str regsub -all ($ToLft_)cat(s?)($ToRght_) $Str \\1dog\\2\\3 Str |
Author's Home Page |
|
Order from Amazon. |