tcl => 正規表現

構文

regexp？スイッチ？ exp string？matchVar？？subMatchVar subMatchVar ...？
regsub？スイッチ？ exp string subSpec？varName？

備考

このトピックでは、正規表現自体について議論するつもりはありません。インターネットには、正規表現を構築するのに役立つ正規表現とツールを説明する多くのリソースがあります。

このトピックでは、Tclで正規表現を使用する一般的なスイッチとメソッド、およびTclと他の正規表現エンジンとの違いのいくつかについて説明します。

正規表現は一般的に遅いです。最初に質問する必要があるのは、「本当に正規表現が必要ですか？」です。あなたが望むものにのみ一致します。他のデータが必要ない場合は、それにマッチしないでください。

これらの正規表現の例では、正規表現をコメントして説明するために、-expandedスイッチが使用されます。

マッチング

regexpコマンドは、正規表現を文字列と照合するために使用されます。

# This is a very simplistic e-mail matcher.
# e-mail addresses are extremely complicated to match properly.
# there is no guarantee that this regex will properly match e-mail addresses.
set mydata "send mail to [email protected] please"
regexp -expanded {
    \y           # word boundary
    [^@\s]+      # characters that are not an @ or a space character
    @            # a single @ sign
    [\w.-]+      # normal characters and dots and dash
    \.           # a dot character
    \w+          # normal characters.
    \y           # word boundary
    } $mydata emailaddr
puts $emailaddr
[email protected]

regexpコマンドは、一致した場合は1（真）の値を返し、一致しない場合は0（偽）を返します。

set mydata "hello wrld, this is Tcl"
# faster would be to use: [string match *world* $mydata] 
if { [regexp {world} $mydata] } {
   puts "spelling correct"
} else {
   puts "typographical error"
}

一部のデータのすべての式を一致させるには、-allスイッチと-inlineスイッチを使用してデータを返します。デフォルトでは、改行は他のデータと同様に扱われることに注意してください。

# simplistic english ordinal word matcher.
set mydata {
    This is the first line.
    This is the second line.
    This is the third line.
    This is the fourth line.
    }
set mymatches [regexp -all -inline -expanded {
    \y                  # word boundary
    \w+                 # standard characters
    (?:st|nd|rd|th)     # ending in st, nd, rd or th
                        # The ?: operator is used here as we don't
                        # want to return the match specified inside
                        # the grouping () operator.
    \y                  # word boundary
    } $mydata]
puts $mymatches
first second third fourth
# if the ?: operator was not used, the data returned would be:
first st second nd third rd fourth th

改行処理

# find real numbers at the end of a line (fake data).
set mydata {
    White 0.87 percent saturation.
    Specular reflection: 0.995
    Blue 0.56 percent saturation.
    Specular reflection: 0.421
    }
# the -line switch will enable newline matching.
# without -line, the $ would match the end of the data.
set mymatches [regexp -line -all -inline -expanded {
    \y                  # word boundary
    \d\.\d+             # a real number
    $                   # at the end of a line.
    } $mydata]
puts $mymatches
0.995 0.421

Unicodeには特別な処理は必要ありません。

% set mydata {123ÂÃÄÈ456}
123ÂÃÄÈ456
% regexp {[[:alpha:]]+} $mydata match
1
% puts $match
ÂÃÄÈ
% regexp {\w+} $mydata match
1
% puts $match
123ÂÃÄÈ456

ドキュメント： regexp re_syntax

欲張りと非貪欲の数量を組み合わせる

最初の量指定子として貪欲なマッチがある場合は、RE全体が貪欲になりますが、

最初の量指定子として貪欲でない一致がある場合、RE全体は非貪欲になります。

set mydata {
    Device widget1: port: 156 alias: input2
    Device widget2: alias: input1 
    Device widget3: port: 238 alias: processor2
    Device widget4: alias: output2
    }
regexp {Device\s(\w+):\s(.*?)alias} $mydata alldata devname devdata
puts "$devname $devdata"
widget1 port: 156 alias: input2
regexp {Device\s(.*?):\s(.*?)alias} $mydata alldata devname devdata
puts "$devname $devdata" 
widget1 port: 156

最初のケースでは、最初の\ w +は欲張りなので、すべての量指定子はgreedyと。*とマークされています。期待以上にマッチします。

2番目のケースでは、最初の。*？非欲張りであり、すべての量子は非貪欲であるとマークされる。

他の正規表現エンジンでは、greedy / non-greedyの量指定子で問題が発生しないかもしれませんが、はるかに遅いです。

ヘンリー・スペンサーは書いた：...トラブルは、混在貪欲正規表現をカバーし、これらのステートメントの一般書くのは非常に、非常に困難であるということです-混合貪欲正規表現が一致している必要があります何を適切に、実装に依存しない定義を-彼らが「何を期待するのか」をさせる。私はもう試した。私はまだ試しています。今まで運がない。 ...

代理

regsubコマンドは、正規表現の照合と置換に使用されます。

set mydata {The yellow dog has the blues.}
# create a new string; only the first match is replaced.
set newdata [regsub {(yellow|blue)} $mydata green]
puts $newdata
The green dog has the blues.
# replace the data in the same string; all matches are replaced
regsub -all {(yellow|blue)} $mydata red mydata
puts $mydata
The red dog has the reds.
# another way to create a new string
regsub {(yellow|blue)} $mydata red mynewdata
puts $mynewdata
The red dog has the blues.

バックリファレンスを使用して一致するデータを参照する。

set mydata {The yellow dog has the blues.}
regsub {(yellow)} $mydata {"\1"} mydata
puts $mydata
The "yellow" dog has the blues.

ドキュメント： regsub re_syntax

TclのREエンジンと他のREエンジンとの違い。

\ m：単語の先頭。
\ M：単語の終わり。
\ y：単語の境界。
\ Y：単語の境界ではない点。
\ Z：データの終わりに一致します。

ドキュメント： re_syntax

リテラル文字列と正規表現のマッチング

時には、REメタ文字を含む部分文字列にもかかわらず、リテラル（サブ）文字列を正規表現にマッチさせる必要があることがあります。もちろん、適切なバックスラッシュを挿入して（ string mapを使用して）適切なバックスラッシュを挿入するコードを記述することは可能です。 ***=パターンに接頭辞を付けるのが最も簡単です。これによりREエンジンは残りの文字列を単なるリテラル文字それ以降のすべてのメタキャラクタを無効にします。

set sampleText "This is some text with \[brackets\] in it."
set searchFor {[brackets]}

if {[ regexp ***=$searchFor $sampleText ]} {
    # This message will be printed
    puts "Found it!"
}

これはアンカーを使用できないことを意味します。

Modified text is an extract of the original Stack Overflow Documentation

ライセンスを受けた CC BY-SA 3.0

所属していない Stack Overflow

tcl
正規表現

サーチ…

構文

備考

マッチング

欲張りと非貪欲の数量を組み合わせる

代理

TclのREエンジンと他のREエンジンとの違い。

リテラル文字列と正規表現のマッチング