C # 中的正则表达式组

我继承了一个包含以下正则表达式的代码块,我试图理解它是如何得到结果的。

var pattern = @"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
...

输入 user == "Josh Smith [jsmith]":

matches.Count == 1
matches[0].Value == "[jsmith]"

我能理解,但是:

matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?

根据我的理解,看看 这个问题,Group 集合存储了整个匹配以及上一个匹配。但是,上面的 regexp 不是只匹配[开方括号][文本][关方括号]吗? 那么为什么“ jsmith”会匹配呢?

另外,group 集合是否总是只存储2个组: 整个匹配和最后一个匹配?

94761 次浏览

The parenthesis is identifying a group as well, so match 1 is the entire match, and match 2 are the contents of what was found between the square brackets.

How? The answer is here

(.*?)

That is a subgroup of @"[(.*?)];

The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).

Groups[0] is your entire input string.

Groups[1] is your group captured by parentheses (.*?). You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group.

  • match.Groups[0] is always the same as match.Value, which is the entire match.
  • match.Groups[1] is the first capturing group in your regular expression.

Consider this example:

var pattern = @"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);

In this case,

  • match.Value is "[john] John Johnson"
  • match.Groups[0] is always the same as match.Value, "[john] John Johnson".
  • match.Groups[1] is the group of captures from the (.*?).
  • match.Groups[2] is the group of captures from the (.*).
  • match.Groups[1].Captures is yet another dimension.

Consider another example:

var pattern = @"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);

Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!

  • match.Groups[0] is always the same as match.Value, "[john][johnny]".
  • match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
  • match.Groups[1].Captures[0] is the same as match.Groups[1].Value
  • match.Groups[1].Captures[1] is [john]
  • match.Groups[1].Captures[2] is [johnny]