可选单词匹配

我正在尝试创建一个正则表达式提取歌手,歌词作者。我想知道如何使歌词作者搜索可选。

多行字符串示例:

Fireworks Singer: Katy Perry
Vogue Singers: Madonna, Karen Lyricist: Madonna

正则表达式: /Singers?:(.\*)\s?Lyricists?:(.\*)/

这将正确匹配第二行并提取 Singers(Madonna, Karen)Lyricists(Madonna)

但是,当没有歌词作者的时候,它不适用于第一行。

如何让作词人搜索可选?

96865 次浏览

You can enclose the part you want to match in a non-capturing group: (?:). Then it can be treated as a single unit in the regex, and subsequently you can put a ? after it to make it optional. Example:

/Singers?:(.*)\s?(?:Lyricists?:(.*))?/

Note that here the \s? is useless since .* will greedily eat all characters, and no backtracking will be necessary. This also means that the (?:Lyricists?:(.*)) part will never be matched for the same reason. You can use the non-greedy version of .*, .*? along with the $ to fix this:

/Singers?:(.*?)\s*(?:Lyricists?:(.*))?$/

Some extra whitespace ends up captured; this can be removed also, giving a final regex of:

/Singers?:\s*(.*?)\s*(?:Lyricists?:\s*(.*))?$/

Just to add to Cameron's solution. if the source string has multiple lines each containing both Singers and Lyricists, you'll probably need to add the 'm' multi-line modifier so that the '$' will match ends-of-lines. (You didn't say what language you are using - you may want to add the 'i' modifier as well.)