如何使用正则表达式删除括号内的文本?

我试图处理一大堆文件,然后需要进行修改以删除文件名中的多余信息; 值得注意的是,我试图删除括号内的文本。例如:

filename = "Example_file_(extra_descriptor).ext"

我想正则化一大堆文件,其中括号表达式可能在中间或结尾,长度可变。

正则表达式是什么样子的? Perl 或 Python 语法将是首选。

190342 次浏览
s/\([^)]*\)//

So in Python, you'd do:

re.sub(r'\([^)]*\)', '', filename)

If you can stand to use sed (possibly execute from within your program, it'd be as simple as:

sed 's/(.*)//g'

I would use:

\([^)]*\)

If a path may contain parentheses then the r'\(.*?\)' regex is not enough:

import os, re


def remove_parenthesized_chunks(path, safeext=True, safedir=True):
dirpath, basename = os.path.split(path) if safedir else ('', path)
name, ext = os.path.splitext(basename) if safeext else (basename, '')
name = re.sub(r'\(.*?\)', '', name)
return os.path.join(dirpath, name+ext)

By default the function preserves parenthesized chunks in directory and extention parts of the path.

Example:

>>> f = remove_parenthesized_chunks
>>> f("Example_file_(extra_descriptor).ext")
'Example_file_.ext'
>>> path = r"c:\dir_(important)\example(extra).ext(untouchable)"
>>> f(path)
'c:\\dir_(important)\\example.ext(untouchable)'
>>> f(path, safeext=False)
'c:\\dir_(important)\\example.ext'
>>> f(path, safedir=False)
'c:\\dir_\\example.ext(untouchable)'
>>> f(path, False, False)
'c:\\dir_\\example.ext'
>>> f(r"c:\(extra)\example(extra).ext", safedir=False)
'c:\\\\example.ext'
>>> import re
>>> filename = "Example_file_(extra_descriptor).ext"
>>> p = re.compile(r'\([^)]*\)')
>>> re.sub(p, '', filename)
'Example_file_.ext'

If you don't absolutely need to use a regex, useconsider using Perl's Text::Balanced to remove the parenthesis.

use Text::Balanced qw(extract_bracketed);


my ($extracted, $remainder, $prefix) = extract_bracketed( $filename, '()', '[^(]*' );


{   no warnings 'uninitialized';


$filename = (defined $prefix or defined $remainder)
? $prefix . $remainder
: $extracted;
}

You may be thinking, "Why do all this when a regex does the trick in one line?"

$filename =~ s/\([^}]*\)//;

Text::Balanced handles nested parenthesis. So $filename = 'foo_(bar(baz)buz)).foo' will be extracted properly. The regex based solutions offered here will fail on this string. The one will stop at the first closing paren, and the other will eat them all.

   $filename =~ s/\([^}]*\)//;
# returns 'foo_buz)).foo'


$filename =~ s/\(.*\)//;
# returns 'foo_.foo'


# text balanced example returns 'foo_).foo'

If either of the regex behaviors is acceptable, use a regex--but document the limitations and the assumptions being made.

Java code:

Pattern pattern1 = Pattern.compile("(\\_\\(.*?\\))");
System.out.println(fileName.replace(matcher1.group(1), ""));

The pattern that matches substrings in parentheses having no other ABC0 and ) characters in between (like (xyz 123) in Text (abc(xyz 123)) is

\([^()]*\)

Details:

  • \( - an opening round bracket (note that in POSIX BRE, ( should be used, see sed example below)
  • [^()]* - zero or more (due to the * Kleene star quantifier) characters other than those defined in the negated character class/POSIX bracket expression, that is, any chars other than ( and )
  • \) - a closing round bracket (no escaping in POSIX BRE allowed)

Removing code snippets:

  • JavaScript: string.replace(/\([^()]*\)/g, '')
  • PHP: preg_replace('~\([^()]*\)~', '', $string)
  • Perl: $s =~ s/\([^()]*\)//g
  • Python: re.sub(r'\([^()]*\)', '', s)
  • C#: Regex.Replace(str, @"\([^()]*\)", string.Empty)
  • VB.NET: Regex.Replace(str, "\([^()]*\)", "")
  • Java: s.replaceAll("\\([^()]*\\)", "")
  • Ruby: s.gsub(/\([^()]*\)/, '')
  • R: gsub("\\([^()]*\\)", "", x)
  • Lua: string.gsub(s, "%([^()]*%)", "")
  • Bash/sed: sed 's/([^()]*)//g'
  • Tcl: regsub -all {\([^()]*\)} $s "" result
  • C++ std::regex: std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
  • Objective-C:
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\([^()]*\\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@""];
  • Swift: s.replacingOccurrences(of: "\\([^()]*\\)", with: "", options: [.regularExpression])
  • Google BigQuery: REGEXP_REPLACE(col, "\\([^()]*\\)" , "")

For those who want to use Python, here's a simple routine that removes parenthesized substrings, including those with nested parentheses. Okay, it's not a regex, but it'll do the job!

def remove_nested_parens(input_str):
"""Returns a copy of 'input_str' with any parenthesized text removed. Nested parentheses are handled."""
result = ''
paren_level = 0
for ch in input_str:
if ch == '(':
paren_level += 1
elif (ch == ')') and paren_level:
paren_level -= 1
elif not paren_level:
result += ch
return result


remove_nested_parens('example_(extra(qualifier)_text)_test(more_parens).ext')