PHP: 提取括号内文本的最佳方法?

在括号之间提取文本集的最佳/最有效的方法是什么?假设我想以最有效的方式从字符串“忽略除此(文本)之外的所有内容”中获取字符串“ text”。

到目前为止,我想到的最好的办法是:

$fullString = "ignore everything except this (text)";
$start = strpos('(', $fullString);
$end = strlen($fullString) - strpos(')', $fullString);


$shortString = substr($fullString, $start, $end);

还有更好的办法吗?我知道通常使用正则表达式效率较低,但是除非我可以减少函数调用的次数,否则这也许是最好的方法?有什么想法吗?

105074 次浏览

i'd just do a regex and get it over with. unless you are doing enough iterations that it becomes a huge performance issue, it's just easier to code (and understand when you look back on it)

$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
print $match[1];

So, actually, the code you posted doesn't work: substr()'s parameters are $string, $start and $length, and strpos()'s parameters are $haystack, $needle. Slightly modified:

$str = "ignore everything except this (text)";
$start  = strpos($str, '(');
$end    = strpos($str, ')', $start + 1);
$length = $end - $start;
$result = substr($str, $start + 1, $length - 1);

Some subtleties: I used $start + 1 in the offset parameter in order to help PHP out while doing the strpos() search on the second parenthesis; we increment $start one and reduce $length to exclude the parentheses from the match.

Also, there's no error checking in this code: you'll want to make sure $start and $end do not === false before performing the substr.

As for using strpos/substr versus regex; performance-wise, this code will beat a regular expression hands down. It's a little wordier though. I eat and breathe strpos/substr, so I don't mind this too much, but someone else may prefer the compactness of a regex.

Use a regular expression:

if( preg_match( '!\(([^\)]+)\)!', $text, $match ) )
$text = $match[1];

This is a sample code to extract all the text between '[' and ']' and store it 2 separate arrays(ie text inside parentheses in one array and text outside parentheses in another array)

   function extract_text($string)
{
$text_outside=array();
$text_inside=array();
$t="";
for($i=0;$i<strlen($string);$i++)
{
if($string[$i]=='[')
{
$text_outside[]=$t;
$t="";
$t1="";
$i++;
while($string[$i]!=']')
{
$t1.=$string[$i];
$i++;
}
$text_inside[] = $t1;


}
else {
if($string[$i]!=']')
$t.=$string[$i];
else {
continue;
}


}
}
if($t!="")
$text_outside[]=$t;


var_dump($text_outside);
echo "\n\n";
var_dump($text_inside);
}

Output: extract_text("hello how are you?"); will produce:

array(1) {
[0]=>
string(18) "hello how are you?"
}


array(0) {
}

extract_text("hello [http://www.google.com/test.mp3] how are you?"); will produce

array(2) {
[0]=>
string(6) "hello "
[1]=>
string(13) " how are you?"
}




array(1) {
[0]=>
string(30) "http://www.google.com/test.mp3"
}

This function may be useful.

    public static function getStringBetween($str,$from,$to, $withFromAndTo = false)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
if ($withFromAndTo)
return $from . substr($sub,0, strrpos($sub,$to)) . $to;
else
return substr($sub,0, strrpos($sub,$to));
}
$inputString = "ignore everything except this (text)";
$outputString = getStringBetween($inputString, '(', ')'));
echo $outputString;
//output will be test


$outputString = getStringBetween($inputString, '(', ')', true));
echo $outputString;
//output will be (test)

strpos() => which is used to find the position of first occurance in a string.

strrpos() => which is used to find the position of first occurance in a string.

function getStringsBetween($str, $start='[', $end=']', $with_from_to=true){
$arr = [];
$last_pos = 0;
$last_pos = strpos($str, $start, $last_pos);
while ($last_pos !== false) {
$t = strpos($str, $end, $last_pos);
$arr[] = ($with_from_to ? $start : '').substr($str, $last_pos + 1, $t - $last_pos - 1).($with_from_to ? $end : '');
$last_pos = strpos($str, $start, $last_pos+1);
}
return $arr; }

this is a little improvement to the previous answer that will return all patterns in array form:

getStringsBetween('[T]his[] is [test] string [pattern]') will return:

The already posted regex solutions - \((.*?)\) and \(([^\)]+)\) - do not return the innermost strings between an open and close brackets. If a string is Text (abc(xyz 123) they both return a (abc(xyz 123) as a whole match, and not (xyz 123).

The pattern that matches substrings (use with preg_match to fetch the first and preg_match_all to fetch all occurrences) in parentheses without other open and close parentheses in between is, if the match should include parentheses:

\([^()]*\)

Or, you want to get values without parentheses:

\(([^()]*)\)        // get Group 1 values after a successful call to preg_match_all, see code below
\(\K[^()]*(?=\))    // this and the one below get the values without parentheses as whole matches
(?<=\()[^()]*(?=\)) // less efficient, not recommended

Replace * with + if there must be at least 1 char between ( and ).

Details:

  • \( - an opening round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class)
  • [^()]* - zero or more characters other than ( and ) (note these ( and ) do not have to be escaped inside a character class as inside it, ( and ) cannot be used to specify a grouping and are treated as literal parentheses)
  • \) - a closing round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class).

The \(\K part in an alternative regex matches ( and omits from the match value (with the \K match reset operator). (?<=\() is a positive lookbehind that requires a ( to appear immediately to the left of the current location, but the ( is not added to the match value since lookbehind (lookaround) patterns are not consuming. (?=\() is a positive lookahead that requires a ) char to appear immediately to the right of the current location.

PHP code:

$fullString = 'ignore everything except this (text) and (that (text here))';
if (preg_match_all('~\(([^()]*)\)~', $fullString, $matches)) {
print_r($matches[0]); // Get whole match values
print_r($matches[1]); // Get Group 1 values
}

Output:

Array ( [0] => (text)  [1] => (text here) )
Array ( [0] => text    [1] => text here   )

i think this is the fastest way to get the words between the first parenthesis in a string.

$string = 'ignore everything except this (text)';
$string = explode(')', (explode('(', $string)[1]))[0];
echo $string;