俄罗斯方块数组

考虑以下数组:

/www/htdocs/1/sites/lib/abcdedd
/www/htdocs/1/sites/conf/xyz
/www/htdocs/1/sites/conf/abc/def
/www/htdocs/1/sites/htdocs/xyz
/www/htdocs/1/sites/lib2/abcdedd

什么是检测 共同基本路径共同基本路径的最短和最优雅的方法-在这种情况下

/www/htdocs/1/sites/

并将其从数组中的所有元素中移除?

lib/abcdedd
conf/xyz
conf/abc/def
htdocs/xyz
lib2/abcdedd
3158 次浏览
$common = PHP_INT_MAX;
foreach ($a as $item) {
$common = min($common, str_common($a[0], $item, $common));
}


$result = array();
foreach ($a as $item) {
$result[] = substr($item, $common);
}
print_r($result);


function str_common($a, $b, $max)
{
$pos = 0;
$last_slash = 0;
$len = min(strlen($a), strlen($b), $max + 1);
while ($pos < $len) {
if ($a{$pos} != $b{$pos}) return $last_slash;
if ($a{$pos} == '/') $last_slash = $pos;
$pos++;
}
return $last_slash;
}

Load them into a trie data structure. Starting from the parent node, see which is having a children count great than one. Once you find that magic node, just dismantle the parent node structure and have the current node as root.

I would explode the values based on the / and then use array_intersect_assoc to detect the common elements and ensure they have the correct corresponding index in the array. The resulting array could be recombined to produce the common path.

function getCommonPath($pathArray)
{
$pathElements = array();


foreach($pathArray as $path)
{
$pathElements[] = explode("/",$path);
}


$commonPath = $pathElements[0];


for($i=1;$i<count($pathElements);$i++)
{
$commonPath = array_intersect_assoc($commonPath,$pathElements[$i]);
}


if(is_array($commonPath) return implode("/",$commonPath);
else return null;
}


function removeCommonPath($pathArray)
{
$commonPath = getCommonPath($pathArray());


for($i=0;$i<count($pathArray);$i++)
{
$pathArray[$i] = substr($pathArray[$i],str_len($commonPath));
}


return $pathArray;
}

This is untested, but, the idea is that the $commonPath array only ever contains the elements of the path that have been contained in all path arrays that have been compared against it. When the loop is complete, we simply recombine it with / to get the true $commonPath

Update As pointed out by Felix Kling, array_intersect won't consider paths that have common elements but in different orders... To solve this, I used array_intersect_assoc instead of array_intersect

Update Added code to remove the common path (or tetris it!) from the array as well.

A naive approach would be to explode the paths at the / and successive compare every element in the arrays. So e.g. the first element would be empty in all arrays, so it will be removed, the next element will be www, it is the same in all arrays, so it gets removed, etc.

Something like (untested)

$exploded_paths = array();


foreach($paths as $path) {
$exploded_paths[] = explode('/', $path);
}


$equal = true;
$ref = &$exploded_paths[0]; // compare against the first path for simplicity


while($equal) {
foreach($exploded_paths as $path_parts) {
if($path_parts[0] !== $ref[0]) {
$equal = false;
break;
}
}
if($equal) {
foreach($exploded_paths as &$path_parts) {
array_shift($path_parts); // remove the first element
}
}
}

Afterwards you just have to implode the elements in $exploded_paths again:

function impl($arr) {
return '/' . implode('/', $arr);
}
$paths = array_map('impl', $exploded_paths);

Which gives me:

Array
(
[0] => /lib/abcdedd
[1] => /conf/xyz
[2] => /conf/abc/def
[3] => /htdocs/xyz
[4] => /conf/xyz
)

This might not scale well ;)

$values = array('/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);




function splitArrayValues($r) {
return explode('/',$r);
}


function stripCommon($values) {
$testValues = array_map('splitArrayValues',$values);


$i = 0;
foreach($testValues[0] as $key => $value) {
foreach($testValues as $arraySetValues) {
if ($arraySetValues[$key] != $value) break 2;
}
$i++;
}


$returnArray = array();
foreach($testValues as $value) {
$returnArray[] = implode('/',array_slice($value,$i));
}


return $returnArray;
}




$newValues = stripCommon($values);


echo '<pre>';
var_dump($newValues);
echo '</pre>';

EDIT Variant of my original method using an array_walk to rebuild the array

$values = array('/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);




function splitArrayValues($r) {
return explode('/',$r);
}


function rejoinArrayValues(&$r,$d,$i) {
$r = implode('/',array_slice($r,$i));
}


function stripCommon($values) {
$testValues = array_map('splitArrayValues',$values);


$i = 0;
foreach($testValues[0] as $key => $value) {
foreach($testValues as $arraySetValues) {
if ($arraySetValues[$key] != $value) break 2;
}
$i++;
}


array_walk($testValues, 'rejoinArrayValues', $i);


return $testValues;
}




$newValues = stripCommon($values);


echo '<pre>';
var_dump($newValues);
echo '</pre>';

EDIT

The most efficient and elegant answer is likely to involve taking functions and methods from each of the provided answers

This has de advantage of not having linear time complexity; however, for most cases the sort will definitely not be the operation taking more time.

Basically, the clever part (at least I couldn't find a fault with it) here is that after sorting you will only have to compare the first path with the last.

sort($a);
$a = array_map(function ($el) { return explode("/", $el); }, $a);
$first = reset($a);
$last = end($a);
for ($eqdepth = 0; $first[$eqdepth] === $last[$eqdepth]; $eqdepth++) {}
array_walk($a,
function (&$el) use ($eqdepth) {
for ($i = 0; $i < $eqdepth; $i++) {
array_shift($el);
}
});
$res = array_map(function ($el) { return implode("/", $el); }, $a);

Write a function longest_common_prefix that takes two strings as input. Then apply it to the strings in any order to reduce them to their common prefix. Since it is associative and commutative the order doesn't matter for the result.

This is the same as for other binary operations like for example addition or greatest common divisor.

$arrMain = array(
'/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);
function explodePath( $strPath ){
return explode("/", $strPath);
}


function removePath( $strPath)
{
global $strCommon;
return str_replace( $strCommon, '', $strPath );
}
$arrExplodedPaths = array_map( 'explodePath', $arrMain ) ;


//Check for common and skip first 1
$strCommon = '';
for( $i=1; $i< count( $arrExplodedPaths[0] ); $i++)
{
for( $j = 0; $j < count( $arrExplodedPaths); $j++ )
{
if( $arrExplodedPaths[0][ $i ] !== $arrExplodedPaths[ $j ][ $i ] )
{
break 2;
}
}
$strCommon .= '/'.$arrExplodedPaths[0][$i];
}
print_r( array_map( 'removePath', $arrMain ) );

This works fine... similar to mark baker but uses str_replace

Ok, I'm not sure this is bullet-proof, but I think it works:

echo array_reduce($array, function($reducedValue, $arrayValue) {
if($reducedValue === NULL) return $arrayValue;
for($i = 0; $i < strlen($reducedValue); $i++) {
if(!isset($arrayValue[$i]) || $arrayValue[$i] !== $reducedValue[$i]) {
return substr($reducedValue, 0, $i);
}
}
return $reducedValue;
});

This will take the first value in the array as reference string. Then it will iterate over the reference string and compare each char with the char of the second string at the same position. If a char doesnt match, the reference string will be shortened to the position of the char and the next string is compared. The function will return the shortest matching string then.

Performance depends on the strings given. The earlier the reference string gets shorter, the quicker the code will finish. I really have no clue how to put that in a formula though.

I found that Artefacto's approach to sort the strings increases performance. Adding

asort($array);
$array = array(array_shift($array), array_pop($array));

before the array_reduce will significantly increase performance.

Also note that this will return the longest matching initial substring, which is more versatile but wont give you the common path. You have to run

substr($result, 0, strrpos($result, '/'));

on the result. And then you can use the result to remove the values

print_r(array_map(function($v) use ($path){
return str_replace($path, '', $v);
}, $array));

which should give:

[0] => /lib/abcdedd
[1] => /conf/xyz/
[2] => /conf/abc/def
[3] => /htdocs/xyz
[4] => /lib2/abcdedd

Feedback welcome.

The problem can be simplified if just viewed from the string comparison angle. This is probably faster than array-splitting:

$longest = $tetris[0];  # or array_pop()
foreach ($tetris as $cmp) {
while (strncmp($longest+"/", $cmp, strlen($longest)+1) !== 0) {
$longest = substr($longest, 0, strrpos($longest, "/"));
}
}

Probably too naive and noobish but it works. I have used this algorithm:

<?php


function strlcs($str1, $str2){
$str1Len = strlen($str1);
$str2Len = strlen($str2);
$ret = array();


if($str1Len == 0 || $str2Len == 0)
return $ret; //no similarities


$CSL = array(); //Common Sequence Length array
$intLargestSize = 0;


//initialize the CSL array to assume there are no similarities
for($i=0; $i<$str1Len; $i++){
$CSL[$i] = array();
for($j=0; $j<$str2Len; $j++){
$CSL[$i][$j] = 0;
}
}


for($i=0; $i<$str1Len; $i++){
for($j=0; $j<$str2Len; $j++){
//check every combination of characters
if( $str1[$i] == $str2[$j] ){
//these are the same in both strings
if($i == 0 || $j == 0)
//it's the first character, so it's clearly only 1 character long
$CSL[$i][$j] = 1;
else
//it's one character longer than the string from the previous character
$CSL[$i][$j] = $CSL[$i-1][$j-1] + 1;


if( $CSL[$i][$j] > $intLargestSize ){
//remember this as the largest
$intLargestSize = $CSL[$i][$j];
//wipe any previous results
$ret = array();
//and then fall through to remember this new value
}
if( $CSL[$i][$j] == $intLargestSize )
//remember the largest string(s)
$ret[] = substr($str1, $i-$intLargestSize+1, $intLargestSize);
}
//else, $CSL should be set to 0, which it was already initialized to
}
}
//return the list of matches
return $ret;
}




$arr = array(
'/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);


// find the common substring
$longestCommonSubstring = strlcs( $arr[0], $arr[1] );


// remvoe the common substring
foreach ($arr as $k => $v) {
$arr[$k] = str_replace($longestCommonSubstring[0], '', $v);
}
var_dump($arr);

Output:

array(5) {
[0]=>
string(11) "lib/abcdedd"
[1]=>
string(8) "conf/xyz"
[2]=>
string(12) "conf/abc/def"
[3]=>
string(10) "htdocs/xyz"
[4]=>
string(12) "lib2/abcdedd"
}

:)

Perhaps porting the algorithm Python's os.path.commonprefix(m) uses would work?

def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
n = min(len(s1), len(s2))
for i in xrange(n):
if s1[i] != s2[i]:
return s1[:i]
return s1[:n]

That is, uh... something like

function commonprefix($m) {
if(!$m) return "";
$s1 = min($m);
$s2 = max($m);
$n = min(strlen($s1), strlen($s2));
for($i=0;$i<$n;$i++) if($s1[$i] != $s2[$i]) return substr($s1, 0, $i);
return substr($s1, 0, $n);
}

After that you can just substr each element of the original list with the length of the common prefix as the start offset.

You could remove prefix the fastest way, reading each character only once:

function findLongestWord($lines, $delim = "/")
{
$max = 0;
$len = strlen($lines[0]);


// read first string once
for($i = 0; $i < $len; $i++) {
for($n = 1; $n < count($lines); $n++) {
if($lines[0][$i] != $lines[$n][$i]) {
// we've found a difference between current token
// stop search:
return $max;
}
}
if($lines[0][$i] == $delim) {
// we've found a complete token:
$max = $i + 1;
}
}
return $max;
}


$max = findLongestWord($lines);
// cut prefix of len "max"
for($n = 0; $n < count($lines); $n++) {
$lines[$n] = substr(lines[$n], $max, $len);
}

I'll throw my hat in the ring …

function longestCommonPrefix($a, $b) {
$i = 0;
$end = min(strlen($a), strlen($b));
while ($i < $end && $a[$i] == $b[$i]) $i++;
return substr($a, 0, $i);
}


function longestCommonPrefixFromArray(array $strings) {
$count = count($strings);
if (!$count) return '';
$prefix = reset($strings);
for ($i = 1; $i < $count; $i++)
$prefix = longestCommonPrefix($prefix, $strings[$i]);
return $prefix;
}


function stripPrefix(&$string, $foo, $length) {
$string = substr($string, $length);
}

Usage:

$paths = array(
'/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd',
);


$longComPref = longestCommonPrefixFromArray($paths);
array_walk($paths, 'stripPrefix', strlen($longComPref));
print_r($paths);

Well, there are already some solutions here but, just because it was fun:

$values = array(
'/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);


function findCommon($values){
$common = false;
foreach($values as &$p){
$p = explode('/', $p);
if(!$common){
$common = $p;
} else {
$common = array_intersect_assoc($common, $p);
}
}
return $common;
}
function removeCommon($values, $common){
foreach($values as &$p){
$p = explode('/', $p);
$p = array_diff_assoc($p, $common);
$p = implode('/', $p);
}


return $values;
}


echo '<pre>';
print_r(removeCommon($values, findCommon($values)));
echo '</pre>';

Output:

Array
(
[0] => lib/abcdedd
[1] => conf/xyz
[2] => conf/abc/def
[3] => htdocs/xyz
[4] => lib2/abcdedd
)

Well, considering that you can use XOR in this situation to find the common parts of the string. Any time you xor two bytes that are the same, you get a nullbyte as the output. So we can use that to our advantage:

$first = $array[0];
$length = strlen($first);
$count = count($array);
for ($i = 1; $i < $count; $i++) {
$length = min($length, strspn($array[$i] ^ $first, chr(0)));
}

After that single loop, the $length variable will be equal to the longest common basepart between the array of strings. Then, we can extract the common part from the first element:

$common = substr($array[0], 0, $length);

And there you have it. As a function:

function commonPrefix(array $strings) {
$first = $strings[0];
$length = strlen($first);
$count = count($strings);
for ($i = 1; $i < $count; $i++) {
$length = min($length, strspn($strings[$i] ^ $first, chr(0)));
}
return substr($first, 0, $length);
}

Note that it does use more than one iteration, but those iterations are done in libraries, so in interpreted languages this will have a huge efficiency gain...

Now, if you want only full paths, we need to truncate to the last / character. So:

$prefix = preg_replace('#/[^/]*$', '', commonPrefix($paths));

Now, it may overly cut two strings such as /foo/bar and /foo/bar/baz will be cut to /foo. But short of adding another iteration round to determine if the next character is either / or end-of-string, I can't see a way around that...