生成没有相邻等元素的列表的所有排列

当我们整理列表的时候,比如

a = [1,2,3,3,2,2,1]
sorted(a) => [1, 1, 2, 2, 2, 3, 3]

结果列表中的相等元素总是相邻的。

我怎样才能完成相反的任务——重新组合列表,使相等的元素永远不会(或尽可能少地)相邻?

例如,对于上面的列表,其中一个可能的解决方案是

p = [1,3,2,3,2,1,2]

更正式的说法是,给定一个列表 a,生成它的排列 p,使得对 p[i]==p[i+1]的数目最小化。

由于列表很大,因此不能生成和过滤所有排列。

附加问题: 如何有效地生成所有这些排列?

这是我用来测试解决方案的代码: https://gist.github.com/gebrkn/9f550094b3d24a35aebd

UPD: 在这里选择一个获胜者是一个艰难的选择,因为许多人发布了出色的答案。@ Vincent VanderWeele@ David Eisenstat@ Coady@ Enrico Bacis@ srgerg提供的功能可以完美地产生最佳的排列。@ Tobias _ k和 David 也回答了奖金问题(生成所有排列)。对大卫的正确性证明的附加要点。

来自@VincentvanderWeele 的代码似乎是最快的。

12353 次浏览

Pseudocode:

  1. Sort the list
  2. Loop over the first half of the sorted list and fill all even indices of the result list
  3. Loop over the second half of the sorted list and fill all odd indices of the result list

You will only have p[i]==p[i+1] if more than half of the input consists of the same element, in which case there is no other choice than putting the same element in consecutive spots (by the pidgeon hole principle).


As pointed out in the comments, this approach may have one conflict too many in case one of the elements occurs at least n/2 times (or n/2+1 for odd n; this generalizes to (n+1)/2) for both even and odd). There are at most two such elements and if there are two, the algorithm works just fine. The only problematic case is when there is one element that occurs at least half of the time. We can simply solve this problem by finding the element and dealing with it first.

I don't know enough about python to write this properly, so I took the liberty to copy the OP's implementation of a previous version from github:

# Sort the list
a = sorted(lst)


# Put the element occurring more than half of the times in front (if needed)
n = len(a)
m = (n + 1) // 2
for i in range(n - m + 1):
if a[i] == a[i + m - 1]:
a = a[i:] + a[:i]
break


result = [None] * n


# Loop over the first half of the sorted list and fill all even indices of the result list
for i, elt in enumerate(a[:m]):
result[2*i] = elt


# Loop over the second half of the sorted list and fill all odd indices of the result list
for i, elt in enumerate(a[m:]):
result[2*i+1] = elt


return result

Here is a good algorithm:

  1. First of all count for all numbers how often they occur. Place the answer in a map.

  2. sort this map so that the numbers that occur most often come first.

  3. The first number of your answer is the first number in the sorted map.

  4. Resort the map with the first now being one smaller.

If you want to improve efficiency look for ways to increase the efficiency of the sorting step.

The idea is to sort the elements from the most common to the least common, take the most common, decrease its count and put it back in the list keeping the descending order (but avoiding putting the last used element first to prevent repetitions when possible).

This can be implemented using Counter and bisect:

from collections import Counter
from bisect import bisect


def unsorted(lst):
# use elements (-count, item) so bisect will put biggest counts first
items = [(-count, item) for item, count in Counter(lst).most_common()]
result = []


while items:
count, item = items.pop(0)
result.append(item)
if count != -1:
element = (count + 1, item)
index = bisect(items, element)
# prevent insertion in position 0 if there are other items
items.insert(index or (1 if items else 0), element)


return result

Example

>>> print unsorted([1, 1, 1, 2, 3, 3, 2, 2, 1])
[1, 2, 1, 2, 1, 3, 1, 2, 3]


>>> print unsorted([1, 2, 3, 2, 3, 2, 2])
[2, 3, 2, 1, 2, 3, 2]

In python you could do the following.

Consider you have a sorted list l, you can do:

length = len(l)
odd_ind = length%2
odd_half = (length - odd_ind)/2
for i in range(odd_half)[::2]:
my_list[i], my_list[odd_half+odd_ind+i] = my_list[odd_half+odd_ind+i], my_list[i]

These are just in place operations and should thus be rather fast (O(N)). Note that you will shift from l[i] == l[i+1] to l[i] == l[i+2] so the order you end up with is anything but random, but from how I understand the question it is not randomness you are looking for.

The idea is to split the sorted list in the middle then exchange every other element in the two parts.

For l= [1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5] this leads to l = [3, 1, 4, 2, 5, 1, 3, 1, 4, 2, 5]

The method fails to get rid of all the l[i] == l[i + 1] as soon as the abundance of one element is bigger than or equal to half of the length of the list.

While the above works fine as long as the abundance of the most frequent element is smaller than half the size of the list, the following function also handles the limit cases (the famous off-by-one issue) where every other element starting with the first one must be the most abundant one:

def no_adjacent(my_list):
my_list.sort()
length = len(my_list)
odd_ind = length%2
odd_half = (length - odd_ind)/2
for i in range(odd_half)[::2]:
my_list[i], my_list[odd_half+odd_ind+i] = my_list[odd_half+odd_ind+i], my_list[i]


#this is just for the limit case where the abundance of the most frequent is half of the list length
if max([my_list.count(val) for val in set(my_list)]) + 1 - odd_ind > odd_half:
max_val = my_list[0]
max_count = my_list.count(max_val)
for val in set(my_list):
if my_list.count(val) > max_count:
max_val = val
max_count = my_list.count(max_val)
while max_val in my_list:
my_list.remove(max_val)
out = [max_val]
max_count -= 1
for val in my_list:
out.append(val)
if max_count:
out.append(max_val)
max_count -= 1
if max_count:
print 'this is not working'
return my_list
#raise Exception('not possible')
return out
else:
return my_list

You can generate all the 'perfectly unsorted' permutations (that have no two equal elements in adjacent positions) using a recursive backtracking algorithm. In fact, the only difference to generating all the permutations is that you keep track of the last number and exclude some solutions accordingly:

def unsort(lst, last=None):
if lst:
for i, e in enumerate(lst):
if e != last:
for perm in unsort(lst[:i] + lst[i+1:], e):
yield [e] + perm
else:
yield []

Note that in this form the function is not very efficient, as it creates lots of sub-lists. Also, we can speed it up by looking at the most-constrained numbers first (those with the highest count). Here's a much more efficient version using only the counts of the numbers.

def unsort_generator(lst, sort=False):
counts = collections.Counter(lst)
def unsort_inner(remaining, last=None):
if remaining > 0:
# most-constrained first, or sorted for pretty-printing?
items = sorted(counts.items()) if sort else counts.most_common()
for n, c in items:
if n != last and c > 0:
counts[n] -= 1   # update counts
for perm in unsort_inner(remaining - 1, n):
yield [n] + perm
counts[n] += 1   # revert counts
else:
yield []
return unsort_inner(len(lst))

You can use this to generate just the next perfect permutation, or a list holding all of them. But note, that if there is no perfectly unsorted permutation, then this generator will consequently yield no results.

>>> lst = [1,2,3,3,2,2,1]
>>> next(unsort_generator(lst))
[2, 1, 2, 3, 1, 2, 3]
>>> list(unsort_generator(lst, sort=True))
[[1, 2, 1, 2, 3, 2, 3],
... 36 more ...
[3, 2, 3, 2, 1, 2, 1]]
>>> next(unsort_generator([1,1,1]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

To circumvent this problem, you could use this together with one of the algorithms proposed in the other answers as a fallback. This will guarantee to return a perfectly unsorted permutation, if there is one, or a good approximation otherwise.

def unsort_safe(lst):
try:
return next(unsort_generator(lst))
except StopIteration:
return unsort_fallback(lst)

This is along the lines of Thijser's currently incomplete pseudocode. The idea is to take the most frequent of the remaining item types unless it was just taken. (See also Coady's implementation of this algorithm.)

import collections
import heapq




class Sentinel:
pass




def david_eisenstat(lst):
counts = collections.Counter(lst)
heap = [(-count, key) for key, count in counts.items()]
heapq.heapify(heap)
output = []
last = Sentinel()
while heap:
minuscount1, key1 = heapq.heappop(heap)
if key1 != last or not heap:
last = key1
minuscount1 += 1
else:
minuscount2, key2 = heapq.heappop(heap)
last = key2
minuscount2 += 1
if minuscount2 != 0:
heapq.heappush(heap, (minuscount2, key2))
output.append(last)
if minuscount1 != 0:
heapq.heappush(heap, (minuscount1, key1))
return output

Proof of correctness

For two item types, with counts k1 and k2, the optimal solution has k2 - k1 - 1 defects if k1 < k2, 0 defects if k1 = k2, and k1 - k2 - 1 defects if k1 > k2. The = case is obvious. The others are symmetric; each instance of the minority element prevents at most two defects out of a total of k1 + k2 - 1 possible.

This greedy algorithm returns optimal solutions, by the following logic. We call a prefix (partial solution) safe if it extends to an optimal solution. Clearly the empty prefix is safe, and if a safe prefix is a whole solution then that solution is optimal. It suffices to show inductively that each greedy step maintains safety.

The only way that a greedy step introduces a defect is if only one item type remains, in which case there is only one way to continue, and that way is safe. Otherwise, let P be the (safe) prefix just before the step under consideration, let P' be the prefix just after, and let S be an optimal solution extending P. If S extends P' also, then we're done. Otherwise, let P' = Px and S = PQ and Q = yQ', where x and y are items and Q and Q' are sequences.

Suppose first that P does not end with y. By the algorithm's choice, x is at least as frequent in Q as y. Consider the maximal substrings of Q containing only x and y. If the first substring has at least as many x's as y's, then it can be rewritten without introducing additional defects to begin with x. If the first substring has more y's than x's, then some other substring has more x's than y's, and we can rewrite these substrings without additional defects so that x goes first. In both cases, we find an optimal solution T that extends P', as needed.

Suppose now that P does end with y. Modify Q by moving the first occurrence of x to the front. In doing so, we introduce at most one defect (where x used to be) and eliminate one defect (the yy).

Generating all solutions

This is tobias_k's answer plus efficient tests to detect when the choice currently under consideration is globally constrained in some way. The asymptotic running time is optimal, since the overhead of generation is on the order of the length of the output. The worst-case delay unfortunately is quadratic; it could be reduced to linear (optimal) with better data structures.

from collections import Counter
from itertools import permutations
from operator import itemgetter
from random import randrange




def get_mode(count):
return max(count.items(), key=itemgetter(1))[0]




def enum2(prefix, x, count, total, mode):
prefix.append(x)
count_x = count[x]
if count_x == 1:
del count[x]
else:
count[x] = count_x - 1
yield from enum1(prefix, count, total - 1, mode)
count[x] = count_x
del prefix[-1]




def enum1(prefix, count, total, mode):
if total == 0:
yield tuple(prefix)
return
if count[mode] * 2 - 1 >= total and [mode] != prefix[-1:]:
yield from enum2(prefix, mode, count, total, mode)
else:
defect_okay = not prefix or count[prefix[-1]] * 2 > total
mode = get_mode(count)
for x in list(count.keys()):
if defect_okay or [x] != prefix[-1:]:
yield from enum2(prefix, x, count, total, mode)




def enum(seq):
count = Counter(seq)
if count:
yield from enum1([], count, sum(count.values()), get_mode(count))
else:
yield ()




def defects(lst):
return sum(lst[i - 1] == lst[i] for i in range(1, len(lst)))




def test(lst):
perms = set(permutations(lst))
opt = min(map(defects, perms))
slow = {perm for perm in perms if defects(perm) == opt}
fast = set(enum(lst))
print(lst, fast, slow)
assert slow == fast




for r in range(10000):
test([randrange(3) for i in range(randrange(6))])

The algorithm already given of taking the most common item left that isn't the previous item is correct. Here's a simple implementation, which optimally uses a heap to track the most common.

import collections, heapq
def nonadjacent(keys):
heap = [(-count, key) for key, count in collections.Counter(a).items()]
heapq.heapify(heap)
count, key = 0, None
while heap:
count, key = heapq.heapreplace(heap, (count, key)) if count else heapq.heappop(heap)
yield key
count += 1
for index in xrange(-count):
yield key


>>> a = [1,2,3,3,2,2,1]
>>> list(nonadjacent(a))
[2, 1, 2, 3, 1, 2, 3]
  1. Sort the list.
  2. Generate a "best shuffle" of the list using this algorithm

It will give the minimum of items from the list in their original place (by item value) so it will try, for your example, to put the 1's, 2's and 3's away from their sorted positions.

Start with the sorted list of length n. Let m=n/2. Take the values at 0, then m, then 1, then m+1, then 2, then m+2, and so on. Unless you have more than half of the numbers the same, you'll never get equivalent values in consecutive order.

Please forgive my "me too" style answer, but couldn't Coady's answer be simplified to this?

from collections import Counter
from heapq import heapify, heappop, heapreplace
from itertools import repeat


def srgerg(data):
heap = [(-freq+1, value) for value, freq in Counter(data).items()]
heapify(heap)


freq = 0
while heap:
freq, val = heapreplace(heap, (freq+1, val)) if freq else heappop(heap)
yield val
yield from repeat(val, -freq)

Edit: Here's a python 2 version that returns a list:

def srgergpy2(data):
heap = [(-freq+1, value) for value, freq in Counter(data).items()]
heapify(heap)


freq = 0
result = list()
while heap:
freq, val = heapreplace(heap, (freq+1, val)) if freq else heappop(heap)
result.append(val)
result.extend(repeat(val, -freq))
return result
  1. Count number of times each value appears
  2. Select values in order from most frequent to least frequent
  3. Add selected value to final output, incrementing the index by 2 each time
  4. Reset index to 1 if index out of bounds
from heapq import heapify, heappop
def distribute(values):
counts = defaultdict(int)
for value in values:
counts[value] += 1
counts = [(-count, key) for key, count in counts.iteritems()]
heapify(counts)
index = 0
length = len(values)
distributed = [None] * length
while counts:
count, value = heappop(counts)
for _ in xrange(-count):
distributed[index] = value
index = index + 2 if index + 2 < length else 1
return distributed

In answer to the bonus question: this is an algorithm which finds all permutations of a set where no adjacent elements can be identical. I believe this to be the most efficient algorithm conceptually (although others may be faster in practice because they translate into simpler code). It doesn't use brute force, it only generates unique permutations, and paths not leading to solutions are cut off at the earliest point.

I will use the term "abundant element" for an element in a set which occurs more often than all other elements combined, and the term "abundance" for the number of abundant elements minus the number of other elements.
e.g. the set abac has no abundant element, the sets abaca and aabcaa have a as the abundant element, and abundance 1 and 2 respectively.

  1. Start with a set like:

aaabbcd

  1. Seperate the first occurances from the repeats:

firsts: abcd
repeats: aab

  1. Find the abundant element in the repeats, if any, and calculate the abundance:

abundant element: a
abundance: 1

  1. Generate all permutations of the firsts where the number of elements after the abundant element is not less than the abundance: (so in the example the "a" cannot be last)

abcd, abdc, acbd, acdb, adbc, adcb, bacd, badc, bcad, bcda, bdac, bdca,
cabd, cadb, cbad, cbda, cdab, cdba, dabc, dacb, abac, dbca, dcab, dcba

  1. For each permutation, insert the set of repeated characters one by one, following these rules:

5.1. If the abundance of the set is greater than the number of elements after the last occurance of the abundant element in the permutation so far, skip to the next permutation.
e.g. when permutation so far is abc, a set with abundant element a can only be inserted if the abundance is 2 or less, so aaaabc is ok, aaaaabc isn't.

5.2. Select the element from the set whose last occurance in the permutation comes first.
e.g. when permutation so far is abcba and set is ab, select b

5.3. Insert the selected element at least 2 positions to the right of its last occurance in the permutation.
e.g. when inserting b into permutation babca, results are babcba and babcab

5.4. Recurse step 5 with each resulting permutation and the rest of the set.

EXAMPLE:
set = abcaba
firsts = abc
repeats = aab


perm3  set    select perm4  set    select perm5  set    select perm6


abc    aab    a      abac   ab     b      ababc  a      a      ababac
ababca
abacb  a      a      abacab
abacba
abca   ab     b      abcba  a      -
abcab  a      a      abcaba
acb    aab    a      acab   ab     a      acaba  b      b      acabab
acba   ab     b      acbab  a      a      acbaba
bac    aab    b      babc   aa     a      babac  a      a      babaca
babca  a      -
bacb   aa     a      bacab  a      a      bacaba
bacba  a      -
bca    aab    -
cab    aab    a      caba   ab     b      cabab  a      a      cababa
cba    aab    -

This algorithm generates unique permutations. If you want to know the total number of permutations (where aba is counted twice because you can switch the a's), multiply the number of unique permutations with a factor:

F = N1! * N2! * ... * Nn!

where N is the number of occurances of each element in the set. For a set abcdabcaba this would be 4! * 3! * 2! * 1! or 288, which demonstrates how inefficient an algorithm is that generates all permutations instead of only the unique ones. To list all permutations in this case, just list the unique permutations 288 times :-)

Below is a (rather clumsy) implementation in Javascript; I suspect that a language like Python may be better suited for this sort of thing. Run the code snippet to calculate the seperated permutations of "abracadabra".

// FIND ALL PERMUTATONS OF A SET WHERE NO ADJACENT ELEMENTS ARE IDENTICAL
function seperatedPermutations(set) {
var unique = 0, factor = 1, firsts = [], repeats = [], abund;


seperateRepeats(set);
abund = abundance(repeats);
permutateFirsts([], firsts);
alert("Permutations of [" + set + "]\ntotal: " + (unique * factor) + ", unique: " + unique);


// SEPERATE REPEATED CHARACTERS AND CALCULATE TOTAL/UNIQUE RATIO
function seperateRepeats(set) {
for (var i = 0; i < set.length; i++) {
var first, elem = set[i];
if (firsts.indexOf(elem) == -1) firsts.push(elem)
else if ((first = repeats.indexOf(elem)) == -1) {
repeats.push(elem);
factor *= 2;
} else {
repeats.splice(first, 0, elem);
factor *= repeats.lastIndexOf(elem) - first + 2;
}
}
}


// FIND ALL PERMUTATIONS OF THE FIRSTS USING RECURSION
function permutateFirsts(perm, set) {
if (set.length > 0) {
for (var i = 0; i < set.length; i++) {
var s = set.slice();
var e = s.splice(i, 1);
if (e[0] == abund.elem && s.length < abund.num) continue;
permutateFirsts(perm.concat(e), s, abund);
}
}
else if (repeats.length > 0) {
insertRepeats(perm, repeats);
}
else {
document.write(perm + "<BR>");
++unique;
}
}


// INSERT REPEATS INTO THE PERMUTATIONS USING RECURSION
function insertRepeats(perm, set) {
var abund = abundance(set);
if (perm.length - perm.lastIndexOf(abund.elem) > abund.num) {
var sel = selectElement(perm, set);
var s = set.slice();
var elem = s.splice(sel, 1)[0];
for (var i = perm.lastIndexOf(elem) + 2; i <= perm.length; i++) {
var p = perm.slice();
p.splice(i, 0, elem);
if (set.length == 1) {
document.write(p + "<BR>");
++unique;
} else {
insertRepeats(p, s);
}
}
}
}


// SELECT THE ELEMENT FROM THE SET WHOSE LAST OCCURANCE IN THE PERMUTATION COMES FIRST
function selectElement(perm, set) {
var sel, pos, min = perm.length;
for (var i = 0; i < set.length; i++) {
pos = perm.lastIndexOf(set[i]);
if (pos < min) {
min = pos;
sel = i;
}
}
return(sel);
}


// FIND ABUNDANT ELEMENT AND ABUNDANCE NUMBER
function abundance(set) {
if (set.length == 0) return ({elem: null, num: 0});
var elem = set[0], max = 1, num = 1;
for (var i = 1; i < set.length; i++) {
if (set[i] != set[i - 1]) num = 1
else if (++num > max) {
max = num;
elem = set[i];
}
}
return ({elem: elem, num: 2 * max - set.length});
}
}


seperatedPermutations(["a","b","r","a","c","a","d","a","b","r","a"]);