Shuffle two list at once with same order

I'm using the nltk library's movie_reviews corpus which contains a large number of documents. My task is get predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists documents and documents2 I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I'm using python 2.7

Example (in real are strings tokenized, but it is not relative):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
(['drink and then drive . '], 'pos'),
(['they get into an accident . '], 'neg'),
(['one of the guys dies'], 'neg')]


documents2 = [(['plot two teen couples church party'], 'neg'),
(['drink then drive . '], 'pos'),
(['they get accident . '], 'neg'),
(['one guys dies'], 'neg')]

And I need get this result after shuffle both lists:

documents = [(['one of the guys dies'], 'neg'),
(['they get into an accident . '], 'neg'),
(['drink and then drive . '], 'pos'),
(['plot : two teen couples go to a church party , '], 'neg')]


documents2 = [(['one guys dies'], 'neg'),
(['they get accident . '], 'neg'),
(['drink then drive . '], 'pos'),
(['plot two teen couples church party'], 'neg')]

I have this code:

def cleanDoc(doc):
stopset = set(stopwords.words('english'))
stemmer = nltk.PorterStemmer()
clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
final = [stemmer.stem(word) for word in clean]
return final


documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]


documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]


random.shuffle( and here shuffle documents and documents2 with same order) # or somehow
111419 次浏览

你可以这样做:

import random


a = ['a', 'b', 'c']
b = [1, 2, 3]


c = list(zip(a, b))


random.shuffle(c)


a, b = zip(*c)


print a
print b


[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

当然,这是一个具有简单列表的示例,但是对于您的案例来说,改编也是一样的。

You can use the second argument of the shuffle function to fix the order of shuffling.

Specifically, you can pass the second argument of shuffle function a zero argument function which returns a value in [0, 1). The return value of this function fixes the order of shuffling. (By default i.e. if you do not pass any function as the second argument, it uses the function random.random(). You can see it at line 277 给你.)

这个例子说明了我所描述的:

import random


a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]


r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same


print a
print b

Output:

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]

同时洗牌一组列表。

from random import shuffle


def shuffle_list(*ls):
l =list(zip(*ls))


shuffle(l)
return zip(*l)


a = [0,1,2,3,4]
b = [5,6,7,8,9]


a1,b1 = shuffle_list(a,b)
print(a1,b1)


a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

Output:

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

注:
shuffle_list()返回的对象是 tuples

附言。 shuffle_list()也可以应用于 numpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])


a1,b1 = shuffle_list(a,b)
print(a1,b1)

产出:

$ (3, 1, 2) (6, 4, 5)

我有个简单的方法

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])


indices = np.arange(a.shape[0])
np.random.shuffle(indices)


a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])
from sklearn.utils import shuffle


a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]


a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)


#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

Easy and fast way to do this is to use random.seed() with random.shuffle() . It lets you generate same random order many times you want. 它会像这样:

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)


>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

This also works when you can't work with both lists at the same time, because of memory problems.

You can store the order of the values in a variable, then sort the arrays simultaneously:

array1 = [1, 2, 3, 4, 5]
array2 = ["one", "two", "three", "four", "five"]


order = range(len(array1))
random.shuffle(order)


newarray1 = []
newarray2 = []
for x in range(len(order)):
newarray1.append(array1[order[x]])
newarray2.append(array2[order[x]])


print newarray1, newarray2

这种方法同样有效:

import numpy as np


a = ['a', 'b', 'c']
b = [1, 2, 3]


rng = np.random.default_rng()


state = rng.bit_generator.state
rng.shuffle(a)
# use same seeds for a & b!
rng.bit_generator.state = state # set state to same state as before
rng.shuffle(b)


print(a)
print(b)

产出:

['b', 'a', 'c']
[2, 1, 3]