始终创建相同的随机数组

我正在等待另一个开发人员完成一段代码,该代码将返回一个 np 形状数组(100,2000) ,其值为 -1、0或1。

与此同时,我想随机创建一个相同特性的数组,这样我就可以在开发和测试方面抢占先机。问题是,我希望这个随机创建的数组每次都是相同的,这样我就不会针对每次重新运行流程时不断更改其值的数组进行测试。

我可以像这样创建我的数组,但是有没有一种方法可以创建它,使它每次都是相同的。我可以把这个东西腌了再解开,但不知道还有没有别的办法。

r = np.random.randint(3, size=(100, 2000)) - 1
84861 次浏览

Simply seed the random number generator with a fixed value, e.g.

numpy.random.seed(42)

This way, you'll always get the same random number sequence.

This function will seed the global default random number generator, and any call to a function in numpy.random will use and alter its state. This is fine for many simple use cases, but it's a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern's answer below.

Create your own instance of numpy.random.RandomState() with your chosen seed. Do not use numpy.random.seed() except to work around inflexible libraries that do not let you pass around your own RandomState instance.

[~]
|1> from numpy.random import RandomState


[~]
|2> prng = RandomState(1234567890)


[~]
|3> prng.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])


[~]
|4> prng2 = RandomState(1234567890)


[~]
|5> prng2.randint(-1, 2, size=10)
array([ 1,  1, -1,  0,  0, -1,  1,  0, -1, -1])

If you are using other functions relying on a random state, you can't just set and overall seed, but should instead create a function to generate your random list of number and set the seed as a parameter of the function. This will not disturb any other random generators in the code:

# Random states
def get_states(random_state, low, high, size):
rs = np.random.RandomState(random_state)
states = rs.randint(low=low, high=high, size=size)
return states


# Call function
states = get_states(random_state=42, low=2, high=28347, size=25)

It is important to understand what is the seed of a random generator and when/how it is set in your code (check e.g. here for a nice explanation of the mathematical meaning of the seed).

For that you need to set the seed by doing:

random_state = np.random.RandomState(seed=your_favorite_seed_value)

It is then important to generate the random numbers from random_state and not from np.random. I.e. you should do:

random_state.randint(...)

instead of

np.random.randint(...)

which will create a new instance of RandomState() and basically use your computer internal clock to set the seed.

I just want to clarify something in regard to @Robert Kern answer just in case that is not clear. Even if you do use the RandomState you would have to initialize it every time you call a numpy random method like in Robert's example otherwise you'll get the following results.

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> prng = np.random.RandomState(2019)
>>> prng.randint(-1, 2, size=10)
array([-1,  1,  0, -1,  1,  1, -1,  0, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([-1, -1, -1,  0, -1, -1,  1,  0, -1, -1])
>>> prng.randint(-1, 2, size=10)
array([ 0, -1, -1,  0,  1,  1, -1,  1, -1,  1])
>>> prng.randint(-1, 2, size=10)
array([ 1,  1,  0,  0,  0, -1,  1,  1,  0, -1])

Based on the latest updates in Random sampling the preferred way is to use Generators instead of RandomState. Refer to What's new or different to compare both approaches. One of the key changes is the difference between the slow Mersenne Twister pseudo-random number generator (RandomState) and a stream of random bits based on different algorithms (BitGenerators) used in the new approach (Generators).

Otherwise, the steps for producing random numpy array is very similar:

  1. Initialize random generator

Instead of RandomState you will initialize random generator. default_rng is the recommended constructor for the random Generator, but you can ofc try another ways.

import numpy as np


rng = np.random.default_rng(42)
# rng -> Generator(PCG64)
  1. Generate numpy array

Instead of randint method, there is Generator.integers method which is now the canonical way to generate integer random numbers from a discrete uniform distribution (see already mentioned What's new or different summary). Note, that endpoint=True uses [low, high] interval for sampling instead of the default [low, high).

arr = rng.integers(-1, 1, size=10, endpoint=True)
# array([-1,  1,  0,  0,  0,  1, -1,  1, -1, -1])

As already discussed, you have to initialize random generator (or random state) every time to generate identical array. Therefore, the simplest thing is to define custom function similar to the one from @mari756h answer:

def get_array(low, high, size, random_state=42, endpoint=True):
rng = np.random.default_rng(random_state)
return rng.integers(low, high, size=size, endpoint=endpoint)

When you call the function with the same parameters you will always get the identical numpy array.

get_array(-1, 1, 10)
# array([-1,  1,  0,  0,  0,  1, -1,  1, -1, -1])


get_array(-1, 1, 10, random_state=12345)  # change random state to get different array
# array([ 1, -1,  1, -1, -1,  1,  0,  1,  1,  0])


get_array(-1, 1, (2, 2), endpoint=False)
# array([[-1,  0],
#        [ 0, -1]])

And for your needs you would use get_array(-1, 1, size=(100, 2000)).