为什么使用 C # 类系统。随机而不是系统。安全。加密。随机数生成器?

为什么会有人使用来自 系统,随机的的“标准”随机数生成器,而不是总是使用来自 系统,安全,密码学,随机数字生成器的加密安全的随机数生成器(或者它的子类,因为随机数生成器是抽象的) ?

Nate Lawson 在13:11分钟的 Google Tech Talk 演讲“ 加密反击”中告诉我们,不要使用 Python、 Java 和 C # 中的“标准”随机数生成器,而是使用加密安全版本。

我知道这两个版本的随机数生成器之间的区别(参见 问题101337)。

但是为什么不总是使用安全的随机数生成器呢?为何使用系统。一点都不随机吗?也许是表演?

57675 次浏览

System.Random is much more performant since it does not generate cryptographically secure random numbers.

A simple test on my machine filling a buffer of 4 bytes with random data 1,000,000 times takes 49 ms for Random, but 2845 ms for RNGCryptoServiceProvider. Note that if you increase the size of the buffer you are filling, the difference narrows as the overhead for RNGCryptoServiceProvider is less relevant.

Speed and intent. If you're generating a random number and have no need for security, why use a slow crypto function? You don't need security, so why make someone else think that the number may be used for something secure when it won't be?

Apart from the speed and the more useful interface (NextDouble() etc) it is also possible to make a repeatable random sequence by using a fixed seed value. That is quite useful, amongst others during Testing.

Random gen1 = new Random();     // auto seeded by the clock
Random gen2 = new Random(0);    // Next(10) always yields 7,8,7,5,2,....

If I don't need the security, i.e., I just want a relatively indeterminate value not one that's cryptographically strong, Random has a much easier interface to use.

Not everyone needs cryptographically secure random numbers, and they might benefit more from a speedier plain prng. Perhaps more importantly is that you can control the sequence for System.Random numbers.

In a simulation utilizing random numbers you might want to recreate, you rerun the simulation with the same seed. It can be handy for tracking bugs when you want to regenerate a given faulty scenario as well - running your program with the exact same sequence of random numbers that crashed the program.

If you're programming an online card game or lotter then you would want to make sure the sequence is next to impossible to guess. However, if you are showing users, say, a quote of the day the performance is more important than security.

The most obvious reasons have already been mentioned, so here's a more obscure one: cryptographic PRNGs typically need to be continually be reseeded with "real" entropy. Thus, if you use a CPRNG too often, you could deplete the system's entropy pool, which (depending on the implementation of the CPRNG) will either weaken it (thus allowing an attacker to predict it) or it will block while trying to fill up its entropy pool (thus becoming an attack vector for a DoS attack).

Either way, your application has now become an attack vector for other, totally unrelated applications which – unlike yours – actually vitally depend on the cryptographic properties of the CPRNG.

This is an actual real world problem, BTW, that has been observed on headless servers (which naturally have rather small entropy pools because they lack entropy sources such as mouse and keyboard input) running Linux, where applications incorrectly use the /dev/random kernel CPRNG for all sorts of random numbers, whereas the correct behavior would be to read a small seed value from /dev/urandom and use that to seed their own PRNG.

This has been discussed at some length, but ultimately, the issue of performance is a secondary consideration when selecting a RNG. There are a vast array of RNGs out there, and the canned Lehmer LCG that most system RNGs consists of is not the best nor even necessarily the fastest. On old, slow systems it was an excellent compromise. That compromise is seldom ever really relevant these days. The thing persists into present day systems primarily because A) the thing is already built, and there is no real reason to 'reinvent the wheel' in this case, and B) for what the vast bulk of people will be using it for, it's 'good enough'.

Ultimately, the selection of an RNG comes down to Risk/Reward ratio. In some applications, for example a video game, there is no risk whatsoever. A Lehmer RNG is more than adequate, and is small, concise, fast, well-understood, and 'in the box'.

If the application is, for example, an on-line poker game or lottery where there are actual prizes involved and real money comes into play at some point in the equation, the 'in the box' Lehmer is no longer adequate. In a 32-bit version, it only has 2^32 possible valid states before it begins to cycle at best. These days, that's an open door to a brute force attack. In a case like this, the developer will want to go to something like a Very Long Period RNG of some species, and probably seed it from a cryptographically strong provider. This gives a good compromise between speed and security. In such a case, the person will be out looking for something like the Mersenne Twister, or a Multiple Recursive Generator of some kind.

If the application is something like communicating large quantities of financial information over a network, now there is a huge risk, and it heavily outweights any possible reward. There are still armored cars because sometimes heavily armed men is the only security that's adequate, and trust me, if a brigade of special ops people with tanks, fighters, and helicopters was financially feasible, it would be the method of choice. In a case like this, using a cryptographically strong RNG makes sense, because whatever level of security you can get, it's not as much as you want. So you'll take as much as you can find, and the cost is a very, very remote second-place issue, either in time or money. And if that means that every random sequence takes 3 seconds to generate on a very powerful computer, you're going to wait the 3 seconds, because in the scheme of things, that is a trivial cost.

Different needs call for different RNGs. For crypto, you want your random numbers to be as random as possible. For Monte Carlo simulations, you want them to fill the space evenly and to be able to start the RNG from a known state.

First of all the presentation you linked only talks about random numbers for security purposes. So it doesn't claim Random is bad for non security purposes.

But I do claim it is. The .net 4 implementation of Random is flawed in several ways. I recommend only using it if you don't care about the quality of your random numbers. I recommend using better third party implementations.

Flaw 1: The seeding

The default constructor seeds with the current time. Thus all instances of Random created with the default constructor within a short time-frame (ca. 10ms) return the same sequence. This is documented and "by-design". This is particularly annoying if you want to multi-thread your code, since you can't simply create an instance of Random at the beginning of each thread's execution.

The workaround is to be extra careful when using the default constructor and manually seed when necessary.

Another problem here is that the seed space is rather small (31 bits). So if you generate 50k instances of Random with perfectly random seeds you will probably get one sequence of random numbers twice (due to the birthday paradox). So manual seeding isn't easy to get right either.

Flaw 2: The distribution of random numbers returned by Next(int maxValue) is biased

There are parameters for which Next(int maxValue) is clearly not uniform. For example if you calculate r.Next(1431655765) % 2 you will get 0 in about 2/3 of the samples. (Sample code at the end of the answer.)

Flaw 3: The NextBytes() method is inefficient.

The per byte cost of NextBytes() is about as big as the cost to generate a full integer sample with Next(). From this I suspect that they indeed create one sample per byte.

A better implementation using 3 bytes out of each sample would speed NextBytes() up by almost a factor 3.

Thanks to this flaw Random.NextBytes() is only about 25% faster than System.Security.Cryptography.RNGCryptoServiceProvider.GetBytes on my machine (Win7, Core i3 2600MHz).

I'm sure if somebody inspected the source/decompiled byte code they'd find even more flaws than I found with my black box analysis.


Code samples

r.Next(0x55555555) % 2 is strongly biased:

Random r = new Random();
const int mod = 2;
int[] hist = new int[mod];
for(int i = 0; i < 10000000; i++)
{
int num = r.Next(0x55555555);
int num2 = num % 2;
hist[num2]++;
}
for(int i=0;i<mod;i++)
Console.WriteLine(hist[i]);

Performance:

byte[] bytes=new byte[8*1024];
var cr=new System.Security.Cryptography.RNGCryptoServiceProvider();
Random r=new Random();


// Random.NextBytes
for(int i=0;i<100000;i++)
{
r.NextBytes(bytes);
}


//One sample per byte
for(int i=0;i<100000;i++)
{
for(int j=0;j<bytes.Length;j++)
bytes[j]=(byte)r.Next();
}


//One sample per 3 bytes
for(int i=0;i<100000;i++)
{
for(int j=0;j+2<bytes.Length;j+=3)
{
int num=r.Next();
bytes[j+2]=(byte)(num>>16);
bytes[j+1]=(byte)(num>>8);
bytes[j]=(byte)num;
}
//Yes I know I'm not handling the last few bytes, but that won't have a noticeable impact on performance
}


//Crypto
for(int i=0;i<100000;i++)
{
cr.GetBytes(bytes);
}

Note that the System.Random class in C# is coded incorrectly, so should be avoided.

https://connect.microsoft.com/VisualStudio/feedback/details/634761/system-random-serious-bug#tabs

I wrote a game (Crystal Sliders on the iPhone: Here) that would put up a "random" series of gems (images) on the map and you would rotate the map how you wanted it and select them and they went away. - Similar to Bejeweled. I was using Random(), and it was seeded with the number of 100ns ticks since the phone booted, a pretty random seed.

I found it amazing that it would generate games that were nearly identical to each other -of the 90 or so gems, of 2 colors, I would get two EXACTLY the same except for 1 to 3 gems! If you flip 90 coins and get the same pattern except for 1-3 flips, that is VERY unlikely! I have several screen shots that show them the same. I was shocked at how bad System.Random() was! I assumed, that I MUST have written something horribly wrong in my code and was using it wrong. I was wrong though, it was the generator.

As an experiment - and a final solution, I went back to the random number generator that I've been using since 1985 or so - which is VASTLY better. It is faster, has a period of 1.3 * 10^154 (2^521) before it repeats. The original algorithm was seeded with a 16 bit number, but I changed that to a 32 bit number, and improved the initial seeding.

The original one is here:

ftp://ftp.grnet.gr/pub/lang/algorithms/c/jpl-c/random.c

Over the years, I've thrown every random number test I could think of at this, and it past all of them. I don't expect that it is of any value as a cryptographic one, but it returns a number as fast as "return *p++;" until it runs out of the 521 bits, and then it runs a quick process over the bits to create new random ones.

I created a C# wrapper - called it JPLRandom() implemented the same interface as Random() and changed all the places where I called it in the code.

The difference was VASTLY better - OMG I was amazed - there should be no way I could tell from just looking at the screens of 90 or so gems in a pattern, but I did an emergency release of my game following this.

And I would never use System.Random() for anything ever again. I'm SHOCKED that their version is blown away by something that is now 30 years old!

-Traderhut Games

Random is not a random number generator, it is a deterministic pseudo-random sequence generator, which takes its name for historical reasons.

The reason to use System.Random is if you want these properties, namely a deterministic sequence, which is guaranteed to produce the same sequence of results when initialized with the same seed.

If you want to improve the "randomness" without sacrificing the interface, you can inherit from System.Random overriding several methods.

Why would you want a deterministic sequence

One reason to have a deterministic sequence rather than true randomness is because it is repeatable.

For example, if you are running a numerical simulation, you can initialize the sequence with a (true) random number, and record what number was used.

Then, if you wish to repeat the exact same simulation, e.g. for debugging purposes, you can do so by instead initializing the sequence with the recorded value.

Why would you want this particular, not very good, sequence

The only reason I can think of would be for backwards compatibility with existing code which uses this class.

In short, if you want to improve the sequence without changing the rest of your code, go ahead.