为什么盐会让字典攻击变得“不可能”?

更新: 请注意,我不是在问盐是什么,彩虹桌是什么,字典式攻击是什么,或者盐的用途是什么。我在问: 如果您知道用户 salt 和 hash,那么计算他们的密码不是很容易吗?

我理解这个过程,并且在我的一些项目中自己实现它。

s =  random salt
storedPassword = sha1(password + s)

在存储的数据库中:

username | hashed_password | salt

我见过的每一个 salting 实现都会在密码的末尾或者开头添加 salt:

hashed_Password = sha1(s + password )
hashed_Password = sha1(password + s)

因此,值得一试的黑客字典式攻击(哈哈)只需将每个关键字与上述常见组合中储存的盐进行比对即可。

当然,上面描述的实现只是为黑客增加了另一个步骤,而没有实际解决根本问题?有什么办法可以解决这个问题,还是我误解了这个问题?

我唯一能想到的办法就是使用一种秘密混合算法,将 salt 和 password 以随机模式混合在一起,或者在散列过程中添加其他用户字段,这意味着黑客必须访问数据库 AND 代码,将它们混合在一起,才能证明字典式攻击是有效的。(更新,正如在评论中指出的那样,最好假设黑客已经访问了你的所有信息,所以这可能不是最好的)。

让我举个例子来说明我是如何建议一个黑客利用密码和哈希表来黑进一个用户数据库的:

黑客数据库里的数据:

RawPassword (not stored)  |  Hashed   |     Salt
--------------------------------------------------------
letmein                       WEFLS...       WEFOJFOFO...

常用密码字典:

   Common Password
--------------
letmein
12345
...

对于每个用户记录,循环使用通用密码并散列它们:

for each user in hacked_DB


salt = users_salt
hashed_pw = users_hashed_password


for each common_password


testhash = sha1(common_password + salt)
if testhash = hashed_pw then
//Match!  Users password = common_password
//Lets visit the webpage and login now.
end if


next


next

我希望这能更好地说明我的观点。

给定10,000个常用密码和10,000个用户记录,我们需要计算100,000,000个哈希才能发现尽可能多的用户密码。可能需要几个小时,但这不是问题。

裂纹理论研究进展

我们将假设我们是一个腐败的网络主机,有访问数据库的 SHA1哈希和盐,随着您的算法,以混合他们。该数据库有10,000条用户记录。

这个站点 声称能够使用 GPU 计算每秒2,300,000,000 SHA1哈希。(在现实世界中可能会慢一些,但是现在我们将使用这个引用的数字)。

(((95 ^ 4)/230000000)/2) * 10000 = 177 几秒钟

给定一个95个可打印的 ASCII 字符的全部范围,最大长度为4个字符,除以计算速率(变量) ,除以2(假设发现密码的平均时间将平均需要50% 的排列) ,对于10,000个用户,计算出所有长度 < = 4的用户密码需要177秒。

让我们调整一下现实主义。

(((36 ^ 7)/100000000)/2) * 10000 = 2天

假设没有大小写敏感性,密码长度 < = 7,只有字母数字字符,需要4天才能解决10,000个用户记录,我已经将算法的速度减半,以反映开销和非理想情况。

重要的是要认识到,这是一个线性穷举法,所有的计算都是相互独立的,因此这是一个完美的任务,多个系统来解决。(IE 很容易设置2台计算机从不同的端运行攻击,可以减少一半的执行时间)。

考虑到递归散列密码1000次以增加这项任务的计算开销的情况:

((36 ^ 7)/1000000000)/2) * 1000 秒 = 10.8839117小时

这表示最大长度为7个字母数字字符,从 一个用户的引号图以不到一半的速度执行。

递归散列1000次有效地阻止了一次全面攻击,但是针对用户数据的攻击仍然很脆弱。

16265 次浏览

To be more precise, a dictionary attack, i.e. an attack where all words in an exhaustive list are tried, gets not "impossible", but it gets impractical: each bit of salt doubles the amount of storage and computation required.

This is different from pre-computed dictionary attacks like attacks involving rainbow tables where it does not matter whether the salt is secret or not.

Example: With a 64-bit salt (i.e. 8 bytes) you need to check 264 additional password combinations in your dictionary attack. With a dictionary containing 200,000 words you will have to make

200,000 * 264 = 3.69 * 1024

tests in the worst case - instead of 200,000 tests without salt.

An additional benefit of using salt is that an attacker cannot pre-compute the password hashes from his dictionary. It would simply take too much time and/or space.

Update

Your update assumes that an attacker already knows the salt (or has stolen it). This is of course a different situation. Still it is not possible for the attacker to use a pre-computed rainbow table. What matters here a lot is the speed of the hashing function. To make an attack impractical, the hashing function needs to be slow. MD5 or SHA are not good candidates here because they are designed to be fast, better candidates for hashing algorithms are Blowfish or some variations of it.

Update 2

A good read on the matter of securing your password hashes in general (going much beyond the original question but still interesting):

Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes

Corollary of the article: Use salted hashes created with bcrypt (based on Blowfish) or Eksblowfish that allows you to use a configurable setup time to make hashing slow.

The idea behind dictionary attack is that you take a hash and find the password, from which this hash was calculated, without hash calculation. Now do the same with salted password - you can't.

Not using a salt makes password search as easy as lookup in the database. Adding a salt make attacker perform hash calculation of all possible passwords (even for dictionary attach this significantly increases time of attack).

A dictionary is a structure where values are indexed by keys. In the case of a pre-computed dictionary attack, each key is a hash, and the corresponding value is a password that results in the hash. With a pre-computed dictionary in hand, an attacker can "instantly" lookup a password that will produce the necessary hash to log in.

With salt, the space required to store the dictionary grows rapidly… so rapidly, that trying to pre-compute a password dictionary soon becomes pointless.

The best salts are randomly chosen from a cryptographic random number generator. Eight bytes is a practical size, and more than 16 bytes serves no purpose.


Salt does much more than just "make an attacker's job more irritating." It eliminates an entire class of attack—the use of precomputed dictionaries.

Another element is necessary to completely secure passwords, and that is "key-strengthening." One round of SHA-1 is not good enough: a safe password hashing algorithm should be very slow computationally.

Many people use PBKDF2, a key derivation function, that feeds back results to the hash function thousands of times. The "bcrypt" algorithm is similar, using an iterative key derivation that is slow.

When the hashing operation is very slow, a precomputed table becomes more and more desirable to an attacker. But proper salt defeats that approach.


Comments

Below are the comments I made on the question.


Without salt, an attacker wouldn't use the method demonstrated in "Update 2". He'd simply do a lookup in a pre-computed table and get the password in O(1) or O(log n) time (n being the number of candidate passwords). Salt is what prevents that and forces him to use the O(n) approach shown in "Update 2".

Once reduced to an O(n) attack, we have to consider how long each attempt takes. Key-strengthening can cause each attempt in the loop to take a full second, meaning that the time needed to test 10k passwords on 10k users will stretch from 3 days to 3 years… and with only 10k passwords, you're likely to crack zero passwords in that time.

You have to consider that an attacker is going to use the fastest tools he can, not PHP, so thousands of iterations, rather than 100, would be a good parameter for key-strengthening. It should take a large fraction of a second to compute the hash for a single password.

Key-strengthening is part of the standard key derivation algorithms PBKDF1 and PBKDF2, from PKCS #5, which make great password obfuscation algorithms (the "derived key" is the "hash").

A lot of users on StackOverflow refer to this article because it was a response to Jeff Atwood's post about the dangers of rainbow tables. It's not my favorite article, but it does discuss these concepts in more detail.


Of course you assume the attacker has everything: salt, hash, user name. Assume the attacker is a corrupt hosting company employee who dumped the user table on your myprettypony.com fansite. He's trying recover these passwords because he's going to turn around and see if your pony fans used the same password on their citibank.com accounts.

With a well-designed password scheme, it will be impossible for this guy to recover any passwords.

It doesn't stop dictionary attacks.

What it does is stop someone who manages to get a copy of your password file from using a rainbow table to figure out what the passwords are from the hashes.

Eventually, it can be brute-forced, though. The answer to that part is to force your users to not use dictionary words as passwords (minimum requirements of at least one number or special character, for example).

Update:

I should have mentioned this earlier, but some (most?) password systems use a different salt for each password, likely stored with the password itself. This makes a single rainbow table useless. This is how the UNIX crypt library works, and modern UNIX-like OSes have extended this library with new hash algorithms.

I know for a fact that support for SHA-256 and SHA-512 were added in newer versions of GNU crypt.

Salt makes Rainbow table attacks much more difficult since it makes a single password hash much harder to crack. Imagine you have a horrid password of just the number 1. A rainbow table attack would crack this immediately.

Now imagine each password in the db is salted with a long random value of many random characters. Now your lousy password of "1" is stored in the db as a hash of 1 plus a bunch of random characters (the salt), so in this example the rainbow table needs to have the hash for something like: 1.

So assuming your salt is something secure and random, say ()%ISLDGHASKLU(%#%#, the hacker's rainbow table would need to have an entry for 1*()%ISLDGHASKLU(*%#%#. Now using a rainbow table on even this simple password is no longer practical.

Yes, you need just 3 days for sha1(salt | password). That's why good password storage algorithms use 1000-iteration hashing: you will need 8 years.

In simplest terms: without salting, each candidate password need only be hashed once to check it against every user, anywhere in the "known universe" (collection of compromised databases), whose password is hashed via the same algorithm. With salting, if the number of possible salt values substantially exceeds the number of users in the "known universe", each candidate password must be hashed separately for each user against whom it will be tested.

The point of salting is to prevent the amortization of the attacker's effort.

With no salt, a single table of precomputed hash-password entries (e.g. MD5 of all alphanumeric 5 character strings, easy to find online) can be used on every user in every database in the world.

With a site-specific salt, the attacker has to compute the table himself and can then use it on all users of the site.

With a per-user salt, the attacker has to expend this effort for every user separately.

Of course, this doesn't do much to protect really weak passwords straight out of a dictionary, but it protects reasonably strong passwords against this amortization.

Salts are implemented to prevent rainbow table attacks. A rainbow table is a list of pre-calculated hashes, which makes translating a hash into it's phrase much more simple. You need to understand that salting isn't effective as a modern prevention to cracking a password unless we have a modern hashing algo.

So lets say we're working with SHA1, taking advantage of recent exploits discovered with this algo, and lets say we have a computer running at 1,000,000 hashes/second, it would take 5.3 million million million years to find a collision, so yeah php can work 300 a second, big woop, doesn't really matter. The reason we salt is because if someone did bother to generate all common dictionary phrases, (2^160 people, welcome to 2007 era exploits).

So here's an actual database, with 2 users I use for testing and admin purposes.

RegistrationTime        UserName        UserPass
1280185359.365591       briang      a50b63e927b3aebfc20cd783e0fc5321b0e5e8b5
1281546174.065087       test        5872548f2abfef8cb729cac14bc979462798d023

In fact, the salting scheme is your sha1(registration time + user name). Go ahead, tell me my password, these are real passwords in production. You can even sit there and hash out a word list in php. Go wild.

I'm not crazy, I just know that this is secure. For fun sake, test's password is test. sha1(sha1(1281546174.065087 + test) + test) = 5872548f2abfef8cb729cac14bc979462798d023

You would need to generate an entire rainbow table perpended with 27662aee8eee1cb5ab4917b09bdba31d091ab732 for just this user. That means I can actually allow my passwords to not all be compromised by a single rainbow table, the hacker needs to generate an entire rainbow table for 27662aee8eee1cb5ab4917b09bdba31d091ab732 for test, and again f3f7735311217529f2e020468004a2aa5b3dee7f for briang. Think back to the 5.3 million million million years for all hashes. Think of the size of storing just the 2^80 hashes (that's well over 20 yottabytes), it's not going to happen.

Don't confuse salting as a means of making a hash something you can't ever decode, it's a means of preventing a rainbow table from translating all your user passwords. It's imposable at this level of technology.

Also - one more imporatant point - using a USER-specific salt prevents the detection of two users with the SAME password - their hashes would match. That's why many times the hash is hash(salt + username + password)

If you try and keep the hash secret the attacker also can not verify the hashes.

Edit- just noticed the main point was made in a comment above.

Simply put salting does not prevent a hash from attack (bruteforce or dictionary), it only makes it harder; the attacker will either need to find the salting algorithm (which if implemented properly will make use of more iterations) or bruteforce the algo, which unless very simple, is nearly impossible. Salting also almost completely discards the option of rainbow table lookups...