清除用户密码

在散列用户提供的密码并将其存储在数据库中之前,应该如何转义或清除它们?

当 PHP 开发人员出于安全目的考虑对用户的密码进行哈希处理时,他们通常倾向于将这些密码看作是用户提供的其他数据。这个主题经常出现在与密码存储相关的 PHP 问题中; 开发人员经常想要在散列并存储到数据库之前,使用诸如 escape_string()(在各种迭代中)、 htmlspecialchars()addslashes()等函数来清除密码。

7086 次浏览

You should never escape, trim or use any other cleansing mechanism on passwords you'll be hashing with PHP's password_hash() for a number of reasons, the single largest of which is because doing additional cleansing to the password requires unnecessary additional code.

You will argue (and you see it in every post where user data is accepted for use in your systems) that we should cleanse all user input and you would be right for every other piece of information we're accepting from our users. Passwords are different. Hashed passwords cannot offer any SQL injection threat because the string is turned into hash prior to storing in the database.

The act of hashing a password is the act of making the password safe to store in your database. The hash function doesn't give special meaning to any bytes, so no cleansing of its input is required for security reasons

If you follow the mantras of allowing users to use the passwords / phrases they desire and you don't limit passwords, allowing any length, any number of spaces and any special characters hashing will make the password/passphrase safe no matter what is contained within the password. As of right now the most common hash (the default), PASSWORD_BCRYPT, turns the password into a 60 character wide string containing a random salt along with the hashed password information and a cost (the algorithmic cost of creating the hash):

PASSWORD_BCRYPT is used to create new password hashes using the CRYPT_BLOWFISH algorithm. This will always result in a hash using the "$2y$" crypt format, which is always 60 characters wide.

The space requirements for storing the hash are subject to change as different hashing methods are added to the function, so it is always better to go larger on the column type for the stored hash, such as VARCHAR(255) or TEXT.

You could use a complete SQL query as your password and it would be hashed, making it unexecutable by the SQL engine e.g.,

SELECT * FROM `users`;

Could be hashed to $2y$10$1tOKcWUWBW5gBka04tGMO.BH7gs/qjAHZsC5wyG0zmI2C.KgaqU5G

Let's see how different sanitizing methods affect the password -

The password is I'm a "dessert topping" & a <floor wax>! (There are 5 spaces at the end of the password which are not displayed here.)

When we apply the following methods of trimming we get some wildy different results:

var_dump(trim($_POST['upassword']));
var_dump(htmlentities($_POST['upassword']));
var_dump(htmlspecialchars($_POST['upassword']));
var_dump(addslashes($_POST['upassword']));
var_dump(strip_tags($_POST['upassword']));

Results:

string(40) "I'm a "dessert topping" & a <floor wax>!" // spaces at the end are missing
string(65) "I'm a &quot;dessert topping&quot; &amp; a &lt;floor wax&gt;!     " // double quotes, ampersand and braces have been changed
string(65) "I'm a &quot;dessert topping&quot; &amp; a &lt;floor wax&gt;!     " // same here
string(48) "I\'m a \"dessert topping\" & a <floor wax>!     " // escape characters have been added
string(34) "I'm a "dessert topping" & a !     " // looks like we have something missing

What happens when we send these to password_hash()? They all get hashed, just as the query did above. The problem comes in when you try to verify the password. If we employ one or more of these methods we must re-employ them prior to comparing them with password_verify(). The following would fail:

password_verify($_POST['upassword'], $hashed_password); // where $hashed_password comes from a database query

You would have to run the posted password through the cleansing method you chose before using the result of that in password verification. It is an unnecessary set of steps and will make the hash no better.


Using a PHP version less than 5.5? You can use the password_hash() compatibility pack.

You really shouldn't use MD5 password hashes.

Before hashing the password, you should normalise it as described in section 4 of RFC 7613. In particular:

  1. Additional Mapping Rule: Any instances of non-ASCII space MUST be mapped to ASCII space (U+0020); a non-ASCII space is any Unicode code point having a Unicode general category of "Zs" (with the exception of U+0020).

and:

  1. Normalization Rule: Unicode Normalization Form C (NFC) MUST be applied to all characters.

This attempts to ensure that if the user types the same password but using a different input method, the password should still be accepted.