在 MySQL 中搜索“整个单词匹配”

我想编写一个 SQL 查询,在文本字段中搜索关键字,但只有当它是“整个单词匹配”(例如,当我搜索“ rid”时,它不应该匹配“ ”,但它应该匹配“ a rid”。

我正在使用 MySQL。

幸运的是,在这个应用程序中性能并不重要,数据库大小和字符串大小都很小,但是我更喜欢在 SQL 中而不是在 PHP 中驱动它。

68830 次浏览

You can use like with the wildcard marker to catch the possibilities (at start, at end, in middle, and alone), something like this should suffice:

select blah blah blah where column like 'rid %' or column like '% rid' or column like '% rid %' or column = 'rid'

select * from table where Locate('rid ', FieldToSearch) > 0
or Locate(' rid', FieldToSearch) > 0

This will handle finding rid where it is preceded or followed by a space, you could extend the approach to take account of .,?! and so on, not elegant but easy.

This is the best answer I've come up myself with so far:

SELECT * FROM table
WHERE keywords REGEXP '^rid[ $]' OR keywords REGEXP ' rid[ $]'

I would have simplified it to:

SELECT *
FROM table
WHERE keywords REGEXP '[^ ]rid[ $]'

but [^ ] has a special meaning of "NOT a space", rather than "line-beginning or space".

How does REGEXP compare to multiple LIKE conditions? (Not that performance matters in this app.)

You can use REGEXP and the [[:<:]] and [[:>:]] word-boundary markers:

SELECT *
FROM table
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

Update for 2020: (actually 2018+)

MySQL updated its RegExp-Engine in version 8.0.4, so you will now need to use the "standard" word boundary marker \b:

SELECT *
FROM table
WHERE keywords REGEXP '\\brid\\b'

Also be aware that you need to escape the backslash by putting a second backslash.

Found an answer to prevent the classic word boundary [[::<::]] clashing with special characters eg .@#$%^&*

Replace..

SELECT *
FROM table
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

With this..

SELECT *
FROM table
WHERE keywords REGEXP '([[:blank:][:punct:]]|^)rid([[:blank:][:punct:]]|$)'

The latter matches (space, tab, etc) || (comma, bracket etc) || start/end of line. A more 'finished' word boundary match.

Use regexp with word boundaries, but if you want also accent insensitive search, please note that REGEXP is a single-byte operator, so it is Worth nothing to have utf8_general_ci collation, the match will not be accent insensitive.

To have both accent insensitive and whole word match, specify the word written in the same way the (deprecated) PHP function sql_regcase() did.

In fact:

  • utf8_general_ci allows you to make an equality (WHERE field = value) case and accent insensitive search but it doesn't allow you to specify an entire word match (word boundaries markers not recognized)

  • LIKE allows you case and accent insensitive search but you have to manually specify all combinations of possible word boundaries charactes (word boundaries markers not recognized)

  • word boundaries [[:<:]] and [[:>:]] are supported in REGEXP, who is a single byte functions so don't perform accent insensitive search.

The solution is to use REGEXP with word boundaries and the word modified in the way sql_regcase does.

Used on http://www.nonsolodiete.it