使用 InnoDB 进行全文检索

我正在开发一个大容量的 web 应用程序,其中一部分是一个 MySQL 讨论文章数据库,需要平稳地增长到20M 以上的行。

我最初计划对表使用 MyISAM (对于内置的 全文检索功能) ,但是一想到 整张桌子由于单个写操作而被锁定,我就感到很紧张。行级锁更有意义(更不用说 InnoDB 在处理大型表时的其他速度优势)。因此,出于这个原因,我决定使用 InnoDB。

问题是... InnoDB 没有内置的全文搜索功能。

我应该用第三方搜索系统吗?就像 Lucene (c + +)/狮身人面像?你们这些数据库忍者有什么建议或指导吗?LinkedIn 的 Zoie (基于 Lucene)似乎是目前最好的选择... 是围绕实时功能构建的(这对我的应用程序非常关键)我有点犹豫要不要做出承诺。

(仅供参考: 将在 EC2上使用高内存设备,使用 PHP 服务于前端)

74113 次浏览

Sphinx, as you point out, is quite nice for this stuff. All the work is in the configuration file. Make sure whatever your table is with the strings has some unique integer id key, and you should be fine.

I can vouch for MyISAM fulltext being a bad option - even leaving aside the various problems with MyISAM tables in general, I've seen the fulltext stuff go off the rails and start corrupting itself and crashing MySQL regularly.

A dedicated search engine is definitely going to be the most flexible option here - store the post data in MySQL/innodb, and then export the text to your search engine. You can set up a periodic full index build/publish pretty easily, and add real-time index updates if you feel the need and want to spend the time.

Lucene and Sphinx are good options, as is Xapian, which is nice and lightweight. If you go the Lucene route don't assume that Clucene will better, even if you'd prefer not to wrestle with Java, although I'm not really qualified to discuss the pros and cons of either.

You should spend an hour and go through installation and test-drive of Sphinx and Lucene. See if either meets your needs, with respect to data updates.

One of the things that disappointed me about Sphinx is that it doesn't support incremental inserts very well. That is, it's very expensive to reindex after an insert, so expensive that their recommended solution is to split your data into older, unchanging rows and newer, volatile rows. So every search your app does would have to search twice: once on the larger index for old rows and also on the smaller index for recent rows. If that doesn't integrate with your usage patterns, this Sphinx is not a good solution (at least not in its current implementation).

I'd like to point out another possible solution you could consider: Google Custom Search. If you can apply some SEO to your web application, then outsource the indexing and search function to Google, and embed a Google search textfield into your site. It could be the most economical and scalable way to make your site searchable.

Perhaps you shouldn't dismiss MySQL's FT so quickly. Craigslist used to use it.

MySQL’s speed and Full Text Search has enabled craigslist to serve their users .. craigslist uses MySQL to serve approximately 50 million searches per month at a rate of up to 60 searches per second."

edit

As commented below, Craigslist seems to have switched to Sphinx some time in early 2009.

try this

ROUND((LENGTH(text) - LENGTH(REPLACE(text, 'serchtext', ''))) / LENGTH('serchtext'),0)!=0

Along with the general phasing out of MyISAM, InnoDB full-text search (FTS) is finally available in MySQL 5.6.4 release.

Lots of juicy details at https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html.

While other engines have lots of different features, this one is InnoDB, so it's native (which means there's an upgrade path), and that makes it a worthwhile option.

You should take a look at Sphinx. It is worth a try. It's indexing is super fast and it is distributed. You should take a look at this (http://www.percona.com/webinars/2012-08-22-full-text-search-throwdown) webminar. It talks about searching and has some neat benchmarks. You may find it helpful.

If everything else fails, there's always soundex_match, which sadly isn't really fast an accurate

For anyone stuck on an older version of MySQL / MariaDB (i.e. CentOS users) where InnoDB doesn't support Fulltext searches, my solution when using InnoDB tables was to create a separate MyISAM table for the thing I wanted to search.

For example, my main InnoDB table was products with various keys and referential integrity. I then created a simple MyISAM table called product_search containing two fields, product_id and product_name where the latter was set to a FULLTEXT index. Both fields are effectively a copy of what's in the main product table.

I then search on the MyISAM table using fulltext, and do an inner join back to the InnoDB table.

The contents of the MyISAM table can be kept up-to-date via either triggers or the application's model.

I wouldn't recommend this if you have multiple tables that require fulltext, but for a single table it seems like an adequate work around until you can upgrade.