MySQL 的伸缩解决方案(复制,集群)

启动我正在工作,我们现在正在考虑我们的数据库的缩放解决方案。事情变得有点混乱(至少对我来说)与 MySQL,其中有 MySQL 集群复制MySQL 集群复制(从弗。5.1.6) ,这是 MySQL 集群的异步版本。MySQL 手册解释了其 集群常见问题中的一些差异,但很难从中确定何时使用其中一种。

我希望能够从那些熟悉这些解决方案之间的差异和利弊的人那里得到一些建议,以及你建议什么时候使用它们。

36750 次浏览

I haven't used them, but from the docs i'd say that replication is the preferred solution if the biggest load is reading from the database.

There are some good discussions about how the folks that maintain drupal.org have structured their database servers:

Both are from 2007, so the Clustering support may be stronger now, but at the time they chose replication.

Disclaimer: I have not used MySQL Cluster, so I'm only going from what I've heard.

MySQL Cluster is an HA (high availability) solution. It's fast, because it's all in memory, but it's real selling point is the availability. There is no single point of failure. With replication, on the other hand, if the master goes down, you have to actually switch to the replica, and there may be a small amount of down time. (although the DRBD solution is another alternative that has high availability)

Cluster requires that your entire database fit in memory. That means that each machine in the cluster needs to have enough memory to store the entire database. So this is not a feasible solution for very large databases (or at least it's a very expensive solution).

I think that unless HA is super important (read: probably not), it's more hassle (and money) than it's worth. Replication is more often the better way to go.

Edit: I forgot to mention also that Cluster does not allow foreign keys, and range scans are slower than on other engines. Here is a link that talks about Known Limitations of MySQL Cluster

The "in memory" limitation prevents us from using MySQL cluster for our nearly 50Gb of data, so we are using DRBD plus linux Heartbeat.

It's kind of like a raid array between two (or more) boxes that keeps the databases / logs / configs in sync (but only one server can be "live" at a time). Failover is automatic, uses the same IP address, and is quick as a mysql restart, so that's been a good solution for us.

I've been doing A LOT of reading on the available options. I also got my hands on High Performance MySQL 2nd edition, which I highly recommend.

This is what I've managed to piece together:

Clustering

Clustering in the general sense is distributing load across many servers that appear to an outside application as one server.

MySQL NDB Cluster

MySQL NDB Cluster is a distributed, in-memory, shared-nothing storage engine with synchronous replication and automatic data partioning (excuse me I borrow literally from the High Performance book, but they put it very nicely there). It can be a high performance solution for some applications, but web application generally do not work well on it.

The major problem is that beyond very simple queries (that touch only one table), the cluster will generally have to search for data on several nodes, allowing network latency to creep in and significantly slow down completion time for queries. Since the application treats the cluster as one computer, it can't tell it which node to fetch the data from.

In addition, the in-memory requirement is not workable for many large databases.

Continuent Sequoia

This is another clustering solution for MySQL, that acts as a middleware on top of the MySQL server. It offers synchronous replication, load balancing and failover. It also ensures that requests always get the data from the latest copy, automatically choosing a node that has the fresh data.

I've read some good things on it, and overall it sounds pretty promising.

Federation

Federation is similar to clustering, so I tugged it here as well. MySQL offers federation via the federated storage engine. Similar to the NDB cluster solution, it works well with simple queries only - but even worse the the cluster for complicated ones (since network latency is much higher).

Replication and load balancing

MySQL has the built in capacity to create replications of a database on different servers. This can be used for many things - splitting the load between servers, hot backups, creating test servers and failover.

The basic setup of replication involves one master server handling mostly writes and one or more slaves handling reads only. A more advanced variation is that of the master-master configuration, which allows to scale writes as well by having several servers writing at the same time.

Each configuration has its pros and cons, but one problem they all share is replication lag - since MySQL replication is asynchronous, not all nodes have the freshest data at all time. This requires the application to be aware of the replication and incorporate replication-aware queries to work as expected. For some applications this might not be a problem, but if you always need the freshest data things get somewhat complicated.

Replication requires some load balancing to split the load between the nodes. This can be as simple as some modifications to the application code, or using dedicated software and hardware solutions.

Sharding and partioning

Sharding is commonly used approach to scale database solutions. You split the data into smaller shards and spread them around different server nodes. This requires the application to be aware of the modification to the data storage to work efficiently, as it needs to know where to find the information it needs.

There are abstraction frameworks available to help deal with data sharding, such as Hibernate Shards, an extension to the Hibernate ORM (which unfortunately is in Java. I'm using PHP). HiveDB is another such solution which also supports shard rebalancing.

Others

Sphinx

Sphinx is a full-text search engine, that can be used for far more than test searches. For many queries it is much faster than MySQL (especially for grouping and sorting), and can query remote systems in parallel and aggregate the results - which make it very useful in use with sharding.

In general sphinx should be used with other scaling solutions to get more of the available hardware and infrastructure. The downside is that again you need the application code to be aware of sphinx to use it wisely.

Summary

Scaling solutions differ depending on the needs of the application that needs it. For us and for most web-applications, I believe that replication (probably multi-master) is the way to go with a load balancer distributing the load. Sharding of specific problem areas (huge tables) is also a must for being able to scale horizontally.

I'm also going to give a shot to Continuent Sequoia and see if it can really do what it promises to since it will involve the least amount of changes to application code.

The cool thing about doing replication is that it's easy. Just set up 2 mysql boxes, change the serverID on the second box, and then point the second box at the first using the change master to command.

Here is the relevant sample slave my.cnf config

#
#       Log names
#


log-bin=binlog
relay-log=relaylog
log-error=errors.log


#
#       Log tuning
#


sync_binlog = 1
binlog_cache_size = 1M


#
#       Replication rules (what are we interested in listening for...)
#
#       In our replicants, we are interested in ANYTHING that isn't a permission table thing
#


replicate-ignore-db =      mysql
replicate-wild-ignore-table=mysql.%


#
#       Replication server ID
#


server-id      =        2

So make sure each slave gets a serverID incremented by 1 (so next slave is server 3)

set up a username and password that the slave can connect on, Then run change master to MASTER_HOST = 'x.x.x.x'; change master to MASTER_PASSWORD = "xxxxx";

and so on.

finally, run "start slave;"

Up comes your slave and starts replicating. sweet huh!

This assumes you start with 2 empty servers. Then you can dump your db into the master server, and as it loads there, it will also load on the slave.

You can check the slave status by running:

show slave status \G

Have fun with it.. soooo easy...

MySQL cluster is a strange beastie and every time we've evaluated it's either performed very badly or been unreliable.

It's horribly complicated to set up (you need at least three nodes, possibly more). Also there is no provision for having clients fail over, so you have to do that yourself (Or use something else to act as a proxy etc).

It's extremely clever, because it does automatic hash partitioning on the primary key which allows you to scale writes, and also because it has no single point of failure.

But I really think it's better suited to the very special purpose cases it was designed for. It cannot in most cases replace another database engine (e.g. InnoDB) in either performance or features.

While doing High Availability study i came across many solutions and probably in our case which was more write intensive system, i found DRBD cluster better than the NDB cluster as it provides more number of transactions per second.

Mysql Replication can provide you a backup machine which can either be used as read slave or can be used in case of disaster recovery.

With different modes on transaction management provided by DRBD you can some what reduce the performance hit by device level replication of data over the network. For reliable system which should not lose any transaction in case of failure use C mode, else go for B.

I tried to list some of the learnings i did during setting up the DRBD cluster at http://www.techiegyan.com/?p=132

It works really well on dedicated connection for replication i.e. reserve separate high speed interfaces on both the machines just for drbd replication. Heartbeat can control the cluster nicely with all the services one by one i.e. IP addresses, partitions, drbd and mysql.

I am yet to discover the Master-Master configuration on DRBD. Will update as and when I get success in that.

Thanks.

in my view, the confusion here just sends me back to Mnesia. With fragmentation, declarative and pragmatic way of handling indexes, Location transparency of Database Replicas e.t.c

In our setup, We run both MySQL Cluster and Mnesia. Our data is kinda seasonal. So what happens is after sometime, we relieve mnesia of data that's no longer used and throw it into MYSQL cluster. This keeps our mnesia efficient. Also we have applications implemented in the main stream languages (Python, Clojure e.t.c) that use data direct from MySQL.

In a nutshell, we run mnesia on top of MySQL Cluster. The MySQL Cluster can handle large data sets, a database can grow to 50GB plus. We have mnesia powering the Erlang/OTP applications. Java and PHP access data from mnesia over tailored REST (recently Thrift) APIs using JSON and XML as exchange formats.

The data access layer has abstracted access to data in Mnesia and old shipped data in MySQL Cluster if its needed. Mnesia is here essentially to power the Erlang/OTP applications.Once it gets hogged up with data, we throw it into MYSQL Cluster. The data access layer can access both data in mnesia and MySQL in an abstracted API on behalf of all applications.

What i can say here is that Mnesia has been the best option for us. The tables are highly fragmented and indexed, queries perform very well and the database is replicated across 2 locations, connected over a tunnel.

Earlier on, we feared that mnesia may not handle as many records as possible due to table size limitation. But we found this statement wrong. With good tuning (fragmentation), our mnesia databases do hold an average of about 250 million records per year.

We have benefited from Erlang's complex data structure and the fact that Mnesia can swallow it up unchanged. The Erlang /OTP applications are most efficient of all other apps in legacy languages and with our system we are planning on migrating it all to Erlang/OTP technology. From Erlang we seemlessly access data from MySQL Cluster and execute queries onto its servers very wonderfully, In fact, we have deduced that its Erlang/OTP that can fully use the MySQL server resources because of its (Erlang) massive concurrency.

Mnesia has worked for us very well.Mnesia has completely changed the way we look at databases because of its thrilling performance. Our Solaris server CPU Cores are kept busy at an average of about 48% usage at peak hours.

I advise you to check out mnesia and who knows, it may answer a number of your distribution or replication needs.