数据反规范化如何与微服务模式一起工作?

我刚读了一篇关于 微服务和 PaaS 体系结构的文章。在那篇文章中,大约三分之一的路程,作者说(在 像疯子一样去规范化下) :

重构数据库模式,并对所有内容进行反规范化,以允许完全分离和分区数据。也就是说,不要使用服务于多个微服务的底层表。不应该共享跨多个微服务的底层表,也不应该共享数据。相反,如果多个服务需要访问相同的数据,则应该通过服务 API (例如已发布的 REST 或消息服务接口)共享这些数据。

虽然这个 声音在理论上很好,但在实践中它有一些严重的障碍需要克服。其中最大的问题是,数据库通常是紧密耦合的,每个表与至少一个其他表都有 一些外键关系。因此,不可能将数据库划分为由 N微服务控制的 N子数据库。

所以我问: 给定一个完全由相关表组成的数据库,如何将其反规范化为更小的片段(表组) ,以便片段可以由单独的微服务控制?

例如,给定以下(相当小,但范例)数据库:

[users] table
=============
user_id
user_first_name
user_last_name
user_email


[products] table
================
product_id
product_name
product_description
product_unit_price


[orders] table
==============
order_id
order_datetime
user_id


[products_x_orders] table (for line items in the order)
=======================================================
products_x_orders_id
product_id
order_id
quantity_ordered

不要花太多时间批评我的设计,我这样做的飞行。关键是,对我来说,将这个数据库分成3个微服务是合乎逻辑的:

  1. UserService-用于系统中的 CRUDding 用户; 应该最终管理 [users]表; 以及
  2. ProductService-用于系统中的 CRUDding 产品; 应该最终管理 [products]表; 以及
  3. OrderService-用于系统中的 CRUDding 订单; 应该最终管理 [orders][products_x_orders]

但是,所有这些表彼此之间都有外键关系。如果我们去规范化它们,把它们当作整体,它们就失去了所有的语义意义:

[users] table
=============
user_id
user_first_name
user_last_name
user_email


[products] table
================
product_id
product_name
product_description
product_unit_price


[orders] table
==============
order_id
order_datetime


[products_x_orders] table (for line items in the order)
=======================================================
products_x_orders_id
quantity_ordered

现在没办法知道谁订了什么,订了多少,什么时候订的。

那么,这篇文章是典型的学术喧嚣,还是这种非规范化方法在现实世界中的实用性,如果是这样,它看起来像什么(在答案中使用我的例子的额外加分) ?

11572 次浏览

This is subjective but the following solution worked for me, my team, and our DB team.

  • At the application layer, Microservices are decomposed to semantic function.
    • e.g. a Contact service might CRUD contacts (metadata about contacts: names, phone numbers, contact info, etc.)
    • e.g. a User service might CRUD users with login credentials, authorization roles, etc.
    • e.g. a Payment service might CRUD payments and work under the hood with a 3rd party PCI compliant service like Stripe, etc.
  • At the DB layer, the tables can be organized however the devs/DBs/devops people want the tables organized

The problem is with cascading and service boundaries: Payments might need a User to know who is making a payment. Instead of modeling your services like this:

interface PaymentService {
PaymentInfo makePayment(User user, Payment payment);
}

Model it like so:

interface PaymentService {
PaymentInfo makePayment(Long userId, Payment payment);
}

This way, entities that belong to other microservices only are referenced inside a particular service by ID, not by object reference. This allows DB tables to have foreign keys all over the place, but at the app layer "foreign" entities (that is, entities living in other services) are available via ID. This stops object cascading from growing out of control and cleanly delineates service boundaries.

The problem it does incur is that it requires more network calls. For instance, if I gave each Payment entity a User reference, I could get the user for a particular payment with a single call:

User user = paymentService.getUserForPayment(payment);

But using what I'm suggesting here, you'll need two calls:

Long userId = paymentService.getPayment(payment).getUserId();
User user = userService.getUserById(userId);

This may be a deal breaker. But if you're smart and implement caching, and implement well engineered microservices that respond in 50 - 100 ms each call, I have no doubt that these extra network calls can be crafted to not incur latency to the application.

I would see each microservice as an Object, and as like any ORM , you use those objects to pull the data and then create joins within your code and query collections, Microservices should be handled in a similar manner. The difference only here will be each Microservice shall represent one Object at a time than a complete Object Tree. An API layer should consume these services and model the data in a way it has to be presented or stored.

Making several calls back to services for each transaction will not have an impact as each service runs in a separate container and all these calles can be executed parallely.

@ccit-spence, I liked the approach of intersection services, but how it can be designed and consumed by other services? I believe it will create a kind of dependency for other services.

Any comments please?

I realise this is possibly not a good answer but what the heck. Your question was:

Given a database that consists entirely of related tables, how does one denormalize this into smaller fragments (groups of tables)

WRT the database design I'd say "you can't without removing foreign keys".

That is, people pushing Microservices with the strict no shared DB rule are asking database designers to give up foreign keys (and they are doing that implicitly or explicitly). When they don't explicitly state the loss of FK's it makes you wonder if they actually know and recognise the value of foreign keys (because it is frequently not mentioned at all).

I have seen big systems broken into groups of tables. In these cases there can be either A) no FK's allowed between the groups or B) one special group that holds "core" tables that can be referenced by FK's to tables in other groups.

... but in these systems "groups of tables" is often 50+ tables so not small enough for strict compliance with microservices.

To me the other related issue to consider with the Microservice approach to splitting the DB is the impact this has reporting, the question of how all the data is brought together for reporting and/or loading into a data warehouse.

Somewhat related is also the tendency to ignore built in DB replication features in favor of messaging (and how DB based replication of the core tables / DDD shared kernel) impacts the design.

EDIT: (the cost of JOIN via REST calls)

When we split up the DB as suggested by microservices and remove FK's we not only lose the enforced declarative business rule (of the FK) but we also lose the ability for the DB to perform the join(s) across those boundaries.

In OLTP FK values are generally not "UX Friendly" and we often want to join on them.

In the example if we fetch the last 100 orders we probably don't want to show the customer id values in the UX. Instead we need to make a second call to customer to get their name. However, if we also wanted the order lines we also need to make another call to the products service to show product name, sku etc rather than product id.

In general we can find that when we break up the DB design in this way we need to do a lot of "JOIN via REST" calls. So what is the relative cost of doing this?

Actual Story: Example costs for 'JOIN via REST' vs DB Joins

There are 4 microservices and they involve a lot of "JOIN via REST". A benchmark load for these 4 services comes to ~15 minutes. Those 4 microservices converted into 1 service with 4 modules against a shared DB (that allows joins) executes the same load in ~20 seconds.

This unfortunately is not a direct apples to apples comparison for DB joins vs "JOIN via REST" as in this case we also changed from a NoSQL DB to Postgres.

Is it a surprise that "JOIN via REST" performs relatively poorly when compared to a DB that has a cost based optimiser etc.

To some extent when we break up the DB like this we are also walking away from the 'cost based optimiser' and all that in does with query execution planning for us in favor of writing our own join logic (we are somewhat writing our own relatively unsophisticated query execution plan).

It is indeed one of key problems in microservices which is quite conviniently omitted in most of articles. Fortunatelly there are solutions for this. As a basis for discussion let's have tables which you have provided in the question. enter image description here Image above shows how tables will look like in monolith. Just few tables with joins.


To refactor this to microservices we can use few strategies:

Api Join

In this strategy foreign keys between microservices are broken and microservice exposes an endpoint which mimics this key. For example: Product microservice will expose findProductById endpoint. Order microservice can use this endpoint instead of join.

enter image description here It has an obvious downside. It is slower.

Read only views

In the second solution you can create copy of the table in the second database. Copy is read only. Each microservice can use mutable operations on its read/write tables. When it comes to read only tables which are copied from other databases they can (obviously) use only reads enter image description here

High performance read

It is possible to achieve high performance read by introducing solutions such as redis/memcached on top of read only view solution. Both sides of join should be copied to flat structure optimized for reading. You can introduce completely new stateless microservice which can be used for reading from this storage. While it seems like a lot of hassle it is worth to note that it will have higher performance than monolithic solution on top of relational database.


There are few possible solutions. Ones which are simplest in implementation have lowest performance. High performance solutions will take few weeks to implement.