何时以及如何使用休眠二级缓存?

我很难理解何时休眠命中第二级缓存,以及何时使缓存失效。

以下是我目前的理解:

  • 第二级缓存存储会话之间的实体,范围是 SessionFactory
  • 您必须告诉缓存哪些实体,默认情况下不会缓存任何实体
  • 查询缓存将查询结果存储在缓存中。

我不明白的是

  • 休眠什么时候到达这个缓存?
  • 假设我已经设置了第二级缓存,但没有设置查询缓存。我想缓存我的客户,有50000人。我可以通过什么方式从缓存中检索客户?
  • 我想我可以从 cache 中通过 id 获得它们。这很容易,但也不值得高速缓存。但是如果我想对所有的客户进行一些计算。假设我想显示一个客户列表,那么我如何访问他们呢?
  • 如果禁用查询缓存,我将如何获得所有客户?
  • 如果有人更新了其中一个客户会发生什么?
    • 那个客户会在缓存中失效,还是所有客户都会失效?

还是我想的缓存完全错了?在这种情况下,使用二级缓存更合适的方法是什么?休眠文档根本不清楚缓存在现实中是如何工作的。只有如何安装的说明书。

更新: 因此,我开始明白,二级缓存(没有查询缓存)可以很好地通过 id 加载数据。例如,我有一个用户对象,我想检查在 Web 应用程序的每个请求的权限。通过在第二级缓存中缓存用户来减少数据库访问是否是一个好的情况?就像我将用户 ID 存储在会话或任何地方,当我需要检查权限时,我将加载用户的 ID 和检查权限。

58598 次浏览
  • the 2nd level cache is a key-value store. It only works if you get your entities by id
  • the 2nd level cache is invalidated / updated per entity when an entity is updated/deleted via hibernate. It is not invalidated if the database is updated in a different way.
  • for queries (e.g. list of customers) use the query cache.

In reality it is useful to have a key-value distributed cache - that's what memcached is, and it powers facebook, twitter and many more. But if you don't have lookups by id, then it won't be very useful.

First of all, let's talk about process level cache (or 2nd level cache as they call it in Hibernate). To make it work, you should

  1. configure cache provider
  2. tell hibernate what entities to cache (right in hbm.xml file if you use this kind of mapping).

You tell to the cache provider how many objects it should store and when/why they should be invalidated. So let's say you have a Book and an Author entities, each time you're getting them from the DB, only those that are not in cache will be selected from actually DB. This increases performance significantly. It's useful when:

  • You write to the database only via Hibernate (because it needs a way to know when to change or invalidate entities in the cache)
  • You read objects often
  • You have a single node, and you don't have replication. Otherwise you'll need to replicate the cache itself (use distributed caches like JGroups) which adds more complexity, and it doesn't scale as good as share-nothing apps.

So when does cache work?

  • When you session.get() or session.load() the object that was previously selected and resides in cache. Cache is a storage where ID is the key and the properties are the values. So only when there is a possibility to search by ID you could eliminate hitting the DB.
  • When your associations are lazy-loaded (or eager-loaded with selects instead of joins)

But it doesn't work when:

  • If you don't select by ID. Again - 2nd level cache stores a map of entities' IDs to other properties (it doesn't actually store objects, but the data itself), so if your lookup looks like this: from Authors where name = :name, then you don't hit cache.
  • When you use HQL (even if you use where id = ?).
  • If in your mapping you set fetch="join", this means that to load associations joins will be used everywhere instead of separate select statements. Process level cache works on children objects only if fetch="select" is used.
  • Even if you have fetch="select" but then in HQL you use joins to select associations - those joins will be issued right away and they will overwrite whatever you specified in hbm.xml or annotations.

Now, about Query Cache. You should note that it's not a separate cache, it's an addition to the process level cache. Let's say you have a Country entity. It's static, so you know that each time there will be the same result set when you say from Country. This is a perfect candidate for query cache, it will store a list of IDs in itself and when you next time select all countries, it will return this list to the process level cache and the latter, in turn, will return objects for each ID as these objects are stored already in the 2nd level cache. Query cache is invalidated each time anything related to the entity changes. So let's say you configured from Authors to be placed into a Query Cache. It won't be effective as Author changes often. So you should use Query Cache only for more or less static data.

Late to the party but wanted to systematically answer these question which many developers ask.

Taking your question one by one here is my answer.

Q. When does hibernate hit this cache?

A. First Level cache is associated with the Session object. The Second Level Cache is associated with the Session Factory object. If object is not found in the first, then the second level is checked.

Q. Let's say I've set up the second level cache but not the query caching. I want to cache my customers, there's 50000 of them. In what ways can I retrieve the customers from the cache?

A. You got that answered in your update. Also the query cache stores just the list of IDs of the object and those Objects w.r.t their IDs are stored in the same second level cache. So if you enable query cache, you'll utilize the same resource. Neat right ?

Q. I assume I can get them by id from cache. That would be easy but also not worthy of caching. But what if I want to do some calculation with all my customers. Let's say I want to show a list of the customers then how would I access them?

A. Answered above.

Q. How would I get all my customers if query caching is disabled?

A. Answered above.

Q. What would happen if someone updated one of the customers? Would that customer get invalidated in the cache or would all customers get invalidated?

A. Hibernate has no idea but you could use other third party IMDG / distributed caches to be implemented as hibernate second level cache and get them invalidated. e.g. TayzGrid is one such product and there are more i guess.

The Hibernate second-level cache is a little tricky to understand and implement. Here’s what we can say based on your questions:

When does Hibernate hit this cache?

As you suggest, the Hibernate L2 cache (if enabled; it’s not turned on by default) is queried only after the L1 cache. This is a key-value cache whose data is preserved across multiple sessions.

Let's say I've set up the second level cache but not the query caching. I want to cache my customers, there's 50000 of them. In what ways can I retrieve the customers from the cache?

Query caching would be best for this use case, since the customer data is static and retrieved from a relational database.

What would happen if someone updated one of the customers? Would that customer get invalidated in the cache or would all customers get invalidated?

It depends on the specific Hibernate cache strategy you’re using. Hibernate actually has four different cache strategies:

READ_ONLY: Objects don’t change once inside the cache.

NONSTRICT_READ_WRITE: Objects change (eventually) after the corresponding database entry is updated; this guarantees eventual consistency.

READ_WRITE: Objects change (immediately) after the corresponding database entry is updated; this guarantees strong consistency by using "soft" locks.

TRANSACTIONAL: Objects change using distributed XA transactions, ensuring data integrity; this guarantees either total success or rolling back all changes. In all four of these cases, though, updating a single database entry would not invalidate the entire list of customers in the cache. Hibernate is a little smarter than that :)

To learn more about how L2 caching works in Hibernate, you can check out the article “What is the Hibernate L2 cache,” or the in-depth article Caching in Hibernate with Redis