为什么使用状态集? 无状态 Pod 不能使用持久卷吗?

我正在努力理解 状态集。它们的使用与具有持久卷的“无状态”豆荚的使用有何不同?也就是说,假设一个“正常”的 Pod 可能声称拥有持久存储,那么我缺少了什么明显的东西需要这个新结构(有序启动/停止等) ?

18487 次浏览

Yes, a regular pod can use a persistent volume. However, sometimes you have multiple pods that logically form a "group". Examples of this would be database replicas, ZooKeeper hosts, Kafka nodes, etc. In all of these cases there's a bunch of servers and they work together and talk to each other. What's special about them is that each individual in the group has an identity. For example, for a database cluster one is the master and two are followers and each of the followers communicates with the master letting it know what it has and has not synced. So the followers know that "db-x-0" is the master and the master knows that "db-x-2" is a follower and has all the data up to a certain point but still needs data beyond that.

In such situations you need a few things you can't easily get from a regular pod:

  1. A predictable name: you want to start your pods telling them where to find each other so they can form a cluster, elect a leader, etc. but you need to know their names in advance to do that. Normal pod names are random so you can't know them in advance.
  2. A stable address/DNS name: you want whatever names were available in step (1) to stay the same. If a normal pod restarts (you redeploy, the host where it was running dies, etc.) on another host it'll get a new name and a new IP address.
  3. A persistent link between an individual in the group and their persistent volume: if the host where one of your database master was running dies it'll get moved to a new host but should connect to the same persistent volume as there's one and only 1 volume that contains the right data for that "individual". So, for example, if you redeploy your group of 3 database hosts you want the same individual (by DNS name and IP address) to get the same persistent volume so the master is still the master and still has the same data, replica1 gets it's data, etc.

StatefulSets solve these issues because they provide (quoting from https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/):

  1. Stable, unique network identifiers.
  2. Stable, persistent storage.
  3. Ordered, graceful deployment and scaling.
  4. Ordered, graceful deletion and termination.

I didn't really talk about (3) and (4) but that can also help with clusters as you can tell the first one to deploy to become the master and the next one find the first and treat it as master, etc.

As some have noted, you can indeed can some of the same benefits by using regular pods and services, but its much more work. For example, if you wanted 3 database instances you could manually create 3 deployments and 3 services. Note that you must manually create 3 deployments as you can't have a service point to a single pod in a deployment. Then, to scale up you'd manually create another deployment and another service. This does work and was somewhat common practice before PetSet/PersistentSet came along. Note that it is missing some of the benefits listed above (persistent volume mapping & fixed start order for example).

1: Why StatefulSets?

Stateless app: Usually, frontend components have completely different scaling requirements than the backends, so we tend to scale them individually. Not to mention the fact that backends such as databases are usually much harder to scale compared to (stateless) frontend web servers. Yes, the term “stateless” means that no past data nor state is stored or needs to be persistent when a new container is created

Stateful app: Stateful applications typically involve some database, such as Cassandra, MongoDB, or MySQL and processes a read and/or write to it.

2: Can't a stateless Pod use persistent volumes?

Basically, there are few ways by which you can do it. However, it has its own disadvantages.

1: USING ONE REPLICASET PER POD INSTANCE

  • you could create multiple ReplicaSets—one for each pod with each ReplicaSet’s desired replica count set to one, and each ReplicaSet’s pod template referencing a dedicated PersistentVolumeClaim.

enter image description here

  • Although this takes care of the automatic rescheduling in case of node failures or accidental pod deletions, it’s much more cumbersome compared to having a single ReplicaSet.

  • For example, think about how you’d scale the pods in that case. You couldn’t change the desired replica count you’d have to create additional ReplicaSets instead. Using multiple ReplicaSets is therefore not the best solution.

2: USING MULTIPLE DIRECTORIES IN THE SAME VOLUME

  • A trick you can use is to have all pods use the same PersistentVolume, but then have a separate file directory inside that volume for each pod Because you can’t configure pod replicas differently from a single pod template, you can’t tell each instance what directory it should use, but you can make each instance automatically select (and possibly also create) a data directory that isn’t being used by any other instance at that time.

enter image description here

  • This solution does require coordination between the instances, and isn’t easy to do correctly. It also makes the shared storage volume the bottleneck.

That's why one should encourage to use statefulsets