多少抽象才算过分?

在面向对象程序中: 多少抽象是太多? 多少才是恰到好处?

我一直都是那种有条有理的人。我理解高级封装和抽象背后的概念,但总是本能地认为添加太多内容只会混淆程序。

我总是试图争取一定量的抽象,不留下任何空的类或层。有疑问的地方,我不会在层次结构中添加一个新的层,而是尝试在现有的层中添加一些内容。

然而,最近我遇到了更多高度抽象的系统。在系统中,层次结构中稍后可能需要表示的所有内容都会提前获得一个表示。这导致了大量的空白层,这在一开始看起来像是糟糕的设计。然而,转念一想,我已经意识到,留下这些空白的图层,将来不需要进行太多的重构,就可以有更多的地方可以连接。它使您能够在旧版本的基础上添加新功能,而不需要做太多的工作来调整旧版本。

这样做的两个风险似乎是你可能会得到你需要的层次错误。在这种情况下,最终仍然需要进行大量的重构来扩展代码,并且仍然有大量从未使用过的层。但是,这取决于你花费了多少时间来完成最初的抽象,搞砸它的可能性,以及如果你做对了以后可以节省的时间——这可能仍然值得一试。

我能想到的另一个风险是过度使用,而且永远不需要所有额外的图层。但这真的有那么糟吗?额外的类层真的如此昂贵,以至于如果它们从未被使用,那么它们就是一个巨大的损失吗?这里最大的费用和损失将是时间,这是失去了前来与层。但是,当可以使用抽象代码而不是更低级的代码时,这些时间中的大部分仍然可以节省。

那什么时候才算过分呢?在什么情况下,空层和额外的“可能需要”抽象会变得过度?多少才算太少?最佳位置在哪里?

在你的职业生涯中,有没有什么可靠的经验法则可以帮助你判断所需要的抽象程度?

22607 次浏览

The reality is that it depends on how well you can look into the future. You want to plan for changes you can foresee without creating too many extra layers. If you have a design that transfers data between systems, go ahead and create an interface and use the planned implementation as the default. For example, you use FTP to move files around but know the standard will be message-based (or whatever) next year.

As for layers within the design, sometimes the added layers make it easier to write smaller classes. It's ok to add conceptual layers if it means the concrete classes become straight forward.

Simply put, there is too much abstraction if the code is difficult to understand.

Now this isn't to say that you should hard code everything, because that's the easiest code to write and read.

The easiest test is to either put it down for a few days, pick it back up and ask yourself, does this make any sense. A better approach is to give it to someone else, and see if they can make heads or tails of it.

See item (6a) of RFC 1925 and know that it is indeed true. The only problems you can't fix by adding abstraction layers are those caused by having too many abstraction layers. (In particular, every piece of abstraction makes the whole thing harder to understand.)

So when is it too much? At what point do the empty layers and extra "might need" abstractions become overkill? How little is too little? Where's the sweet spot?

I don't think there is a definitive answer to these questions. Experience is needed to develop a feeling of what is "too much" and "too little". Maybe the usage of some metric or quality control tools can help, but it's hard to generalize. It mostly depends on each case.

Here are a few links that might inspire you in the quest of answers:

Development is all about finding the right balance between the various tensions that are present in any software engineering effort.

In theory, it should be a matter of simple math using only three (fairly simple) variables:

  • S = savings from use
  • C = cost of the extra abstractions
  • P = probability of use

If S * P > C , then the code is good. If S * P < C, then it's bad.

The reason that's purely theoretical, however, is that you generally can't guess at the probability of use or the savings you'll get from using it. Worse, you can't guess or or usually even measure the cost of its presence.

At least some people have drawn a conclusion from this. In TDD, the standard mantra is "you ain't gonna need it" (YAGNI). Simply put, anything that doesn't directly contribute toward the code meeting its current requirements is considered a bad thing. In essence, they've concluded that the probability of use is so low, that including such extra code is never justified.

Some of this comes back to "bottom up" versus "top down" development. I tend to think of bottom up development as "library development" -- I.e. instead of developing a specific application, you're really developing libraries for the kinds of things you'll need in the application. The thinking is that with a good enough library, you can develop almost any application of that general type relatively easily.

Quite a bit also depends on the size of the project. Huge projects that stay in use for decades justify a lot more long-term investment than smaller projects that are discarded and replaced much more quickly. This has obvious analogs in real life as well. You don't worry nearly as much about the fit, finish, or workmanship in a disposable razor you'll throw away in less than a week as you do in something like a new car that you'll be using for the next few years.

Every abstraction that is not actually used is too much. The simpler a system, the easier it is to change. Abstraction layers nearly always make systems more complicated.

OTOH, it's certainly possible to program a complex, unmaintainable mess without any kind of abstraction, and sometimes, "standardized" abstraction layers can help structure a system better than most people would be able to do on their own.

How little is too little?

When you keep working with "low level" elements on a routine basis and you constantly feel like you don';t want to be doing this. Abstract 'em away.

So when is it too much?

When you can't make bits and pieces of some code parts on a regular basis and have to debug them down to the previous layer. You feel this particular layer does not contribute anything, just an obstacle. Drop it.

Where's the sweet spot?

I like to apply the pragmatic approach. If you see a need for an abstraction and understand how it will improve your life, go for it. If you've heard there should be "officially" an extra layer of abstraction but you're not clear why, don't do it but research first. If somebody insists on abstracting something but cannot clearly explain what if will bring, tell them to go away.

The point of abstractions is to factor out common properties from the specific ones, like in the mathematical operation:

ab + ac => a(b + c)

Now you do the same thing with two operations instead of three. This factoring made our expression simpler.

A typical example of an abstraction is the file system. For example, you want your program to be able to write to many kinds of storage devices: pen drives, SD cards, hard drives, etc...

If we didn't have a file system, we would need to implement the direct disk writing logic, the pen drive writing logic and the SD card writing logic. But all of these logics have something in common: they create files and directories, so this common things can be abstracted away, by creating an abstraction layer, and providing an interface to the hardware vendor to do the specific stuff.

The more the things share a common property. The more beneficial an abstraction can be:

ab + ac + ad + ae + af

to:

a(b + c + d + e + f)

This would reduce the 9 operations to 5.

Basically each good abstraction roughly halves the complexity of a system.

You always need at least two things sharing a common property to make an abstraction useful. Of course you tear a single thing apart so it looks like an abstraction, but it does not mean it's useful:

10 => 5 * 2

You cannot define the word "common" if you have only one entity.

So to answer your question. You have enough abstractions if they making your system as simple as possible.

(In my examples, addition connects the parts of the system, while multiplication defines an abstract-concrete relationship.)

TLDR

There is no absolute right level of abstraction. A good code-level abstraction can only be discussed in the context of the programmers working on the code and reflects the programmers' actual mental model.


Abstraction has no intrinsic value whatsoever. The CPUs couldn't care less about what abstraction you are using in your code. On fact from the computation perspective, abstraction is detrimental. That's why higher-level programming language runs slower than lower-level language.

The reason for doing it is because we humans have a finite cognitive ability with a very small amount of working memory. By abstraction, we break up problems into smaller subproblems that our tiny brain can actually process. Therefore, the value of abstraction must be evaluated in the context of the programmers who write, maintain, and use the code.

How much abstraction is too much depends on the relative complexity of the problem to the programmers? No sane person would think it is a good idea to factor x = x + 1 into a plus_one function. Why? because it is already a trivial problem everyone can solve without effort. In contrast, A find the greatest common divider function would be considered a good thing to have because it is a complex enough problem that most people will have to put in quite a bit of cognitive effort to solve.

Abstraction is often considered synonymous with code refactoring and avoids code duplication. But they are actually different things. code duplication is often a sign that you can probably make the problem easier to think about by adding some extra abstraction, but it is not necessarily always true. This is what most people who "over-engineer" and "over-abstract" get wrong. Abstraction is the mental model of how you break down the problem into sub-program and a good code-level abstraction should reflect that same mental model. Abstraction should make your code easier to think about not harder.

Other answers to this question also mentioned easier to adapt to future changes as a reason for abstraction. I think that is another misconception. If abstraction is only needed to make code easier to adapt to future changes, then logically it must also be true that if I know there will be no changes to the requirement, then there is no need for any abstraction at all which is clearly false. What's really happening here is by foreseeing any potential change we will have, we are altering the problem to solve it in the first place hence a different way to abstract as a solution to that different problem.

It is also worth pointing out that people can and usually do have different mental models. An abstraction that makes perfect sense might cause a serious headache to someone else. So if you're working in a team environment it is important to get a consensus on at least the high-level abstraction.