配置文件何时成为编程语言?

我已经仔细考虑了配置文件以及它们与代码的关系有一段时间了,我的观点似乎随着时间和风向的变化而改变。尽管我越来越多地回想起我在学习 Lisp 时第一次意识到的问题: 数据和代码之间几乎没有什么区别。对于配置文件来说,似乎更是如此。从正确的角度来看,Perl 脚本只不过是 Perl 的一个配置文件。这对于诸如 QA 之类的任务和分工(比如谁应该负责更改配置文件)来说,往往会产生相当严重的后果。

从配置文件到成熟语言的缓慢发展通常是缓慢的,而且似乎是由希望拥有一个通用系统的愿望驱动的。大多数项目开始时似乎很小,只有一些配置项,比如在哪里写日志、在哪里查找数据、用户名和密码等。但随后它们开始增长: 功能开始能够打开或关闭,时间和操作顺序开始受到控制,并且,不可避免地,有人想要开始添加逻辑(例如,使用10如果机器是 X 和15如果机器是 Y)。在某一点上,配置文件变成了一种特定于领域的语言,而且是一种写得很糟糕的语言。

现在,我已经漫无边际地走上舞台,这里是我的问题:

  1. 配置的真正用途是什么 文件?
  2. 如果试图保持 配置文件简单吗?
  3. 应该由谁来负责 对它们(开发人员、用户、, 管理员等) ?
  4. 如果他们被源头控制 (见问题3) ?

正如我之前所说,我对这些问题的答案不断变化,但现在我在想:

  1. 允许非程序员改变 很快就会有大量的行为
  2. 是的,任何不粗糙的东西 粒度应该在代码中
  3. 使用者应负责 配置文件和程序员应该 负责配置 在配置文件和代码之间 能够提供更细粒度的控制 申请书
  4. 没有,但是细粒度的中间层应该是
17797 次浏览

Here's my thoughts:

  1. To allow the runtime behavior of an application to be modified easily. This can be by programmers or non programmers, depending on the needs. This can be during development, but I often view configuration files as a way to help make a program more flexible at any point.

  2. Yes. I think config files should be as simple as possible, given the constraint that you may need various options to control different behaviors of your runtime. I prefer grouping configuration settings, and simplifying them as much as possible.

  3. Depends on what and why the change is being made. If users are going to be changing it, a front-end should be made to hide them from the details. The same is often true of non-developers in general.

  4. I often source control the "default" configuration, but have a way to override this per system for the actual runtime.

As for adding logic to the config file - I'd avoid this. I think it's better to just have the configuration file switch on the logic in your application. Behavior in config files leads to a lack of maintainability and understanding, in my experience. I strongly prefer keeping configuration files as simple as possible.

Very interesting questions!

I tend to limit my config files to a very simple "key=value" format, because I fully agree with you that config files can very quickly become full-blown programs. For example, anyone who has ever tried to "configure" OpenSER knows the feeling you are talking about: it's not configuration, it's (painful) programming.

When you need your application to be very "configurable" in ways that you cannot imagine today, then what you really need is a plugins system. You need to develop your application in a way that someone else can code a new plugin and hook it into your application in the future.

So, to answer your questions:

  1. What is the true purpose of a config file?

    I would say, to allow the people who will install your application to be able to tweek some deployment-related parameters, such as host name, number of threads, names of the plugins you need, and the deployment-parameters for those plugins (check out FreeRadius's configuration for an example of this principle), etc.. Definitely not the place to express business logic.

  2. Should an attempt be made to keep config files simple?

    Definitely. As you suggested, "programming" in a config file is horrible. I believe it should be avoided.

  3. Who should be responsible for making changes to them (developers, users, admins, etc.)?

    In general, I would say admins, who deploy the application.

  4. Should they be source controlled (see question 3)?

    I usually don't source-control the configuration files themselves, but I do source-control a template configuration file, with all the parameters and their default values, and comments describing what they do. For example, if a configuration file is named database.conf, I usually source-control a file named database.conf.template. Now of course I am talking about what I do as a developer. As an admin, I may want to source-control the actual settings that I chose for each installation. For example, we manage a few hundred servers remotely, and we need to keep track of their configurations: we chose to do this with source-control.


Edit: Although I believe the above to be true for most applications, there are always exceptions, of course. Your application may allow its users to dynamically configure complex rules, for example. Most email clients allow the users to define rules for the management of their emails (for example, "all emails coming from 'john doe' and not having me in the To: field should be discarded"). Another example is an application that allows the user to define a new complex commercial offer. You may also think about applications like Cognos which allow their users to build complex database reports. The email client will probably offer the user a simple interface to define the rules, and this will generate a complex configuration file (or even perhaps a bit of code). On the other hand, the user-defined configuration for the commercial offers might be saved in a database, in a structured way (neither a simple key=value structure nor a portion of code). And some other applications might even allow the user to code in python or VB, or some other automation-capable language. In other words... your mileage may vary.

Every (sufficiently-long-lived) config file schema eventually becomes a programming language. Due to all the implications you describe, it is wise for the config-file designer to realize she is authoring a programming language and plan accordingly, lest she burden future users with bad legacy.

Yes, config files should be simple. They should contain no 'logic' themselves - think of them as a list of expressions in if statements, not the conditional statements in their entirety.

They're there to allow the user to decide which of the options coded within the application should be used, so don't try to make them complicated, it'll end up being self-defeating - you may end up writing simple config files to control how the original config file should be configured otherwise!

One of the purposes of the "Oslo" work at Microsoft is to permit (though not require) resolution of this issue.

  1. An application would ship with models of any new components it includes. It would also use existing models. For instance, it might include a web service, so it could reuse the system model of a web service.
  2. The models will include metadata describing them, including enough information for tools to access them, either textually or graphically.
  3. Parts of the models will correspond to "configuration"

This means that the equivalent of todays configuration files may be rich enough to support both textual and graphical editing of their configuration. The graphical tool will be supplied with "Oslo" (code name "Quadrant").

It depends on what you agree with other developers on the team. Are you using config files just as config files or you are creating a Model Driven application.

Symptoms of config file becoming a programming language:

  • name=value pairs start to depend on each other
  • you feel a need to have flow control (ex. if (this) than that)
  • documentation for config file becomes essential in order to do further development (instead of just using the application)
  • before value from config is read it requires to have some context (i.e. values depend on something external to config file itself)

Ok. You will have some users which want a really simple config, you should give it to them. At the same time, you will have constant requests of "Can you add this? How do I do in the config file?", I don't see why you can't support both groups.

The project I am currently working on uses Lua for its configuration file. Lua is a scripting language, and it works quite well in this scenario. There is available an example of our default configuration.

You'll note that it is mainly key=value statements, where value can be any of Lua's built-in types. The most complicated thing there are lists, and they aren't really complicated (it's just a matter of syntax).

Now I'm just waiting for someone to ask how to set their server's port to a random value every time they start it up...

Config files invariably inch their way to becoming ugly, illogical "full fledged programming languages". It takes art and skill to design good programming languages, and config languages turned programming language tend to be horrendous.

A good approach is to use a nicely designed language, say python or ruby, and use it to create a DSL for your configuration. That way your configuration language can remain simple on the surface but in actually be the full fledged programming language.

I'll be the contrarian and submit it's only a language when it embodies more than can be represented by XML; or else when XML is considered to be a language.

Alternatively, most config files can be thought of as classes, but with only properties and no methods. And without methods, I don't think it's a language.

Ultimately, "language" is a squishy abstraction, but yes, the edges are ambiguous.

The code of our applications becomes less important... There is scripting, there are all kind of attributes that define the behaviour of classes, methods, method arguments and properties. Users can define database triggers and database constraints. There can be very complicated config files. Sometimes the user can define XSLT stylsheets to manipulate input and output because our systems need to be open (SOA). And there is stuff like BizzTalk that needs complex configuration too. Users can define complex workflows.

We have to write better code to deal with this complex environment, so the code of our applications becomes more important...

I believe your question is very relevant given the move to "fluent interfaces". Many developers have "seen the light" with respect to XML configured applications. Using XML can be very verbose and difficult to edit correctly (especially if no schema is provided). Having a fluent interface allows the developer to configure the application in a domain-specific language with the assistance of some key-value pairs from a plain text configuration file (or perhaps command-line parameters). It also makes it very easy to setup and configure new instances of the application for testing or whatever.

Here are my answers to your question:

  • What is the true purpose of a config file?

A config file is a way to allow the user to customize the behavior of their program at run-time.

  • Should an attempt be made to keep config files simple?

Ideally, I would think that config files should at least be supplemented by a fluent interface to configure the program (this is useful in many respects). If you do require a config file then it should be kept very simple, nothing other than key-value pairs.

  • Who should be responsible for making changes to them (developers, users, admins, etc.)?

I think the answer to this depends on your organization. It should be the responsibility of the person deploying the software to ensure that it is properly configured.

  • Should they be source controlled (see question 3)?

I will steal this answer from someone else :) I like the idea of storing a template configuration in source control and modifying it for each local user's needs. Chances are one developer's config file is another developer's nightmare so it is best to leave things that vary by user out of source control. Having a template is also a nice way to let the person deploying the application (or other developers) see exactly what values are valid for the config file.

I have a different philosophy about config files. Data about how an application should be run is still data, and therefore belongs in a data store, not in code (a config file IMO is code). If end users need to be able to change the data, then the application should provide an interface to do so.

I only use config files to point at data stores.

Recently I was working upon a project and I realised that I wanted to have conditionals inside my configuration file - which had previously just been a pretty simple one of the form:


key = val
key2 = val
name = `hostname`

I didn't want to write a mini-language, because unless I did it very carefully I couldn't allow the flexibility that would be useful.

Instead I decided that I'd have two forms:

  1. If the file started with "#!" and was executable I'd parse the result of running it.

  2. Otherwise I'd read it as-is

This means that I can now allow people to write "configuration files" that look like this:

#!/usr/bin/perl
if ( -x /bin/foo )
{
print <<EOF;
foo=me
bar=you
EOF
}
else
{
print <<EOF;
foo=bar
bar=foo
EOF
}

This way I get the power of a dynamic configuration file if the user wants to use it, and the simplicity of not having to write my own mini-language.

I tend to agree with the premise of this question. I avoid getting myself into trouble by predicting early that this is going to happen, and therefore never roll my own config system.

  • Either I use the operating systems' config facuility (such as a plist, or gconf or whatever is appropriate),
  • Or a simple flat file, as can be handled by something like an off the shelf INI parser.
  • Bite the bullet and plug a light weight language parser, usually lua, sometimes tcl into the application,
  • Or store data in a SQLite or similar relational database.

And resign myself to live with whatever decision I made, or if i cant, refactor to use one of the above choices that better suits the application.

Point is, there's not really any reason to use a home-grown config solution. For one thing, it's harder on your users to have to learn a new, application specific config format. For another, You benefit from all the many bug-fixes and updates that come free when using an off-the-shelf solution. Finally, Feature creep is put to rest, because, well, you actually can't just add one more feature without really doing a major overhaul cause the config system isn't really in your hands in the first place.

I have seen python programs where the config file is code. If you don't need to do anything special (conditionals, etc.) it doesn't look much different from other config styles. e.g. I could make a file config.py with stuff like:

num_threads = 13
hostname = 'myhost'

and the only burden on the user, compared with (say) INI files, is that they need to put '' around strings. No doubt you could do the same thing in other interpreted languages. It gives you unlimited ability to complicate your config file if necessary, at the risk of possibly scaring your users.

You could turn to theory of computation to define what counts as a programming language. If your configuration file format is Turing Complete then it reasonably counts as a programming language. By this definition, a file format to describe levels of Sokoban counts as a programming language (see here). There are other levels of complexity below Turing Complete that may also count, such as Regular Grammars and Pushdown Automata.

Another way to look at it is that many config files are only capable of data markup, whereas a proper programming language must be able to implement algorithms. For example, JSON is a config file format, whereas ECMA Script is a programming language.

I'm a big fan of using python programs as config files, especially for daemons. I like to take the tack of making the daemon completely empty of configuration except for the "configuration port". The python program then connects to the daemon and proceeds to create objects in the daemon and wire them together to create the desired configuration. Once everything is set up, the daemon can then be left to run on it's own. The benefits, of course, are that you get a full fledged programming language to write your config files and since you already have a way to talk to the daemon from another program, you can use it for debugging and getting stats. The major downside is having to deal with messages from another program coming in at any time.

Config file: "What is my purpose?"
You: "Configure the butter."
Config file: "Ok..."
Config file: "What is my purpose?"
You: "You configure butter."
Config file: "Oh my god." You: "Yeah, welcome to the club."

  1. There is no "true purpose" of a configuration file. Its whatever makes sense for your application. In general, things that differ (or might differ) between machines and don't change in the middle of your application run should probably be in a configuration file. Defaults, ports, and addresses for other services are all great candidates. Keys and secrets are also great candidates but should be handled separately from your normal config for security reasons. I disagree that the purpose of a config file is to allow quick changes to be made. The purpose should be to allow flexibility in the setup of your application. If a config file is a quick easy to way to allow that flexibility, so much the better - but you should not be intending your config files to be frequently changing.

  2. Yes and no. Should you atempt to make your application's code simple? Yes. You should attempt to make everything you write simple and to the point. No more complicated than it needs to be. Same is true of your config. However, this is very application specific. Hardcoding what should be in config because it would make your config "too complicated" is bad design. In fact, trying to "keep things simple" is why config files end up being a giant mess. Sometimes the simplest move is to modularize. This is why your configuration files should be written in a well known general purpose programming langauge - not some terrible configuration language (read: all "configuration languages" suck).

  3. Again, who should be modifying config files is completely application dependent. But I agree with miniquark, whoever is deploying the application should be in charge of the configuration.

  4. Source control everything you can. Source control is great. You can roll stuff back super easily and you have a full history of the changes you've made and a record of who made those changes. So why not?

Keep conditional logic in your program and data/params in your config file. Simples people!