Unusual usage of .h file in C

During reading article about filtering, I've found some strange using of .h file - use it for filling array of coefficients:

#define N 100 // filter order
float h[N] = { #include "f1.h" }; //insert coefficients of filter
float x[N];
float y[N];


short my_FIR(short sample_data)
{
float result = 0;


for ( int i = N - 2 ; i >= 0 ; i-- )
{
x[i + 1] = x[i];
y[i + 1] = y[i];
}


x[0] = (float)sample_data;


for (int k = 0; k < N; k++)
{
result = result + x[k]*h[k];
}
y[0] = result;


return ((short)result);
}

So, is it normal practice to use float h[N] = { #include "f1.h" }; this way?

7847 次浏览

Preprocessor directives like #include are just doing some textual substitution (see the documentation of GNU cpp inside GCC). It can occur at any place (outside of comments and string literals).

However, a #include should have its # as the first non-blank character of its line. So you'll code

float h[N] = {
#include "f1.h"
};

The original question did not have #include on its own line, so had wrong code.

It is not normal practice, but it is permitted practice. In that case, I would suggest using some other extension than .h e.g. use #include "f1.def" or #include "f1.data" ...

Ask your compiler to show you the preprocessed form. With GCC compile with gcc -C -E -Wall yoursource.c > yoursource.i and look with an editor or a pager into the generated yoursource.i

I actually prefer to have such data in its own source file. So I would instead suggest to generate a self-contained h-data.c file using e.g. some tool like GNU awk (so file h-data.c would start with const float h[345] = { and end with };...) And if it is a constant data, better declare it const float h[] (so it could sit in read-only segment like .rodata on Linux). Also, if the embedded data is big, the compiler might take time to (uselessly) optimize it (then you could compile your h-data.c quickly without optimizations).

When the preprocessor finds the #include directive it simply opens the file specified and inserts the content of it, as though the content of the file would have been written at the location of the directive.

As already said in the comments this is not normal practice. If I see such code, I try to refactor it.

For example f1.h could look like this

#ifndef _f1_h_
#define _f1_h_


#ifdef N
float h[N] = {
// content ...
}


#endif // N


#endif // _f1_h_

And the .c file:

#define N 100 // filter order
#include “f1.h”


float x[N];
float y[N];
// ...

This seems a bit more normal to me - although the above code could be improved further still (eliminating globals for example).

So, is it normal practice to use float h[N] = { #include “f1.h” }; this way?

It is not normal, but it is valid (will be accepted by the compiler).

Advantages of using this: it spares you the small amount of effort that would be required to think of a better solution.

Disadvantages:

  • it increases WTF/SLOC ratio of your code.
  • it introduces unusual syntax, both in client code and in the included code.
  • in order to understand what f1.h does, you would have to look at how it is used (this means either you need to add extra docs to your project to explain this beast, or people will have to read the code to see what it means - neither solution is acceptable)

This is one of those cases where an extra 20 minutes spent thinking before writing the code, can spare you a few tens of hours of cursing code and developers during the lifetime of the project.

Adding to what everyone else said - the contents of f1.h must be like this:

20.0f, 40.2f,
100f, 12.40f
-122,
0

Because the text in f1.h is going to initialize array in question!

Yes, it may have comments, other function or macro usage, expressions etc.

As already explained in previous answers, it is not normal practice but it's a valid one.

Here is an alternative solution:

File f1.h:

#ifndef F1_H
#define F1_H


#define F1_ARRAY                   \
{                                  \
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, \
10,11,12,13,14,15,16,17,18,19, \
20,21,22,23,24,25,26,27,28,29, \
30,31,32,33,34,35,36,37,38,39, \
40,41,42,43,44,45,46,47,48,49, \
50,51,52,53,54,55,56,57,58,59, \
60,61,62,63,64,65,66,67,68,69, \
70,71,72,73,74,75,76,77,78,79, \
80,81,82,83,84,85,86,87,88,89, \
90,91,92,93,94,95,96,97,98,99  \
}


// Values above used as an example


#endif

File f1.c:

#include "f1.h"


float h[] = F1_ARRAY;


#define N (sizeof(h)/sizeof(*h))


...

A long time ago people overused the preprocessor. See for instance the XPM file format which was designed so that people could:

#include "myimage.xpm"

in their C code.

It's not considered good any more.

The OP's code looks like C so I will talk about C

Why is it overuse of the preprocessor ?

The preprocessor #include directive is intended to include source code. In this case and in the OP's case it is not real source code but data.

Why is it considered bad ?

Because it is very inflexible. You cannot change the image without recompiling the whole application. You cannot even include two images with the same name because it will produce non-compilable code. In the OP's case, he cannot change the data without recompiling the application.

Another issue is that it creates a tight-coupling between the data and the source code, for instance the data file must contain at least the number of values specified by the N macro defined in the source code file.

The tight-coupling also imposes a format to your data, for instance if you want to store a 10x10 matrix values, you can either choose to use a single dimension array or a two dimensional array in your source code. Switching from one format to the other will impose a change in your data file.

This problem of loading data is easily solved by using the standard I/O functions. If you really need to include some default images you can give a default path to the images in your source code. This will at least allow the user to change this value (through a #define or -D option at compile time), or to update the image file without need to recompile.

In the OP's case, its code would be more reusable if the FIR coeficients and x, y vectors where passed as arguments. You could create a structto hold togeteher these values. The code would not be inefficient, and it would become reusable even with other coeficients. The coeficients could be loaded at startup from a default file unless the user pass a command line parameter overriding the file path. This would remove the need for any global variables and makes the intentions of the programmer explicit. You could even use the same FIR function in two threads, provided each thread has got its own struct.

When is it acceptable ?

When you cannot do dynamic loading of data. In this case you have to load your data statically and you are forced to use such techniques.

We should note that not having access to files means you're programming for a very limited platform, and as such you have to do tradeoffs. This will be the case if your code run on a micro-controller for instance.

But even in that case I would prefer to create a real C source file instead of including floating point values from a semi-formatted file.

For instance, providing a real C function returning the coefficients, rather than having a semi-formatted data file. This C function could then be defined in two different files, one using I/O for development purposes, and enother one returning static data for the release build. You would compile the correct source file conditionnaly.

No, it is not normal practice.

There is little to no advantage in using such a format directly, instead the data could be generated in a separate source file, or at least a complete definition could be formed in this case.


There is however a "pattern" which involves including a file in such random places: X-Macros, such as those.

The usage of X-macro is to define a collection once and use it in various places. The single definition ensuring the coherence of the whole. As a trivial example, consider:

// def.inc
MYPROJECT_DEF_MACRO(Error,   Red,    0xff0000)
MYPROJECT_DEF_MACRO(Warning, Orange, 0xffa500)
MYPROJECT_DEF_MACRO(Correct, Green,  0x7fff00)

which can now be used in multiple ways:

// MessageCategory.hpp
#ifndef MYPROJECT_MESSAGE_CATEGORY_HPP_INCLUDED
#define MYPROJECT_MESSAGE_CATEGORY_HPP_INCLUDED


namespace myproject {


enum class MessageCategory {
#   define MYPROJECT_DEF_MACRO(Name_, dummy0_, dummy1_) Name_,
#   include "def.inc"
#   undef MYPROJECT_DEF_MACRO
NumberOfMessageCategories
}; // enum class MessageCategory


enum class MessageColor {
#   define MYPROJECT_DEF_MACRO(dumm0_, Color_, dummy1_) Color_,
#   include "def.inc"
#   undef MYPROJECT_DEF_MACRO
NumberOfMessageColors
}; // enum class MessageColor


MessageColor getAssociatedColorName(MessageCategory category);


RGBColor getAssociatedColorCode(MessageCategory category);


} // namespace myproject


#endif // MYPROJECT_MESSAGE_CATEGORY_HPP_INCLUDED

There are at times situations which require either using external tools to generate .C files based upon other files that contain source code, having external tools generate C files with an inordinate amount of code hard-wired into the generating tools, or having code use the #include directive in various "unusual" ways. Of these approaches, I would suggest that the latter--though icky--may often be the least evil.

I would suggest avoiding the use of the .h suffix for files which do not abide by the normal conventions associated with header files (e.g. by including method definitions, allocating space, requiring an unusual inclusion context (e.g. in the middle of a method), requiring multiple inclusion with different macros defined, etc. I also generally avoid using .c or .cpp for files which are incorporated into other files via #include unless those files are primarily used standalone [I might in some cases e.g. have a file fooDebug.c containing #define SPECIAL_FOO_DEBUG_VERSION[newline]`#include "foo.c"`` if I wish to have two object files with different names generated from the same source, and one of them is affirmatively the "normal" version.]

My normal practice is to use .i as the suffix for either human-generated or machine-generated files that are designed to be included, but in usual ways, from other C or C++ source files; if files are machine-generated, I will generally have the generation tool include as the first line a comment identifying the tool used to create it.

BTW, one trick where I've used this was when I wanted to allow a program to be built using just a batch file, without any third-party tools, but wanted to count how many times it was built. In my batch file, I included echo +1 >> vercount.i; then in file vercount.c, if I recall correctly:

const int build_count = 0
#include "vercount.i"
;

The net effect is that I get a value which increments on every build without having to rely upon any third-party tools to produce it.

I read people want to refactor and saying that this is evil. Still I used in some cases. As some persons said this is a preprocesor directive so is including the content of file. Here's a case where I used: building random numbers. I build random numbers and I dont want to do this each time I compile neither in run time. So another program (usually a script) just fill a file with the generated numbers which are included. This avoids copying by hand, this allows easily changing the numbers, the algorithm which generates them and other niceties. You cannot blame easily the practice, in that case it is simply the right way.

I used the OP's technique of putting an include file for the data initialization portion of a variable declaration for quite some time. Just like the OP, the included file was generated.

I isolated the generated .h files into a separate folder so they could be easily identified:

#include "gensrc/myfile.h"

This scheme fell apart when I started to use Eclipse. Eclipse syntax checking was not sophisticated enough to handle this. It would react by reporting syntax errors where there were none.

I reported samples to Eclipse mailing list, but there did not seem to be much interest in "fixing" the syntax checking.

I changed my code generator to take additional arguments so it could generated that entire variable declaration, not just the data. Now it generates syntactically correct include files.

Even if I was not using Eclipse, I think it is a better solution.

It is normal practice for me.

The preprocessor allows you to split a source file into as many chunks as you like, which are assembled by #include directives.

It makes a lot of sense when you don't want to clutter the code with lengthy/not-to-be-read sections such as data initializations. As it turns out, my record "array initialization" file is 11000 lines long.

I also use them when some parts of the code are automatically generated by some external tool: it is very convenient to have the tool just generate his chunks, and include them in the rest of the code written by hand.

I have a few such inclusions for some functions that have several alternative implementations depending on the processor, some of them using inline assembly. The inclusions make the code more manageable.

By tradition, the #include directive has been used to include header files, i.e. sets of declarations that expose an API. But nothing mandates that.

In the Linux kernel, I found an example that is, IMO, beautiful. If you look at the cgroup.h header file

http://lxr.free-electrons.com/source/include/linux/cgroup.h

you can find the directive #include <linux/cgroup_subsys.h> used twice, after different definitions of the macro SUBSYS(_x); this macro is used inside cgroup_subsys.h, to declare several names of Linux cgroups (if you're not familiar with cgroups, they are user-friendly interfaces Linux offers, which must be initialized at system boot).

In the code snippet

#define SUBSYS(_x) _x ## _cgrp_id,
enum cgroup_subsys_id {
#include <linux/cgroup_subsys.h>
CGROUP_SUBSYS_COUNT,
};
#undef SUBSYS

each SUBSYS(_x) declared in cgroup_subsys.h becomes an element of the type enum cgroup_subsys_id, while in the code snippet

#define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys;
#include <linux/cgroup_subsys.h>
#undef SUBSYS

each SUBSYS(_x) becomes the declaration of a variable of type struct cgroup_subsys.

In this way, kernel programmers can add cgroups by modifying only cgroup_subsys.h, while the pre-processor will automatically add the related enumeration values/declarations in the initialization files.