从 C/C + + 代码中删除注释

有没有一种简单的方法可以在不进行任何预处理的情况下从 C/C + + 源文件中删除注释。(即,我认为您可以使用 gcc-E,但这将扩展宏。)我只想删除带注释的源代码,其他的不需要修改。

编辑:

对现有工具的偏好。我不想用正则表达式自己编写这个代码,因为我预见到代码中有太多的意外。

75080 次浏览

Run the following command on your source file:

gcc -fpreprocessed -dD -E test.c

Thanks to KennyTM for finding the right flags. Here’s the result for completeness:

test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments

gcc -fpreprocessed -dD -E test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo

It depends on how perverse your comments are. I have a program scc to strip C and C++ comments. I also have a test file for it, and I tried GCC (4.2.1 on MacOS X) with the options in the currently selected answer - and GCC doesn't seem to do a perfect job on some of the horribly butchered comments in the test case.

NB: This isn't a real-life problem - people don't write such ghastly code.

Consider the (subset - 36 of 135 lines total) of the test case:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.


/\
\/ This is not a C++/C99 comment!


This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.


/\
\* This is not a C or C++ comment!


This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.


This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

On my Mac, the output from GCC (gcc -fpreprocessed -dD -E subset.c) is:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.


/\
\/ This is not a C++/C99 comment!


This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.


/\
\* This is not a C or C++ comment!


This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.


This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

The output from 'scc' is:

The regular C comment number 1 has finished.


/\
\/ This is not a C++/C99 comment!


This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.


/\
\* This is not a C or C++ comment!


This is followed by regular C comment number 2.


The regular C comment number 2 has finished.


This is followed by regular C comment number 3.

The output from 'scc -C' (which recognizes double-slash comments) is:

The regular C comment number 1 has finished.


/\
\/ This is not a C++/C99 comment!


This is followed by C++/C99 comment number 3.


The C++/C99 comment number 3 has finished.


/\
\* This is not a C or C++ comment!


This is followed by regular C comment number 2.


The regular C comment number 2 has finished.


This is followed by regular C comment number 3.

Source for SCC now available on GitHub

The current version of SCC is 6.60 (dated 2016-06-12), though the Git versions were created on 2017-01-18 (in the US/Pacific time zone). The code is available from GitHub at https://github.com/jleffler/scc-snapshots. You can also find snapshots of the previous releases (4.03, 4.04, 5.05) and two pre-releases (6.16, 6.50) — these are all tagged release/x.yz.

The code is still primarily developed under RCS. I'm still working out how I want to use sub-modules or a similar mechanism to handle common library files like stderr.c and stderr.h (which can also be found in https://github.com/jleffler/soq).

SCC version 6.60 attempts to understand C++11, C++14 and C++17 constructs such as binary constants, numeric punctuation, raw strings, and hexadecimal floats. It defaults to C11 mode operation. (Note that the meaning of the -C flag — mentioned above — flipped between version 4.0x described in the main body of the answer and version 6.60 which is currently the latest release.)

gcc -fpreprocessed -dD -E did not work for me but this program does it:

#include <stdio.h>


static void process(FILE *f)
{
int c;
while ( (c=getc(f)) != EOF )
{
if (c=='\'' || c=='"')            /* literal */
{
int q=c;
do
{
putchar(c);
if (c=='\\') putchar(getc(f));
c=getc(f);
} while (c!=q);
putchar(c);
}
else if (c=='/')              /* opening comment ? */
{
c=getc(f);
if (c!='*')                  /* no, recover */
{
putchar('/');
ungetc(c,f);
}
else
{
int p;
putchar(' ');               /* replace comment with space */
do
{
p=c;
c=getc(f);
} while (c!='/' || p!='*');
}
}
else
{
putchar(c);
}
}
}


int main(int argc, char *argv[])
{
process(stdin);
return 0;
}

There is a stripcmt program than can do this:

StripCmt is a simple utility written in C to remove comments from C, C++, and Java source files. In the grand tradition of Unix text processing programs, it can function either as a FIFO (First In - First Out) filter or accept arguments on the command line.

(per hlovdal's answer to: question about Python code for this)

I Believe If you use one statement you can easily remove Comments from C

perl -i -pe ‘s/\\\*(.*)/g’ file.c This command Use for removing * C style comments
perl -i -pe 's/\\\\(.*)/g' file.cpp This command Use for removing \ C++ Style Comments

Only Problem with this command it cant remove comments that contains more than one line.but by using this regEx you can easily implement logic for Multiline Removing comments

This is a perl script to remove //one-line and /* multi-line */ comments

  #!/usr/bin/perl


undef $/;
$text = <>;


$text =~ s/\/\/[^\n\r]*(\n\r)?//g;
$text =~ s/\/\*+([^*]|\*(?!\/))*\*+\///g;


print $text;

It requires your source file as a command line argument. Save the script to a file, let say remove_comments.pl and call it using the following command: perl -w remove_comments.pl [your source file]

Hope it will be helpful

I had this problem as well. I found this tool (Cpp-Decomment) , which worked for me. However it ignores if the comment line extends to next line. Eg:

// this is my comment \
comment continues ...

In this case, I couldn't find a way in the program so just searched for ignored lines and fixed in manually. I believe there would be an option for that or maybe you could change the program's source file to do so.

#include<stdio.h>
{
char c;
char tmp = '\0';
int inside_comment = 0;  // A flag to check whether we are inside comment
while((c = getchar()) != EOF) {
if(tmp) {
if(c == '/') {
while((c = getchar()) !='\n');
tmp = '\0';
putchar('\n');
continue;
}else if(c == '*') {
inside_comment = 1;
while(inside_comment) {
while((c = getchar()) != '*');
c = getchar();
if(c == '/'){
tmp = '\0';
inside_comment = 0;
}
}
continue;
}else {
putchar(c);
tmp = '\0';
continue;
}
}
if(c == '/') {
tmp = c;
} else {
putchar(c);
}
}
return 0;
}

This program runs for both the conditions i.e // and /...../

Recently I wrote some Ruby code to solve this problem. I have considered following exceptions:

  • comment in strings
  • multiple line comment on one line, fix greedy match.
  • multiple lines on multiple lines

Here is the code:

It uses following code to preprocess each line in case those comments appear in strings. If it appears in your code, uh, bad luck. You can replace it with a more complex strings.

  • MUL_REPLACE_LEFT = "MUL_REPLACE_LEFT"
  • MUL_REPLACE_RIGHT = "MUL_REPLACE_RIGHT"
  • SIG_REPLACE = "SIG_REPLACE"

USAGE: ruby -w inputfile outputfile

I know it's late, but I thought I'd share my code and my first attempt at writing a compiler.

Note: this does not account for "\*/" inside a multiline comment e.g /\*...."*/"...\*. Then again, gcc 4.8.1 doesn't either.

void function_removeComments(char *pchar_sourceFile, long long_sourceFileSize)
{
long long_sourceFileIndex = 0;
long long_logIndex = 0;


int int_EOF = 0;


for (long_sourceFileIndex=0; long_sourceFileIndex < long_sourceFileSize;long_sourceFileIndex++)
{
if (pchar_sourceFile[long_sourceFileIndex] == '/' && int_EOF == 0)
{
long_logIndex = long_sourceFileIndex;  // log "possible" start of comment


if (long_sourceFileIndex+1 < long_sourceFileSize)  // array bounds check given we want to peek at the next character
{
if (pchar_sourceFile[long_sourceFileIndex+1] == '*') // multiline comment
{
for (long_sourceFileIndex+=2;long_sourceFileIndex < long_sourceFileSize; long_sourceFileIndex++)
{
if (pchar_sourceFile[long_sourceFileIndex] == '*' && pchar_sourceFile[long_sourceFileIndex+1] == '/')
{
// since we've found the end of multiline comment
// we want to increment the pointer position two characters
// accounting for "*" and "/"
long_sourceFileIndex+=2;


break;  // terminating sequence found
}
}


// didn't find terminating sequence so it must be eof.
// set file pointer position to initial comment start position
// so we can display file contents.
if (long_sourceFileIndex >= long_sourceFileSize)
{
long_sourceFileIndex = long_logIndex;


int_EOF = 1;
}
}
else if (pchar_sourceFile[long_sourceFileIndex+1] == '/')  // single line comment
{
// since we know its a single line comment, increment file pointer
// until we encounter a new line or its the eof
for (long_sourceFileIndex++; pchar_sourceFile[long_sourceFileIndex] != '\n' && pchar_sourceFile[long_sourceFileIndex] != '\0'; long_sourceFileIndex++);
}
}
}


printf("%c",pchar_sourceFile[long_sourceFileIndex]);
}
}

Because you use C, you might want to use something that's "natural" to C. You can use the C preprocessor to just remove comments. The examples given below work with the C preprocessor from GCC. They should work the same or in similar ways with other C perprocessors as well.

For C, use

cpp -dD -fpreprocessed -o output.c input.c

It also works for removing comments from JSON, for example like this:

cpp -P -o - - <input.json >output.json

In case your C preprocessor is not accessible directly, you can try to replace cpp with cc -E, which calls the C compiler telling it to stop after the preprocessor stage. In case your C compiler binary is not cc you can replace cc with the name of your C compiler binary, for example clang. Note that not all preprocessors support -fpreprocessed.

I write a C program using standard C library, around 200 lines, which removes comments of C source code file. qeatzy/removeccomments

behavior

  1. C style comment that span multi-line or occupy entire line gets zeroed out.
  2. C style comment in the middle of a line remain unchanged. eg, void init(/* do initialization */) {...}
  3. C++ style comment that occupy entire line gets zeroed out.
  4. C string literal being respected, via checking " and \".
  5. handles line-continuation. If previous line ending with \, current line is part of previous line.
  6. line number remain the same. Zeroed out lines or part of line become empty.

testing & profiling

I tested with largest cpython source code that contains many comments. In this case it do the job correctly and fast, 2-5 faster than gcc

time gcc -fpreprocessed -dD -E Modules/unicodeobject.c > res.c 2>/dev/null
time ./removeccomments < Modules/unicodeobject.c > result.c

usage

/path/to/removeccomments < input_file > output_file