如何在 Unix 行结束转换窗口行结束(CR/LF 到 LF)

我是一个 Java 开发者,我用 Ubuntu 来开发。这个项目是在 Windows 中用 Eclipse 创建的,它使用的是 视窗 -1252编码。

为了转换成 UTF-8,我使用了 翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳翻译: 奇芳程序:

find Web -iname \*.java | xargs recode CP1252...UTF-8

这个命令会出现以下错误:

recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data

我搜索了一下,在 重新编码: 在步骤“ data.CR-LF”中的模糊输出。重新编码: 在步骤“ data.CR-LF”中的模糊输出。重新编码: 在步骤“ data.CR-LF”中的模糊输出。重新编码: 在步骤“ data.CR-LF”中的模糊输出。重新编码: 在步骤“ data.CR-LF”中的模糊输出。重新编码: 在步骤“ data.CR-LF”中的模糊输出中得到了解答,它说:

将行结尾从 CR/LF 转换为 a 单 LF: 用 Vim 编辑文件, 给出命令 :set ff=unix并保存 重新编码现在应该可以运行了 没有错误。

很好,但是我有很多文件要删除 CR/LF 字符,我不能打开每个文件。Vi 没有为 Bash 操作提供任何命令行选项。

可以用 sed 来做这个吗? 怎么做?

152826 次浏览

There should be a program called dos2unix that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.

The tr command can also do this:

tr -d '\15\32' < winfile.txt > unixfile.txt

and should be available to you.

You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:

#!/bin/bash


for f in `find -iname \*.java`; do
echo "$f"
tr -d '\15\32' < "$f" > "$f.tr"
mv "$f.tr" "$f"
recode CP1252...UTF-8 "$f"
done

Running myscript.sh would process all the java files in the current directory and its subdirectories.

Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u on the files.

Try the Python script by Bryan Maupin found here (I've modified it a little bit to be more generic):

#!/usr/bin/env python


import sys


input_file_name = sys.argv[1]
output_file_name = sys.argv[2]


input_file = open(input_file_name)
output_file = open(output_file_name, 'w')


line_number = 0


for input_line in input_file:
line_number += 1
try:  # first try to decode it using cp1252 (Windows, Western Europe)
output_line = input_line.decode('cp1252').encode('utf8')
except UnicodeDecodeError, error:  # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error))  # write to stderr
try:  # then if that fails, try to decode using latin1 (ISO 8859-1)
output_line = input_line.decode('latin1').encode('utf8')
except UnicodeDecodeError, error:  # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error))  # write to stderr
sys.exit(1)  # and just keep going
output_file.write(output_line)


input_file.close()
output_file.close()

You can use that script with

$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql

In order to overcome

Ambiguous output in step `CR-LF..data'

the simple solution might be to add the -f flag to force the conversion.

sed cannot match \n because the trailing newline is removed before the line is put into the pattern space, but it can match \r, so you can convert \r\n (DOS) to \n (Unix) by removing \r:

sed -i 's/\r//g' file

Warning: this will change the original file

However, you cannot change from Unix EOL to DOS or old Mac (\r) by this. More readings here:

How can I replace a newline (\n) using sed?

Actually, Vim does allow what you're looking for. Enter Vim, and type the following commands:

:args **/*.java
:argdo set ff=unix | update | next

The first of these commands sets the argument list to every file matching **/*.java, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:

  • Sets the line-endings to Unix style (you already know this)
  • Writes the file out iff it's been changed
  • Proceeds to the next file

I'll take a little exception to jichao's answer. You can actually do everything he just talked about fairly easily. Instead of looking for a \n, just look for carriage return at the end of the line.

sed -i 's/\r$//' "${FILE_NAME}"

To change from Unix back to DOS, simply look for the last character on the line and add a form feed to it. (I'll add -r to make this easier with grep regular expressions.)

sed -ri 's/(.)$/\1\r/' "${FILE_NAME}"

Theoretically, the file could be changed to Mac style by adding code to the last example that also appends the next line of input to the first line until all lines have been processed. I won't try to make that example here, though.

Warning: -i changes the actual file. If you want a backup to be made, add a string of characters after -i. This will move the existing file to a file with the same name with your characters added to the end.

Update: The Unix to DOS conversion can be simplified and made more efficient by not bothering to look for the last character. This also allows us to not require using -r for it to work:

sed -i 's/$/\r/' "${FILE_NAME}"