How do I remove carriage returns with Ruby?

小开

lines2 = lines.split.join("\n")

小开

How about the following?

irb(main):003:0> my_string = "Some text with a carriage return \r"
=> "Some text with a carriage return \r"
irb(main):004:0> my_string.gsub(/\r/,"")
=> "Some text with a carriage return "
irb(main):005:0>

Or...

irb(main):007:0> my_string = "Some text with a carriage return \r\n"
=> "Some text with a carriage return \r\n"
irb(main):008:0> my_string.gsub(/\r\n/,"\n")
=> "Some text with a carriage return \n"
irb(main):009:0>

小开

Generally when I deal with stripping \r or \n, I'll look for both by doing something like

lines.gsub(/\r\n?/, "\n");

I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.

小开

最佳答案

What do you get when you do puts lines? That will give you a clue.

By default File.open opens the file in text mode, so your \r\n characters will be automatically converted to \n. Maybe that's the reason lines are always equal to lines2. To prevent Ruby from parsing the line ends use the rb mode:

C:\> copy con lala.txt
a
file
with
many
lines
^Z


C:\> irb
irb(main):001:0> text = File.open('lala.txt').read
=> "a\nfile\nwith\nmany\nlines\n"
irb(main):002:0> bin = File.open('lala.txt', 'rb').read
=> "a\r\nfile\r\nwith\r\nmany\r\nlines\r\n"
irb(main):003:0>

But from your question and code I see you simply need to open the file with the default modifier. You don't need any conversion and may use the shorter File.read.

小开

Use String#strip

Returns a copy of str with leading and trailing whitespace removed.

e.g

"    hello    ".strip   #=> "hello"
"\tgoodbye\r\n".strip   #=> "goodbye"

Using gsub

string = string.gsub(/\r/," ")
string = string.gsub(/\n/," ")

小开

Why not read the file in text mode, rather than binary mode?

小开

"still the same\n".chomp
or
"still the same\n".chomp!

http://www.ruby-doc.org/core-1.9.3/String.html#method-i-chomp

小开

modified_string = string.gsub(/\s+/, ' ').strip

小开

You can use this :

my_string.strip.gsub(/\s+/, ' ')

小开

If you are using Rails, there is a squish method

"\tgoodbye\r\n".squish => "goodbye"

"\tgood \t\r\nbye\r\n".squish => "good bye"

小开

lines.map(&:strip).join(" ")

小开

I think your regex is almost complete - here's what I would do:

lines2 = lines.gsub(/[\r\n]+/m, "\n")

In the above, I've put \r and \n into a class (that way it doesn't matter in which order they might appear) and added the "+" qualifier (so that "\r\n\r\n\r\n" would also match once, and the whole thing replaced with "\n")

小开

Just another variant:

lines.delete(" \n")

小开

def dos2unix(input)
input.each_byte.map { |c| c.chr unless c == 13 }.join
end


remove_all_the_carriage_returns = dos2unix(some_blob)