在 Ruby 中,将字符串分割成给定长度的块的最佳方法是什么?

我一直在寻找一种优雅而有效的方法,在 Ruby 中将字符串组块为给定长度的子字符串。

到目前为止,我能想到的最好的办法是:

def chunk(string, size)
(0..(string.length-1)/size).map{|i|string[i*size,size]}
end


>> chunk("abcdef",3)
=> ["abc", "def"]
>> chunk("abcde",3)
=> ["abc", "de"]
>> chunk("abc",3)
=> ["abc"]
>> chunk("ab",3)
=> ["ab"]
>> chunk("",3)
=> []

您可能希望 chunk("", n)返回 [""]而不是 []。如果是这样,只需在方法的第一行添加:

return [""] if string.empty?

你有更好的建议吗?

剪辑

感谢 Jeremy Ruten 提供的这个优雅而高效的解决方案: [ edit: NOT effect! ]

def chunk(string, size)
string.scan(/.{1,#{size}}/)
end

剪辑

Can 解决方案将512k 分解为1k 块10000次,大约需要60秒,而原始的基于片的解决方案只需要2.4秒。

40067 次浏览

Are there some other constraints you have in mind? Otherwise I'd be awfully tempted to do something simple like

[0..10].each {
str[(i*w),w]
}
test.split(/(...)/).reject {|v| v.empty?}

The reject is necessary because it otherwise includes the blank space between sets. My regex-fu isn't quite up to seeing how to fix that right off the top of my head.

Use String#scan:

>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,3}/)
=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]

Here is another way to do it:

"abcdefghijklmnopqrstuvwxyz".chars.to_a.each_slice(3).to_a.map {|s| s.to_s }

=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]

I think this is the most efficient solution if you know your string is a multiple of chunk size

def chunk(string, size)
(string.length / size).times.collect { |i| string[i * size, size] }
end

and for parts

def parts(string, count)
size = string.length / count
count.times.collect { |i| string[i * size, size] }
end

Here is another one solution for slightly different case, when processing large strings and there is no need to store all chunks at a time. In this way it stores single chunk at a time and performs much faster than slicing strings:

io = StringIO.new(string)
until io.eof?
chunk = io.read(chunk_size)
do_something(chunk)
end

A better solution which takes into account the last part of the string which could be less than the chunk size:

def chunk(inStr, sz)
return [inStr] if inStr.length < sz
m = inStr.length % sz # this is the last part of the string
partial = (inStr.length / sz).times.collect { |i| inStr[i * sz, sz] }
partial << inStr[-m..-1] if (m % sz != 0) # add the last part
partial
end

I made a little test that chops about 593MB data into 18991 32KB pieces. Your slice+map version ran for at least 15 minutes using 100% CPU before I pressed ctrl+C. This version using String#unpack finished in 3.6 seconds:

def chunk(string, size)
string.unpack("a#{size}" * (string.size/size.to_f).ceil)
end

Just text.scan(/.{1,4}/m) resolves the problem