如何在 Lua 字符串中迭代单个字符?

我在 Lua 中有一个字符串,并希望在其中迭代单个字符。但是我试过的代码都没有用,官方手册只显示了如何查找和替换子字符串: (

str = "abcd"
for char in str do -- error
print( char )
end


for i = 1, str:len() do
print( str[ i ] ) -- nil
end
114207 次浏览

If you're using Lua 5, try:

for i = 1, string.len(str) do
print( string.sub(str, i, i) )
end

In lua 5.1, you can iterate of the characters of a string this in a couple of ways.

The basic loop would be:

for i = 1, #str do
local c = str:sub(i,i)
-- do something with c
end

But it may be more efficient to use a pattern with string.gmatch() to get an iterator over the characters:

for c in str:gmatch"." do
-- do something with c
end

Or even to use string.gsub() to call a function for each char:

str:gsub(".", function(c)
-- do something with c
end)

In all of the above, I've taken advantage of the fact that the string module is set as a metatable for all string values, so its functions can be called as members using the : notation. I've also used the (new to 5.1, IIRC) # to get the string length.

The best answer for your application depends on a lot of factors, and benchmarks are your friend if performance is going to matter.

You might want to evaluate why you need to iterate over the characters, and to look at one of the regular expression modules that have been bound to Lua, or for a modern approach look into Roberto's lpeg module which implements Parsing Expression Grammers for Lua.

Depending on the task at hand it might be easier to use string.byte. It is also the fastest ways because it avoids creating new substring that happends to be pretty expensive in Lua thanks to hashing of each new string and checking if it is already known. You can pre-calculate code of symbols you look for with same string.byte to maintain readability and portability.

local str = "ab/cd/ef"
local target = string.byte("/")
for idx = 1, #str do
if str:byte(idx) == target then
print("Target found at:", idx)
end
end

All people suggest a less optimal method

Will be best:

    function chars(str)
strc = {}
for i = 1, #str do
table.insert(strc, string.sub(str, i, i))
end
return strc
end


str = "Hello world!"
char = chars(str)
print("Char 2: "..char[2]) -- prints the char 'e'
print("-------------------\n")
for i = 1, #str do -- testing printing all the chars
if (char[i] == " ") then
print("Char "..i..": [[space]]")
else
print("Char "..i..": "..char[i])
end
end

There are already a lot of good approaches in the provided answers (here, here and here). If speed is what are you primarily looking for, you should definitely consider doing the job through Lua's C API, which is many times faster than raw Lua code. When working with preloaded chunks (eg. load function), the difference is not that big, but still considerable.

As for the pure Lua solutions, let me share this small benchmark, I've made. It covers every provided answer to this date and adds a few optimizations. Still, the basic thing to consider is:

How many times you'll need to iterate over characters in string?

  • If the answer is "once", than you should look up first part of the banchmark ("raw speed").
  • Otherwise, the second part will provide more precise estimation, because it parses the string into the table, which is much faster to iterate over. You should also consider writing a simple function for this, like @Jarriz suggested.

Here is full code:

-- Setup locals
local str = "Hello World!"
local attempts = 5000000
local reuses = 10 -- For the second part of benchmark: Table values are reused 10 times. Change this according to your needs.
local x, c, elapsed, tbl
-- "Localize" funcs to minimize lookup overhead
local stringbyte, stringchar, stringsub, stringgsub, stringgmatch = string.byte, string.char, string.sub, string.gsub, string.gmatch


print("-----------------------")
print("Raw speed:")
print("-----------------------")


-- Version 1 - string.sub in loop
x = os.clock()
for j = 1, attempts do
for i = 1, #str do
c = stringsub(str, i)
end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))


-- Version 2 - string.gmatch loop
x = os.clock()
for j = 1, attempts do
for c in stringgmatch(str, ".") do end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))


-- Version 3 - string.gsub callback
x = os.clock()
for j = 1, attempts do
stringgsub(str, ".", function(c) end)
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))


-- For version 4
local str2table = function(str)
local ret = {}
for i = 1, #str do
ret[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
end
return ret
end


-- Version 4 - function str2table
x = os.clock()
for j = 1, attempts do
tbl = str2table(str)
for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
c = tbl[i]
end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))


-- Version 5 - string.byte
x = os.clock()
for j = 1, attempts do
tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
for i = 1, #tbl do
c = tbl[i] -- Note: produces char codes instead of chars.
end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))


-- Version 5b - string.byte + conversion back to chars
x = os.clock()
for j = 1, attempts do
tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
for i = 1, #tbl do
c = stringchar(tbl[i])
end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))


print("-----------------------")
print("Creating cache table ("..reuses.." reuses):")
print("-----------------------")


-- Version 1 - string.sub in loop
x = os.clock()
for k = 1, attempts do
tbl = {}
for i = 1, #str do
tbl[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
end
for j = 1, reuses do
for i = 1, #tbl do
c = tbl[i]
end
end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))


-- Version 2 - string.gmatch loop
x = os.clock()
for k = 1, attempts do
tbl = {}
local tblc = 1 -- Note: This is faster than table.insert
for c in stringgmatch(str, ".") do
tbl[tblc] = c
tblc = tblc + 1
end
for j = 1, reuses do
for i = 1, #tbl do
c = tbl[i]
end
end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))


-- Version 3 - string.gsub callback
x = os.clock()
for k = 1, attempts do
tbl = {}
local tblc = 1 -- Note: This is faster than table.insert
stringgsub(str, ".", function(c)
tbl[tblc] = c
tblc = tblc + 1
end)
for j = 1, reuses do
for i = 1, #tbl do
c = tbl[i]
end
end
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))


-- Version 4 - str2table func before loop
x = os.clock()
for k = 1, attempts do
tbl = str2table(str)
for j = 1, reuses do
for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
c = tbl[i]
end
end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))


-- Version 5 - string.byte to create table
x = os.clock()
for k = 1, attempts do
tbl = {stringbyte(str,1,#str)}
for j = 1, reuses do
for i = 1, #tbl do
c = tbl[i]
end
end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))


-- Version 5b - string.byte to create table + string.char loop to convert bytes to chars
x = os.clock()
for k = 1, attempts do
tbl = {stringbyte(str, 1, #str)}
for i = 1, #tbl do
tbl[i] = stringchar(tbl[i])
end
for j = 1, reuses do
for i = 1, #tbl do
c = tbl[i]
end
end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))

Example output (Lua 5.3.4, Windows):

-----------------------
Raw speed:
-----------------------
V1: elapsed time: 3.713
V2: elapsed time: 5.089
V3: elapsed time: 5.222
V4: elapsed time: 4.066
V5: elapsed time: 2.627
V5b: elapsed time: 3.627
-----------------------
Creating cache table (10 reuses):
-----------------------
V1: elapsed time: 20.381
V2: elapsed time: 23.913
V3: elapsed time: 25.221
V4: elapsed time: 20.551
V5: elapsed time: 13.473
V5b: elapsed time: 18.046

Result:

In my case, the string.byte and string.sub were fastest in terms of raw speed. When using cache table and reusing it 10 times per loop, the string.byte version was fastest even when converting charcodes back to chars (which isn't always necessary and depends on usage).

As you have probably noticed, I've made some assumptions based on my previous benchmarks and applied them to the code:

  1. Library functions should be always localized if used inside loops, because it is a lot faster.
  2. Inserting new element into lua table is much faster using tbl[idx] = value than table.insert(tbl, value).
  3. Looping through table using for i = 1, #tbl is a bit faster than for k, v in pairs(tbl).
  4. Always prefer the version with less function calls, because the call itself adds a little bit to the execution time.

Hope it helps.

Iterating to construct a string and returning this string as a table with load()...

itab=function(char)
local result
for i=1,#char do
if i==1 then
result=string.format('%s','{')
end
result=result..string.format('\'%s\'',char:sub(i,i))
if i~=#char then
result=result..string.format('%s',',')
end
if i==#char then
result=result..string.format('%s','}')
end
end
return load('return '..result)()
end


dump=function(dump)
for key,value in pairs(dump) do
io.write(string.format("%s=%s=%s\n",key,type(value),value))
end
end


res=itab('KOYAANISQATSI')


dump(res)

Puts out...

1=string=K
2=string=O
3=string=Y
4=string=A
5=string=A
6=string=N
7=string=I
8=string=S
9=string=Q
10=string=A
11=string=T
12=string=S
13=string=I