How to split strings into text and number?

I'd like to split strings like these

'foofo21'
'bar432'
'foobar12345'

into

['foofo', '21']
['bar', '432']
['foobar', '12345']

Does somebody know an easy and simple way to do this in python?

184171 次浏览

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")
>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'

So, if you have a list of strings with that format:

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]

Output:

[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

I'm always the one to bring up findall() =)

>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Note that I'm using a simpler (less to type) regex than most of the previous answers.

Yet Another Option:

>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]
def mysplit(s):
head = s.rstrip('0123456789')
tail = s[len(head):]
return head, tail
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
import re


s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)

without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number

def text_num_split(item):
for index, letter in enumerate(item, 0):
if letter.isdigit():
return [item[:index],item[index:]]


print(text_num_split("foobar12345"))

OUTPUT :

['foobar', '12345']

here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,

def seperate_string_number(string):
previous_character = string[0]
groups = []
newword = string[0]
for x, i in enumerate(string[1:]):
if i.isalpha() and previous_character.isalpha():
newword += i
elif i.isnumeric() and previous_character.isnumeric():
newword += i
else:
groups.append(newword)
newword = i


previous_character = i


if x == len(string) - 2:
groups.append(newword)
newword = ''
return groups


print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']

Here is simple solution for that problem, no need for regex:

user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []


for item in user:
try:
item = int(item)  # searching for integers in your string
except:
str_list.append(item)
string = ''.join(str_list)
else:  # if there are integers i will add it to int_list but as str, because join function only can work with str
int_list.append(str(item))
integer = int(''.join(int_list))  # if you want it to be string just do z = ''.join(int_list)


final = [string, integer]  # you can also add it to dictionary d = {string: integer}
print(final)

This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.

def getNumbers( input ):
# Collect Info
compile = ""
complete = []


for letter in input:
# If compiled string
if compile:
# If compiled and letter are same type, append letter
if compile.isdigit() == letter.isdigit():
compile += letter
            

# If compiled and letter are different types, append compiled string, and begin with letter
else:
complete.append( compile )
compile = letter
            

# If no compiled string, begin with letter
else:
compile = letter
        

# Append leftover compiled string
if compile:
complete.append( compile )
    

# Return numbers only
numbers = [ word for word in complete if word.isdigit() ]
        

return numbers

In Addition to the answer of @Evan If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.

import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
items = match.groups()
print(items)
>> ("21", "foofo")

Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.