大熊猫数据框中整个列的子串

小开

最佳答案

Use the str accessor with square brackets:

df['col'] = df['col'].str[:9]

Or str.slice:

df['col'] = df['col'].str.slice(0, 9)

小开

In case the column isn't a string, use astype to convert it:

df['col'] = df['col'].astype(str).str[:9]

小开

As one doesn't know exactly OP's dataframe, one can create one to be used as test.

df = pd.DataFrame({'col': {0: '2020-12-08', 1: '2020-12-08', 2: '2020-12-08', 3: '2020-12-08', 4: '2020-12-08', 5: '2020-12-08', 6: '2020-12-08', 7: '2020-12-08', 8: '2020-12-08', 9: '2020-12-08'}})


[Out]:
col
0  2020-12-08
1  2020-12-08
2  2020-12-08
3  2020-12-08
4  2020-12-08
5  2020-12-08
6  2020-12-08
7  2020-12-08
8  2020-12-08
9  2020-12-08

Assuming one wants to store the column in the same dataframe df, and that we want to keep only 4 characters, on a column called col_substring, there are various options one can do.

Option 1

Using pandas.Series.str

df['col_substring'] = df['col'].str[:4]


[Out]:


col col_substring
0  2020-12-08          2020
1  2020-12-08          2020
2  2020-12-08          2020
3  2020-12-08          2020
4  2020-12-08          2020
5  2020-12-08          2020
6  2020-12-08          2020
7  2020-12-08          2020
8  2020-12-08          2020
9  2020-12-08          2020

Option 2

Using pandas.Series.str.slice as follows

df['col_substring'] = df['col'].str.slice(0, 4)


[Out]:


col col_substring
0  2020-12-08          2020
1  2020-12-08          2020
2  2020-12-08          2020
3  2020-12-08          2020
4  2020-12-08          2020
5  2020-12-08          2020
6  2020-12-08          2020
7  2020-12-08          2020
8  2020-12-08          2020
9  2020-12-08          2020

or like this

df['col_substring'] = df['col'].str.slice(stop=4)

Option 3

Using a custom lambda function

df['col_substring'] = df['col'].apply(lambda x: x[:4])


[Out]:


col col_substring
0  2020-12-08          2020
1  2020-12-08          2020
2  2020-12-08          2020
3  2020-12-08          2020
4  2020-12-08          2020
5  2020-12-08          2020
6  2020-12-08          2020
7  2020-12-08          2020
8  2020-12-08          2020
9  2020-12-08          2020

Option 4

Using a custom lambda function with a regular expression (with re)

import re


df['col_substring'] = df['col'].apply(lambda x: re.findall(r'^.{4}', x)[0])


[Out]:


col col_substring
0  2020-12-08          2020
1  2020-12-08          2020
2  2020-12-08          2020
3  2020-12-08          2020
4  2020-12-08          2020
5  2020-12-08          2020
6  2020-12-08          2020
7  2020-12-08          2020
8  2020-12-08          2020
9  2020-12-08          2020

Option 5

Using numpy.vectorize

df['col_substring'] = np.vectorize(lambda x: x[:4])(df['col'])


[Out]:


col col_substring
0  2020-12-08          2020
1  2020-12-08          2020
2  2020-12-08          2020
3  2020-12-08          2020
4  2020-12-08          2020
5  2020-12-08          2020
6  2020-12-08          2020
7  2020-12-08          2020
8  2020-12-08          2020
9  2020-12-08          2020

Note:

The ideal solution would depend on the use case, constraints, and the dataframe.