大熊猫系列蟒蛇的地板或天花板?

我有一个熊猫系列 series。如果我想得到元素明智的地板或天花板,是否有一个内置的方法,或者我必须写的函数和使用应用程序?我这么问是因为数据量很大,所以我很欣赏效率。这个问题也没有被问到有关熊猫的一揽子计划。

115854 次浏览

UPDATE: THIS ANSWER IS WRONG, DO NOT DO THIS

Explanation: using Series.apply() with a native vectorized Numpy function makes no sense in most cases as it will run the Numpy function in a Python loop, leading to much worse performance. You'd be much better off using np.floor(series) directly, as suggested by several other answers.

You could do something like this using NumPy's floor, for instance, with a dataframe:

floored_data = data.apply(np.floor)

Can't test it right now but an actual and working solution might not be far from it.

You can use NumPy's built in methods to do this: np.ceil(series) or np.floor(series).

Both return a Series object (not an array) so the index information is preserved.

I am the OP, but I tried this and it worked:

np.floor(series)

With pd.Series.clip, you can set a floor via clip(lower=x) or ceiling via clip(upper=x):

s = pd.Series([-1, 0, -5, 3])
    

print(s.clip(lower=0))
# 0    0
# 1    0
# 2    0
# 3    3
# dtype: int64
    

print(s.clip(upper=0))
# 0   -1
# 1    0
# 2   -5
# 3    0
# dtype: int64

pd.Series.clip allows generalised functionality, e.g. applying and flooring a ceiling simultaneously, e.g. s.clip(-1, 1)

NOTE: Answer originally referred to clip_lower / clip_upper which were removed in pandas 1.0.0.

The pinned answer already the fastest. Here's I provide some alternative to do ceiling and floor using pure pandas and compare it with the numpy approach.

series = pd.Series(np.random.normal(100,20,1000000))

Floor

%timeit np.floor(series) # 1.65 ms ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit series.astype(int) # 2.2 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit (series-0.5).round(0) # 3.1 ms ± 47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series-0.5,0) # 2.83 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Why astype int works? Because in Python, when converting to integer, that it always get floored.

Ceil

%timeit np.ceil(series) # 1.67 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit (series+0.5).round(0) # 3.15 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series+0.5,0) # 2.99 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So yeah, just use the numpy function.