Numpy: “ array_like”对象的正式定义?

在 numpy 中,许多对象的构造函数都接受“ array _ like”作为第一个参数。是否有这样一个对象的定义,或者作为一个抽象元类,或者方法的文档应该包含?

58587 次浏览

The term "array-like" is used in NumPy, referring to anything that can be passed as first parameter to numpy.array() to create an array ().

As per the Numpy document:

In general, numerical data arranged in an array-like structure in Python can be converted to arrays through the use of the array() function. The most obvious examples are lists and tuples. See the documentation for array() for details for its use. Some objects may support the array-protocol and allow conversion to arrays this way. A simple way to find out if the object can be converted to a numpy array using array() is simply to try it interactively and see if it works! (The Python Way).

For more information, read:

It turns out almost anything is technically an array-like. "Array-like" is more of a statement of how the input will be interpreted than a restriction on what the input can be; if a parameter is documented as array-like, NumPy will try to interpret it as an array.

There is no formal definition of array-like beyond the nearly tautological one -- an array-like is any Python object that np.array can convert to an ndarray. To go beyond this, you'd need to study the source code.

NPY_NO_EXPORT PyObject *
PyArray_FromAny(PyObject *op, PyArray_Descr *newtype, int min_depth,
int max_depth, int flags, PyObject *context)
{
/*
* This is the main code to make a NumPy array from a Python
* Object.  It is called from many different places.
*/
PyArrayObject *arr = NULL, *ret;
PyArray_Descr *dtype = NULL;
int ndim = 0;
npy_intp dims[NPY_MAXDIMS];


/* Get either the array or its parameters if it isn't an array */
if (PyArray_GetArrayParamsFromObject(op, newtype,
0, &dtype,
&ndim, dims, &arr, context) < 0) {
Py_XDECREF(newtype);
return NULL;
}
...

Particularly interesting is PyArray_GetArrayParamsFromObject, whose comments enumerate the types of objects np.array expects:

NPY_NO_EXPORT int
PyArray_GetArrayParamsFromObject(PyObject *op,
PyArray_Descr *requested_dtype,
npy_bool writeable,
PyArray_Descr **out_dtype,
int *out_ndim, npy_intp *out_dims,
PyArrayObject **out_arr, PyObject *context)
{
PyObject *tmp;


/* If op is an array */


/* If op is a NumPy scalar */


/* If op is a Python scalar */


/* If op supports the PEP 3118 buffer interface */


/* If op supports the __array_struct__ or __array_interface__ interface */


/*
* If op supplies the __array__ function.
* The documentation says this should produce a copy, so
* we skip this method if writeable is true, because the intent
* of writeable is to modify the operand.
* XXX: If the implementation is wrong, and/or if actual
*      usage requires this behave differently,
*      this should be changed!
*/


/* Try to treat op as a list of lists */


/* Anything can be viewed as an object, unless it needs to be writeable */


}

So by studying the source code we can conclude an array-like is

It's just a concept, and there is an official statement (in Numpy Glossary) about it besides the explanation in User Guide part mentioned in other answers:

array_like

Any sequence that can be interpreted as an ndarray. This includes nested lists, tuples, scalars and existing arrays.

so even scalars can be taken into account, just like np.array(1024).

NumPy 1.21 introduces numpy.typing.ArrayLike.


Originally defined as follows in this commit:

class _SupportsArray(Protocol):
@overload
def __array__(self, __dtype: DtypeLike = ...) -> ndarray: ...
@overload
def __array__(self, dtype: DtypeLike = ...) -> ndarray: ...


ArrayLike = Union[bool, int, float, complex, _SupportsArray, Sequence]

However, a more recent definition of ArrayLike can be found in numpy/_typing/_array_like.py:

_ArrayLike = Union[
_NestedSequence[_SupportsArray[_DType]],
_NestedSequence[_T],
]


ArrayLike = Union[
_RecursiveSequence,
_ArrayLike[
"dtype[Any]",
Union[bool, int, float, complex, str, bytes]
],
]

I posted this as a comment but I guess I'll put it up as an answer too.

It turns out the concept of "array-like" is less of an abstract base class or protocol and more about HOW the array-likeness of various objects will be treated. That is, array-like is a statement about what will be done with the object when it is supplied as an array-like argument. But just about any object can be supplied as an array-like argument.

For many objects, array-likeness is treated much the same as being an iterable. This is true of sequence objects:

>>> x=[1,2,3]
>>> a = np.array(x)
>>> a[0]
1

However, numpy does not treat iterable objects in general (when supplied as an array-like argument) in the same way. Instead, it treats array-like objects as either nested or atomic, depending on the type.

Here are a few examples of objects that are iterable (an idea that sounds closely related to array-like, but is quite different), but that numpy treats as atomic.

  • strings (str objects)
  • dictionaries/mappings
  • sets
  • iterators
  • buffers/file handlers

The array() factory treats all of these as atomic (ie, not nested) values. In short: array-like is in no way a synonym for typing.Iterable.

According to https://numpy.org/doc/stable/reference/arrays.interface.html

"The array interface (sometimes called array protocol) was created in 2005 as a means for array-like Python objects to re-use each other’s data buffers intelligently whenever possible."

I believe object implements arrays interfaces is array-like. The interface defines properties such as shape, type, and etc., so that others can interpret and share the array data properly.