如何修复“ TypeError: 一个整数是必需的(得到类型字节)”错误时,尝试运行 pypark 安装火花2.4.4

我已经安装了 OpenJDK13.0.1和 python3.8,还安装了 park 2.4.4。测试安装的说明是运行。从火花机的根部放出火花。我不确定我是否漏掉了火花安装的一个步骤,比如设置一些环境变量,但我找不到任何进一步的详细说明。

我可以在我的机器上运行 python 解释器,所以我确信它安装正确,并且运行“ java-version”会给我预期的响应,所以我不认为这两个问题中的任何一个有问题。

我从 Cloudpickly.py 得到一堆错误:

Traceback (most recent call last):
File "C:\software\spark-2.4.4-bin-hadoop2.7\bin\..\python\pyspark\shell.py", line 31, in <module>
from pyspark import SparkConf
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "C:\software\spark-2.4.4-bin-hadoop2.7\python\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
100119 次浏览

This is happening because you're using python 3.8. The latest pip release of pyspark (pyspark 2.4.4 at time of writing) doesn't support python 3.8. Downgrade to python 3.7 for now, and you should be fine.

Try to install the latest version of pyinstaller that can be compatible with python 3.8 using this command:

pip install https://github.com/pyinstaller/pyinstaller/archive/develop.tar.gz

reference:
https://github.com/pyinstaller/pyinstaller/issues/4265

As a dirty workaround one can replace the _cell_set_template_code with the Python3-only implementation suggested by docstring of _make_cell_set_template_code function:

Notes
-----
In Python 3, we could use an easier function:


.. code-block:: python


def f():
cell = None


def _stub(value):
nonlocal cell
cell = value


return _stub


_cell_set_template_code = f()

Here is a patch for spark v2.4.5: https://gist.github.com/ei-grad/d311d0f34b60ebef96841a3a39103622

Apply it by:

git apply <(curl https://gist.githubusercontent.com/ei-grad/d311d0f34b60ebef96841a3a39103622/raw)

This fixes the problem with ./bin/pyspark, but ./bin/spark-submit uses bundled pyspark.zip with its own copy of cloudpickle.py. And if it would be fixed there, then it still wouldn't work, failing with the same error while unpickling some object in pyspark/serializers.py.

But it looks like Python 3.8 support is already arrived to spark v3.0.0-preview2, so one can try it. Or, stick to Python 3.7, like the accepted answer suggests.

Its python and pyspark version mismatch like John rightly pointed out. For a newer python version you can try,

pip install --upgrade pyspark

That will update the package, if one is available. If this doesn't help then you might have to downgrade to a compatible version of python.


pyspark package doc clearly states:

NOTE: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you may experience odd errors.

Make sure to use the right versions of Java, Python and Spark. I got the same error caused by an outdated Spark version (Spark 2.4.7).

By downloading the latest Spark 3.0.1, next to Python 3.8 (as part of Anaconda3 2020.07) and Java JDK 8 got the problem solved for me!

The problem with python 3.8 has been resolved in the most recent versions. I got this error because my scikit-learn version was very outdated

pip install scikit-learn --upgrade

solved the problem

As of today (21 Oct 2022), I can confirm that, on Anaconda,

python   3.8.13
pyspark  3.1.2

work together.

I had the same issue earlier: all I had to do was to run a

conda update --all

and wait.