有没有办法在 IPython 笔记本中同时运行多个单元?

我笔记本中的一个单元执行很长时间,而机器中的另一个 CPU 处于空闲状态。是否可以并行运行其他单元?

43147 次浏览

Yes. Here is the documentation for ipyparallel (formerly IPython parallel) that will show you how to spawn multiple IPython kernel. After you are free to distribute the work across cores, and you can prefix cells with %%px0 %%px1... %%px999 (once set up) to execute a cell on a specific engine, which in practice correspond to parallel execution of cell. I would suggest having a look at Dask as well.

This does not answer your question directly but I think it would help a lot of people that are having the same problem. You can move variables between notebooks easily and then continue running the functions on another notebook then move the result back to the main notebook.

For example:

Notebook 1:

%store X
%store y

Notebook 2:

%store -r X
%store -r y


new_df = ...
%store new_df

Notebook 1:

%store -r new_df

I want to introduce a library that has this feature, this does not require multiple notebooks tricks etc...

Parsl is the Productive parallel programming in Python

Configuration

import parsl
from parsl.app.app import python_app, bash_app
parsl.load()

As an example, I edited this snippet from parsl/parsl-tutorial.

# App that generates a random number after a delay
@python_app
def generate(limit,delay):
from random import randint
import time
time.sleep(delay)
return randint(1,limit)


# Generate 5 random numbers between 1 and 10
import time
st = time.time()
rand_nums = []
for i in range(5):
rand_nums.append(generate(10, 1))


# Wait for all apps to finish and collect the results
outputs = [i.result() for i in rand_nums]
et = time.time()
print(f"Execution time: {et - st:.2f}")


# Print results
print(outputs)

Result:

Execution time: 3.00
[1, 6, 4, 8, 3]

Note that the time it takes for the code to execute is 3s not 5s.

So what you can do is call the function (in this example is generate(...)) in a cell. This generate(...) will return a object. Then if you call the .result() on the object it will either:

  1. Halt the program if it's waiting for the result.
  2. Return the result if it's completed.

Therefore, as long as you call the .result() at the last few cells, the subroutine will be running in the background. And you can be sure at the last few cells the result can be obtained.

Regarding data dependencies, parsl is very smart, it will wait for the data that is dependent, even if it's decorated with the @python_app.

I got very hopeful with Matt answer of the ipp module, but the truth is that the ipp does not run two cells in pararell. Ipp lets you work in two or more engines but not simultaneously.

Take this example, you run the first code and 1 second later you run the second code, each code in different cells:

%%px --targets 0
import time
for i in range(0,6):
time.sleep(1)
print(time.ctime())

Gives:

Thu Jun 16 10:30:53 2022
Thu Jun 16 10:30:54 2022
Thu Jun 16 10:30:55 2022
Thu Jun 16 10:30:56 2022
Thu Jun 16 10:30:57 2022

And

%%px --targets 1
import time
for i in range(0,6):
time.sleep(1)
print(time.ctime())

Gives:

Thu Jun 16 10:30:59 2022
Thu Jun 16 10:31:00 2022
Thu Jun 16 10:31:01 2022
Thu Jun 16 10:31:02 2022
Thu Jun 16 10:31:03 2022

So in conclusion, the cells are not running at the same time, they are just running in different engines. The second cell waits the 1st one to finish, and once it finishes the second cell starts.

Hope there is simple solution for this -.-

PD: Here is the image Code in jupyter notebook

When someone wanted to leave a long-running calculation running in the background while running other things in the notebook, we were able to hack a solution using Python's multiprocesing. That allowed leaving a long-running cell running while running another cell in the classic notebook interface as well as Jupyterlab, see here.