如何使用降雪和多个 Windows 节点在 R 中为并行处理设置 worker?

我已经成功地使用 Snow fall 在一台有16个处理器的服务器上设置了一个集群。

require(snowfall)
if (sfIsRunning() == TRUE) sfStop()


number.of.cpus <- 15
sfInit(parallel = TRUE, cpus = number.of.cpus)
stopifnot( sfCpus() == number.of.cpus )
stopifnot( sfParallel() == TRUE )


# Print the hostname for each cluster member
sayhello <- function()
{
info <- Sys.info()[c("nodename", "machine")]
paste("Hello from", info[1], "with CPU type", info[2])
}
names <- sfClusterCall(sayhello)
print(unlist(names))

现在,我正在寻找完整的指示 如何转向分布式模型。我有4个不同的 Windows 机器,总共有16个核心,我想用一个16节点集群。到目前为止,我理解我可以手动设置一个 SOCK 连接或利用 MPI。虽然看起来有可能,但我还没有找到明确和完整的方向。

SOCK 路由 似乎依赖于 Snow-lib 脚本中的代码。我可以用下面的代码从主端生成一个存根:

winOptions <-
list(host="172.01.01.03",
rscript="C:/Program Files/R/R-2.7.1/bin/Rscript.exe",
snowlib="C:/Rlibs")


cl <- makeCluster(c(rep(list(winOptions), 2)), type = "SOCK", manual = T)

结果如下:

Manually start worker on 172.01.01.03 with
"C:/Program Files/R/R-2.7.1/bin/Rscript.exe"
C:/Rlibs/snow/RSOCKnode.R
MASTER=Worker02 PORT=11204 OUT=/dev/null SNOWLIB=C:/Rlibs

我在 GitHub 的 Snow 软件包下面找到了 RSOCKnode.R的代码:

local({
master <- "localhost"
port <- ""
snowlib <- Sys.getenv("R_SNOW_LIB")
outfile <- Sys.getenv("R_SNOW_OUTFILE") ##**** defaults to ""; document


args <- commandArgs()
pos <- match("--args", args)
args <- args[-(1 : pos)]
for (a in args) {
pos <- regexpr("=", a)
name <- substr(a, 1, pos - 1)
value <- substr(a,pos + 1, nchar(a))
switch(name,
MASTER = master <- value,
PORT = port <- value,
SNOWLIB = snowlib <- value,
OUT = outfile <- value)
}


if (! (snowlib %in% .libPaths()))
.libPaths(c(snowlib, .libPaths()))
library(methods) ## because Rscript as of R 2.7.0 doesn't load methods
library(snow)


if (port == "") port <- getClusterOption("port")


sinkWorkerOutput(outfile)
cat("starting worker for", paste(master, port, sep = ":"), "\n")
slaveLoop(makeSOCKmaster(master, port))
})

目前还不清楚如何在工作站上启动一个 SOCK 侦听器,除非它被埋在 snow::recvData中。

查看 MPI 路线,就我所知,MicrosoftMPI 版本7是一个起点。但是,我找不到 sfCluster 的 Windows 替代品。我能够启动 MPI 服务,但是它似乎没有在端口22上侦听,而且无论用 snowfall::makeCluster对它进行多少攻击都没有产生结果。我已经禁用了防火墙,并尝试使用 makCluster 进行测试,然后使用 PuTTY 从主服务器直接连接到 worker。


有没有一个全面的,一步一步的指南来建立一个 Windows 工作人员的降雪集群,我已经错过了吗?我喜欢 snowfall::sfClusterApplyLB,并希望继续使用,但如果有一个更容易的解决方案,我愿意改变过程。通过研究 Rmpi 和并行,我找到了工作主机方面的替代解决方案,但是对于如何安装运行 Windows 的工作人员,仍然没有什么具体的细节。

由于工作环境的性质,既不能迁移到 AWS,也不能迁移到 Linux。

对于 Windows 工作节点没有明确答案的相关问题:

2586 次浏览

There were several options for HPC infrastructure considered: MPICH, Open MPI, and MS MPI. Initially tried to use MPICH2 but gave up as the latest stable release 1.4.1 for Windows dated back by 2013 and no support since those times. Open MPI is not supported by Windows. Then only the MS MPI option is left.

Unfortunately snowfall does not support MS MPI so I decided to go with pbdMPI package, which supports MS MPI by default. pbdMPI implements the SPMD paradigm in contrast withRmpi, which uses manager/worker parallelism.

MS MPI installation, configuration, and execution

  1. Install MS MPI v.10.1.2 on all machines in the to-be Windows HPC cluster.
  2. Create a directory accessible to all nodes, where R-scripts / resources will reside, for example, \HeadMachine\SharedDir.
  3. Check if MS MPI Launch Service (MsMpiLaunchSvc) running on all nodes.
  4. Check, that MS MPI has the rights to run R application on all the nodes on behalf of the same user, i.e. SharedUser. The user name and the password must be the same for all machines.
  5. Check, that R should be launched on behalf of the SharedUser user.
  6. Finally, execute mpiexec with the following options mentioned in Steps 7-10:

mpiexec.exe -n %1 -machinefile "C:\MachineFileDir\hosts.txt" -pwd SharedUserPassword –wdir "\HeadMachine\SharedDir" Rscript hello.R

where

  • -wdir is a network path to the directory with shared resources.
  • –pwd is a password by SharedUser user, for example, SharedUserPassword.
  • –machinefile is a path to hosts.txt text file, for example С:\MachineFileDir\hosts.txt. hosts.txt file must be readable from the head node at the specified path and it contains a list of IP addresses of the nodes on which the R script is to be run.
  1. As a result of Step 7 MPI will log in as SharedUser with the password SharedUserPassword and execute copies of the R processes on each computer listed in the hosts.txt file.

Details

hello.R:

library(pbdMPI, quiet = TRUE)
init()
cat("Hello World from
process",comm.rank(),"of",comm.size(),"!\n")
finalize()

hosts.txt

The hosts.txt - MPI Machines File - is a text file, the lines of which contain the network names of the computers on which R scripts will be launched. In each line, after the computer name is separated by a space (for MS MPI), the number of MPI processes to be launched. Usually, it equals the number of processors in each node.

Sample of hosts.txt with three nodes having 2 processors each:

192.168.0.1 2
192.168.0.2 2
192.168.0.3 2