Windows distributed cluster
Distributed computing infrastructure, using text scripts.
Optional: AutoIt-based screensaver for pausable executables on Windows nodes.
Benefit: no "centralized host" is needed: the code can be initiated from a "master" node, and the results retrieved by some other box. No dedicated master is needed.
Compared to the above tools, this approach necessitates a lot of administrative overhead, but is most accessible since I have access to hundreds of windows computers. No rebooting of the nodes is needed, nor does additional software need to be installed. Also, no "central server" needs to be maintained. The process can be broken into three steps:
A chunked executable is pushed to each node
The chunk is remotely executed
After the node is done, the finished data is retrieved, then catted to one file.
The requirements for this system are
Need a local administrator account on each computer (node)
Need to have windows file sharing turned on for each computer (node)
be able to run the job from CLI (such as fortran or c exe) [no GUI]
A more detailed description of the three step process is
1.) pushing the chunked executable to nodes:
a computer is set up as the initialization master, which has the scripts (listed below), a list of the nodes to be used for this job, and the input files and executable job
The input files should be set up with the number of nodes in mind for the job: if your loop is do i=1:1000 and you are using ten nodes, then each input file should specify that the loop upper limit is 100
each node should have windows file sharing turned on
the local administrator account information should be known
DO: if necessary, set the unique random seed files for each node
DO: run the script that copies the scripts, executables, and input files to the remote machines
2.) remotely executing the chunks:
DO: remotely execute the scripts, which in turn execute the job
3.) data retrieval, catting
a computer is set up as the retrieval master [doesn't need to be the same as the initialization master], which has the scripts (listed below), a list of the nodes that were used for this job, and a windows-accessible share
DO: run the script that zips, then retrieves the data from the nodes
DO: run the script that unzips the copied data and cat the pieces together
to put the pieces back together on windows, using cygwin's "cat" http://www.cygwin.com/
or install cat for windows: http://gnuwin32.sourceforge.net/packages/coreutils.htm
On Linux, "cat" is built in
If the nodes where purely linux-based, then the nodes could be controlled from a Windows master ((using putty and Winscp) or cygwin) or linux master (using ssh and scp). The equivalent shell script would look like
$ scp compiled_executable user@remotehostnode:/home/user/distributed_files
$ scp seed.input user@remotehostnode:/home/user/
$ scp parameters.input user@node/home/user
$ ssh user@remotehostnode nohup nice /home/user/distributed_files/compiled_executable
or possibly
$ ssh user@node 'nice nohup /home/user/executable >> output.txt'
<wait for program to run>
## to display a systems vital stats and highest CPU utilizing process:
$ ssh user@node 'top -b -n 1 | head -n 8'
$ scp user@remotehostnode:/home/user/distributed_files/results .
$ scp user@node:/home/user/executable .
$ scp user@node:/home/user/seed.input .
$ scp user@node/home/user/parameters.input .
$ scp user@node/home/user/output.dat .
$ cat output.dat > outputAll.dat
http://bashcurescancer.com/run_remote_commands_with_ssh.html
For a Windows master controlling windows nodes, the required capabilities (execute remote code from CLI) are not built in to the OS. For that reason, PSExec is used.
psexec is part of pstools, from
http://technet.microsoft.com/en-us/sysinternals/bb897553.aspx
batch files copy the file
FOR /F "eol=; tokens=1* delims=, " %%i in (nodes.txt) do copy "myprogram.exe" \\%%i\c$\dump\
run the remote file
FOR /F "eol=; tokens=1* delims=, " %%i in (nodes.txt) do psexec \\%%i c:\dump\runner.cmd
The following batch file copies an executable to a remote workstation (whose name
and logon credentials are in "nodes.txt") and starts the process remotely using psexec
A second batch file script copies the results back when the remote process is completed.
If you want to be polite to any users that may be on the remote machines, it would be nice
to be able to pause the execution while the node is in use. The following AutoIt script waits
for the screensaver to initiate the executable. If the executable can be paused (by changing
the status of an input file), then the AutoIt script pauses the executable when the screensaver
is interupted (by a node user).
might be possible to use batch files, if they could handle math (specifically, inequalites)
tasklist | findstr screensaver.scr
if ERRORLEVEL 1 echo good news, screen saver is running
The last configuration I'll cover is a linux box controlling windows nodes. Unfortunately, the
ability to control windows nodes through command line is not built in, so a program like
winexe is needed. The node shares can be accessed via samba. Regrettably, the ability to copy from linux to windows is not available (mounting would be unnecessary if it did):
$ cp file smb://domain\;user@<IP>/c$/
so a local directory must be mounted using samba
http://www.linux.com/feature/118225
So the set of commands to control Windows nodes from linux looks like
$ mkdir remotedir
$ sudo mount -t cifs //<IP>/c$ remotedir -o rw,username=domain\/user,password=<password>
$ cp -r for_remote/* remotedir
$ sudo umount remotedir
$ winexe -U domain/user%password //<IP> 'cmd /C c:\executable'
and then when the program finishes,
$ mkdir localresults
$ mkdir rresultsdir
$ sudo mount -t cifs //<IP>/c$ rresultsdir -o rw,username=domain\/user,password=<password>
$ cp -r rresultsdir localresultsdir
$ sudo umount localdir
where your windows executable lives in for_remote, and "remotedir" is the remote windows share mounted locally.
Your results will end up in localresultsdir