What do I not mean? Windows servers use the word "cluster," but really they mean automatic fail-over and load balancing. Microsoft does offer high-performance computing (HPC), but that's just a minimum requirement for Vista [joke]. In the content below, various LiveCD implementations of linux clustering are reviewed, with my own Windows distributed technique thrown in.
There are three ways to go about building a cluster, given many small PCs. This is not the same as a single computer, like a Cray. Clusters give you the ability to run many processes at once, as though instead of a "dual core" or "quad core" machine, you had and "N core" machine, where N scales as the number of nodes in the cluster. The three ways are (1) MPI (message passing interface), (2) openmosix, manually setting up each process (3) windows distributed scripting, manually setting up each process.
The MPI route involves re-writing the code. For example, Fortran77 --> mpif77; Fortran90 --> mpif90; C --> mpich. This code-rewrite can sometimes take more time than it would to just write a script that "manually" sets up each node.
Not all programs are parallelizable. This clustering idea is not useful for processes where the current step in the program is based on the previous step. This type of program ("serial") is not easily distributed. The motivation for the cluster is when you have a program that uses as simple do-while loop, and each iteration is not dependent on the previous step. For example, if each loop used a different random number for its calcuations, this program is easily distributed across many processes [just remember to use a different seed on each process].
With that disclaimer out of the way, here are the methods of implementing a cluster:
http://www.firewall.cx/linux-openmosix-building-a-cluster.php
Project: Looking for two cluster setups for a single user, for free
parallel computing, like MPI F90
research
john-mpi
possibly easier: multiple CPUs acting as a multi-core system, then using shell scripts with regular fortan
research
djohn
Progress so far:
I haven't found any distros that work with MPIF90 out of the box. Dirk Eddelbuettel compiled a custom Quantian ISO for me that does work with mpif90.
Advantage to LiveCDs is that they allow box to revert to original installation or operate without hard drive.
Overall, using the multi-core approach (open-mosix) seems easier. One caveat to ClusterKnoppix is that development stopped in 2004 (the ISO is still available). Quantian DVD includes openmosix and supports PXE boot. Also has Fortran, MPI, octave
parallel MPI, but gfortran is not recognized for mpif90
Parallel Knoppix (LiveCD)
PS3 (Fedora)
Pelican HPC (LiveCD)
Quantian (LiveDVD)
BCCD (LiveCD)
many CPU as multi-core
clusterknoppix (LiveCD)
Quantian (LiveDVD)
requires admin privledges
requires lots of administrative overhead
but if you have 600 windows nodes operating in labs (like on a university campus) this is the easiest
Testing on virtualbox:
If the network is set to "NAT", then each box behind the NAT will receive the same IP via internal DHCP, and the boxes won't be able to communicate. Instead, select "internal network." but then there is no automatic D
to get cluster knoppix to work, I used the cheatcode "knoppix noscsi" otherwise the bootup hanged at scsi probe
No MPI
No gfortran
using apt-get gfortran doesn't seem to work either (kernel is 2.4)
No octave
Where to get the ISO: http://clusterknoppix.sw.be/
http://www.knoppix.net/wiki/Cheat_Codes
Parallel Knoppix
ParallelKnoppix 2.9 (Knoppix) [DHCP server, PXE boot]:
http://idea.uab.es/mcreel/ParallelKnoppix/
uses 1 CD (PXE)
MPI possible
octave MPI
gfortran works
does not appear as a multi-core system
haven't been able to get MPI to work
no longer supported. Upgrade to Pelican HPC
Run the "Start PK" to create a list of nodes. Then initiate the LAM environment:
$ lamboot ~/tmp/bhosts
for some reason, mpif90 doesn't work initially. It needs to be tied to gfortran. In /usr/lib/mpich/bin edit the mpif90 shell script and change f90linkerbase="" to
f90linkerbase="gfortran"
now mpif90 command should work. Note that
./usr/lib/mpich/bin/mpif90
doesn't work. Specifiy
sh /usr/lib/mpich/bin/mpif90
~/Examples$ ./usr/lib/mpich/bin/mpif77 -o pi pi.f
available commands
mpirun -np 1 ./a.out : "cannot execute binary file
mpiexec -np 1 ./a.out : "cannot execute binary file", "mpirun failed with exit status 252"
gfortran works
mpif90: "command not found"
ifort: "command not found"
lamexec -np 1 ./a.out : "cannot execute binary file
Note: mounting external drives is most intuitive in PK
idea
dpkg -i libopenmpi-dev.deb
to get mpif90 working, then boot the nodes
Pelican HPC 1.6
PelicanHPC (Debian) [DHCP server, PXE boot]:
http://pareto.uab.es/mcreel/PelicanHPC/
interface = xfce
octave MPI installed
ifconfig not found (?)
mpirun works
mpiexec works
gfortran works
mpif90: "command not found"
To initiate cluster setup,
$ pelican_setup
BCCD
Bootable Cluster CD
External DHCP
"trivial-net-setup"
broadcasts SSH keys ("heartbeat")
requires entry of password during setup loading
manual startx
interface = fluxbox
root = letmein
no "sudo" available
gcc available
no gfortran
has MPI included
has mpif90, but no compiler is specified
BCCD 2.2.1c7 [DHCP server, ssh "heartbeats"]:
mpiexec "command not found"
mpirun -np 1 ./a.out : "cannot execute binary file"
mpif90: "no fortran 90 compiler specified when mpif90 was created"
gfortran: "command not found"
(Note: mpichversion returns a segmentation fault)
gunzip gfortran-i686-linux.tar.gzip
tar -xvvf gfortran-i686-linux.tar
cp /mpich/bin/mpif90 /home/bccd
vi /home/bccd/mpif90
add "gfortran" to line 30:
F90base=""
[becomes]
F90base="gfortran"
(to see if your gfortran works,) ./gfortran --version
(additional high precision math libraries are available (not tested))
when compiling a plain f90 program I get "libgfortran.so.3: cannot open shared object file: no such file or directory" solution:
echo $LD_LIBRARY_PATH
declare -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH":"/home/bccd/gfortran/irun/lib
(now your gfortran should compile (non-mpi code))
(compile the MPI F90 code using the edited ~/mpif90)
(to run the compiled output, use mpirun)
on the nic cluster
/opt/mpich/intel/bin/mpif90 code.f90
works (results in a compiled file) whereas
/opt/mpich/gnu/bin/mpif90 code.f90
results in "no fortran90 compiler specified"
add a directory to the path: declare -x PATH=$PATH":"/home/bccd/gfortran/irun/bin
http://dirk.eddelbuettel.com/quantian.html
interface = Knoppix
gfortran
mpif90 installed, but not functional
octave 2.1.7.2
openmosix [needs the 2.4 kernel]
openmosix terminal server
Distributed computing infrastructure, using text scripts.
Optional: AutoIt-based screensaver for pausable executables on Windows nodes.
Benefit: no "centralized host" is needed: the code can be initiated from a "master" node, and the results retrieved by some other box. No dedicated master is needed.
Compared to the above tools, this approach necessitates a lot of administrative overhead, but is most accessible since I have access to hundreds of windows computers. No rebooting of the nodes is needed, nor does additional software need to be installed. Also, no "central server" needs to be maintained. The process can be broken into three steps:
A chunked executable is pushed to each node
The chunk is remotely executed
After the node is done, the finished data is retrieved, then catted to one file.
The requirements for this system are
Need a local administrator account on each computer (node)
Need to have windows file sharing turned on for each computer (node)
be able to run the job from CLI (such as fortran or c exe) [no GUI]
PS3
clustering 8 core PS3 with Fedora and MPI: