- How to use the cluster resources.
- Getting an account.
- Subscribe to the mailing list.
- Logging in and setting up.
- Batch scheduler
Twickel is a computing environment that consists of a compute cluster and several compute servers.
Most resources are controlled by means of a torque/MAUI scheduling and execution system.
This means that most resources have to be requested.
Resources.¶The available computational resources will be:
- A group of 4 machines that use Virtual Box to run virtual machines. For example, the SHARE project is hosted on warmelo.
- A group of 10 compute nodes with dual Intel E5335 CPUs and 24GB RAM, which are connected with gigabit ethernet.
- A group of 10 compute nodes with dual Intel E5520 CPUs and 24GB RAM, which are connected with DDR Infiniband.
- Two compute servers for exclusive access:
- big3: dual Intel X5550, 144GB RAM.
- big4: dual Intel X5550, 72GB RAM.
- One compute server for experimentation with multi-core algorithms:
- big5: Reserved for experimentation with multi-core algorithms, contact Alfons Laarman
- Two head nodes:
- twickel: dual Xeon with 4GB RAM, can be used for compilation and job submission.
- weldam: dual X5365 with 64GB RAM, can be used for compilation and job submission as well as for doing development work.
- Total of 2TB of semi reliable shared storage. (An effort will be made to keep the data safe, but no backups are made.)
- A single 6TB distributed scratch file system. (If something goes wrong with this file system, it will be reformatted.)
- Every machine has a limited local scratch file system.
A picture and a complete listing can be found here.
How to use the cluster resources.¶
Getting an account.¶First, you need to get a working EWI account. Then
- Employees of DACS, SE or FMT can ask the ICTS help desk. (If you mention the names Enno Oosterhuis and Twickel you reduce the chances of the message being forwarded to the wrong person.)
- Others (including students) need to have an employee ask for them.
Subscribe to the mailing list.¶
A mailing list for cluster users has been created on the UTwente list server: TWICKEL-USERS.
Logging in and setting up.¶
Once you have an account then you can log in to twickel.ewi.utwente.nl.
and/or weldam.ewi.utwente.nl. (Weldam seems to be unavailable now).
Twickel is a real head node: it's meant for compilation and job submission but not for running applications.
Weldam is a mix between a compute server and a head node: you can use it both for running applications
and for submitting jobs.
All machines currently run openSUSE 11.2.
We are migrating the cluster from openSUSE 11.2 to Scientific Linux.
Twickel, and the compute nodes that it controls, already run Scientific Linux;
Weldam, and the compute nodes that it controls, are being migrated.
For more information see the Hardware Inventory.
Additional software has been installed in the
/software. To get access to that software you need to add the following lines to your
The FMT group also maintains a directory of software, which uses environment modules.
Currently there are two FMT software directories available: one that is current, i.e. it occasionally gets updated with new software, and a legacy one that is not further extended.
It is not advisable to try to use those two together.
There is a third FMT software directory that contains software that is installed by the Jenkins continous integration server; this can be used together with the current FMT directory of software.
Current FMT directory of Software¶
To get access to this software, you also need to add the following lines to your
export MOD_BASE=/software/fmtv2 . $MOD_BASE/bin/mod_setup.sh module load cadp
Software installed via the Jenkins continous integration server¶
This software is installed in
It is probably best to use this together with the 'current fmt software'.
To use this software, together with the current fmt software, such that software in
fmt-jenkins takes preference over software in
fmtv2,write the following in your
.bashrc (or similar file), or on the bash prompt, instead of what was written in the section above.
export MOD_PATH=/software/fmt-jenkins:/software/fmtv2 . /software/fmtv2/bin/mod_setup.sh module load cadp mcrl2
Now, when you use
module load to load a package like
ltsmin without specifying the version, you will get the latest version from the
module avail you can see all available packages.
So, to tell
mod_setup.sh that you want to access software from multiple trees, set environment variable
MOD_PATH to those trees, before
mod_setup.sh is sourced (read).
(right now we only have two such trees:
Moreover, it is not necessary to set the
MOD_BASE variable if you set
(as long as you only want to use the software installed in multiple trees; when you want to install software in
/software/fmt-jenkins you have to set
MOD_BASE to indicate where the software must be installed)
Legacy FMT directory of Software¶
There still is an older version of the directory of FMT software.
To access that, add the following lines to your
export MOD_HOME=/software/fmt export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH:$MOD_HOME/pkg/tcltk-8.5.6/lib if [ -f /etc/profile.d/modules.sh ] ; then . /etc/profile.d/modules.sh else . $MOD_HOME/Modules/3.2.6/init/bash fi module use $MOD_HOME/modules module load cadp
To get access to the compute nodes and the other compute servers, you need to
use the torque/MAUI batch scheduler.
A job can be submitted to the torque/MAUI batch scheduler by means of the tool
We continue with a very brief discussion of qsub, mostly focussing on the resource syntax for twickel.
There are several important options. The
-W allows passing information to MAUI.
For example to tell MAUI to get exclusive access to a node:
And most important of all the
-loption which specifies the resources needed.
We'll provide a few examples.
- To get one processor on an E5335 use
- To get 4 processors on a single E5335 use
- To get 8 processors each on 2 E5520 machines use
- To run a batch job on big3 use
-l nodes=big3(doesn't work yet: use ssh)
Normally qsub will take a script as argument and run the script.
When you need interactive access to e.g. big4 type
qsub -I -l nodes=big4 -W x=NACCESSPOLICY:SINGLEJOB
And after waiting for currently running jobs to complete, the entire machine will be yours.
Instead of passing option on the command line to
qsub, you can also put them in the script.
#PBS -N the-name-of-the-job #PBS -l nodes=1:E5335 #PBS -W x=NACCESSPOLICY:SINGLEJOB hostname date sleep 60 date
hostname date sleep 60 date
Then the commands
qsub file1.pbs qsub -N the-name-of-the-job -l nodes=1:E5335 -W x=NACCESSPOLICY:SINGLEJOB file2.pbs
behave exactly the same. Note that command line options override the option in the script.
Status of your job¶
You can get information about the status of your job with the command
Other commands that give information about the status of the qeueus and cluster nodes:
Removing your job¶
If you want to remove your job, use
where jobId is the job id returned when you submitted the job with
qsub; this is shown in output of
qstat (in the leftmost column).
(Overview taken from http://clusterinfo.physik.hu-berlin.de/)
pbsnodes -a # show status of all nodes
pbsnodes -a nodeNN # show status of specified node
pbsnodes -l # list inactive nodes
pbsnodelist # list status of all nodes (one per line)
qstat -Q # show all queues
qstat -Q queue # show status of specified queue
qstat -f -Q queue # show full info for specified queue
qstat -q # show all queues (alternative format)
qstat -q queue # show status of specified queue (alt.)
Job submission and monitoring¶
qsub jobscript # submit to default queue
qsub -q queue jobscript # submit to specified queue
qsub -l nodes=4:ppn=2 jobscript # request 4x2 processors
qsub -l nodes=nodeNN jobscrip # run on specified node
qsub -l cput=HH:MM:SS jobscript # limit on CPU time (serial job)
qsub -l walltime=HH:MM:SS jobscript # limit on wallclock time (parallel job)
qdel job_no # delete job (with job_no from qstat)
qstat -a # show all jobs
qstat -a queue # show all jobs in specified queue
qstat -f job_no # show full info for specified job
qstat -n # show all jobs and the nodes they occupy
Executing a job involving multiple compute nodes communicating through MPI
requires allocating the appropriate number of resources through
mpirun command with appropriate arguments, including the
program to be run.
Suppose, e.g., we would like to execute the
lps2lts-mpi tool with a file
model.lps on ten compute nodes using two processors per node. This
can be achieved by submitting a script with the following contents to the
#PBS -N the-name-of-the-job #PBS -l nodes=10:ppn=2:E5335 #PBS =W x=NACCESSPOLICY:SINGLEJOB mpirun -mca btl tcp,self lps2lts-mpi model.lps
More information on the
mpirun command and the arguments that can be passed
to it can be found in the in the
mpirun manual page.