Difference between revisions of "Cluster"

From ITSwiki
Jump to: navigation, search
[unchecked revision][unchecked revision]
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<big>'''<FONT COLOR=red>This page is valid from September 18th, 2014, when the upgrade to Ubuntu 14.04 is completed.</FONT>'''</big>
+
<div style="background-color: #FFFF00; border-style: dotted;"> This guide is for users at '''DTU Compute''' only</div>
 +
 
 +
 
 +
This page describes the cluster facilities at '''DTU Compute'''.
 +
 
 +
Most CPU resources are available at http://www.hpc.dtu.dk (via the "Compute" queue). The resources here are kept for backward compatibility and to allow for more interactive usage.
  
This page describes the cluster facilities at DTU Compute.
 
  
 
The cluster is made of:
 
The cluster is made of:
 +
* 6 servers (grid20, ..., grid27) running Debian 12 64-bit Linux with 2 E5-2660 v3 10 Core CPU  2.60GHz, 128GB RAM
  
* 16 servers (grid01, grid02, ...) running 64-bit Linux with 2 X5650 6-Core Processor2.66GHz, 48GB RAM
 
* 2 servers (hms1 &amp;hms2) running 64-bit Linux with 8 AMD Quad-Core AMD Opteron(tm) Processor 8356 2.3GHz, 256 GBRAM
 
  
In addition there are 4 servers (cimbi1-4) running 64 bit Linux with 4 Dual core AMD Opteron (tm) Processor 880 2.4GHz, 32GB RAM.
 
Those are running an older version of the grid software (ubuntu 10.04).
 
  
 
The grid is reboot'ed once pr. month, generally first Wednesday after the 15th. Is announced on the ''messaage of today (motd)''  on the grid terminals.
 
The grid is reboot'ed once pr. month, generally first Wednesday after the 15th. Is announced on the ''messaage of today (motd)''  on the grid terminals.
Line 16: Line 17:
 
=Setup=
 
=Setup=
  
Access to most of the servers is controlled via sun gridengine. However, in order to submit jobs one has to logon to hms1 or one of grid01-04 (using the normal user account/password which is used for the linuxterm servers [https://itswiki.compute.dtu.dk/index.php/Linux_Terminal_Server] as well.) The servers hms1 and grid01-04 can be used for developing purposes, i.e. compile, test etc. and to submit jobs to of the other servers using the qsub command. grid01-04 are available from the linuxterm servers via the '''gridterm '''command and through the menu system. hms1 is also available for running interactive jobs.
+
There is no longer a job scheduler running.
  
There are 5 queues defined on the grid.
 
 
# '''fast'''<br /> This queue is for very short (test) jobs which requires max 10 min. of WALL time. Jobs takin more time will be killed.<br /> In order to use it jobs have to be submitted with '''qsub -q fast -P fast job.sh'''. Submitting jobs without arguments (i.e. '''qsub job.sh''') will not run in this queue.<br /> The queue has access to almost all slots on all machines.
 
# '''long<br />'''This queue is for long lasting jobs (more than 12 hours) - there is no enforced upper limit.<br /> The queue can only utilize 4 slots on cimbi2-4, grid05-16 and 16 on hms2, i.e. jobs submitted to this queue can not saturate these machines
 
# '''himem'''<br /> For jobs needing up to 12 hours of WALL time and will be executed on either grid05-16 or hms1-2.
 
# '''himem-long'''<br /> For jobs needing more than 12 hours and running on either grid05-16 or hms1-2
 
# '''himem2<br />'''For jobs up to 12 hours and will be executed on hms2.
 
 
On each node there is a '''/space''' (about 20GB) where everybody can write files to. However, files older than one week will be deleted automatically.
 
 
The grid has support for openMPI.
 
 
For a howto use the grid, submit matlab jobs in parallel and using openmpi see the [[Cluster | grid howto]].
 
  
 
=Software=
 
=Software=
  
The cluster currently runs Ubuntu 14.04 and with that OS comes a suite of standard utilities like gcc compiler suite, emacs, etc. Other software is installed (under /appl if not specficied). The interactive versions of the programs are generally available from the menu system on the [[Linux_Terminal_Server | linuxterm servers]] (either using a linuxterm hardware client or [[Linux_Terminal_Server | Thinlinc]]) or from the command line (text in [] denotes the name of the command).
+
The cluster currently runs Debian 12 and with that OS comes a suite of standard utilities like gcc compiler suite, emacs, etc. Other software is installed (under /opt/appl if not specficied). The interactive versions of the programs are generally available from the menu system on the [[Linux_Terminal_Server | linuxterm servers]] (either using a linuxterm hardware client or [[Linux_Terminal_Server | Thinlinc]]) or from the command line (text in [] denotes the name of the command).
  
'''''<u>matlab:</u>''''' [matlab] version 2014a (aka version 8.3). 2013a [matlab81], 2011b [matlab713]. Make sure to see the examples in the [[Cluster | howto ]]for running matlab on the grid<br />
+
'''''<u>matlab:</u>''''' [matlab] version 2018b (aka version 9.5). 2016b [matlab91], 2017b [matlab93].<br />
'''''<u>mathematica:</u>''''' [mathematica] version 9.0 and version 8.0 [mathematica8]<br />
+
'''''<u>mathematica:</u>''''' [mathematica] version 11.3 and version 10.4 [mathematica10]<br />
'''''<u>maple:</u>''''' [maple/xmaple] version 18<br />
+
'''''<u>maple:</u>''''' [maple/xmaple] version 2018<br />
 
'''''<u>sas:</u>''''' version 9.4 [sas]<br />
 
'''''<u>sas:</u>''''' version 9.4 [sas]<br />
'''''<u>jmp:</u>''''' version 8 [jmp]<br />
+
'''''<u>R:</u>''''' [R] newest version <br />
'''''<u>R:</u>''''' [R] newest version (by June 11th, 2014: 3.1.1)<br />
+
 
'''''<u>Rstudio:</u>''''' [rstudio]<br />
 
'''''<u>Rstudio:</u>''''' [rstudio]<br />
'''''<u>splus:</u>''''' [Splus] version 8.2<br />
 
'''''<u>SUN studio 12u1:</u>''''' [sunstudio] installed under /opt/SS12u1/.... . <br />
 
'''''<u>TotalView debugger:</u>''''' [totalview]<br />
 
 
=How to=
 
 
This page contains howto's for SUN gridengine, Matlab and Gridengine and OpenMPI.
 
 
For a general descripton of the cluster facilities at DTU Compute, please follow this [[Cluster | link]].<br />
 
 
==SUN gridengine - a mini howto==
 
 
This is a 1 minute introduction to how to use the gridengine.
 
 
===To submit jobs===
 
 
All jobs must be submitted using a batch file (a binary file can not be submitted directly). Create a file like this <code>'''simplesum.sh'''</code>:
 
 
#!/bin/bash
 
# -- your mail name ---(like abc@dtu.dk)
 
#$ -N simplesum
 
# -- request /bin/bash --
 
#$ -S /bin/bash
 
# -- send an email once the job has ended or been aborted.
 
#$ -m ea
 
# -- run in the current working (submission) directory --
 
#$ -cwd
 
matlab -nodesktop &lt; simplesum.m
 
 
The content of <code>simplesum.m</code> is
 
 
x=sum(sum(inv(rand(3300))))
 
 
Next submit it using:
 
 
qsub simplesum.sh
 
 
If it is a memory demanding job submit it using
 
 
qsub -q himem simplesum.sh
 
 
 
 
More (much more!) information in the man page of qsub.
 
 
===Job status===
 
 
After you have submitted your job, you can check the status with the qstat command:
 
 
qstat
 
 
Without extra arguments, '''qstat''' will print all of your jobs only. Other useful options:
 
 
qstat -u '*'
 
 
Show jobs belonging to all users<br />
 
 
qstat -u user_name1 [user_name2 ....]
 
 
Show only jobs belonging to user(s) user_name1 (user_name2 ...)
 
 
qstat -j job_id1 [job_id2 ...]
 
 
Show only jobs with the requested job_ids (long listing!). This lists (almost) everything GE knows about the jobs, and this output from this command can be useful to check the reasons why your job will not start.
 
 
For more options see the qstat manual pages:
 
 
man qstat
 
 
===Stop/Delete a job===
 
 
Usually your jobs will run, finish and disappear from the gridengine system, but sometimes you might want to stop a job or remove a job from the queue that does not start due to wrong submission options:
 
 
qdel job_id
 
 
For more options see the manual pages for qdel:
 
 
man qdel
 
 
==Matlab and gridengine==
 
 
At the time of writing (18/3-2008) ''matlab distributed engine'' is only supported on matlab 2006b and 2007a. Matlab jobs - which can be processed in parallel (like a parameter sweep) can benefit from ''matlab distributed engine''. Each subtask should consume several CPU minutes otherwise the benefit of creating subtask within matlab is negligible. The method illustrated below for matlab 2007b can be beneficial for
 
smaller jobs than the method for matlab 2006b (i.e. owehead per job is les). A small mini-example is provided (courtesy Martin Vester Christensen). Read through the m files and it should give a starting point.
 
 
 
===Matlab 2013a===
 
 
# Create a folder '''~/matlab81'''
 
# Download [[media:103-colsum.m | colsum.m]] to the directory '''~/matlab81'''
 
# Edit the files according to need
 
# Either run [[media:paralleltest81.m | paralleltest81.m]] interactively or using the '''qsub''' command as described above.
 
 
===Matlab 2011b===
 
 
# Create a folder '''~/matlab713'''
 
# Download [[media:103-colsum.m | colsum.m]] to the directory '''~/matlab713'''
 
# Edit the files according to need
 
# Either run '''paralleltest713.m''' interactively or using the '''qsub''' command as described above.
 
 
==OpenMPI and gridengine==
 
 
''The OpenMPI installationen hasn't been tested widely.''<br />
 
 
2 versions are installed:
 
 
===OpenMPI from SUNhpc tools===
 
 
The Open MPI installed comes from SUN and is based on version 1.3.4. There are 2 versions installed; one for compilations using gcc compiler suite and one for using SUN Studio compiler suite. Installed in /appl/SUNWhpc/HPC8.2.1/gcc and /appl/SUNWhpc/HPC8.2.1/sun respectively. Neither of those are in the default PATH, i.e. the user must do that in his/her dot files.
 
 
It is beyond the scope of this page to explain the usage of OpenMPI. However, to use it follow the idea in the following example (which is using the "gcc" compiler suite:
 
 
* Download the following [[media:122-hello_world.c | hello_world.c]] example.
 
* Compile it using the gcc wrapper ''mpicc'':
 
export PATH=/appl/SUNWhpc/HPC8.2.1/gnu/bin:$PATH
 
mpicc hello_world.c -o hello_world
 
* create a script file ''hello_world.sh'' like this
 
 
#!/bin/sh
 
# -- our name ---
 
# -- request /bin/sh --
 
#$ -S /bin/sh
 
# -- run in the current working (submission) directory --
 
#$ -cwd
 
#$ -m bea
 
#$ -pe mpi 8
 
export PATH=/appl/SUNWhpc/HPC8.2.1/gnu/bin:$PATH
 
mpirun  -np $NSLOTS hello_world
 
 
The key line here is '''#$ -pe mpi 8''' which tells SUN grid Engine to use the parallel environment '''mpi''' and request '''8''' slots (change the latter according to your needs).
 
* Submit it using '''qsub hello_world'''
 
* To avoid the setting up '''PATH '''one should add that line to '''~/.profile.linux'''.
 
 
===OpenMPI from open-mpi.org===
 
 
* Download the followong file hello_world.c example
 
* Compile it using gcc wrapper mpicc
 
mpicc hello_world.c -o hello_world
 
* create a script file hello_world.sh like this
 
 
#!/bin/sh
 
# -- our name ---
 
# -- request /bin/sh --
 
#$ -S /bin/sh
 
# -- run in the current working (submission) directory --
 
#$ -cwd
 
#$ -m bea
 
#$ -pe mpi 8
 
mpirun  -np $NSLOTS hello_world
 
 
The key line here is '''#$ -pe mpi 8''' which tells SUN grid Engine to use the parallel environment '''mpi''' and request '''8''' slots (change the latter according to your needs).
 
  
* Submit it using '''qsub hello_world'''
 
* To avoid the setting up '''PATH '''one should add that line to '''~/.bashrc'''
 
  
 
[[Category:IT]]
 
[[Category:IT]]

Latest revision as of 13:30, 16 February 2024

This guide is for users at DTU Compute only


This page describes the cluster facilities at DTU Compute.

Most CPU resources are available at http://www.hpc.dtu.dk (via the "Compute" queue). The resources here are kept for backward compatibility and to allow for more interactive usage.


The cluster is made of:

  • 6 servers (grid20, ..., grid27) running Debian 12 64-bit Linux with 2 E5-2660 v3 10 Core CPU 2.60GHz, 128GB RAM


The grid is reboot'ed once pr. month, generally first Wednesday after the 15th. Is announced on the messaage of today (motd) on the grid terminals.


Setup

There is no longer a job scheduler running.


Software

The cluster currently runs Debian 12 and with that OS comes a suite of standard utilities like gcc compiler suite, emacs, etc. Other software is installed (under /opt/appl if not specficied). The interactive versions of the programs are generally available from the menu system on the linuxterm servers (either using a linuxterm hardware client or Thinlinc) or from the command line (text in [] denotes the name of the command).

matlab: [matlab] version 2018b (aka version 9.5). 2016b [matlab91], 2017b [matlab93].
mathematica: [mathematica] version 11.3 and version 10.4 [mathematica10]
maple: [maple/xmaple] version 2018
sas: version 9.4 [sas]
R: [R] newest version
Rstudio: [rstudio]