Cluster

From ITSwiki
Revision as of 15:17, 14 June 2012 by Hench (Talk | contribs)


Jump to: navigation, search

This page describes the cluster facilities at DTU Informatics.

The cluster is made of:

  • 16 servers (grid01, grid02, ...) running 64-bit Linux with 2 X5650 6-Core Processor2.66GHz, 48GB RAM
  • 2 servers (hms1 &hms2) running 64-bit Linux with 8 AMD Quad-Core AMD Opteron(tm) Processor 8356 2.3GHz, 256 GBRAM
  • 3 servers (cimbi2-4) running 64 bit Linux with 4 Dual core AMD Opteron (tm) Processor 880 2.4GHz, 32GB RAM [*]

Setup

Access to most of the servers is controlled via sun gridengine. However, in order to submit jobs one has to logon to hms1 or one of grid01-04 (using the normal user account/password which is used for the SunRay servers as well.) The servers hms1 and grid01-04 can be used for developing purposes, i.e. compile, test etc. and to submit jobs to of the other servers using the qsub command. grid01-04 are available from the SunRay servers via the gridterm command and through the menu system. hms1 is also available for running interactive jobs.

There are 5 queues defined on the grid.

  1. fast
    This queue is for very short (test) jobs which requires max 10 min. of WALL time. Jobs takin more time will be killed.
    In order to use it jobs have to be submitted with qsub -q fast P fast job.sh. Submitting jobs without arguments (i.e. qsub job.sh) will not run in this queue.
    The queue has access to almost all slots on all machines.
  2. long
    This queue is for long lasting jobs (more than 12 hours) - there is no enforced upper limit.
    The queue can only utilize 4 slots on cimbi2-4, grid05-16 and 16 on hms2, i.e. jobs submitted to this queue can not saturate these machines
  3. himem
    For jobs needing up to 12 hours of WALL time and will be executed on either grid05-16, cimbi2-4 or hms1-2.
  4. himem-long
    For jobs needing more than 12 hours and running on either grid05-16, cimbi2-4 or hms1-2
  5. himem2
    For jobs up to 12 hours and will be executed on hms2.

On each node there is a /space (about 20GB) where everybody can write files to. However, files older than one week will be deleted automatically.

The grid has support for openMPI.

For a howto use the grid, submit matlab jobs in parallel and using openmpi see the grid howto.

Software

The cluster currently runs Ubuntu 10.04 and with that OS comes a suite of standard utilities like gcc compiler suite, emacs, etc. Other software is installed (under /appl if not specficied). The interactive versions of the programs are generally available from the menu system on the SunRay servers (either using a SunRay client or Thinlinc) or from the command line (text in [] denotes the name of the command).

matlab: [matlab] version 2006b (aka version 7.3). 2008b [matlab77] 2011b [matlab713] as well. Make sure to see the examples in the howto for running matlab on the grid
mathematica: [mathematica] version 6.0, version 7.0 [mathematica7] and version 8.0 [mathematica8]
maple: [maple/xmaple] version 12, version 14 [maple14/xmaple14] and version 15 [maple15/xmaple15]
sas: version 9.2 [sas]
R: [R] newest version (by February 6th, 2012: 2.14.1)
splus: [Splus] version 8.0.4.
SUN studio 12u1: [sunstudio] installed under /opt/SS12u1/.... .
TotalView debugger: [totalview]
OpenMPI: version 1.3.x. See the howto for usage.
Wine: [wine] version 1.0.1. See special setup instructions.