Content-type: text/html Manpage of DANCER

DANCER

Section: ICL cluster users' guide (7)
Updated: 2016-09-29
Index Return to Main Contents

NAME

dancer - Introduction to the Dancer cluster, ICL, University of Tennessee

DESCRIPTION

This manual page describes the features and architecture of the Dancer cluster, at the University of Tennessee.

The dancer cluster is a small Infiniband Cluster administered by the DisCo team (mostly Aurelien). To receive help, prefer contacting ICL support icl-help@icl.utk.edu.

ACCOUNTS

To obtain your access to the Dancer cluster, send an email to icl-help@icl.utk.edu. You will need to provide a user name (the same as your ICL account is preferred if you have one), and a SSH public key. It is recommended that you create a new public-private key pair for security reasons.

To create a SSH key pair, use the following command:: ssh-keygen -o .ssh/dancer
Once you get the confirmation that your account has been created, login using the following command:: slogin myname@dancer.icl.utk.edu -i .ssh/dancer Password authentification is not possible on Dancer, and multiple trials to login with the wrong credential will get your IP banned, so beware. There are various ways of automating the selection of the key (see ssh_config(5)) or to login from multiple machines without putting your private key at risk by transferring it to all these machines (see ssh-agent(1)). See also the SSH_CONFIG example below.

ETIQUETTE

The Dancer cluster is a shared resource. Please be mindful of other users and avoid being disruptive. Most of our policy is "good will" based and trust our users' good maners. Remember that the dancer home area is shared with NFS, and therefore loosely secure. We advice against importing sensitive or confidential material on this system.

Do not run compute intensive, or disk intensive activities on the headnode. Always run your compute tasks (including serial ones) on the compute nodes. The headnode is for editing your files, compiling your programs, and launching mpirun/qsub commands. Only light duty processing (like visualization) should take place there.

Do not write on the NFS home areas from multiple nodes at the same time. Use the scratch disks (local or shared) if you need to write large files from your compute tasks.

Do not reserve nodes in exclusive mode, except for performance measurements. Do not reserve them for long stretch of time at once, and try to schedule exclusive activity overnight when possible.

SYSTEM DESCRIPTION

The Dancer cluster contains 32 Westmere cluster nodes, and is also hosting 6+9 Haswell GPU machines. The cluster nodes are named d00-d31; the Haswell machines are named nd01-nd06 and arc00-arc08.

Hardware

Not all nodes are the same:

d00-d15

2x Westmere-EP E5606 @2.13GHz. 8 cores, 24G RAM, Infiniband 10G, Ethernet

http://ark.intel.com/products/52583/Intel-Xeon-Processor-E5606-8M-Cache-2_13-GHz-4_80-GTs-Intel-QPI

d16-d31

2x Gainestown E5520 @2.27GHz. 8 cores, ~12G RAM, Infiniband 20G, Ethernet

http://ark.intel.com/products/40200/Intel-Xeon-Processor-E5520-8M-Cache-2_26-GHz-5_86-GTs-Intel-QPI

nd01-nd06

2x Xeon(R) CPU E5-2650 v3 @ 2.30GHz. 20 cores, 32GB RAM, Infiniband QDR 40G, Ethernet

http://ark.intel.com/products/81705/Intel-Xeon-Processor-E5-2650-v3-25M-Cache-2_30-GHz

arc00-arc08

2x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. 20 cores, 64GB RAM, Infiniband FDR 56G, Ethernet

http://ark.intel.com/products/81705/Intel-Xeon-Processor-E5-2650-v3-25M-Cache-2_30-GHz

Networks

The machines are connected through a shared Ethernet network. Home areas are accesssible through NFS on all nodes. In addition, we have four SEPARATE Infiniband compute networks. NOTE: MPI jobs over IB are maximum 16 nodes, within either d00-d15, or d16-d31. Only Ethernet jobs can span d00-d31

d00-d15: Infiniband 10G network.: mpirun -np 16 -hostfile /opt/etc/ib10g.machinefile.ompi
d16-d31: Infiniband 20G network (DDR).: mpirun -np 16 -hostfile /opt/etc/ib20g.machinefile.ompi
nd01-nd06: Infiniband 40G network (QDR).: mpirun -np 6 -hostfile /opt/etc/nd.machinefile.ompi
arc00-arc08: Infiniband 56G network (FDR).: mpirun -np 9 -hostfile /opt/etc/arc.machinefile.ompi

GPUs/Coprocessors

Some of the d16-d31 machines form a GPU accelerated cluster with IB20G. You can start a job on the NV-C2050/70 cluster with: mpirun -np 12 -hostfile /opt/etc/c2050.machinefile.ompi
Most of the nd??,arc?? machines are heavily accelerated, with NV-K40, NV-K80, AMD or MIC coprocessors.
See the following dancersh section to inquire about the availability and type of coprocessor on nodes.

SOFTWARE

There is a lot of pre-installed software, most can be found in /opt/.

We are transitioning toward the use of module(1) to locate optional software. Note that you can also maintain your own set of modules compiled in your home directory. If you identify software that would benefit to most users and is missing, please contact us (especially if you want to have some CentOS package installed).

FILESYSTEMS

/homes/myname/: This is the home area, NFS exported from the headnode to the compute nodes. It is not the same home area as your regular ICL account. You can use rsync(1) to transfer files to, and from dancer.
/cores/: This is where your core files are generated when your program crashes on a compute node. This is an NFS volume (so the core files are visible from the headnode). To reduce NFS server load, by default, core file generation is disabled. To reenable core files, run the command you wish to debug with mpirun -np 2 bash -c "ulimit -c unlimited; myprogram -a myarg1 -b myarg2".
/scratch/shared/: This filesystem is NFS exported from the headnode, and is available on all nodes. This is a network volume, writeable by everybody. Its content is cleared only when space gets scarce.
/scratch/local/: This filesystem is available on all nodes, including the headnode. This is a local disk, writeable by everybody. It's content may be wiped without notice (altough it is expected to be rare, especially for the headnode).
/scratch/ssd/: This filesystem is available on some nodes. This is a local SSD disk, writeable by everybody. Beware, its content may be wiped without notice.

THE DANCERSH COMMAND

There is a nice dancersh command to inquire quickly about the nodes, try the following options:

To obtain the list of active user processes (all users) at each node: dancersh -p
To obtain the load average of the nodes: dancersh -l
To list all YOUR threads at each node: dancersh -u
To execute a command: dancersh ls /scratch/local executes 'ls /scratch/local' at each node, which shows the content of the local hard drive scratch space
To restric the command to a range of nodes: dancersh -r 05 08 uname -a shows the operating system version on nodes d05, d06, d07, d08
To see what GPU accelerators are available on the nodes: dancersh -g

EXCLUSIVE ACCESS

It is possible to make exclusive reservations for some nodes. Exclusive reservations are managed through PBS, with the qsub(1) command. Please refrain from requesting exclusive access, except if you need to do performance measurements. By default, you can access all machines in shared mode, without reservation, simply by using ssh(1) or mpirun(1).

To start a job a 9PM: qsub -a 2100
To get 6 haswell nodes: qsub -lnodes=6:haswell
To get 6 haswell nodes on the same Infiniband network: qsub -lnodes=6:ib56
To get 12 nodes with a cuda board on the same IB section: qsub -lnodes=12:ib20:cuda
To get nodes by name: qsub -lnodes=dancer02+dancer03

GANGLIA

Ganglia is available from the ICL Ganglia dashboard http://icl.cs.utk.edu/ganglia/?c=dancer. It is also available directly from the dancer headnode, but it is firewalled and you will need to establish an SSH tunnel to access it this way.

SSH_CONFIG EXAMPLE

You can ease your access to the dancer cluster with the following SSH tricks. Adapt and Insert the following material into your .ssh/config file, on the host you use to connect to the dancer headnode.


Host dancer dancer.icl.utk.edu
  HostName dancer.icl.utk.edu 
  User myname
  IdentityFile ~/.ssh/dancer_dsa
  ForwardAgent yes
  # Use only one SSH tunel for all your actions (including rsync, scp, etc.)
  ControlMaster auto
  ControlPersist yes
  ControlPath /tmp/%r@%h:%p
  # Lets you debug with mpirun -xterm
  ForwardX11 yes
  ForwardX11Timeout 1w
  # Make ganglia accessible on your machine at url http://localhost:8086 
  LocalForward 8086 localhost:80

# Direct login to arc nodes
Host arc?? arc??.icl.utk.edu
  ProxyCommand ssh -q dancer nc %h 22

KNOWN ISSUES

NFS slow to update recompiled programs/libraries

FIXED If this issue comes back, an imperfect workaround is: make && mpirun ls >/dev/null.

CUDA on Kernel 4.5.4

Using CUDA on this kernel can leave the machine out of memory. Issue is being investigated.

X11 forwarding to nodes is flacky

You get xauth error messages when trying to mpirun -xterm, etc. This is a known issue with CentOS 6. It will be fixed when we upgrade to CentOS 7, later this year.

Using 'mpi_leave_pinned' leads to my MPI program hanging/crashing

This is normal. We have now set the default to be 0 on the system installed Open MPI. If you use your own brew of Open MPI, make sure you disable this optimization, except when you know what you are doing.

Ethernet performance unstable

FIXED

CHANGELOG

Hardware Upgrades

15/10/05: arc00-08 machines online. nd01-06 machines have infiniband.
15/07/27: Defective card mic3 on nd03 replaced.
15/07/17: Defective Ethernet switch has been replaced.

Software Upgrades

16/09/30: Kernel-4.7.2 MPSS-3.7.2 gcc-4.9.4 gcc-5.4.0 gcc-6.2.0 papi-5.5.0 ompi-2.0.1
16/05/12: Kernel-4.5.4 MPSS-3.7 gcc-6.1
16/04/26: autotools(automake-1.15)
15/09/10: CUDA-7.5 ompi-1.10.0(module)
15/06/18: Kernel-4.0.4 MPSS-3.5.1 ompi-1.8.6 Totalview-8.15.4
15/05/22: gcc-5.1 gdb-7.9.1 ompi-1.8.5
15/03/28: CUDA-7.0 MPSS-3.4.3 PAPI-5.4.1 Modules kernel-3.10.72