Biomedical Informatics High Performance Computing Cluster
Credit: Kevin Ernst, Matthew Weirauch, and Leah Kottyan for compiling and hosting much of this information not found on the DBS HPC website.
We have access to a Linux computational cluster through the Data Management and Analysis Collaborative (DMAC). To request access to the cluster, please use the linked web form, or send an email to Cluster Support: help-cluster@bmi.cchmc.org. After your access is enabled, you should be able to login to the cluster with your CCHMC employee network credentials.
Up to date policies are described here. In brief, all users have a default disk (home directory) quota of 100 GB and a job walltime quota of 10000 hours per quarter, which is free. The quota of your home directory is fixed, and cannot be changed. To increase the walltime, please send an email to Cluster Support.
Your home directory is accessible at the path /users/YOURUSERID
and has a default 100 GB quota. You also have access to a “scratch” volume at /scratch/YOURUSERID
, where there is a 100 GB quota with a burst of up to 5 TB for up to 7 days. Anything there will be removed after 60 days. Users typically do not have (read/write) access to other users’ home directories. Thus, /data/hpc-troutman/
is the place to store any data or analyses (not personal files) that anyone else in the lab might ever need to work with.
Accessing the HPC cluster
Connect to the CCHMC cluster by following the [BMI instructions]https://hpc.research.cchmc.org/accessing-the-cluster), or here, or follow the examples below.
If you are in the CCHMC network or connected to CCHMC VPN:
ssh [yourusername]@bmiclusterp.chmcres.cchmc.org
If you are outside of CCHMC network, connect to ssh.research.cchmc.org then to bmiclusterp.chmcres.cchmc.org:
ssh [yourusername]@ssh.research.cchmc.org
ssh [yourusername]@bmiclusterp.chmcres.cchmc.org
Alternatively, connect through the OnDemand Portal. Detailed instructions are here
Running jobs on the HPC cluster
Do not run computationally intensive jobs on the login/head nodes (bmiclusterp2 or p3). Basic file management or resuming a GNU Screen or tmux session should be about the extent of it.
The “head nodes” bmiclusterp1 and bmiclusterp2 are not intended to be used for computationally-intensive tasks; basic file management is about it. The head nodes are where everyone logs in and submits their jobs, so if you were to submit computationally heavy tasks, you could prevent others from logging in or submitting their own jobs to the actual cluster nodes. Processes run from the cluster head node directly (i.e., not submitted using bsub as shown below) are limited to 75% of one CPU and 10% of total system RAM (currently at 10 GB, so 1 GB) and will be terminated if they go over.
What is a “job”? A job is basically a Bash shell script that runs computationally intensive program of your making on the cluster.
What is a cluster? A cluster is a form of supercomputer, which is made up of individual servers or nodes. They share access to a large pool of disk storage through NAS (Network Attached Storage). Our server uses IBM’s “Platform LSF” workload management platform, which means, among other things, that there is a scheduler for using the cluster. You submit batch jobs (programs) to the scheduler, and it starts your jobs as soon as resources are available.
The BMI cluster offers several advantages:
First, the cluster has big computer capabilities. The largest nodes have 512 GB RAM and 16 cores.
Linux environment with lots of software ready to go. This is handles with many pre-installed sofware envrionment, called Modules.
Please refer to modules page for more information and examples.The cluster allows you to do pseudo-parallelism. You can break a problem into many pieces and run many jobs in a distributed fashion.
Scheduling computation jobs with LSF
The cluster batch system is currently managed by LSF resource manager and scheduler. You can find some examples of submitting jobs using LSF here. Please read the job scheduling policies page for recent policy changes.
Essentially, users submit jobs to the server using the bsub command. The current state of the queue in the server can be viewed using bjobs. There are a host of other utilities that can be used by Torque users like: bkill, bmod, bstop, bmig, bresume etc. bsub can be used for batch as well as interactive submission of jobs. Interactive job submission should be used only when a user needs to run and debug his code and for short-duration jobs. In either case, you need to submit your job to a cluster node so that you can do your testing without burdening the head node.
An example to directly submit a script:
bsub -W 4:00 -n 1 -M 10000 < script
An example command to request an interactive session:
bsub -W 4:00 -n 1 -M 10000 -Is /bin/bash
We have wrapper scripts to automate assembly of .bat files and submission to the LSF scheduler. More information is available here.
Checking your cluster balance
By default, you have 10,000 hours of “wall time” (real-world computation time for the jobs you run) on the cluster, and this quota resets quarterly (ref).
BMI uses an accounting system named “Gold” to keep track of how many wall hours you have remaining; you need to load a module named gold to use this: - load the ‘gold’ module, which provides the ‘gbalance’ command - you should be on a cluster node (e.g., bmiclusterp2) when you do this.
module load gold
gbalance -h -u \$USER
leave off the ‘-h’ if for some reason you want to see the output in seconds of wall time instead of the more human-readable hours
The string $USER is an environment variable that always refers to the currently-logged in user (you!). You can replace $USER with some other username if you want to spy on someone else’s cluster usage.
The columns in the output of gbalance are:
Column Name | Meaning |
---|---|
Id | Unique user identifier |
Name | Your username |
Amount | Wall time requested for currently running jobs |
Reserved | Wall time requested for currently pending jobs |
Balance | Total wall time currently requested (sum of amount and reserved) |
CreditLimit | Amount of wall time credited each quarter |
Available | Wall time remaining after current requests |
See also:
Requesting a cluster account
Cluster accounts are now requested using a web form. The older email system below is out dated. It may still be useful if the form is broken.
Here is a boilerplate request for a BMI HPC cluster account that you can update with your own information:
To: help@bmi.cchmc.org
Subject: New HPC cluster account
Hi everyone,
May I please have an HPC cluster account? My username is usr0xy (replace with your actual username).
Please also grant me the appropriate roles so that I can use the BMI HTTP proxy to download scientific data to the cluster.
Thanks very much.
--Your Name
Make sure to say “please” and “thank you,” and include your Children’s username, which is something like usr0xy
(three letters of your last name, then three random characters). Also include with your request that you want BMI proxy access, and mention that they can look at account ern6xv
if they need a template.