Table of Contents
Interactive Jobs
If you want to test software that requires GUI, MPI/Parallel/Multiple threads using interactive jobs may be useful.
Note that if you need a GUI – you’ll need to ssh
with -X
.
Run the following to start an interactive job:
sinteractive --ntasks 1 --nodes 1 --time 00:30:00 --mem 2GB
Once resources are allocated, you’ll be placed in an alternative machine (for your interactive session). You will need to re-load your modules.
Jupyter notebooks + Slurm
In an interactive job session you can open a jupyter
notebook with the following steps:
Source envs for you interactive session For example you may run the following:
source ~/.bash_profile module load git/2.18.0 gcc/9.2.0 openmpi/4.0.2 python/3.8.5 source venv/bin/activate
Setup tunnel + jupyter instance on cluster
To do this run the following:
ipnport=$(shuf -i8000-9999 -n1) ipnip=$(hostname -i) echo "Run on local >>> ssh -N -L $ipnport:$ipnip:$ipnport avajpeyi@ozstar.swin.edu.au" jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
Local connection to interactive job
- run the command echoed above
- open the link to the jupyer notebook (printed in the previous window)
Run
exit
when doneOtherwise the job will keep running, hogging resources, until it times out.
For convenience, I have added the following to my OzStar .bash_profile
# Interactive Jupter notebooks
alias start_ijob="sinteractive --ntasks 2 --time 00:60:00 --mem 4GB"
start_jupyter () {
ipnport=$(shuf -i8000-9999 -n1)
ipnip=$(hostname -i)
echo "Run on local >>>"
echo "ssh -N -L $ipnport:$ipnip:$ipnport avajpeyi@ozstar.swin.edu.au"
jupcmd=$(jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip)
}
export -f start_jupyter
This allows me to start an interactive job with start_ijob
and start the jupter notebook with start_jupyter
.
Plot CPU hours used for jobs
Academics should try to be cognizant of the energy impact of their jobs.
The following creates a file jobstats.txt
that contains the CPU time (seconds) for each job run bw the start+end time specified.
sacct -S 2021-01-01 -E 2021-10-06 -u avajpeyi -X -o "jobname%-40,cputimeraw" --parsable2 > jobstats.txt
To plot the data you can use the following:
""" Plots total number of CPU hours used | |
To create a "jobstats.txt" run somthing like the following: | |
> sacct -S 2020-01-01 -E 2021-10-06 -u avajpeyi -X -o "jobname%-40,cputimeraw,start" --parsable2 > jobstats.txt | |
""" | |
import matplotlib.pyplot as plt | |
import numpy as np | |
import pandas as pd | |
from humanize import intword | |
from matplotlib import ticker | |
plt.style.use( | |
"https://gist.githubusercontent.com/avivajpeyi/4d9839b1ceb7d3651cbb469bc6b0d69b/raw/4ee4a870126653d542572372ff3eee4e89abcab0/publication.mplstyle") | |
plt.rcParams['axes.grid']= False | |
FNAME = "jobstats.txt" | |
SEC_IN_HR = 60.0 * 60.0 | |
def read_file(): | |
with open(FNAME, 'r') as f: | |
filecontents = f.read().split("\n") | |
header = filecontents[0].split("|") | |
data = filecontents[1:] | |
data = [d for d in data if len(d) > 1] | |
data = np.array([np.array(row.split("|")) for row in data]) | |
data = data.T | |
data_dict = {header[i]: data[i] for i in range(len(header))} | |
data = pd.DataFrame(data_dict) | |
data['CPUTimeRAW'] = data['CPUTimeRAW'].astype('float64') | |
data['CPU Hrs'] = data['CPUTimeRAW'] / SEC_IN_HR | |
data['Start'] = pd.to_datetime(d['Start'], format='%Y-%m-%dT%H:%M:%S') | |
return data | |
def get_total_cpu_hrs(data): | |
return np.sum(data['CPUTimeRAW'].values) / SEC_IN_HR | |
def plot_time(data): | |
plt.figure(figsize=(4, 3)) | |
data['CPU Hrs'] = data['CPUTimeRAW'] / SEC_IN_HR | |
hrs = data['CPU Hrs'] | |
hrs = hrs[hrs > 0.01] | |
min_h, max_h = min(hrs), max(hrs) | |
plt.hist(hrs, density=False, bins=np.geomspace(min_h, max_h, 100)) | |
plt.xlabel("CPU Hrs") | |
plt.xlim(left=min_h) | |
plt.yscale('log') | |
plt.xscale('log') | |
plt.ylabel("Jobs") | |
plt.title(f"Total: {intword(get_total_cpu_hrs(data), '%.1f')} Hr") | |
plt.tight_layout() | |
plt.savefig('cpuhrs_hist.png') | |
def bin_dates_data(df, delta=5): | |
delta = timedelta(days=delta) | |
res = {} | |
# end of first bin: | |
binstart = df['Start'][0] | |
bin_key = str(binstart) | |
res[bin_key] = 0 | |
# iterate through the data item | |
for i, row in df.iterrows(): | |
cur_date, cur_data = row['Start'], row['CPU Hrs'] | |
# if the data item belongs to this bin, append it into the bin | |
if cur_date < binstart + delta: | |
res[bin_key] = res.get(bin_key,0) + cur_data | |
continue | |
# otherwise, create new empty bins until this data fits into a bin | |
binstart += delta | |
bin_key = str(binstart) | |
while cur_date > binstart + delta: | |
res[bin_key] = 0 | |
binstart += delta | |
bin_key = str(binstart) | |
# create a bin with the data | |
res[bin_key] = res.get(bin_key,0) + cur_data | |
date_bins, cpu_hrs = list(res.keys()), list(res.values()) | |
return date_bins, cpu_hrs | |
def format_date_ticklabel(d): | |
d = datetime.strptime(d, "%Y-%m-%d %H:%M:%S") | |
return d.strftime("%b, '%y") | |
def plot_cpu_timseries(data, delta=20): | |
date_bins, cpu_hrs = bin_dates_data(data, delta=delta) | |
fig, ax = plt.subplots(1,1, figsize=(4,2.5)) | |
ax.bar(date_bins, cpu_hrs,width=1) | |
plt.xticks(rotation=-45) | |
num_bins = len(date_bins) | |
ticks = [i for i in range(0, num_bins, int(num_bins/5))] | |
labels = [format_date_ticklabel(date_bins[i]) for i in ticks] | |
ax.set_xticks(ticks) | |
ax.set_xticklabels(labels) | |
ax.set_yscale('log') | |
plt.minorticks_off() | |
ax.set_ylim(bottom=0.3) | |
ax.set_ylabel("Hrs") | |
plt.grid(visible=False) | |
plt.savefig('cpuhrs_timeseries.png') | |
def main(): | |
data = read_file() | |
total_hrs = get_total_cpu_hrs(data) | |
print(f"Total CPU hrs: {total_hrs:.2f}") | |
plot_time(data) | |
plot_cpu_timseries(d) | |
if __name__ == '__main__': | |
main() |

Downloading/Uploading data
Slurm job with data-download
The nodes with the fastest net speeds are the data-mover
nodes.
The compute-nodes dont have internet connection, so any jobs that require data download sould be done as a pre-processing step on the data-mover
nodes.
For example:
#!/bin/bash
#
#SBATCH --job-name={{jobname}}
#SBATCH --output={{log_dir}}/download_%A_%a.log
#SBATCH --ntasks=1
#SBATCH --time={{time}}
#SBATCH --mem={{mem}}
#SBATCH --cpus-per-task={{cpu_per_task}}
#SBATCH --partition=datamover
#SBATCH --array=0-5
module load {{module_loads}}
source {{python_env}}
ARRAY_ARGS=(0 1 2 3 4)
srun download_dataset ${ARRAY_ARGS[$SLURM_ARRAY_TASK_ID]}
Rsync data from/to OzStar
rsync -avPxH --no-g --chmod=Dg+s <LOCAL_PATH> avajpeyi@data-mover01.hpc.swin.edu.au:/fred/<OZ_PROJ>
Sequential jobs
Say you want to trigger sequential jobs (like a DAG), you will need to use the JobID for this and --dependnecy=afterany:<JOBID>
. For example:
==> submit.sh <==
#!/bin/bash
ANALYSIS_FN=('slurm_analysis_0.sh' 'slurm_analysis_1.sh')
POST_FN='slurm_post.sh'
JOB_IDS=()
for index in ${!ANALYSIS_FN[*]}; do
echo "Submitting ${ANALYSIS_FN[$index]}"
JOB_ID=$(sbatch --parsable ${ANALYSIS_FN[$index]})
JOB_IDS+=(JOB_ID)
done
IDS="${JOB_IDS[@]}"
IDFORMATTED=${IDS// /:}
echo "Submitting ${POST_FN}"
echo "sbatch --dependnecy=afterany:${IDFORMATTED} ${POST_FN}"
sbatch --dependency=afterany:$IDFORMATTED $POST_FN
squeue -u $USER -o '%.4u %.20j %.10A %.4C %.10E %R'
Click to view the submission scripts
==> slurm_analysis_0.sh <==
#!/bin/bash
#
#SBATCH --job-name=analysis_0
#SBATCH --output=out.log
#
#SBATCH --ntasks=1
#SBATCH --time=0:01:00
#SBATCH --mem=100MB
#SBATCH --cpus-per-task=1
module load git/2.18.0 gcc/9.2.0 openmpi/4.0.2 python/3.8.5
echo "analysis 0"
==> slurm_analysis_1.sh <==
#!/bin/bash
#
#SBATCH --job-name=analysis_1
#SBATCH --output=out.log
#
#SBATCH --ntasks=1
#SBATCH --time=0:01:00
#SBATCH --mem=100MB
#SBATCH --cpus-per-task=1
module load git/2.18.0 gcc/9.2.0 openmpi/4.0.2 python/3.8.5
echo "analysis 1"
==> slurm_post.sh <==
#!/bin/bash
#
#SBATCH --job-name=post
#SBATCH --output=out.log
#
#SBATCH --ntasks=1
#SBATCH --time=0:01:00
#SBATCH --mem=100MB
#SBATCH --cpus-per-task=1
module load git/2.18.0 gcc/9.2.0 openmpi/4.0.2 python/3.8.5
echo "post"