Skip to content
Snippets Groups Projects
README.md 10.9 KiB
Newer Older
d.bertini's avatar
d.bertini committed
# pp-containers
d.bertini's avatar
d.bertini committed
In this repository you will find the [apptainer](https://apptainer.org/) definition file which are used to create
the apptainer containers that one can use for plasma pic simulation on the **virgo3** cluster.
d.bertini's avatar
d.bertini committed

The containers are all built on top of [Rocky Linux v8.10](https://rockylinux.org/news/rocky-linux-8-8-ga-release/),which is
similar to the system used by all **virgo3** nodes.
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
The containers provide a self-consistent environment to perform PIC simulation using either
 - [EPOCH 1d, 2d, 3d](https://github.com/Warwick-Plasma/epoch) 
 - [WarpX 1d, 2d, rz, 3d](https://github.com/ECP-WarpX/WarpX)
d.bertini's avatar
d.bertini committed
 
All the containers have been tested and guarantee
  - full speed Infiniband interconnect communication
  - Lustre filesystem optimised MPI I/O   
d.bertini's avatar
d.bertini committed


d.bertini's avatar
d.bertini committed
## Software stack highlights
Containers provide the **latest** version for both the needed libraries/dependencies and PIC codes.
The latest versions are always installed in the **/dev** subdirectory keeping the former releases versions in both
the **/prod** and **/old** subdirectories.
d.bertini's avatar
d.bertini committed

The **/dev** (latest) software stack includes:
- Linux 
  - [RockyLinux v8.10](https://docs.rockylinux.org/release_notes/8_8/)
d.bertini's avatar
d.bertini committed
- Lustre
  - [Lustre fs client v2.15.5](http://downloads.whamcloud.com/public/lustre/lustre-2.15.5/el8.10/client)
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
- MPI
  - [openMPI v5.0.5](https://www.open-mpi.org/software/ompi/v5.0/)
  - [PMix v5.0.3](https://github.com/openpmix/openpmix)
  - [UCX v1.17.0](https://github.com/openucx/ucx)
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
- I/O Libraries
  - [HDF5 v1.14.4](https://www.hdfgroup.org/solutions/hdf5/)
  - [ADIOS2 2.10.1](https://github.com/ornladios/ADIOS2)
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
- PIC codes
  - [EPOCH v4.19.0](https://github.com/Warwick-Plasma/epoch/releases/tag/v4.19.0)
     - epoch1d,2d,3d
     - epoch1d_lstr,2d_lstr,3d_lstr (increased string length)
  - [WarpX v24.08](https://github.com/ECP-WarpX/WarpX)
     - waprx_1d,2d,3d
	 
- [OpenPMD-api v0.15.2]( https://github.com/openPMD/openPMD-api.git) 
d.bertini's avatar
d.bertini committed

- [Python](https://www.python.org/)
  - v3.6.8 (Rocky Linux 8.10 native version)
  - v3.12.1 (latest version)
d.bertini's avatar
d.bertini committed
  - scientific packages (numpy, matplotlib, lmfit, yt etc ...)
  - `sdfutils` bindings for `Epoch`
  - `openpmd-api` bindings
  
d.bertini's avatar
d.bertini committed
## Availability
Containers are available on `/cvmfs/phelix.gsi.de/sifs/` :
d.bertini's avatar
d.bertini committed

### [Epoch](https://github.com/Warwick-Plasma/epoch/releases/tag/v4.19.0) + [WarpX](https://github.com/ECP-WarpX/WarpX)
-  `GCC 8.5 + python v3.6.8`   (RockyLinux 8.10 system compiler+python)
   - `/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx.sif`
   - `/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx_dask.sif` 

-  `GCC 13.2 + python v3.12.1` (latest GNU compiler + Python versions)
   - `/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx_gcc13_py312.sif`
-  `FLASH dedicated container using latest software stack.`   
   - `/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx_gcc13_flash.sif`

### GPUs (AMD GPUs MI-100 Instinct)
### [WarpX](https://github.com/ECP-WarpX/WarpX)
-  `GCC 8.5 + python v3.6.8 + ROCm 6.2`   (RockyLinux 8.10 system compiler+python)
   - `/cvmfs/phelix.gsi.de/sifs/gpu/dev/rlx8_rocm-6.2_warpx.sif`
### [PicOnGPU](https://github.com/ComputationalRadiationPhysics/picongpu)
-  `GCC 8.5 + python v3.6.8 + ROCm 5.7.1`   (RockyLinux 8.10 system compiler+python)
   - `/cvmfs/phelix.gsi.de/sifs/gpu/prod/rlx8_rocm-5.7.1_picongpu.sif`
d.bertini's avatar
d.bertini committed


## Apptainer in unpriviled user namespace mode
A new version of `Apptainer (1.3)` in unprivileged user namespace mode is now installed on Virgo3.
d.bertini's avatar
d.bertini committed

In order to avoid a possible user namespace exhaustion and to use full
MPI intra-node optimizations, MPI users  need  to use the so-called
Apptainer sharens mode.

To do that, MPI users should add the following environment variables in
their submit script:

```
export APPTAINER_SHARENS=true
export APPTAINER_CONFIGDIR=/tmp/$USER
```

Setting `APPTAINER_SHARENS` to `true` value will tell `Apptainer` to switch to the sharens mode.  
All the MPI spawned processes on one node will then be moved to the same user namespace 
defined by a unique Apptainer instance created on this node.

The `APPTAINER_CONFIGDIR` is the location where the metadata
of the unique Apptainer instance is stored. This information does not need 
to be shared between nodes  so one can safely use `/tmp/$USER`.

## Getting started with pp-containers 
To use container you need first to login to **virgo3** on a baremetal submit host:
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
```
ssh user_name@virgo3.hpc.gsi.de
d.bertini's avatar
d.bertini committed
``` 

From the baremetal submit node, you are able to submit a job with the environment define in
your own container.
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
To do that just modify your  script as next:
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
```
# PP RLX8 container with gcc 8.5.0
export CONT=/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx.sif
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
export APPTAINER_BINDPATH=/lustre/rz/dbertini/,/cvmfs
export APPTAINER_SHARENS=true
export APPTAINER_CONFIGDIR=/tmp/$USER

# openMPI I/O module 
export OMPI_MCA_io=romio341
d.bertini's avatar
d.bertini committed

d.bertini's avatar
d.bertini committed
# run your application as if it was installed on the host !
echo "." | srun --export=ALL -- $CONT epoch3d
```
and that's it !

Basically the scheduling will be done from SLURM installed on the baremetal host, for all the rest
( execution of MPI core, I/O and PIC code) the software stack installed inside the container
will take over.

d.bertini's avatar
d.bertini committed
## Interaction with the container
Once you data is produced you can do analysis using the same containerized environment since it also provide 
the necessary python libraries:
```
[dbertini@lxbk1131 ~]$ singularity exec /cvmfs/phelix.gsi.de/sifs/cpu/rlx8_ompi_ucx.sif bash -l  
Centos system profile loaded ...
Apptainer> python3 --version
Python 3.6.8
Apptainer> python3
Python 3.6.8 (default, Feb 21 2023, 16:57:46) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import matplotlib
>>> import sdf
...
```
d.bertini's avatar
d.bertini committed


## GPU-WarpX container
A dedicated container can be used  to run the latest [WarpX 24.01](https://github.com/ECP-WarpX/WarpX/tree/development) 
on the virgo3 `gpu` partition.
The container is built on top of the standard  `rlx8` container featuring a Rocky Linux 8.10 system with the common additional 
libraries ( openMPI + I/O libraries ).
The latest Radeon Open Compute  [ROCM 6.0](https://github.com/RadeonOpenCompute/ROCm) 
is installed and has been tested on the AMD MI 100 GPUs available on the 
`gpu` virgo3 partition.

To submit on gpu, example scripts are available from the `gpu_scripts` directory.
Basically the submission is similar to the one uses for  CPUs-based partition i.e

```
#!/bin/bash

# Define container image and working directory
export CONT=/cvmfs/phelix.gsi.de/sifs/gpu/dev/rlx8_rocm-6.0_warpx.sif
export WDIR=/lustre/rz/dbertini/gpu/warpx

# Define I/O module for openMPI 
export OMPI_MCA_io=romio341

# Definie apptainer external filesystems bindings 
export APPTAINER_BINDPATH=/lustre/rz/dbertini/,/cvmfs
export APPTAINER_SHARENS=true
export APPTAINER_CONFIGDIR=/tmp/$USER

# Executable with dimentionality and corresponding input deck file.
srun --export=ALL -- $CONT warpx_2d  $WDIR/scripts/inputs/warpx_opmd_deck

```

## New Container directory layout on `/cvmfs`
The container directory layout provides now  `old`, `prod`, `dev` directories.
It is recommended to always use the `dev` containers since they features the latest 
openMPI software stack versions. 
New produced container will be push to the `dev` directory waiting for validation from
the user community. After validation we will move them to `prod` and `prod` containers will then
be mode to the `old` directory.

CVMFS directory layout (date `Tue Sep  3 12:42:56 CEST 2024`):
[dbertini@lxbk0724 /cvmfs/phelix.gsi.de/sifs]$ tree
.
├── cpu
│   ├── dev
│   │   ├── rlx8_ompi_ucx_dask.sif
│   │   ├── rlx8_ompi_ucx_flash.sif
│   │   ├── rlx8_ompi_ucx_gcc13_py312.sif
│   │   └── rlx8_ompi_ucx.sif
│   ├── old
│   └── prod
│       ├── rlx8_ompi_ucx_gcc12.sif
│       └── rlx8_ompi_ucx.sif
└── gpu
    ├── dev
    │   └── rlx8_rocm-6.2_warpx.sif
    ├── old
    │   ├── rlx8_rocm-5.4.6.def
    │   ├── rlx8_rocm-5.4.6.sif
    │   ├── rlx8_rocm-5.4.6_warpx.def
    │   ├── rlx8_rocm-5.4.6_warpx.sif
    │   ├── ubuntu-20.04_rocm-5.4.2_picongpu.def
    │   ├── ubuntu-20.04_rocm-5.4.2_picongpu.sif
    │   ├── ubuntu-20.04_rocm-5.4.2_warpx.def
    │   └── ubuntu-20.04_rocm-5.4.2_warpx.sif
    └── prod
        ├── rlx8_rocm-5.7.1_picongpu.sif
        ├── rlx8_rocm-5.7.1_warpx_aware.sif
        ├── rlx8_rocm-6.0_warpx_aware.sif
        └── rlx8_rocm-6.0_warpx.sif

8 directories, 19 files
## Compiling and installing own package 
One can compile and install its own package using the provided containers 
d.bertini's avatar
d.bertini committed
```
/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx.sif
This container provide all  components which are necessary for self 
d.bertini's avatar
d.bertini committed
install of user packages.
To compile within the container environment you need first to load the container.

```
> export CONT=/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx.sif
> singularity exec $CONT -B /lustre -B /cvmfs bash -l
d.bertini's avatar
d.bertini committed
> Apptainer> 
```
d.bertini's avatar
d.bertini committed
The `Apptainer` prompt means that created shell contains now the containerized environment.
You can get the normal unix prompt back typing once the bash command

```
Apptainer> bash
[dbertini@lxbk1130 /lustre/rz/dbertini]$
```

You can check for example which version are available within the environment:

```
[dbertini@lxbk1130 /lustre/rz/dbertini]$ singularity exec rlx8_ompi_ucx.sif bash -l 
 
RLX system profile loaded ...
[pp_container]/lustre/rz/dbertini]$ g++ --version
g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
d.bertini's avatar
d.bertini committed
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[pp_container]/lustre/rz/dbertini] $  ompi_info | grep ucx
  Configure command line: '--prefix=/usr/local' '--with-pmix=/usr/local' '--with-libevent=/usr' '--with-ompi-pmix-rte' '--with-orte=no' '--disable-oshmem' '--enable-mpirun-prefix-by-default' '--enable-shared' '--without-verbs' '--with-hwloc' '--with-ucx=/usr/local/ucx' '--with-lustre' '--with-slurm' '--enable-mca-no-build=btl-uct'
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.0.1)
                 MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.0.1)

d.bertini's avatar
d.bertini committed
```

Installation of packages should be done on the Lustre shared filesystem, which is acessible from every nodes in the cluster.

To run the self installed package use the following command in your `run-file.sh`

```
export CONT=/cvmfs/phelix.gsi.de/sifs/cpu/dev/rlx8_ompi_ucx.sif
d.bertini's avatar
d.bertini committed
export WDIR=/lustre/rz/dbertini/warpx

export OMPI_MCA_io=romio341
d.bertini's avatar
d.bertini committed
export APPTAINER_BINDPATH=/lustre/rz/dbertini/,/cvmfs
export APPTAINER_SHARENS=true
export APPTAINER_CONFIGDIR=/tmp/$USER
d.bertini's avatar
d.bertini committed

srun --export=ALL -- singularity exec -B /lustre -B /cvmfs  $CONT <my_compiled_exec>  <options>
d.bertini's avatar
d.bertini committed

```