Newer
Older

d.bertini
committed
# pp-ana

d.bertini
committed
[OpenPMD](https://github.com/openPMD/openPMD-standard) python post processor using parallel I/O.
## Motivations

d.bertini
committed
Realistic pic simulations, involving 3D grids or/and collisions for instance, produce usually large datasets that do not fit into memory on a single machine.

d.bertini
committed
**pp-ana** uses parallel computing techniques to efficiently process arbitrary large datasets produced
by any [OpenPMD](https://github.com/openPMD/openPMD-standard) compatible PIC codes such as

d.bertini
committed
[WarpX](https://github.com/ECP-WarpX/WarpX), [PiconGPU](https://github.com/ComputationalRadiationPhysics/picongpu), [fbic](https://github.com/fbpic/fbpic).

d.bertini
committed
By leveraging [MPI(Message Passing Interface)](https://www.open-mpi.org/) and the [openpmd-api](https://openpmd-api.readthedocs.io/en/0.15.2/) library ,

d.bertini
committed
**pp-ana** reads input datasets in chunks, which helps to mitigate memory limitations on a single node.
Furthermore **pp-ana** works perfectly on High-Performance Computing clusters, allowing for scalable and efficient data processing.

d.bertini
committed
## Overview

d.bertini
committed
The **pp-ana** design follow a 2 steps approach:
- 1) the simulation input openpmd data, which usually consist of large datasets, are reduced using selections (kinematical, geometrical ... ) , re-sampling or

d.bertini
committed
particle merging methods. The data is then stored as reduced datasets. This step is done once or only few times.

d.bertini
committed
- 2) The post-processing (analysis), which usually consists in visualizing the data using differents king of plots, can be performed by reading directly the reduced

d.bertini
committed
datasets provide by te first step. This step is usually done many times and is then naturally optimized by the reduced datasets to be read.

d.bertini
committed
To perform this 2-steps approach **pp-ana** contains two main post-processing components:

d.bertini
committed
- (`opmd_filter.py`) the filtering program that processes in parallel OpenPMD data files and generates [Parquet files](https://parquet.apache.org/)

d.bertini
committed
for both field and particle data

d.bertini
committed
- (`opmd_pq_reader.py`) the main analysis program that reads the generated [Parquet files](https://parquet.apache.org/)

d.bertini
committed
to produce histograms and visualizations ([matplotlib](https://matplotlib.org/)).

d.bertini
committed
The code is designed to leverage parallel processing using [MPI (Message Passing Interface)](https://www.open-mpi.org/) via the [mpi4py python interface](https://mpi4py.readthedocs.io/en/stable/) and [Dask](https://www.dask.org/) for efficient data handling.
## Workflow

d.bertini
committed
1. **Data Processing**: The filtering script reads in parallel the simulation data from OpenPMD files, extracts/select relevant fields and particle information, and saves them in a efficient structured columnar format [Parquet](https://parquet.apache.org/) for further analysis.

d.bertini
committed
2. **Parallel Computing**: The code utilizes MPI for parallel processing, allowing multiple processes to work on different parts of the data simultaneously. This is particularly useful for large datasets where reducing memory usage per node is essential.

d.bertini
committed
3. **Data Storage**: The use of [Parquet files](https://parquet.apache.org/) provides an efficient way to store large amounts of data with support for compression, making it easier to read and write data in a distributed environment. This format has been prouven to be very efficient on the [lustre filesystem](https://www.lustre.org/) installed on the [gsi virgo cluster](https://hpc.gsi.de/virgo/)

d.bertini
committed
4. **Data Analysis**: The analysis code reads the Parquet files and performs various analyses, including generating histograms of particle energy and field data visualizations.
## Main components

d.bertini
committed
### Filtering Script (`opmd_filter.py`)

d.bertini
committed
- **Command Line Arguments**:
- `--opmd_dir` or `-d`: Directory containing OpenPMD input files.
- `--opmd_file` or `-f`: Specific OpenPMD file to process (optional).

d.bertini
committed
- `--output_dir` or `-o`: Directory to save the output [Parquet files](https://parquet.apache.org/)

d.bertini
committed
- `--species` or `-s`: Particle species name (default: "electrons").

d.bertini
committed
- **Implemented Features**:

d.bertini
committed
- Traverses the specified OpenPMD directory to find relevant simulation files.
- Reads electric field data (Ex, Ey, Ez) and particle data (positions and momenta).
- Normalizes and filters particle data based on energy thresholds.

d.bertini
committed
- Saves electric field and particle data as [Parquet files](https://parquet.apache.org/) , with metadata for field information.

d.bertini
committed

d.bertini
committed
### Analysis Script (`opmd_pq_reader.py`)

d.bertini
committed
- **Command Line Arguments**:
- `--pq_dir` or `-d`: Directory containing the Parquet files.
- `--output_dir` or `-o`: Directory to save the output plots.
- `--opmd_file` or `-f`: Specific OpenPMD file to analyze.
- `--species` or `-s`: Particle species name (default: "electrons").
- `--analyze` or `-a`: Type of analysis to perform: 'field', 'particle', or 'full' (default: 'full').

d.bertini
committed
- **Implemented Features**:

d.bertini
committed
- Initializes analyzers for field and particle data based on user input.

d.bertini
committed
- Reads particle data and calculates energy, generating 2D/1D histograms of particle distributions.

d.bertini
committed
- Analyzes divergence of particle momenta.

d.bertini
committed
- Reads electric/magnetic field data and generates visualizationsin any 2D projections located in the middle of the non-visible direction.

d.bertini
committed
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
## Requirements
- Python 3.x
- Required libraries:
- `numpy`
- `pandas`
- `dask`
- `mpi4py`
- `pyarrow`
- `scipy`
- `openpmd-api`
- `openpmd-viewer`
You can install the required libraries using pip:
```bash
pip install numpy pandas dask mpi4py pyarrow scipy openpmd-api openpmd-viewer
```
## Installation
### Clone the repository:
```bash
git clone https://git.gsi.de/d.bertini/pp-ana
cd pp-ana
```
The main components are located on the `/analysis` directory
## Usage
### Run the filtering process
```bash
mpirun -np <num_processes> python opmd_filter.py -d <opmd_directory> -f <opmd_file> -o <output_directory>
```
- `mpirun -np <num_processes>`: Specifies the number of parallel processes to run.
Replace <num_processes> with the desired number of MPI processes.
- `python opmd_filter.py`: The command to execute the filtering script.
- `-d <opmd_directory>` or `--opmd_dir <opmd_directory>`: The directory containing the OpenPMD input files. Replace <opmd_directory> with the path to your OpenPMD data.
- `-f <opmd_file>` or `--opmd_file <opmd_file>`: (Optional) The specific OpenPMD file to process. If not provided, the script will process all files in the specified directory.
- `-o <output_directory>` or `--output_dir <output_directory>`: The directory where the output Parquet files will be saved. Replace <output_directory> with the desired output path.
- `-s <species>` or `--species <species>`: (Optional) The particle species name to filter (default is "electrons"). Replace `<species>` with the desired species name.
### Run the analysis process
```bash
mpirun -np <num_processes> python opmd_pq_reader.py -d <parquet_directory> -o <output_directory> -f <opmd_file> -a <analysis_type>
```
- `mpirun -np <num_processes>`: Specifies the number of parallel processes to run. Replace <num_processes> with the desired number of MPI processes.
- `python opmd_pq_reader.py`: The command to execute the analysis script.
- `-d <parquet_directory>` or `--pq_dir <parquet_directory>:` The directory containing the Parquet files generated by the filtering script. Replace <parquet_directory> with the path to your Parquet files.
- `-o <output_directory>` or `--output_dir <output_directory>`: The directory where the output plots will be saved. Replace `<output_directory>` with the desired output path.
- `-f <opmd_file>` or `--opmd_file <opmd_file>`: The specific OpenPMD file to analyze. This should match the file processed in the filtering step.
- `-s <species>` or `--species <species>`: (Optional) The particle species name to analyze (default is "electrons"). Replace <species> with the desired species name.
- `-a <analysis_type>` or `--analyze <analysis_type>`: Specifies which type of analysis to run. Options include:
- `field`: Analyze only the electric field data.
- `particle`: Analyze only the particle data.
- `full`: Perform both field and particle analyses (default).
### Examples
- To filter data from a specific OpenPMD file:
```bash
mpirun -np 4 python opmd_filter.py -d /path/to/opmd_data -f simulation.bp -o /path/to/output
```
- To analyze the generated Parquet files and create histograms:
```bash
mpirun -np 4 python opmd_pq_reader.py -d /path/to/output/simulation/ -o /path/to/plots/ -f simulation.bp -a full
```
## Acknowledgments

d.bertini
committed
Special thanks to the [openpmd](https://github.com/openPMD) community and particularly the [openpmd-api](https://github.com/openPMD/openPMD-api)
developpers for their support and feedback.