Newer
Older

d.bertini
committed
# pp-ana
[OpenPMD](https://github.com/openPMD/openPMD-standard) python post processor using parallel I/O.

d.bertini
committed
## Overview
**pp-ana** enables the reading of large PIC simulation [openPMD](https://github.com/openPMD/openPMD-standard) datasets.
It uses internally the [openpmd-api](https://openpmd-api.readthedocs.io/en/0.15.2/) for parallel
reading/writing.

d.bertini
committed
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
The **pp-ana** design follow a 2 steps approach:
- 1) the simulation input openpmd data, which usually consist of large datasets, are reduced using selections (kinematical, geometrical ... ) , re-sampling or
particle merging methods. The data is then stored as reduced datasets. This steps is done once or only few times.
- 2) The post-processing (analysis), which usually consists in visualizing the data using differents king of plots, can be performed by reading directly the reduced
datasets provide by te first step. This step is usually done many times and is then naturally optimized by the reduced datasets.
To perform this 2-steps approach **pp-ana** contains two main post-processing components:
- (`opmdfilter.py`) the filtering program that processes in parallel OpenPMD data files and generates Parquet files for both field and particle data
- (`analyze.py`) the main analysis program that reads the generated Parquet files to produce histograms and visualizations (matplotlib).
The code is designed to leverage parallel processing using [MPI (Message Passing Interface)](https://www.open-mpi.org/) via the [mpi4py python interface](https://mpi4py.readthedocs.io/en/stable/) and [Dask](https://www.dask.org/) for efficient data handling.
## Workflow
1. **Data Processing**: The filtering script reads in parallel the simulation data from OpenPMD files, extracts/select relevant fields and particle information, and saves them in a efficient structured columnar format (Parquet) for further analysis.
2. **Parallel Computing**: The code utilizes MPI for parallel processing, allowing multiple processes to work on different parts of the data simultaneously. This is particularly useful for large datasets where reducing memory usage per node is essential.
3. **Data Storage**: The use of Parquet files provides an efficient way to store large amounts of data with support for compression, making it easier to read and write data in a distributed environment. This format has been prouven to be very efficient on the [lustre filesystem](https://www.lustre.org/) installed on the [gsi virgo cluster](https://hpc.gsi.de/virgo/)
4. **Data Analysis**: The analysis code reads the Parquet files and performs various analyses, including generating histograms of particle energy and field data visualizations.
## Main components
### Filtering Script (`opmdfilter.py`)
- **Command Line Arguments**:
- `--opmd_dir` or `-d`: Directory containing OpenPMD input files.
- `--opmd_file` or `-f`: Specific OpenPMD file to process (optional).
- `--output_dir` or `-o`: Directory to save the output Parquet files.
- `--species` or `-s`: Particle species name (default: "electrons").
- **Key Features**:
- Traverses the specified OpenPMD directory to find relevant simulation files.
- Reads electric field data (Ex, Ey, Ez) and particle data (positions and momenta).
- Normalizes and filters particle data based on energy thresholds.
- Saves electric field and particle data as Parquet files, with metadata for field information.
### Analysis Script (`analyze.py`)
- **Command Line Arguments**:
- `--pq_dir` or `-d`: Directory containing the Parquet files.
- `--output_dir` or `-o`: Directory to save the output plots.
- `--opmd_file` or `-f`: Specific OpenPMD file to analyze.
- `--species` or `-s`: Particle species name (default: "electrons").
- `--analyze` or `-a`: Type of analysis to perform: 'field', 'particle', or 'full' (default: 'full').
- **Key Features**:
- Initializes analyzers for field and particle data based on user input.
- Reads particle data and calculates energy, generating 2D histograms of particle distributions.
- Analyzes divergence of particle momenta.
- Reads electric field data and generates visualizations.
## Requirements
- Python 3.x
- Required libraries:
- `numpy`
- `pandas`
- `dask`
- `mpi4py`
- `pyarrow`
- `scipy`
- `openpmd-api`
- `openpmd-viewer`
You can install the required libraries using pip:
```bash
pip install numpy pandas dask mpi4py pyarrow scipy openpmd-api openpmd-viewer
```
## Installation
### Clone the repository:
```bash
git clone https://git.gsi.de/d.bertini/pp-ana
cd pp-ana
```
The main components are located on the `/analysis` directory
## Usage
### Run the filtering process
```bash
mpirun -np <num_processes> python opmd_filter.py -d <opmd_directory> -f <opmd_file> -o <output_directory>
```
- `mpirun -np <num_processes>`: Specifies the number of parallel processes to run.
Replace <num_processes> with the desired number of MPI processes.
- `python opmd_filter.py`: The command to execute the filtering script.
- `-d <opmd_directory>` or `--opmd_dir <opmd_directory>`: The directory containing the OpenPMD input files. Replace <opmd_directory> with the path to your OpenPMD data.
- `-f <opmd_file>` or `--opmd_file <opmd_file>`: (Optional) The specific OpenPMD file to process. If not provided, the script will process all files in the specified directory.
- `-o <output_directory>` or `--output_dir <output_directory>`: The directory where the output Parquet files will be saved. Replace <output_directory> with the desired output path.
- `-s <species>` or `--species <species>`: (Optional) The particle species name to filter (default is "electrons"). Replace `<species>` with the desired species name.
### Run the analysis process
```bash
mpirun -np <num_processes> python opmd_pq_reader.py -d <parquet_directory> -o <output_directory> -f <opmd_file> -a <analysis_type>
```
- `mpirun -np <num_processes>`: Specifies the number of parallel processes to run. Replace <num_processes> with the desired number of MPI processes.
- `python opmd_pq_reader.py`: The command to execute the analysis script.
- `-d <parquet_directory>` or `--pq_dir <parquet_directory>:` The directory containing the Parquet files generated by the filtering script. Replace <parquet_directory> with the path to your Parquet files.
- `-o <output_directory>` or `--output_dir <output_directory>`: The directory where the output plots will be saved. Replace `<output_directory>` with the desired output path.
- `-f <opmd_file>` or `--opmd_file <opmd_file>`: The specific OpenPMD file to analyze. This should match the file processed in the filtering step.
- `-s <species>` or `--species <species>`: (Optional) The particle species name to analyze (default is "electrons"). Replace <species> with the desired species name.
- `-a <analysis_type>` or `--analyze <analysis_type>`: Specifies which type of analysis to run. Options include:
- `field`: Analyze only the electric field data.
- `particle`: Analyze only the particle data.
- `full`: Perform both field and particle analyses (default).
### Examples
- To filter data from a specific OpenPMD file:
```bash
mpirun -np 4 python opmd_filter.py -d /path/to/opmd_data -f simulation.bp -o /path/to/output
```
- To analyze the generated Parquet files and create histograms:
```bash
mpirun -np 4 python opmd_pq_reader.py -d /path/to/output/simulation/ -o /path/to/plots/ -f simulation.bp -a full
```
## Acknowledgments
Special thanks to the [openpmd](https://github.com/openPMD) community and particularly the [openpmd-api](https://github.com/openPMD/openPMD-api)
developpers for their support and feedback.
## Contact
For any questions or inquiries, please contact [d.bertini@gsi.de](mailto:D.Bertini@gsi.de) [j.hornung@gsi.de](mailto:J.Hornung@gsi.de).