From bc83356dd808502b2713edb4f2f455a38214a65f Mon Sep 17 00:00:00 2001
From: Denis <D.bertini@gsi.de>
Date: Tue, 10 Dec 2024 20:05:59 +0100
Subject: [PATCH 1/4] New design using analyse steps 1 - reducing openpmd files
 to columnar parquet files 2 - analysis from reduced sets to produce
 histograms + readme

---
 README.md | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 154 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 336b2fc..aceceb6 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,159 @@
-# PP Ana
+# pp-ana
+
 [OpenPMD](https://github.com/openPMD/openPMD-standard) python post processor using parallel I/O.
 
-It enables the reading of large PIC simulation [openPMD](https://github.com/openPMD/openPMD-standard) datasets.
+## Overview
+**pp-ana** enables the reading of large PIC simulation [openPMD](https://github.com/openPMD/openPMD-standard) datasets.
 It uses internally the [openpmd-api](https://openpmd-api.readthedocs.io/en/0.15.2/) for parallel
 reading/writing. 
 
+The **pp-ana** design follow a 2 steps approach:
+- 1) the simulation input openpmd data, which usually consist of large datasets, are reduced using selections (kinematical, geometrical ... ) , re-sampling or 
+particle merging methods. The data is then stored as reduced datasets. This steps is done once or only few times.
+
+- 2) The post-processing (analysis),  which usually consists in visualizing the data using differents king of plots, can be performed by reading directly the reduced
+datasets provide by te first step. This step is usually done many times and is then naturally optimized by the reduced datasets. 
+
+To perform this 2-steps approach  **pp-ana** contains two main post-processing components: 
+- (`opmdfilter.py`) the filtering program that processes in parallel OpenPMD data files and generates Parquet files for both field and particle data
+- (`analyze.py`) the main analysis program that reads the generated Parquet files to produce histograms and visualizations (matplotlib). 
+
+The code is designed to leverage parallel processing using [MPI (Message Passing Interface)](https://www.open-mpi.org/) via the [mpi4py python interface](https://mpi4py.readthedocs.io/en/stable/) and  [Dask](https://www.dask.org/) for efficient data handling.
+
+## Workflow
+
+1. **Data Processing**: The filtering script reads in parallel the simulation data from  OpenPMD files, extracts/select relevant fields and particle information, and saves them in a efficient structured columnar format (Parquet) for further analysis.
+
+2. **Parallel Computing**: The code utilizes MPI for parallel processing, allowing multiple processes to work on different parts of the data simultaneously. This is particularly useful for large datasets where reducing memory usage per node is essential.
+
+3. **Data Storage**: The use of Parquet files provides an efficient way to store large amounts of data with support for compression, making it easier to read and write data in a distributed environment. This format has been prouven to be very efficient on the [lustre filesystem](https://www.lustre.org/) installed on the [gsi virgo cluster](https://hpc.gsi.de/virgo/)
+
+4. **Data Analysis**: The analysis code reads the Parquet files and performs various analyses, including generating histograms of particle energy and field data visualizations.
+
+## Main components
+
+### Filtering Script (`opmdfilter.py`)
+
+- **Command Line Arguments**:
+  - `--opmd_dir` or `-d`: Directory containing OpenPMD input files.
+  - `--opmd_file` or `-f`: Specific OpenPMD file to process (optional).
+  - `--output_dir` or `-o`: Directory to save the output Parquet files.
+  - `--species` or `-s`: Particle species name (default: "electrons").
+
+- **Key Features**:
+  - Traverses the specified OpenPMD directory to find relevant simulation files.
+  - Reads electric field data (Ex, Ey, Ez) and particle data (positions and momenta).
+  - Normalizes and filters particle data based on energy thresholds.
+  - Saves electric field and particle data as Parquet files, with metadata for field information.
+
+### Analysis Script (`analyze.py`)
+
+- **Command Line Arguments**:
+  - `--pq_dir` or `-d`: Directory containing the Parquet files.
+  - `--output_dir` or `-o`: Directory to save the output plots.
+  - `--opmd_file` or `-f`: Specific OpenPMD file to analyze.
+  - `--species` or `-s`: Particle species name (default: "electrons").
+  - `--analyze` or `-a`: Type of analysis to perform: 'field', 'particle', or 'full' (default: 'full').
+
+- **Key Features**:
+  - Initializes analyzers for field and particle data based on user input.
+  - Reads particle data and calculates energy, generating 2D histograms of particle distributions.
+  - Analyzes divergence of particle momenta.
+  - Reads electric field data and generates visualizations.
+
+## Requirements
+
+- Python 3.x
+- Required libraries:
+  - `numpy`
+  - `pandas`
+  - `dask`
+  - `mpi4py`
+  - `pyarrow`
+  - `scipy`
+  - `openpmd-api`
+  - `openpmd-viewer`
+  
+   
+You can install the required libraries using pip:
+
+```bash
+pip install numpy pandas dask mpi4py pyarrow scipy openpmd-api openpmd-viewer
+```
+
+## Installation
+
+### Clone the repository:
+   ```bash
+   git clone https://git.gsi.de/d.bertini/pp-ana
+   cd pp-ana
+   ```
+The main components are located on the `/analysis` directory
+
+## Usage
+
+### Run the filtering process
+
+```bash
+mpirun -np <num_processes> python opmd_filter.py -d <opmd_directory> -f <opmd_file> -o <output_directory> 
+```
+  
+- `mpirun -np <num_processes>`: Specifies the number of parallel processes to run. 
+  	Replace <num_processes> with the desired number of MPI processes.
+
+- `python opmd_filter.py`: The command to execute the filtering script.
+
+- `-d <opmd_directory>` or `--opmd_dir <opmd_directory>`: The directory containing the OpenPMD input files. Replace <opmd_directory> with the path to your OpenPMD data.
+
+- `-f <opmd_file>` or `--opmd_file <opmd_file>`: (Optional) The specific OpenPMD file to process. If not provided, the script will process all files in the specified directory.
+
+- `-o <output_directory>` or `--output_dir <output_directory>`: The directory where the output Parquet files will be saved. Replace <output_directory> with the desired output path.
+
+- `-s <species>` or `--species <species>`: (Optional) The particle species name to filter (default is "electrons"). Replace `<species>` with the desired species name. 
+   
+   
+### Run the analysis process
+
+```bash
+mpirun -np <num_processes> python opmd_pq_reader.py -d <parquet_directory> -o <output_directory> -f <opmd_file> -a <analysis_type>
+```
+	
+- `mpirun -np <num_processes>`: Specifies the number of parallel processes to run. Replace <num_processes> with the desired number of MPI processes.
+
+- `python opmd_pq_reader.py`: The command to execute the analysis script.
+
+- `-d <parquet_directory>` or `--pq_dir <parquet_directory>:` The directory containing the Parquet files generated by the filtering script. Replace <parquet_directory> with the path to your Parquet files.
+
+- `-o <output_directory>` or `--output_dir <output_directory>`: The directory where the output plots will be saved. Replace `<output_directory>` with the desired output path.
+
+- `-f <opmd_file>` or `--opmd_file <opmd_file>`: The specific OpenPMD file to analyze. This should match the file processed in the filtering step.
+
+- `-s <species>` or `--species <species>`: (Optional) The particle species name to analyze (default is "electrons"). Replace <species> with the desired species name.
+
+-  `-a <analysis_type>` or `--analyze <analysis_type>`: Specifies which type of analysis to run. Options include:
+	- `field`: Analyze only the electric field data.
+	- `particle`: Analyze only the particle data.
+	- `full`: Perform both field and particle analyses (default).
+
+
+### Examples
+
+- To filter data from a specific OpenPMD file:
+
+	```bash
+	mpirun -np 4 python opmd_filter.py -d /path/to/opmd_data -f simulation.bp -o /path/to/output
+	```
+
+- To analyze the generated Parquet files and create histograms:
+
+	```bash
+	mpirun -np 4 python opmd_pq_reader.py -d /path/to/output/simulation/ -o /path/to/plots/ -f simulation.bp -a full
+	```
+
+## Acknowledgments
+    Special thanks to the [openpmd](https://github.com/openPMD) community and particularly the [openpmd-api](https://github.com/openPMD/openPMD-api) 
+	developpers for their support and feedback.
+
+## Contact
+
+For any questions or inquiries, please contact [d.bertini@gsi.de](mailto:D.Bertini@gsi.de) [j.hornung@gsi.de](mailto:J.Hornung@gsi.de).
-- 
GitLab


From a5917fe273e476896c62e22bed054777cb4fb46d Mon Sep 17 00:00:00 2001
From: Denis <D.bertini@gsi.de>
Date: Tue, 10 Dec 2024 20:26:02 +0100
Subject: [PATCH 2/4] New design using analyse steps 1 - reducing openpmd files
 to columnar parquet files 2 - analysis from reduced sets to produce
 histograms + readme

---
 README.md | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index aceceb6..85e95b1 100644
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@ The code is designed to leverage parallel processing using [MPI (Message Passing
   - `--output_dir` or `-o`: Directory to save the output Parquet files.
   - `--species` or `-s`: Particle species name (default: "electrons").
 
-- **Key Features**:
+- **Implemented Features**:
   - Traverses the specified OpenPMD directory to find relevant simulation files.
   - Reads electric field data (Ex, Ey, Ez) and particle data (positions and momenta).
   - Normalizes and filters particle data based on energy thresholds.
@@ -55,11 +55,11 @@ The code is designed to leverage parallel processing using [MPI (Message Passing
   - `--species` or `-s`: Particle species name (default: "electrons").
   - `--analyze` or `-a`: Type of analysis to perform: 'field', 'particle', or 'full' (default: 'full').
 
-- **Key Features**:
+- **Implemented Features**:
   - Initializes analyzers for field and particle data based on user input.
-  - Reads particle data and calculates energy, generating 2D histograms of particle distributions.
+  - Reads particle data and calculates energy, generating 2D/1D histograms of particle distributions.
   - Analyzes divergence of particle momenta.
-  - Reads electric field data and generates visualizations.
+  - Reads electric/magnetic field data and generates visualizationsin any 2D projections located in the middle of the non-visible direction.
 
 ## Requirements
 
@@ -135,7 +135,6 @@ mpirun -np <num_processes> python opmd_pq_reader.py -d <parquet_directory> -o <o
 	- `particle`: Analyze only the particle data.
 	- `full`: Perform both field and particle analyses (default).
 
-
 ### Examples
 
 - To filter data from a specific OpenPMD file:
-- 
GitLab


From 0fa30be2493a38164160082d0d7ff81f03e39a52 Mon Sep 17 00:00:00 2001
From: Denis <D.bertini@gsi.de>
Date: Tue, 10 Dec 2024 20:26:54 +0100
Subject: [PATCH 3/4] New design using analyse steps 1 - reducing openpmd files
 to columnar parquet files 2 - analysis from reduced sets to produce
 histograms + readme

---
 README.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 85e95b1..ffeab22 100644
--- a/README.md
+++ b/README.md
@@ -150,8 +150,9 @@ mpirun -np <num_processes> python opmd_pq_reader.py -d <parquet_directory> -o <o
 	```
 
 ## Acknowledgments
-    Special thanks to the [openpmd](https://github.com/openPMD) community and particularly the [openpmd-api](https://github.com/openPMD/openPMD-api) 
-	developpers for their support and feedback.
+
+Special thanks to the [openpmd](https://github.com/openPMD) community and particularly the [openpmd-api](https://github.com/openPMD/openPMD-api) 
+developpers for their support and feedback.
 
 ## Contact
 
-- 
GitLab


From 0afe18400ee67df3e317eff5a06f7414d48937f9 Mon Sep 17 00:00:00 2001
From: Denis <D.bertini@gsi.de>
Date: Tue, 10 Dec 2024 20:32:36 +0100
Subject: [PATCH 4/4] New design using analyse steps 1 - reducing openpmd files
 to columnar parquet files 2 - analysis from reduced sets to produce
 histograms + readme

---
 README.md | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index ffeab22..d8fbb32 100644
--- a/README.md
+++ b/README.md
@@ -9,24 +9,26 @@ reading/writing.
 
 The **pp-ana** design follow a 2 steps approach:
 - 1) the simulation input openpmd data, which usually consist of large datasets, are reduced using selections (kinematical, geometrical ... ) , re-sampling or 
-particle merging methods. The data is then stored as reduced datasets. This steps is done once or only few times.
+particle merging methods. The data is then stored as reduced datasets. This step is done once or only few times.
 
 - 2) The post-processing (analysis),  which usually consists in visualizing the data using differents king of plots, can be performed by reading directly the reduced
-datasets provide by te first step. This step is usually done many times and is then naturally optimized by the reduced datasets. 
+datasets provide by te first step. This step is usually done many times and is then naturally optimized by the reduced datasets to be read. 
 
 To perform this 2-steps approach  **pp-ana** contains two main post-processing components: 
-- (`opmdfilter.py`) the filtering program that processes in parallel OpenPMD data files and generates Parquet files for both field and particle data
-- (`analyze.py`) the main analysis program that reads the generated Parquet files to produce histograms and visualizations (matplotlib). 
+- (`opmdfilter.py`) the filtering program that processes in parallel OpenPMD data files and generates [Parquet files](https://parquet.apache.org/) 
+for both field and particle data
+- (`analyze.py`) the main analysis program that reads the generated [Parquet files](https://parquet.apache.org/)  
+to produce histograms and visualizations ([matplotlib](https://matplotlib.org/)). 
 
 The code is designed to leverage parallel processing using [MPI (Message Passing Interface)](https://www.open-mpi.org/) via the [mpi4py python interface](https://mpi4py.readthedocs.io/en/stable/) and  [Dask](https://www.dask.org/) for efficient data handling.
 
 ## Workflow
 
-1. **Data Processing**: The filtering script reads in parallel the simulation data from  OpenPMD files, extracts/select relevant fields and particle information, and saves them in a efficient structured columnar format (Parquet) for further analysis.
+1. **Data Processing**: The filtering script reads in parallel the simulation data from  OpenPMD files, extracts/select relevant fields and particle information, and saves them in a efficient structured columnar format [Parquet](https://parquet.apache.org/)  for further analysis.
 
 2. **Parallel Computing**: The code utilizes MPI for parallel processing, allowing multiple processes to work on different parts of the data simultaneously. This is particularly useful for large datasets where reducing memory usage per node is essential.
 
-3. **Data Storage**: The use of Parquet files provides an efficient way to store large amounts of data with support for compression, making it easier to read and write data in a distributed environment. This format has been prouven to be very efficient on the [lustre filesystem](https://www.lustre.org/) installed on the [gsi virgo cluster](https://hpc.gsi.de/virgo/)
+3. **Data Storage**: The use of [Parquet files](https://parquet.apache.org/)  provides an efficient way to store large amounts of data with support for compression, making it easier to read and write data in a distributed environment. This format has been prouven to be very efficient on the [lustre filesystem](https://www.lustre.org/) installed on the [gsi virgo cluster](https://hpc.gsi.de/virgo/)
 
 4. **Data Analysis**: The analysis code reads the Parquet files and performs various analyses, including generating histograms of particle energy and field data visualizations.
 
@@ -37,14 +39,14 @@ The code is designed to leverage parallel processing using [MPI (Message Passing
 - **Command Line Arguments**:
   - `--opmd_dir` or `-d`: Directory containing OpenPMD input files.
   - `--opmd_file` or `-f`: Specific OpenPMD file to process (optional).
-  - `--output_dir` or `-o`: Directory to save the output Parquet files.
+  - `--output_dir` or `-o`: Directory to save the output [Parquet files](https://parquet.apache.org/) 
   - `--species` or `-s`: Particle species name (default: "electrons").
 
 - **Implemented Features**:
   - Traverses the specified OpenPMD directory to find relevant simulation files.
   - Reads electric field data (Ex, Ey, Ez) and particle data (positions and momenta).
   - Normalizes and filters particle data based on energy thresholds.
-  - Saves electric field and particle data as Parquet files, with metadata for field information.
+  - Saves electric field and particle data as [Parquet files](https://parquet.apache.org/) , with metadata for field information.
 
 ### Analysis Script (`analyze.py`)
 
-- 
GitLab