[slurmd] Configuration files in the package should not be installed under /etc/slurm

Our proposal: would it possible to have a slurmd RPM that store its configuration files under /usr/share/slurmd/... as examples, instead of /etc/slurm?

Yes, sounds good. I have looked also for the possibility of an install-time parameter, but this is does not exist, so we need to move them unconditionally to a different path (or skip) as you suggested.

@d.klein do you think it possible to rebuild the slurmd package and if yes, how much do you estimate it will take to have such a change?

I am on it, give me an hour or so. This change will also be affecting the config of slurm-singularity-exec as it depends on the creation of the /etc/slurm/plugstack.conf.d directory by the slurm package. This will no longer happen, so slurm-singularity-exec also cannot install its config any more.

Ah, now I realize, you only want it for the slurmd package? Not for all?

Obviously the slurmctld and slurmdbd should not install configuration files and not modify anything in /etc/slurm, which would overwrite the configuration maintained in virgo-3/slurm-config

# ...after an automatic package upgrade this morning
[root@lxrm10 slurm]# grep -i upgraded /var/log/dnf.log 
2024-01-31T06:24:19+0100 DEBUG Upgraded: munge-0.5.15-6.el8.x86_64
2024-01-31T06:24:19+0100 DEBUG Upgraded: munge-libs-0.5.15-6.el8.x86_64
2024-01-31T06:24:19+0100 DEBUG Upgraded: slurm-23.11.1-3.el8.x86_64
2024-01-31T06:24:19+0100 DEBUG Upgraded: slurm-libs-23.11.1-3.el8.x86_64
2024-01-31T06:24:19+0100 DEBUG Upgraded: slurm-slurmctld-23.11.1-3.el8.x86_64
2024-01-31T06:24:19+0100 DEBUG Upgraded: slurm-slurmdbd-23.11.1-3.el8.x86_64

# ...restart was triggered by the package
[root@lxrm10 slurm]# systemctl status slurmdbd
● slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2024-01-31 06:24:19 CET; 59min ago
Condition: start condition failed at Wed 2024-01-31 07:18:11 CET; 5min ago
           └─ ConditionPathExists=/etc/slurm/slurmdbd.conf was not met
#...

# ...slurmdbd is dead ...since the configuration is missing
[root@lxrm10 slurm]# ls /etc/slurm/slurmdbd*
/etc/slurm/slurmdbd.conf.rpmsave

The exiting configuration files are moved to *.rpmsave, which makes an restart of the service not possible.

I'm still unsure about automatic restarts, which you mentioned in another comment below. Depending on the situation restarting slurmctld could render the cluster unresponsive for all users. Personally I would like to propose to NOT automatically restart slurmctld and slurmdbd, but lets see what the other think about that...

No config (also for slurmctld/dbd pkgs) will be written to /etc/slurm any more by the updated packages. The *.rpmsave files must be a one-time effect from uninstalling the old package which used to own those files via %config(noreplace).

Regarding the triggered restart, I can easily remove this behaviour. Linux should keep the old binary files alive as long as they are used by the still running services. (even though they are no longer reachable via the filesystem after the package is installed)

...one-time effect from uninstalling the old package which used to own those files via %config(noreplace)

That makes sense... We'll see with the next package upgrade for issue #1 (closed) soon.

...after an automatic package upgrade this morning

This is of course now in our hands to coordinate better as well. E.g. I could only publish updates to the repo-history git repo and you decide when to copy this to cluster-mirror and thus control when a restart would be triggered. And in addition, we could remove the triggered restart, then you can trigger it via config mgmt or manually.

mentioned in commit 5263959b

mentioned in commit slurm-singularity-exec@984ccbd9

mentioned in commit 282ef038

@m.dessalvi:

I just pushed new el8 packages to http://cluster-mirror.hpc.gsi.de/packages/virgo-3/el8/. The full diff to the previous repo state is repo-history@33080cf1.

I modified the slurm and slurm-singularity-exec packages to no longer put their configuration to /etc/slurm (directory is not even created), but to /usr/share/slurm. However, by content, the configuration and service files assume the slurm config to be present in /etc/slurm as before which means you have to deploy it at that location before any services can be started.

In case you are wondering, why the munge package is updated as well. It has not changed except that the build version number is now rendered correctly. I only realized this bug after the first repo build was published, see builder-virgo#1 (closed) for details. This is also the reason why it took a bit longer, sry, I first tried to get rpmautospec working on epel8, but it needs too many changes in the code so I got stuck.

If you update the packages slurmctld/dbd packages, note: Currently, they will restart the services. I would be interested in your feedback on this.

Now also updating the el9 packages to match the new behaviour regarding /etc/slurm. Should finish some time later today.

I just pushed new el8 packages to http://cluster-mirror.hpc.gsi.de/packages/virgo-3/el8/. The full diff to the previous repo state is repo-history@33080cf1.

I can confirm that the slurmd package can be install without modifying /etc/slurm. Thanks!

mentioned in commit slurm-singularity-exec@0ccf0a88

Now also updating the el9 packages to match the new behaviour regarding /etc/slurm. Should finish some time later today.

Uploaded.

mentioned in issue #3 (closed)

Followup in #3 (closed)

closed

mentioned in commit e093553f

mentioned in commit slurm-singularity-exec@52cd4cd5

[slurmd] Configuration files in the package should not be installed under /etc/slurm

Designs

Child items ...

Activity