Strange Halting at Half Total Iterations

finzeo · Post by **finzeo** » Tue Aug 06, 2024 8:13 pm

Edit: The issue has reappeared, even with the temporary solution I mentioned in the previous message. The problem is that inconsistently, in some runs, a .lock file is generated when writing to disk (either for checkpoint or post-processing), and the run gets blocked. This occurs when running within an HPC cluster.

The solution we found (which is a workaround) is to force sequential (non-parallel) writing. This can be done from the GUI: Performance settings > Input/output > Read/write method: Standard I/O, serial.

I understand that for heavy cases where a lot of disk writing is done, this could be a time constraint, but generally, there shouldn't be a problem.

finzeo · Post by **finzeo** » Thu Jan 02, 2025 4:57 pm

The issue has resurfaced again: forcing sequential disk writing during the most critical moments (checkpoint saving and postprocessing files) currently does not solve this problem. The manifestation is practically the same: the run simply pauses indefinitely when performing checkpointing and/or saving postprocessing files (although now it no longer generates .lock files).

I recently tried changing options within cs_user_performance_tuning-parallel_io.c, specifically: minimum buffer size, block size, and options related to romio_cb_write and romio_ds_write.

Let me remind you that I am running this on an HPC cluster with the following specifications: https://cimec.org.ar/c3/pirayu/index.php#equipo. The cluster administrators temporarily lifted any possible resource restrictions for my user:

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 513004
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 70000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 513004
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

This problem has not been reported by users of other CFD software running on this cluster.

Another solution I tried was running the case from RAM memory, meaning having the case folder stored in RAM (in my cluster, under /dev/shm). However, the same issue occurs, and it remains random: sometimes it crashes when trying to save those files, and sometimes it doesn't.

This issue is truly a headache. I can't understand how it is possible to still experience this problem while writing sequentially and with all resource limitations lifted.

What else could I try? I am using Code_Saturne 8.0.1 with MPI version 3.1 (Open MPI 4.1.1). I have also tested with version 8.2 and with other Open MPI versions (I do not have the exact versions recorded, but they were within the 4.0.x and 3.x ranges).

Antech · Post by **Antech** » Thu Jan 09, 2025 3:21 pm

Hello. I'd try to compile with OpenMPI 1.8.4 because I found newer versions of OpenMPI cause unpredictable hangs of calculation on my system (local CentOS-7.5 machine). Newer does not mean better not only with music, in many cases with software also. For example, old versions of Torque and Ganglia (for CentOS-6.2) was better.
Also, I don't think such issue may be related to memory size available. If memory is insufficient, Saturne crashes, not hangs. You will see from error output that it can't allocate enough memory. In opposite, OS/MPI/Solver interaction is likely to cause hangs and slowdowns.

Post by **Yvan Fournier** » Fri Jan 10, 2025 11:43 am

Hello,

Did you try using the Crystal Router algorithm instead of the default All-to-all algorithm (in the GUI, under "Performance Settings -> MPI Algorithms -> All to all data movement" ?

Also, you may try MPICH instead of OpenMPI. Since this sort of issue is related to low-level bugs in MPI libraries, depdending on interaction with firmware and lower-level network drivers and libraries, using a different MPI implmentation can sometimes help (we do not have the issues we had with OpenMPI 4.0/4.1 with Intel MPI, which is MPICH-based).

All of this of course depends on what is available on your cluster, as installing/configuring an MPI library which integrates well with the batch system might require help from the admins (at the very least, when installing MPI, you may need to tell it where som high-speed network drivers and batch system headers may be, otherwise you may end up using the default (slow) network and miss batch integration.

Best regards,

Yvan

finzeo · Post by **finzeo** » Fri Jan 10, 2025 6:05 pm

Hi all,

Over the past few days I’ve been trying to run, with the help of the cluster administrators, with other MPI implementations, as you finally recommended me. These are the results I got (originally, I was running with OpenMPI 4.1.1, which is the problematic one):

I tried with Intel MPI (based on MPICH 3.4a2), and with the little I tested, I didn’t experience any crashes, but the computation times increased by 30-50%, which is unacceptable.

I tried with mvapich 2.3.6, and this time, instead of crashing, the run finishes and throws an error log (always when writing a large amount of data to disk: postprocessing or checkpoint). The log is attached. In the logs, what stands out to me, which always appears in the stack, is the function cs_all_to_all_copy_array, which is defined in base/cs_all_to_all.c.

I tried using Crystal Router, and it’s still crashing.

I mentioned the InfiniBand drivers to the cluster administrators, but they said it’s not in their plans to change anything related to that, as it would jeopardize the functioning of other programs installed on the system.

Based on all these results, I ask you:

How could I try to improve performance/computation time using Intel MPI?

What I most suspect is that it’s a race condition issue related to the cs_all_to_all.c code; so could any basic changes inside the cs_all_to_all_copy_array function help with this? Something simple like adding an MPI_Barrier/MPI_Wait?

Post by **Yvan Fournier** » Sat Jan 11, 2025 1:27 am

Hello,

Do you have the crash with Crystal Router with different MPI libraries ? Or with which ones ? Do you have a backtrace for this ?

With Intel MPI, there are some environment variables which allow choosing some algorithm variants for all to all or reduction operators, so you might check the doc and experiment with those. Last time I tried, the default options were the fastest but this might depend on your hardware.

In any case, we use MPI_Alltoall to have a collective operation and avoid race conditions, but too simple implementations can have these races. Which is why we added the Crystal router. But the issues we encountered on are side are much lower level, and due to a ug or incompatibility much lower in the software stack. According to other users, t also impacted commercial codes (Star-CCM+ if I remember correctly), and the workaround for those was to switch to another MPI library.

How much memory does the code use compared to what is available? How many cells per rank do you have.

Also, in case you are o code path with a specific bug, did you try running a few iterations with a debug build ? It will be slower but might catch some upstream errors.
Do you have user-defined functions? You could have memory leaks in those, or (more rarely) in the code itself.
Can you run a few iterations with
export CS_MEM_LOG=mem.log
In the batch template (In run.cfg) or calling script and check the end of the mem.log file ?

Regards,

Yvan

code_saturne User's Forum

Strange Halting at Half Total Iterations

Re: Strange Halting at Half Total Iterations

Re: Strange Halting at Half Total Iterations

Re: Strange Halting at Half Total Iterations

Re: Strange Halting at Half Total Iterations

Re: Strange Halting at Half Total Iterations

Re: Strange Halting at Half Total Iterations