user_rescon problem with parallel processing

This forum is dedicated to Syrthes related issues, as the Syrthes tool does not have its own forum.
Post Reply
Fanny.DM
Posts: 5
Joined: Fri Mar 27, 2020 8:35 am

user_rescon problem with parallel processing

Post by Fanny.DM » Tue Jun 23, 2020 8:40 am

user_cond.c
(10.56 KiB) Downloaded 18 times
syrthes.log
(11.24 KiB) Downloaded 17 times
Hello,

I've recently installed Syrthes 4.3 on a linux computer, and I got a segmentation fault error when running a function who worked perfectly with 4.1 version.

When I use the user_rescon (from user_cond.c)function to program a variable thermal contact resistance, the sequential calculation run.
But when I try the same calculation with parallel processing, I got the following error :

Code: Select all

*** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x5585736d9790
 [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f6d30e93f20]
 [ 1] ./syrthes(+0x50f8)[0x55881be690f8]
 [ 2] ./syrthes(+0x3eba)[0x55881be67eba]
 [ 3] ./syrthes(+0x78341)[0x55881bedc341]
 [ 4] ./syrthes(+0x25119)[0x55881be89119]
 [ 5] ./syrthes(+0x39f1)[0x55881be679f1]
 [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f6d30e76b97]
 [ 7] ./syrthes(+0x3c2a)[0x55881be67c2a]
 *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 5760 on node serveur1-Precision-7820-Tower exited on signal 11 (Segmentation fault).
The error seems to come from the call of

Code: Select all

prepare_paires_rc(maillnodes,rescon,t,tcor,sdparall);
and disapear when I quote this sub-function. The matter is, I do need this function for my script.

Here are the listing and the user_rescon function from the test case I tried (which is a simplification to debug user_rescon on my computer).

Can the error come from my installation?

Another question : at line 297 from the attached user_cond.c, I replaced

Code: Select all

nr=maillnodeus.nrefe[i]
by

Code: Select all

nr=maillnodeus.nrefe[ne]
 or nr=maillnodeus.nrefe[rescon.numf[i]]
Am I correct? It seems to be an error from 4.1 Syrthes version.

Thanks for your help!
Fanny

Fanny.DM
Posts: 5
Joined: Fri Mar 27, 2020 8:35 am

Re: user_rescon problem with parallel processing

Post by Fanny.DM » Tue Aug 11, 2020 3:05 pm

Hello,

I’m refreshing this post, because unfortunately, I couldn’t find any solution concerning my problem at this time. Here are some new informations :
- The problem doesn’t exactly appear with parallel processing, but seems to be depending of the partitioning. For example, in a very simple case with a contact resistance between two materials, the calculation run correctly with 1 or 2 processors. But with 3 or more procs I get the error message previously described (cf. my first post). The problem is the same if I use METIS rather than SCOTCH to partition the mesh, but with a limit of 3 processors before the problem occurrence. When I check the partitionning of my domain, it seems the error appears when the contact resistance is included in more than 1 partition of the domain.
- I tried to install Syrthes with several versions of openmpi (from 1.8.3 to 2.1.1), and the same problem occurs.
- I also tried with the last version I found of Syrthes (4.3.5 instead of the 4.3.0), but there is no change either.

Any advice or idea would be welcome !

Thanks
Fanny

Yvan Fournier
Posts: 3049
Joined: Mon Feb 20, 2012 3:25 pm

Re: user_rescon problem with parallel processing

Post by Yvan Fournier » Tue Aug 11, 2020 8:16 pm

Hello,

Have you tried posting the the Syrthes support e-mail ? I am not sure but believe there is a syrthes-support at edf.fr adress, though answers are not guaranteed.

Otherwise, I can always try send them an e-mail, with no guarantee either.

Do you use the Syrthes radiative module, or just thermal diffusion ? The internal coupling in code_saturne does not handle either the radiative model nor thermal contact resistance yet, but thermal contact resistance would probably be easy to add and test rapidly.

Best regards,

Yvan

Post Reply