Page 1 of 1

user_rescon problem with parallel processing

Posted: Tue Jun 23, 2020 8:40 am
by Fanny.DM
user_cond.c
(10.56 KiB) Downloaded 291 times
syrthes.log
(11.24 KiB) Downloaded 297 times
Hello,

I've recently installed Syrthes 4.3 on a linux computer, and I got a segmentation fault error when running a function who worked perfectly with 4.1 version.

When I use the user_rescon (from user_cond.c)function to program a variable thermal contact resistance, the sequential calculation run.
But when I try the same calculation with parallel processing, I got the following error :

Code: Select all

*** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x5585736d9790
 [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f6d30e93f20]
 [ 1] ./syrthes(+0x50f8)[0x55881be690f8]
 [ 2] ./syrthes(+0x3eba)[0x55881be67eba]
 [ 3] ./syrthes(+0x78341)[0x55881bedc341]
 [ 4] ./syrthes(+0x25119)[0x55881be89119]
 [ 5] ./syrthes(+0x39f1)[0x55881be679f1]
 [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f6d30e76b97]
 [ 7] ./syrthes(+0x3c2a)[0x55881be67c2a]
 *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 5760 on node serveur1-Precision-7820-Tower exited on signal 11 (Segmentation fault).
The error seems to come from the call of

Code: Select all

prepare_paires_rc(maillnodes,rescon,t,tcor,sdparall);
and disapear when I quote this sub-function. The matter is, I do need this function for my script.

Here are the listing and the user_rescon function from the test case I tried (which is a simplification to debug user_rescon on my computer).

Can the error come from my installation?

Another question : at line 297 from the attached user_cond.c, I replaced

Code: Select all

nr=maillnodeus.nrefe[i]
by

Code: Select all

nr=maillnodeus.nrefe[ne]
 or nr=maillnodeus.nrefe[rescon.numf[i]]
Am I correct? It seems to be an error from 4.1 Syrthes version.

Thanks for your help!
Fanny

Re: user_rescon problem with parallel processing

Posted: Tue Aug 11, 2020 3:05 pm
by Fanny.DM
Hello,

I’m refreshing this post, because unfortunately, I couldn’t find any solution concerning my problem at this time. Here are some new informations :
- The problem doesn’t exactly appear with parallel processing, but seems to be depending of the partitioning. For example, in a very simple case with a contact resistance between two materials, the calculation run correctly with 1 or 2 processors. But with 3 or more procs I get the error message previously described (cf. my first post). The problem is the same if I use METIS rather than SCOTCH to partition the mesh, but with a limit of 3 processors before the problem occurrence. When I check the partitionning of my domain, it seems the error appears when the contact resistance is included in more than 1 partition of the domain.
- I tried to install Syrthes with several versions of openmpi (from 1.8.3 to 2.1.1), and the same problem occurs.
- I also tried with the last version I found of Syrthes (4.3.5 instead of the 4.3.0), but there is no change either.

Any advice or idea would be welcome !

Thanks
Fanny

Re: user_rescon problem with parallel processing

Posted: Tue Aug 11, 2020 8:16 pm
by Yvan Fournier
Hello,

Have you tried posting the the Syrthes support e-mail ? I am not sure but believe there is a syrthes-support at edf.fr adress, though answers are not guaranteed.

Otherwise, I can always try send them an e-mail, with no guarantee either.

Do you use the Syrthes radiative module, or just thermal diffusion ? The internal coupling in code_saturne does not handle either the radiative model nor thermal contact resistance yet, but thermal contact resistance would probably be easy to add and test rapidly.

Best regards,

Yvan