Page 1 of 2

Installation on cluster

Posted: Tue Sep 15, 2020 5:29 pm
by Puneeth
Hello,

While testing a fresh installation of Code-Saturne v5.1.6 on a cluster, I'm facing a Compile or link error shown in slurm-275114.out.
The error doesn't provide much info on what is going wrong or where.
I kindly request you to assist me in debugging this error.
Also, please find the compile.log attached herewith.

Thanks and Regards,
Puneeth

Re: Installation on cluster

Posted: Tue Sep 15, 2020 6:11 pm
by Yvan Fournier
Hello,

Thé compile log indicates the link with libxml2 fails. It is possibly due to not having the libxml2 dev package on the computer nodes (i.e. libxml2.so.x present but not the libxml2.so link).

Do you use "code_saturne submit" or the GUi, or submit a runcase directly (not recommend ed for the above reason) ?

You also have compile warnings you should check.

Best regards,

Yvan

Re: Installation on cluster

Posted: Wed Sep 16, 2020 9:06 am
by Puneeth
Hi,

Thanks for the reply.

I will confirm with the Cluster admins whether Code_saturne was installed with the libxml2 option enabled. And also for the dev package of libxml2 to be installed.

The simulation is run using a batch file.

Best regards,

Puneeth

Re: Installation on cluster

Posted: Wed Sep 16, 2020 2:17 pm
by Puneeth
Hello,

There has been an update in the situation.

The admin has recompiled Code-saturne v5.1.6 considering your recommendations about libxml2.
However, the simulations still fail due to another error:
"/gpfslocalsup/pub/code-saturne/5.1.6/libexec/code_saturne/cs_preprocess: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory"
Please find the slurm-xxxx.out file attached, which specifies this error.

Surprisingly, libimf.so seems to be found as a dependency by cs_process when we check the output for
"$ldd /gpfslocalsup/pub/code-saturne/5.1.6/libexec/code_saturne/cs_preprocess":

$ldd /gpfslocalsup/pub/code-saturne/5.1.6/libexec/code_saturne/cs_preprocess
linux-vdso.so.1 (0x00007fff3c184000)
libhdf5.so.10 => /gpfslocalsup/spack_soft/hdf5/1.8.21/intel-19.0.4-ze52g22lxxwb7ezsvxepmcixo6lmotwe/lib/libhdf5.so.10 (0x00007f0468a9a000)
libm.so.6 => /lib64/libm.so.6 (0x00007f0468718000)
libz.so.1 => /lib64/libz.so.1 (0x00007f0468501000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f04682fd000)
libmpifort.so.12 => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007f0467f3e000)
libmpi.so.12 => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/release/libmpi.so.12 (0x00007f046704c000)
librt.so.1 => /lib64/librt.so.1 (0x00007f0466e43000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0466c23000)
libgcc_s.so.1 => /gpfslocalsup/spack_soft/gcc/7.3.0/gcc-8.3.1-vqzoua4fyg6e5jiz3vhkpjb4qtofjfrf/lib64/libgcc_s.so.1 (0x00007f0466a0b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f0466648000)
libimf.so => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin/libimf.so (0x00007f04660a8000)
libsvml.so => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin/libsvml.so (0x00007f0464704000)
libirng.so => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin/libirng.so (0x00007f0464392000)
libintlc.so.5 => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007f0464120000)
/lib64/ld-linux-x86-64.so.2 (0x00007f046906b000)
libfabric.so.1 => /gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00007f0463ee8000)

Would you happen to have an idea about why this error appears?

Thank you,

Best regards,

Puneeth

Re: Installation on cluster

Posted: Wed Sep 16, 2020 3:15 pm
by Puneeth
Hello,

There is also something going wrong with LD_LIBRARY_PATH.
The path printed in the summary doesn't correspond to the path printed on the terminal.

The terminal output for ">echo $LD_LIBRARY_PATH" is:
/gpfslocalsup/spack_soft/scotch/6.0.6/intel-19.0.4-v5fgt76h3qeay6moyrh3w5jmof5kd5mq/lib:
/gpfslocalsup/spack_soft/hdf5/1.8.21/intel-19.0.4-ze52g22lxxwb7ezsvxepmcixo6lmotwe/lib:
/gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin:
/gpfslocalsup/pub/code-saturne/5.1.6/lib:
/gpfslocalsup/spack_soft/libxml2/2.9.9/gcc-8.3.1-oeywxcenymqugus6ctqdzstgjibgnwvj/lib:
/gpfslocalsup/spack_soft/petsc/3.11.3/intel-19.0.4-npalh4bbtqx2n646lssj4yqxzkejhwls/lib:
/gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib:
/gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/release:
/gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib:
/gpfslocalsup/spack_soft/metis/5.1.0/intel-19.0.4-2rnvhtykdeapptm3tr5a4qle5y3miact/lib:
/gpfslocalsys/intel/parallel_studio_xe_2019_update4_cluster_edition/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin:
/gpfslocalsup/spack_soft/gcc/7.3.0/gcc-8.3.1-vqzoua4fyg6e5jiz3vhkpjb4qtofjfrf/lib64:
/gpfslocalsup/spack_soft/gcc/7.3.0/gcc-8.3.1-vqzoua4fyg6e5jiz3vhkpjb4qtofjfrf/lib:
/gpfslocalsys/slurm/current/lib/slurm:
/gpfslocalsys/slurm/current/lib


But the LD_LIBRARY_PATH in the summary file is different:
LD_LIBRARY_PATH=/gpfslocalsup/spack_soft/libxml2/2.9.9/gcc-8.3.1-oeywxcenymqugus6ctqdzstgjibgnwvj/lib

Could this be the reason why Code-Saturne doesn't find libimf.so?

Thank you,

Best regards,

Puneeth

Re: Installation on cluster

Posted: Thu Sep 17, 2020 6:49 am
by Yvan Fournier
Hello,

Yes, this could explain the issue. Do you have environment modules loaded at install time ?

Regards,

Yvan

Re: Installation on cluster

Posted: Thu Sep 17, 2020 8:23 am
by Puneeth
Hello,

The installation is based on the following modules:
-intel-compilers/19.0.4
-intel-mpi/2019.4
-intel-mkl/2019.4
-hdf5/1.8.21-mpi
-scotch/6.0.6-mpi
-petsc/3.11.3-mpi
-libxml2/2.9.9

Best,

Puneeth

Re: Installation on cluster

Posted: Thu Sep 17, 2020 10:20 pm
by Yvan Fournier
Hello,

Yes, but how is the environment see sourced/loaded ?

Can you post the config.log ?

Regards,

Yvan

Re: Installation on cluster

Posted: Fri Sep 18, 2020 7:59 am
by Puneeth
Hello,

Please find the config.log attached herewith.

Best regards,

Puneeth

Re: Installation on cluster

Posted: Sat Sep 19, 2020 5:55 pm
by Yvan Fournier
Hello,

Environment modules were detected (and probably loaded) when running the "configure' step.

It is possible that those modules are not loaded correctly when running (depending on the module system variant), as they are loaded by a Python script.

Starting with code_saturne 5.0.10 (the latest 5.0 release is 5.0.12), there is a "--with-shell-env" configure option allowing to source a shell environment script first, which might help in your case. If you do this, you can also add --with-modules=no, since you need one mechanism or the other, not both.

To use the --with-shell-env option, first install using --with-shell-env and no path, then copy/adapt the <install_prefix>/bin/code_saturne script so as to load the module or environment variables you need, and re-install usin --with-shell-env=<path-to-modified-script> (unless everything works ok on the first pass).

Best regards,

Yvan