Page 1 of 1

computation start error with number of processes

Posted: Mon Oct 26, 2015 1:59 pm
by sirlb
Hello,

My computations are running fine when i take full nodes ressources.
But when grid engine gives my job half nodes (6 cores already affected to another job), the computation failed with the error below :

Code: Select all

--------------------------------------------------------------------------
Your job has requested a conflicting number of processes for the
application:

App: ./cs_solver
number of procs:  84

This is more processes than we can launch under the following
additional directives and conditions:

number of nodes:   8
npernode:   10

Please revise the conflict and try again.
--------------------------------------------------------------------------
My configuration is 2 proc of 6 cores by node.
Is there any option in code_saturne to fix this problem? Or do you think that it is linked to my grid engine system ?

Thank you for your feedback.

Re: computation start error with number of processes

Posted: Mon Oct 26, 2015 3:37 pm
by Yvan Fournier
Hello,

I am not too knowledgeable regarding GridEngine, and from a Code_Saturne script management point of view, it is the worst batch system to support (due to its peculiar syntax, where you basically need to know the name of computing environments in advance). I was hoping with Oracle buying sun then almost abandoning Grid Engine, this system would die, but unfortunately, some admins still cling to it...

In the Code_Saturne post-install phase (code_saturne.cfg), you can force additional options to mpiexec (or mpirun, or whatever the local recommended command is), so this might help you work around the issue, but I'm not sure.

In any case, in parallel, I do not recommend sharing nodes with other codes, at least not without prior benchmarking. Depending on the resource usage of the other code, if Code_Saturne is slowed down on even a single node, all other nodes will wait so resources are wasted. Sharing a node with a tool that is light on resources (such a a co-processing tool which only activates once in a while) could be a much better idea, so there are scenarios where sharing a node can be good, but when in doubt (or if you haven't benchmarked on a similar sized case) don't do it. Use less nodes, don't share them.

Regards,

Yvan

Re: computation start error with number of processes

Posted: Mon Oct 26, 2015 4:45 pm
by sirlb
Hello Yvan,

Thank you for the feedback.
We are trying to keep nodes reserved per jobs so that we usually indeed not share nodes for multiple jobs / codes.

But in rare occasions, some jobs are leaving some ressources in a node and then my cs jobs are launched when i don't want them to start.

I'll try to dig in gridengine conf if i can avoid this.