Dear Code Saturne Experts:
This is Dr. Fahad and I recently convinced my department to procure a
Dell blade server R815 with 48 processing nodes. It is not much but a beginning. Its purpose is to host open-source CFD. CAE Linux2013 is installed on it with code_saturne version 3.0. I want to promote use of open-source softwares and build a strong community of reseachers in the field of thermo-fluids.
After initial code_saturne understanding and practice of over 2 months, I am still unable to answer so many basic questions that arise daily. I did not want to ask basic questions in forums therefore took a lot of time to self-learn.
I will start with the most frustrating issue.
1. Small meshes of few hundred thousand cells run perfectly well. Running them on multiple parallel nodes show good acceleration and you can feel the acceleration as the number of nodes are increased.
2. But this is not the case on fairly large meshes like I am using nowadays with 22 million cells (unstructured using Gmsh) of a flow around a circular cylinder. I intend to run LES on it. The behaviour is strange. If i run it on multiple nodes greater than 10 the solver either gets stuck at mesh partitioning or on computing geometric quanitities (mostly on this). And it goes on for days without any error and without any advancement with CPU loads as per number of nodes engaged yet no output.
3. Lower the number of nodes the faster is the solver preprocessing though still taking time on partitioning and computing geometries which i guess makes sense even though i do not know what is meant by computing geometric quantities and what is its need if check mesh is already performed after importing mesh.
4. Right now I am running 22 million cells on just 4 nodes and it has performed 1 iteration and now stuck on second without any error. It did one iteration in 07hours physical time since i pressed run batch calculation. Still a lot of time was consumed by preprocessing by the solver.
5. Sometimes i kill calculations because no iterations are being performed after 24 hours of the run and mysteriously one iteration appears on the listings.
6. But then small meshes work perfectly fine.
7. I would be grateful if someone could guide me on howto steps to resolve these issues. I am better with gui and less with command line.
Thanks and I apologize for lengthy post
Mesh partition and computing geometric quantities
Forum rules
Please read the forum usage recommendations before posting.
Please read the forum usage recommendations before posting.
-
- Posts: 4208
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Mesh partition and computing geometric quantities
Hello,
For a new usage, I really recommend Code_Saturne 4.0, which will be maintained 3+ more years, versus version 3.0, which will be maintained only up to the realease of version 5.0, in slightly less than one year... It also has additional features.
In any case, the issues you describe may be (at least in part) related to MPI driver issues, so I have a few questions:
- do you use user subroutines ? (if so, could you post them here, along with the xml file) ?
- which MPI library do you use ? (we might be able so suggest som tuning parameters)
- for cases which got stuck then progressed a bit, do you have "listing" files ?
Here again, relative to "listing" files, the latest 4.0 version may provide a little bit more useful information...
Regards,
Yvan
For a new usage, I really recommend Code_Saturne 4.0, which will be maintained 3+ more years, versus version 3.0, which will be maintained only up to the realease of version 5.0, in slightly less than one year... It also has additional features.
In any case, the issues you describe may be (at least in part) related to MPI driver issues, so I have a few questions:
- do you use user subroutines ? (if so, could you post them here, along with the xml file) ?
- which MPI library do you use ? (we might be able so suggest som tuning parameters)
- for cases which got stuck then progressed a bit, do you have "listing" files ?
Here again, relative to "listing" files, the latest 4.0 version may provide a little bit more useful information...
Regards,
Yvan
Re: Mesh partition and computing geometric quantities
Hello:
Thanks for replying.
I cannot immediately update to version 4.0 until the current issue is resolved or shown to be a version 3.0 specific issues only.
The answers to your questions are:
1. The reported issue is independent of the user subroutines. I am learning the structure of the code and as such no user subroutines have been used. Only GUI
2. I have attached few listings and respective xml files in a zipped folder that I have at the moment. Details are in their names.
3. All in all the solver did not progress beyond first time step even on small number of nodes.
4. I have contacted my network support. I am not sure which version of MPI is being used if that is what you have asked. Otherwise if you are asking about partitioning libraries I have used 'default' in GUI options as well as 'METIS' with the same issue.
I will be grateful if you could point out flaws in my set-up. Things I missed. May be wrong initialization. Any tips or tweaks. MPI rank, mesh input etc.
Thanks
Thanks for replying.
I cannot immediately update to version 4.0 until the current issue is resolved or shown to be a version 3.0 specific issues only.
The answers to your questions are:
1. The reported issue is independent of the user subroutines. I am learning the structure of the code and as such no user subroutines have been used. Only GUI
2. I have attached few listings and respective xml files in a zipped folder that I have at the moment. Details are in their names.
3. All in all the solver did not progress beyond first time step even on small number of nodes.
4. I have contacted my network support. I am not sure which version of MPI is being used if that is what you have asked. Otherwise if you are asking about partitioning libraries I have used 'default' in GUI options as well as 'METIS' with the same issue.
I will be grateful if you could point out flaws in my set-up. Things I missed. May be wrong initialization. Any tips or tweaks. MPI rank, mesh input etc.
Thanks
- Attachments
-
- Listings.zip
- (40.9 KiB) Downloaded 319 times
Re: Mesh partition and computing geometric quantities
If it helps, my network administrator has set up a virtual machine for me with 32 processing nodes available to me. That's how i run CAE linux on this virtual machine remotely.
-
- Posts: 4208
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Mesh partition and computing geometric quantities
Hello,
Difficult to say what is wrong from these listings. In any case, even if you remain with version 3.0.x, there are some fixes in version 3.0.2 at least (we are now at 3.0.8) which may be important for wall boundary conditions.
I see you have a most 3 interior faces per cell, so I assume your mesh is 1 cell-thick, meshes with tetrahedra, which is not optimal at all for LES...
To have more info, could you run one of the cases which get stuck after 1 iteration for a single iteration ? this will allow obtaining more performance info, and a useful "performance.log" file which will provide additional info.
Also, on how many physical nodes and cores are your 32 virtual nodes ? This is important for performance. Which virtual machine type do you use ?
Regards,
Yvan
Difficult to say what is wrong from these listings. In any case, even if you remain with version 3.0.x, there are some fixes in version 3.0.2 at least (we are now at 3.0.8) which may be important for wall boundary conditions.
I see you have a most 3 interior faces per cell, so I assume your mesh is 1 cell-thick, meshes with tetrahedra, which is not optimal at all for LES...
To have more info, could you run one of the cases which get stuck after 1 iteration for a single iteration ? this will allow obtaining more performance info, and a useful "performance.log" file which will provide additional info.
Also, on how many physical nodes and cores are your 32 virtual nodes ? This is important for performance. Which virtual machine type do you use ?
Regards,
Yvan
Re: Mesh partition and computing geometric quantities
hi yvan:
1. Yes it is one cell thick. Yes tetrahedras. I agree its not optimal for LES but my simulation more or less finds the same fate when running laminar and steady for testing purposes.
2. I mean 1 node=1 processor. In total the blade server has 48 processors. Out of these 48, I am given a virtual machine of 32 processors. I virtual processor or node=1 physical processor or node.
3. Virtualization is performed using vmware vsphere (if that answers your question otherwise kindly rephrase). Also my VM has 128GB RAM which translates to 4GB per VM node.
4. I have run the simulation for 1 iteration. listing, xml, performance.log and other files are attached in a zipped folder. please guide.
Regards
Fahad
1. Yes it is one cell thick. Yes tetrahedras. I agree its not optimal for LES but my simulation more or less finds the same fate when running laminar and steady for testing purposes.
2. I mean 1 node=1 processor. In total the blade server has 48 processors. Out of these 48, I am given a virtual machine of 32 processors. I virtual processor or node=1 physical processor or node.
3. Virtualization is performed using vmware vsphere (if that answers your question otherwise kindly rephrase). Also my VM has 128GB RAM which translates to 4GB per VM node.
4. I have run the simulation for 1 iteration. listing, xml, performance.log and other files are attached in a zipped folder. please guide.
Regards
Fahad
- Attachments
-
- fahad.zip
- (14.71 KiB) Downloaded 311 times
-
- Posts: 4208
- Joined: Mon Feb 20, 2012 3:25 pm
Re: Mesh partition and computing geometric quantities
Hello,
The AMD Opteron 6348 seems to have 12 cores, so I assume your server has 4 processors (though the virtual machine might change this). If it really has 48 processors, that would be 48x12 cores, and with 32 of those, you could actually run on 32*12 = 384 MPI processes (or 192 for half load, which would seem more reasonable). But I doubt a 48 processor (not core) machine would have "only" 128 Gb RAM, so I guess your really have 4*12, and are alloted 3*12.
I do not know which type of network you have between processors (or if all are on a same board, sharing memory), but by your performance.log, I have the impression bandwidth is more of an issue than network latency (and memory bandwidth might be the biggest issue, more than network bandwidth).
If memory bandwidth is the main issue, you should see similar performance using 16 MPI processes as using 32. In the case you sent, you used 5. Could you try again with 16 and 32, for comparison ?
Also, I dod not know whether virtualization has a significant impact or not on performance. My recommendations are based on "non-virtualized" experience, but I am not sure whether the virtualization you use has a significant impact on performance or not...
Regards,
Yvan
The AMD Opteron 6348 seems to have 12 cores, so I assume your server has 4 processors (though the virtual machine might change this). If it really has 48 processors, that would be 48x12 cores, and with 32 of those, you could actually run on 32*12 = 384 MPI processes (or 192 for half load, which would seem more reasonable). But I doubt a 48 processor (not core) machine would have "only" 128 Gb RAM, so I guess your really have 4*12, and are alloted 3*12.
I do not know which type of network you have between processors (or if all are on a same board, sharing memory), but by your performance.log, I have the impression bandwidth is more of an issue than network latency (and memory bandwidth might be the biggest issue, more than network bandwidth).
If memory bandwidth is the main issue, you should see similar performance using 16 MPI processes as using 32. In the case you sent, you used 5. Could you try again with 16 and 32, for comparison ?
Also, I dod not know whether virtualization has a significant impact or not on performance. My recommendations are based on "non-virtualized" experience, but I am not sure whether the virtualization you use has a significant impact on performance or not...
Regards,
Yvan
Re: Mesh partition and computing geometric quantities
Hi Yvan
Thanks for replying. Tomorrow I will get back to you with performance logs on 16 and 32 MPI processes for 1 iteration simulation. I would really like to get to the bottom of this so I will get back to you with all the necessary information regarding network between processors, shared memory, same board, memory bandwidth etc. after one sitting with my network admin.
The confusion between processors and cores is as old as the parallel processing itself. You are right. I will come back with proper verified numbers this time. I do not have at all technical capabilities/knowledge ( neither my network admin) to know if virtual environment has an impact on performance so I can only request you to consult your colleagues on this perhaps.
Thanks
Fahad
Thanks for replying. Tomorrow I will get back to you with performance logs on 16 and 32 MPI processes for 1 iteration simulation. I would really like to get to the bottom of this so I will get back to you with all the necessary information regarding network between processors, shared memory, same board, memory bandwidth etc. after one sitting with my network admin.
The confusion between processors and cores is as old as the parallel processing itself. You are right. I will come back with proper verified numbers this time. I do not have at all technical capabilities/knowledge ( neither my network admin) to know if virtual environment has an impact on performance so I can only request you to consult your colleagues on this perhaps.
Thanks
Fahad
Re: Mesh partition and computing geometric quantities
Hi Yvan:
The specs of the physical system are:
-------------------------------------
4 AMD opteron 6348 processors with 12 cores each. Hence in total 12*4=48 cores.
4 sockets and 12 cores per socket
All 4 processors on the same board so NO networking between them
Total RAM 256 GB DDR3-1333MHz
Total Storage 4TB
The specs of my virtual machine are:
------------------------------------
32 cores=8 cores from each socket
128GB RAM dedicated for my use
1TB storage
1 iteration unsteady LES simulations:
-------------------------------------
3 performance logs are attached for one iteration for 5, 16 and 32 MPI processes as asked by you.
I was surprised to see that LES simulation at 32 MPI ran for one iteration and that too quicker than the 16MPI and 5MPI, since solver used to get stuck at preprocessing stages at high MPIs whose listings i have already provided.
For further investigation, I ran the same simulation (32 MPI) for 10 time steps and it performed 1 time step and got stuck during second whose listing is also attached. And if i run it for 1000 time steps it wont even perform a single iteration. I do not know if we are getting somewhere as it seems higher the number of time steps the earlier they get stuck at high MPIs. May be its something else.
I am performing some further tests as well to get to the bottom of it. Meanwhile I would request you to check the attachments.
Regards
Fahad
The specs of the physical system are:
-------------------------------------
4 AMD opteron 6348 processors with 12 cores each. Hence in total 12*4=48 cores.
4 sockets and 12 cores per socket
All 4 processors on the same board so NO networking between them
Total RAM 256 GB DDR3-1333MHz
Total Storage 4TB
The specs of my virtual machine are:
------------------------------------
32 cores=8 cores from each socket
128GB RAM dedicated for my use
1TB storage
1 iteration unsteady LES simulations:
-------------------------------------
3 performance logs are attached for one iteration for 5, 16 and 32 MPI processes as asked by you.
I was surprised to see that LES simulation at 32 MPI ran for one iteration and that too quicker than the 16MPI and 5MPI, since solver used to get stuck at preprocessing stages at high MPIs whose listings i have already provided.
For further investigation, I ran the same simulation (32 MPI) for 10 time steps and it performed 1 time step and got stuck during second whose listing is also attached. And if i run it for 1000 time steps it wont even perform a single iteration. I do not know if we are getting somewhere as it seems higher the number of time steps the earlier they get stuck at high MPIs. May be its something else.
I am performing some further tests as well to get to the bottom of it. Meanwhile I would request you to check the attachments.
Regards
Fahad
- Attachments
-
- performance-32MPI-1iter.log
- (19.81 KiB) Downloaded 321 times
-
- performance-16MPI-1iter.log
- (20.98 KiB) Downloaded 348 times
-
- performance-5MPI-1iter.log
- (19.81 KiB) Downloaded 317 times
Re: Mesh partition and computing geometric quantities
Could not attach in the previous post. Its stuck listing of 32MPI ran for 100 iterations as mentioned in previous post.
Thanks
Thanks
- Attachments
-
- listing-32MPI-100iter-stuck.zip
- (8.98 KiB) Downloaded 311 times