Page 1 of 2
					
				RSM SSG model + Least Squares + All vertexes does not work
				Posted: Wed Oct 12, 2022 4:11 pm
				by Antech
				Hello. I found that there is a bug with RSM SSG (didn't test with EBRSM) on tetra mesh when Least Squares gradient reconstruction is selected with 
Full (All vertex adjacent) option and 
"Improved pressure interpolation" is on. Calculation just crashes at first iteration when turbulence variables solver is engaged. If I switch to "Non-ortho faces threshold" calculation starts normally (I think it will diverge anyway because it's hard to covnerge on RSM in virtually any code but it's natural while SIGSEGV is not 

). I tried with Scalar diffusivity (Shir model) in RSM options but without success. There are different subroutines in call stack but the same SIGSEGV at first iteration somewere at linear solvers stage. 
Calculation also runs with different, coarser mesh (but its too coarse to obtain good solution, it gives too low aerodynamic resistance). So the problem is intermittent, it depends on the mesh.
Setup is a simple air-cooled heat exchanger with distributed aerodynamic resistance and heat source (simple user subroutine) in tube bondles areas. Air is driven by the fan model (4/9 fans depending on particular case), I only slightly patched cs_fan.c to set flat fan pressure difference profile with torque, I don't think it can affect turbulence model or gradient reconstruction. It was all right with this setup on k-epsilon in different variations (geometry, mesh, discretisation schemes).
Sorry, I can't share mesh here, it's a commercial case, but, if you will review RSM part of the code, please, take a look what can cause this error. Maybe something was not updated for RSM when "Full (All vertex adjacent)" option was introduced. There is a feature in this case: 3 of 4 fans are cut by the mesh symmetry boundaries (defined as walls) so there are only sectors of these fans in calculation volume but it's OK with k-epsilon.
I attached the case xml file.
Linux: CentOS 7.5
Saturne: 7.0.2, 7.0.4 (any version)
MPI: 1.8.4 (with Saturne 7.0.2), 1.10.7 (with Saturne 7.0.4)
 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Thu Oct 13, 2022 1:36 am
				by Yvan Fournier
				Hello,
Are you running this on a workstation, or a cluster ? If you are running on a workstation and connot share your mesh, there are at least 2 options to try to debug more in detail on your machine (I doubt we will find the bug by simply reviewing the code here, especially as the crash can appear in a delayed manner relative to the bug, which is probably what is happening here).
- If the mesh is small, you can run a debug build (configured with "--enable-debug") under the Valgrind tool.
- If the mesh is too large, you can run a debug + Adress Sanitizer build (configured with "--enable-debug CFLAGS=-fsanitize=address  CXXFLAGS=-fsanitize=address  LDFLAGS=-fsanitize=address  LDFLAGS=-fsanitize=address")
in both cases, you should detect the exact line causing the crash in this case.
If you need more info, I can help. You ca also check info on debugging here: 
https://www.code-saturne.org/documentat ... gging.html.
Please keep us updated.
Best regards,
  Yvan
 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Thu Oct 13, 2022 8:25 am
				by Antech
				Hello, thanks for your response. I have no time now for deep analysis of the problem, sorry. But it's OK to use an old gradient reconstruction option.
I have another question related to this topic. With both meshes (whole domain + "All adjacent" gradient option or partial geometry + "Non-orthogonal faces threshold" gradient option) the case diverges after some iterations. Target CFL in 1.0 (real is always < 1.5), fan flow rates are quite low, fan pressures are 3000 Pa in the beginning and lowering to 1800...2800 Pa, turbulence is RSM SSG.
At some iteration, rapid pressure ripple occurs in some cell and calculation diverges quickly. Cells are arbitrary, it may even be the cell on the bottom boundary where there are no any features around and no any resistance areas, just usual volume mesh... Pressure maximum is around 5000...6000 Pa and then rises to 10^5 Pa in one iteration!
Is it expected with RSM that is known to have convergence problem, or I need to change case settings? The xml file is in first post of the thread. It looks strange because there are no any causes for these pressure ripples with so low velocities at the calculation start... Below is a table with pressure and velocity maximums on iterations.
Code: Select all
=====================================================
| Itr (abs)  | VelMag Max | Prs Min    | Prs Max    |
| [---]      | [m/s]      | [Pa]       | [Pa]       |
=====================================================
| 1          | 0.72942    | -6021.1    | 6211.2     |
| 2          | 1.5544     | -6246.7    | 6567.1     |
| 3          | 2.4442     | -6374.8    | 6763.4     |
| 4          | 3.4138     | -6234.3    | 6571.7     |
| 5          | 4.7079     | -6184      | 6515.6     |
| 6          | 6.1844     | -6173.2    | 6490.8     |
| 7          | 7.8699     | -6152.3    | 6461.8     |
| 8          | 9.7741     | -6128.2    | 6420.8     |
| 9          | 11.906     | -6094      | 6374       |
| 10         | 14.274     | -6058.2    | 6322.2     |
| 11         | 16.888     | -6013.7    | 6271.2     |
| 12         | 19.751     | -5966.8    | 6216.4     |
| 13         | 22.863     | -5915.1    | 6152       |
| 14         | 26.216     | -5857.8    | 6077.8     |
| 15         | 29.79      | -5795.1    | 5994.9     |
| 16         | 33.549     | -5725.3    | 5904.5     |
| 17         | 37.434     | -5650      | 5809.4     | 
| 18         | 41.351     | -5566.1    | 5708.3     | 
| 19         | 45.167     | -5477.9    | 5596.3     | 
| 20         | 48.7       | -5378.9    | 5477.2     | 
| 21         | 51.705     | -5274.3    | 5348       | 
| 22         | 53.59      | -5158.2    | 5210       | 
| 23         | 53.856     | -5036.2    | 73912      | 
| 24         | 56.805     | -1.1282e+05 | 7400.5     |
| 25         | 63.765     | -1.0855e+05 | 52389      |
| 26         | 65.198     | -4616.3    | 1.5016e+05 | 
| 27         | 57.838     | -1.1346e+06 | 6.8951e+05 |
| 28         | 53.622     | -1.3136e+06 | 1.6121e+06 |
| 29         | 83.346     | -3.5274e+06 | 2.6567e+06 |
| 30         | 2754.5     | -1.5123e+07 | 1.8698e+07 |
======================================================
I also attached a picture with divergence area. Background is semi-transparent and coloured with pressure, vectors a colored with velocity. As you can see, there is still area around with divergence in just 1-2 cells. Pressure in "diverged cells" is up to 10^6 Pa although its in -600...-500 Pa range in area around and velocities is low except "diverged cells" where velocity components reach 42...117 m/s (background is ~0.5 m/s at this iteration).
========================================
The strange thing is that calculation does not run again, although I opened exactly the same XML from RESU for successful run (that diverged but started normally). When I switch to k-epsilon it's OK, if I switch back to 
RSM SSG or EBRSM it fails. Seems the bug is intermittent and the reason is not gradient reconstruction option but RSM with particular mesh (it's around 20 millions of cells, calculation is on Xeon desktop machine with 2678v3 CPU). Unfortunately, I have no time now to install debug tools and dig deeper into this issue... Error message only contains call stack, two examples:
Code: Select all
SIGTERM signal (termination) received.
--> computation interrupted by environment.
Call stack:
   1: 0x7f86f4a68bbe <+0x485bbe>                      (libsaturne-7.0.so)
   2: 0x7f86f4a8eddd <cs_convection_diffusion_tensor+0x10cd> (libsaturne-7.0.so)
   3: 0x7f86f4a393fb <cs_balance_tensor+0x4db>        (libsaturne-7.0.so)
   4: 0x7f86f471b49e <cs_equation_iterative_solve_tensor+0x58e> (libsaturne-7.0.so)
   5: 0x7f86f4e70e3f <__cs_c_bindings_MOD_coditts+0x389> (libsaturne-7.0.so)
   6: 0x7f86f4c6ea69 <resssg2_+0x3b59>                (libsaturne-7.0.so)
   7: 0x7f86f4c7caf6 <turrij_+0x35f6>                 (libsaturne-7.0.so)
   8: 0x7f86f4836b91 <tridim_+0x4171>                 (libsaturne-7.0.so)
   9: 0x7f86f469ddf7 <caltri_+0x1e77>                 (libsaturne-7.0.so)
  10: 0x7f86f57859ba <main+0x70a>                     (libcs_solver-7.0.so)
  11: 0x7f86f1f8f555 <__libc_start_main+0xf5>         (libc.so.6)
  12: 0x400c99     <>                               (cs_solver)
End of stack
Code: Select all
SIGTERM signal (termination) received.
--> computation interrupted by environment.
Call stack:
   1: 0x7fefbc3e5adb <+0x3adb>                        (mca_btl_vader.so)
   2: 0x7fefc1292d2a <opal_progress+0x4a>             (libopen-pal.so.6)
   3: 0x7fefc31a8005 <ompi_request_default_wait_all+0x225> (libmpi.so.1)
   4: 0x7fefc31d874f <PMPI_Waitall+0x9f>              (libmpi.so.1)
   5: 0x7fefc46b4cc9 <cs_halo_sync_var_strided+0x459> (libsaturne-7.0.so)
   6: 0x7fefc4a6d6b8 <cs_matrix_pre_vector_multiply_sync+0x28> (libsaturne-7.0.so)
   7: 0x7fefc4aac02b <+0x54602b>                      (libsaturne-7.0.so)
   8: 0x7fefc4aaec52 <cs_sles_it_solve+0x152>         (libsaturne-7.0.so)
   9: 0x7fefc4a9c5ca <cs_sles_solve+0x28a>            (libsaturne-7.0.so)
  10: 0x7fefc4a9d824 <cs_sles_solve_native+0x514>     (libsaturne-7.0.so)
  11: 0x7fefc469f137 <cs_equation_iterative_solve_tensor+0x1227> (libsaturne-7.0.so)
  12: 0x7fefc4df3e3f <__cs_c_bindings_MOD_coditts+0x389> (libsaturne-7.0.so)
  13: 0x7fefc4bf1a69 <resssg2_+0x3b59>                (libsaturne-7.0.so)
  14: 0x7fefc4bffaf6 <turrij_+0x35f6>                 (libsaturne-7.0.so)
  15: 0x7fefc47b9b91 <tridim_+0x4171>                 (libsaturne-7.0.so)
  16: 0x7fefc4620df7 <caltri_+0x1e77>                 (libsaturne-7.0.so)
  17: 0x7fefc57089ba <main+0x70a>                     (libcs_solver-7.0.so)
  18: 0x7fefc1f12555 <__libc_start_main+0xf5>         (libc.so.6)
  19: 0x400c99     <>                               (cs_solver)
End of stack
Common thing is 
cs_equation_iterative_solve_tensor subroutine. Then, an error may occur in different functions that it call: matrix multiplication or "balancing" (sorry, don't know what it means). So I don't think I will find what causes this error because it looks like it is generated in some other place and here we only see the result as in case of memory access problems. I will now check if there is no problem with free memory also. 
Oops! It's seems that it just runs out of memory. Mesh is 22M that is OK for simple turbulence models but on RSM with a bunch of Rij fields it consumes almost entire memory. It ran now 2 times on RSM after reboot (had some problem with KDE) but peak memory usage is almost 64 GB that the system has. Sorry for "many words", 99% that the mesh is just too large for the system on RSM.
But the question with divergence (pressure runaway) remains. Maybe I need to tweak numeric settings? Or make fan curves "softer"?
 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Thu Oct 13, 2022 2:04 pm
				by Yvan Fournier
				Hello,
Thanks for the update. In case of memory access error (for example accessing a value at an uninitalized index), the behavior can be semi-random as you seem to observe (that is why using Valgrind would really help).
Regarding memory usage, a simple test would be to run a few (5 to 10) iterations, doing the following :
- In "advanced" run option in the GUI, check 'initialize only", then run (this prepares the case, but does not launch it)
- cd to the new RESU/<run_id> directory.
- edit the "run_solver" script, adding "export CS_MEM_LOG=mem.log" after the 2nd line
- run ./"run_solver".
Check the end of that file (or post it here) to see if there are non-freed arrays. Only the C-part (not the Fortran part) is instrumented, but this can help locate memory leaks which would cause you to run out of memory.
You can also post (or send me via prviate message) your modified user and non-user modified functions, so I can check them.
Regards,
  Yvan
			 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Thu Oct 13, 2022 2:57 pm
				by Antech
				Thanks for your support.
That problem with error was due to insufficent memory. I checked memory usage during the run and it used almost all memory (so when there was some memory occupied before reboot there was no enough memory to run RSM case). The same was for 41M mesh with k-epsilon... It's OK, it's not Saturne problem (I can remesh if needed).
But the problem is RSM divergence. I reduced target CFL from 1.0 to 0.1. Pressure increase relaxation is 0.1. Behavior is exactly the same: rapid pressure divergence at 20+ iterations while real CFL maximum is 0.12. It's linked with fans flow that reach 4 of 43 m3/s flow at that point, not with CFL. So the flow is at particular level, the pressure field develops divergence at some (usually 1 or 2) cells, then velocity divegres... Example is in my table above, now it's similar with CFL 0.1. Are there any numerical tricks I can do to stabilize? XML file is in my first post.
			 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Thu Oct 13, 2022 9:10 pm
				by Yvan Fournier
				Hello,
Since you mentioned changing the fans characteristic, that might be an idea, so as to increase the velocity in a more progressive manner.
Otherwise, I've never tested this, but clipping excessive velocity values in cs_user_extra_operations might be an option, as long as you can determine/choose a maximum expected velocity value in the domain.
Significantly increasing the fluid viscosity or density (to decrease the Reynolds number) for the first few time steps might help, though I've never tested that either.
Best regards,
  Yvan
			 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Fri Oct 14, 2022 8:05 am
				by Antech
				Hello. I tried to reduce maximum fan pressure (at zero flow) from 3000 to 500 Pa and enabling pseudo-coupled velocity pressure solver. It didn't help, divergence starts almost at the same iteration (23). Target CFX is 1.0 now but 0.1, as I described, also didn't solve the problem, as well as relaxation of pressure increase = 0.1 (to tell you the truth, I didn't notice any effect from this relaxation in my experience).
Will try to start with high viscosity, thanks for this hint, never used this before (we usually use small steps). Looks useful.
Also, I noticed that divergence again starts near bottom wall boundary. Maybe I need to change wall treatment? Now it's:
- 2-scale model (log law),
- Tensorial diffusivity,
- No gravity in turbulence equations.
Here is how velocity and pressure maximums behave with reduced fan pressures:
Code: Select all
=====================================================
| Itr (abs)  | VelMag Max | Prs Min    | Prs Max    |
| [---]      | [m/s]      | [Pa]       | [Pa]       |
=====================================================
| 1          | 0.13134    | -1001.7    | 1023       |
| 2          | 0.27843    | -1052.7    | 1083.3     |
| 3          | 0.43722    | -1033.7    | 1067.2     |
| 4          | 0.60984    | -1029.1    | 1055.7     |
| 5          | 0.79864    | -1029.3    | 1055.3     |
| 6          | 1.0053     | -1029.6    | 1055.8     |
| 7          | 1.2315     | -1029.7    | 1055.7     |
| 8          | 1.4784     | -1029.5    | 1055.2     |
| 9          | 1.7474     | -1029.5    | 1054.7     |
| 10         | 2.0395     | -1029.5    | 1054.3     |
| 11         | 2.3557     | -1029.5    | 1053.9     |
| 12         | 2.6964     | -1029.5    | 1053.6     |
| 13         | 3.0863     | -1029.4    | 1053.3     |
| 14         | 3.5165     | -1029.4    | 1053       |
| 15         | 3.9771     | -1029.5    | 1052.8     |
| 16         | 4.6054     | -1029.5    | 1052.5     |
| 17         | 5.4894     | -1029.5    | 1052.4     | 
| 18         | 6.4938     | -1029.6    | 1052.3     | 
| 19         | 7.6251     | -1029.7    | 1052.2     | 
| 20         | 8.8756     | -1029.9    | 1052.1     | 
| 21         | 10.208     | -1030.1    | 1052       | 
| 22         | 11.717     | -1030.3    | 1052       | 
| 23         | 13.791     | -22485     | 25200      | 
| 24         | 16.146     | -11288     | 66823      | 
| 25         | 18.834     | -26946     | 7583.3     | 
| 26         | 21.897     | -1030.3    | 31303      | 
| 27         | 25.421     | -57201     | 28161      | 
| 28         | 28.271     | -75752     | 1.3232e+05 | 
| 29         | 31.121     | -2.268e+05 | 13239      | 
| 30         | 137.31     | -2.5358e+06 | 2.3066e+06 |  
| 31         | 702.59     | -5.7847e+07 | 7.6566e+07 |
| 32         | 7196.2     | -1.3365e+08 | 1.3232e+08 |
======================================================
 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Tue Oct 18, 2022 2:25 pm
				by Antech
				I tried to stabilize the RSM calculation with artificial viscosity. It worked, so thanks for the hint. For air at room temperatures viscosity multiplier of 500 is appropriate. Target CFL of 1.0 + Upwind is OK. But it only helps with artificial viscosity. When viscosity is set back normal, calculation diverges after some iteration. Pressure rapidly rises "to infinity" on single iteration and then it never converges.
So the next thing I tried is relaxation and pressure/velocity limiting with user functions. Values used are:
Pressure relaxation factor: 0.15
Velocity relaxation factor: 0.3
Turbulence relaxation factor: 0.8
|Velocity component| maximum: 70 m/s
|Pressure| maximum: 1500 Pa
In this setup, velocity magnitude should be up to ~20 m/s, pressure - up to ~300 Pa, so these limits are compatible with expected velocity/pressure levels.
Although relaxation factors are set that is approved with setup.log and limiting works, calculation doesn't converge even with Upwind scheme. It doesn't diverge, just stays at limited levels in a few problematic cells. After 10 iterations original velocity and pressure maximums (before limiting) are comparable with that at starting iteration, they doesn't reduce to normal levels.
Also, it looks like some model problem. Pressure peak is too sharp (for example, from 2.4 kPa to 23 kPa just at one iteration), maybe there is some issue in RSM implementation, I can't say anything specific cause it's too complex for me, but it doesn't look like regular slow divergence.
Any other thing I can try?
Also, I have a question. How can I printf to listing from all processes? To trace what happens with limiting. Standard printf only prints to GUI window, although from all processes.
Thanks for your attention.
			 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Tue Oct 18, 2022 3:05 pm
				by Yvan Fournier
				Hello,
Not sure I have many ideas left... I forgot to ask if the mesh in the fan area (especially the fan) is regular (i.e. extruded in the direction of the flow through the fan). This might play a large role.
Also, the fact that viscosity plays a role may mean a model generating more turbulence (and higher turbulent viscosity) might help, so be sure to choose a model that does not underestimate turbulent viscoisty in this type of flow (I'll let the turbulence experts indicate if so models are recommended/not recommended here).
Also, have you used the fan model from code_saturne before ? Do you have some documentation ? I have an old, minimal doc, from an older code I could post here (though it may already be present on older posts), as the current doc is weak. This is just to make sure the fan characteristics model implemented matches what you expect/define.
Finally, regarding the listing/log, you can add "--logp" to the cs_solver executable's command line (this may be done in cs_user_scripts.py), or simply use "printf" instead of "bft_printf" calls for the specific output you need to add to have output on all ranks. The first approach creates on log par rank, the second outputs everything to the standard output (which may appear in non-guaranteed order, though you can then use addition mpiexec options with OpenMPI to control that, or at least also force it to separate files).
Best regards,
  Yvan
			 
			
					
				Re: RSM SSG model + Least Squares + All vertexes does not work
				Posted: Tue Oct 18, 2022 3:38 pm
				by Antech
				Thanks!
Fan area mesh is usual tetra, but there are no problems with fan volumes. I attached the axial section of fan volume mesh. Fan pressures are monitored and they are good, around what they should be. I used fan model before, including this case with k-epsilon, it's commonly OK (I described results in report already so they are checked thoroughly). It would be better if fan model will accept pressure (head) radial profile in GUI (user can implement non-regular fan radial pressure distribution if accessible). In industry, fans are usually not simple to obtain more even radial pressure (head) field, so these fans cannot be described by built-in Saturne model, but also are non-ideal so flat pressure distribution is also not precise.
Divergence area is now under extruded cylindrical volume that represents fan motor casing and located under fan blades volume from attached picture (flow is upward, so air flows around the motor casing, then enters fan blade zone). So, for now, divergence is, roughly, in front area of the cylinder that air flows around. Divergence is not in fan bale area.
If you have fan model doc, you can add it to download page. I saw it but looks like I didn't save it.
Regarding turbulence model. Actually, there is no need in RSM in such cases (k-epsilon is enough, SST is better but runs into mesh limit and meshing complexity). The meaning of RSM here is to try the most reliable model for volume flow (although not wall separation / reattachment) to approve k-epsilon results. I tried CFX with RSM on cyclones but it produces lots of non-realistic vortices, and it's unstable with RSM also (RSM problem is it's low stability).
OK, I will try to run calculation with velocity/pressure limiting and relaxation from the state without divergence. Maybe it will make it up to stable flow if there is no divergence initially. In previous tests there was local divergence initially, so vanishing of this local problem would mean that stability tips was very effective. But it didn't help with more complex conditions so I will now try with simpler initial approximation.
Anyway, thanks for your support.