The previous attempt (dsl_feature_completeness_2019-08-23) to enable arbitrary kernel functions was a failure: we get significant performance loss (25-100%) if step_number is not passed as a template parameter to the integration kernel. Apparently the CUDA compiler cannot perform some optimizations if there is a if/else construct in a performance-critical part which cannot be evaluated at compile time. This branch keeps step_number as a template parameter but takes rest of the user parameters as uniforms (dt is no longer passed as a function parameter but as an uniform with the DSL instead).
This commit is contained in:
@@ -15,6 +15,7 @@ uniform int AC_bin_steps;
|
||||
uniform int AC_bc_type;
|
||||
|
||||
// Real params
|
||||
uniform Scalar AC_dt;
|
||||
// Spacing
|
||||
uniform Scalar AC_dsx;
|
||||
uniform Scalar AC_dsy;
|
||||
|
@@ -294,8 +294,9 @@ out ScalarField out_tt(VTXBUF_TEMPERATURE);
|
||||
#endif
|
||||
|
||||
Kernel void
|
||||
solve(Scalar dt)
|
||||
solve()
|
||||
{
|
||||
Scalar dt = AC_dt;
|
||||
out_lnrho = rk3(out_lnrho, lnrho, continuity(uu, lnrho), dt);
|
||||
|
||||
#if LMAGNETIC
|
||||
|
Reference in New Issue
Block a user