I am implementing the Runge-Kutta-Fehlberg 5(4) method in Rust, so I have to keep track of the current state of the system I am trying to model, perform several computations based on the current state and as a result of those computations, update the state.
In the process, I ran into ownership issues, because the nalgebra
type I am using does not implement copy
. The compiler suggested I clone
the data instead.
I ended up with four .clone()
statements within a loop and am wondering about the performance impact on the one hand, and possible alternatives on the other.
You can find the relevant function below.
pub fn rkf54(
initial_time: f64,
initial_state: State<f64>,
initial_step_size: f64,
control: bool,
tolerance: f64,
final_time: f64,
f: &dyn Fn(f64, &State<f64>) -> State<f64>)
-> Vec<State<f64>> {
let mut output: Vec<State<f64>> = vec![];
let mut time = initial_time;
let mut state = initial_state;
let mut step_size = initial_step_size;
while time < final_time {
if control {
// CLONE 1
step_size = rkf_control(time, state.clone(), initial_step_size, tolerance, f);
}
// CLONE 2
let stages = rkf_stages(time, &state.clone(), step_size, A, C, f);
// CLONE 3
state = rkf_step(&state.clone(), step_size, &stages, B_PRIMARY);
time += step_size;
// CLONE 4
output.push(state.clone())
}
let stages = rkf_stages(final_time, &state.clone(), step_size, A, C, f);
state = rkf_step(&state.clone(), step_size, &stages, B_PRIMARY);
output.push(state.clone());
return output;
}
Will repeated cloning noticeably impact performance? If I understand correctly, this copies the same data from and to different areas of memory 4 times per iteration.
If so, how can I avoid this kind of cloning/allocation?
Here is a full working example along with type and constant definitions.
#![allow(dead_code)]
extern crate nalgebra; // 0.33.2
use nalgebra::DVector; // 0.33.2
use std::f64::INFINITY;
type Stages<T, const D: usize> = [DVector<T>; D];
pub type State<T> = DVector<T>;
pub type StageCoefficients<T, const D: usize> = [&'static [T]; D];
pub type Weights<T, const D: usize> = [T; D];
pub const A: StageCoefficients<f64, 6> = [
&[],
&[1./4.],
&[3./32., 9./32.],
&[1932./2197., -7200./2197., 7296./2197.],
&[439./216., 8., 3680./513., -845./4104.],
&[-8./27., 2., -3544./2565., 1859./4104., -11./40.]
];
pub const C: Weights<f64, 6> = [0.0, 1./4., 3./8., 12./13., 1., 1./2. ];
pub const B_PRIMARY: Weights<f64, 6> = [16./135., 0., 6656./12825., 28561./56430., -9./50., 2./55. ];
pub const B_EMBEDDED: Weights<f64, 6> = [25./216., 0., 1408./2565., 2197./4104., -1./5., 0. ];
const K: f64 = 0.02;
const B0: f64 = 3.5;
const B1: f64 = 16.72;
pub fn f(t:f64, state: &State<f64>) -> State<f64> {
let x = state[0];
let y = state [1];
let x_dot = y;
let y_dot = (-K*y) - (x.powf(3.)) + (B0) + (B1 * t.cos());
return State::from(vec![x_dot, y_dot]);
}
pub fn rkf54(
initial_time: f64,
initial_state: State<f64>,
initial_step_size: f64,
control: bool,
tolerance: f64,
final_time: f64,
f: &dyn Fn(f64, &State<f64>) -> State<f64>)
-> Vec<State<f64>> {
let mut output: Vec<State<f64>> = vec![];
let mut time = initial_time;
let mut state = initial_state;
let mut step_size = initial_step_size;
while time < final_time {
if control {
// CLONE 1
step_size = rkf_control(time, state.clone(), initial_step_size, tolerance, f);
}
// CLONE 2
let stages = rkf_stages(time, &state.clone(), step_size, A, C, f);
// CLONE 3
state = rkf_step(&state.clone(), step_size, &stages, B_PRIMARY);
time += step_size;
// CLONE 4
output.push(state.clone())
}
let stages = rkf_stages(final_time, &state.clone(), step_size, A, C, f);
state = rkf_step(&state.clone(), step_size, &stages, B_PRIMARY);
output.push(state.clone());
return output;
}
fn rkf_stages<const D: usize>(
time: f64,
state: &State<f64>,
step_size: f64,
stage_coefficients: StageCoefficients<f64, D>,
time_fractions: Weights<f64, D>,
f: &dyn Fn(f64, &State<f64>) -> State<f64>
) -> Stages<f64, D> {
let mut k: Stages<f64, D> = std::array::from_fn(|_| DVector::zeros(state.nrows()));
for i in 0..stage_coefficients.len() {
let mut x: State<f64> = DVector::zeros(state.nrows());
for j in 0..stage_coefficients[i].len() {
x += &k[j] * stage_coefficients[i][j];
}
x = state + step_size*(x);
let t = time + (step_size*time_fractions[i]);
k[i] = f(t,&x);
}
return k;
}
fn rkf_step<const D: usize>(state: &State<f64>, step_size: f64, stages: &Stages<f64, D>, weights: Weights<f64, D>) -> State<f64> {
let mut x_out = 0.0 * DVector::zeros(state.nrows());
for i in 0..A.len() {
x_out += &stages[i] * weights[i];
}
x_out *= step_size;
x_out += state;
return x_out;
}
fn rkf_control(time: f64, state: State<f64>, initial_step_size: f64, tolerance: f64,
f: &dyn Fn(f64, &State<f64>) -> State<f64>) -> f64 {
let mut error_estimate = INFINITY;
let mut step_size = initial_step_size;
while error_estimate > tolerance {
let k = rkf_stages(time, &state, step_size, A, C, f);
let x_primary = rkf_step(&state, step_size, &k, B_PRIMARY);
let x_embedded = rkf_step(&state, step_size, &k, B_EMBEDDED);
error_estimate = (x_primary - x_embedded).norm();
step_size = 0.9 * step_size * (tolerance / error_estimate).powf(1. / 5.);
}
return step_size;
}
fn main() {
let initial_time = 0.;
let initial_state = State::from(vec![2., 2.]);
let initial_step_size = 1.;
let control = true;
let tolerance = 10e-3;
let final_time = 1.;
let output = rkf54(initial_time, initial_state, initial_step_size, control, tolerance, final_time, &f);
for state in output {
println!("{state:1.3e}");
}
}
I think you can avoid all of the clones. But I can't test it because you didn't provide definition for your custom types and constants.
In all cases where you do &state.clone()
you can just do &state
.
You can modify the rkf_step
to take &State<f64>
as argument and pass the state to it again just with &state
instead of state.clone()
.
In case of the last clone, you can modify the loop like so:
pub fn rkf54(
initial_time: f64,
initial_state: State<f64>,
initial_step_size: f64,
control: bool,
tolerance: f64,
final_time: f64,
f: &dyn Fn(f64, &State<f64>) -> State<f64>)
-> Vec<State<f64>> {
let A: StageCoefficients<f64, 1> = [Default::default()];
let mut output: Vec<State<f64>> = vec![];
let mut time = initial_time;
let mut state = initial_state;
let mut step_size = initial_step_size;
while time < final_time {
if control {
// Just use &state here
step_size = rkf_control(time, &state, initial_step_size, tolerance, f);
}
// Again just &state
let stages = rkf_stages(time, &state, step_size, A, C, f);
// Modified to take &state.
let new_state = rkf_step(&state, step_size, &stages, B_PRIMARY);
output.push(state); // Push the state from previous iteration.
state = new_state; // Overwrite the old state to the new state.
time += step_size;
}
let stages = rkf_stages(final_time, &state, step_size, A, C, f);
let new_state = rkf_step(&state, step_size, &stages, B_PRIMARY);
output.push(state); // Push state from the last iteration.
output.push(new_state); // Here you will need to clone only if you plan to use new_state later
return output;
}
Note that in this new version, the initial state is also stored in the output. That is not the case in your original code. If you want to avoid that, you can just check for first iteration and don't do output.push(state)
in the first iteration.
EDIT:
To answer the other questions: Cloning is usually expensive operation. Sometimes it is necesary, but it is good to avoid it when you can.
If you have some data type (DataType
) that you don't need to modify, it is expensive to clone and it is not clear where it should be owned, you can use Rc<DataType>
(or Arc<DataType>
if it is shared across threads). That will create reference counted pointer. You will still have to use .clone()
but this clone is in this case quite cheap compared to how expensive the clone of DataType
may be.