I am writing a CLI application to "restore" deleted and overwritten object versions in S3 using the SDK from AWS as my first "real" Rust Project.
One part of this is allowing the user to pass in a start and end date between which file changes should be undone. As such i've written this function to parse a NaiveDateTime (from the Chrono Crate) from the users input:
fn create_option_datetime_from_string(input: String) -> Option<DateTime<Utc>> {
let date_regex = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
if date_regex.is_match(&input) {
let parsed_date: ParseResult<NaiveDateTime> = NaiveDateTime::parse_from_str(&input, "%Y-%m-%d");
if parsed_date.is_ok() {
let naive_date = parsed_date.unwrap_or_default();
let utc_date: DateTime<Utc> = DateTime::<Utc>::from_utc(naive_date, Utc);
return Some(utc_date);
}
}
return None;
}
I also have a function to fetch object version data from S3 that looks something like this:
struct ObjectVersionsFetchResponse {
next_key_marker: Option<String>,
versions: Vec<ObjectVersion>,
}
async fn fetch_object_versions_from_s3<'a>(client: &'a s3::Client, bucket: &'a str, limit: Option<i32>, prefix: &'a String, key_marker: Option<String>) -> Result<ObjectVersionsFetchResponse, SdkError<ListObjectVersionsError>> {
let resp = client.list_object_versions().bucket(bucket).set_max_keys(limit).prefix(prefix).set_key_marker(key_marker).send().await?;
/* If this is Some, there are more objects to be fetched so we need to do another request */
let next_key_marker: Option<&str> = resp.next_key_marker();
/* Technically we don't need a vector as the response stays at a fixed length, maybe TODO fix*/
let versions: Vec<ObjectVersion> = resp.versions().unwrap_or_default().to_vec();
/* We return the data including the version marker to the calling method to allow for after-fetching */
Ok(ObjectVersionsFetchResponse {
next_key_marker: next_key_marker.map(|s| s.to_string()),
versions,
})
}
These functions are expected to give me two things:
Now im trying to write a function to only grab all object versions that fit the timeframe
To achieve this, i've made a struct to save the parsed Timeframe
#[derive(Debug)]
struct Timeframe {
start: Option<DateTime<Utc>>,
end: DateTime<Utc>,
}
And this function to hopefully compare the DateTimes in the ObjectVersions and the Timeframes that were parsed by user Input:
fn filter_object_versions(object_versions: &Vec<ObjectVersion>, timeframe: Timeframe) -> Vec<&ObjectVersion> {
println!("object_versions: {:?}", object_versions.len());
let filtered_object_versions: Vec<_> = object_versions
.into_par_iter()
.filter(|object_version| {
let is_latest = object_version.is_latest;
let last_modified = object_version.last_modified.as_ref().unwrap_or(&Utc::now());
let is_after_start = last_modified > timeframe.start.unwrap_or_else(|| Utc::now());
let is_before_end = last_modified < timeframe.end;
return is_after_start && is_before_end && is_latest;
})
.collect();
println!("filtered_object_versions: {:?}", filtered_object_versions.len());
return filtered_object_versions;
}
Sadly this comparison does not work as expected.
As my Timeframe properties are of type NaiveDateTime
from Chrono they can't be compared to the "Smithy Datetime" that the S3 SDK does appear to use.
I'm now looking for advice on how to best do this comparison.
Amazon made a crate for these types of conversions. Its called aws-smithy-types-convert
.
Just add it to your Cargo.toml like so:
[dependencies]
aws-smithy-types-convert = { version = "0.56.1", features = ["convert-chrono"] }
Then you can turn the smithy DateTime into a chrono DateTime before comparing.
use aws_smithy_types_convert::date_time::DateTimeExt;
//...
.filter(|object_version| {
let is_latest = object_version.is_latest;
let last_modified = object_version.last_modified
.map(|t| t.to_chrono_utc())
.unwrap_or(Utc::now());
let is_after_start = last_modified > timeframe.start.unwrap_or_else(|| Utc::now());
let is_before_end = last_modified < timeframe.end;
return is_after_start && is_before_end && is_latest;
})
That being said, the code to convert to a "UTC" chrono datetime really just copies over the seconds and microseconds without any timezone handling. So either S3 uses UTC for everything or they just spit out a timestamp that says it is UTC, but isn't. You get to find out.
Also your unwrapping doesn't seem right to me. I would filter out all objects that don't have a timestamp instead of creating a fake UTC::now()
timestamp for them. And finally, I would return an impl Iterator
instead of a Vec. It reduces allocation (significantly if its a big list!) in the case that the api consumer uses it as an iterator or wants it as any other type of collection.
fn filter_object_versions(object_versions: &Vec<ObjectVersion>, timeframe: Timeframe) -> Impl Iterator<&ObjectVersion> {
object_versions
.into_par_iter()
.filter(|version| version.is_latest)
.filter_map(|version| version.last_modified) //removes all versions that are None
.map(|last_modified| last_modified.to_chrono_utc())
.filter(|last_modified| {
let is_after_start = last_modified > timeframe.start.unwrap_or_else(|| Utc::now());
let is_before_end = last_modified < timeframe.end;
is_after_start && is_before_end
})
}