I want to broaden my horizons in terms of programming languages and, thus, are trying to build a little helper app for managing my .bib
file in Rust.
Now I'm stuck on a problem for which I couldn't find a solution so far.
I wrote a module to read in my .bib
file, parse it and process the entries in a way that it outputs a vector which contains an inner vector for every bibliographic entry of the file with the needed fields. The output should look like this (using {:#?}
with println
):
[
[
"Grandsire",
"The METAFONTtutorial",
"2004",
"online",
"grandsire_the_metafonttutorial_2004",
],
[
"Gruber",
"Daring Fireball",
"2001",
"online",
"gruber_markdown",
],
[
"Schmuli (ed.)",
"How TeX macros actually work",
"emtpy",
"online",
"how_tex_macros_actually_work",
],
[
"Skibinski",
"Automated JATS XML to PDF conversion",
"2018",
"online",
"skibinski_automated_jats_xml_to_pdf_conversion_2018",
],
],
My code is able to produce that output, but with a testfile containing around 500 separate (and simplistic) biblatex entries it takes a huge amount of time!
When I run the bin with $ time bibfilebin /path/to/testbib.bib
it takes about 10 seconds to process the whole thing!
That seems waaay too long for a language like Rust, known for its speed, processing a simple plain text file.
I'm sure the mistake is due to my very limited Rust knowledge in particular, plus my general knowledge towards real programming languages in general; since I'm not a trained programmer.
It may be due to the many loops/iterators in the code or a programming mistake which reads the file over and over again for every entry. But, so far, I couldn't find the source of it.
My code is here, it uses additional crates biblatex
and sarge
:
use bib::BibiData;
use cliargs::*;
pub mod cliargs {
use core::panic;
use std::path::PathBuf;
use sarge::prelude::*;
sarge! {
// Name of the struct
ArgumentsCLI,
}
pub struct PosArgs {
pub bibfilearg: PathBuf,
}
impl PosArgs {
pub fn parse_pos_args() -> Self {
let (_, pos_args) =
ArgumentsCLI::parse().expect("Could not parse positional arguments");
Self {
bibfilearg: if pos_args.len() > 1 {
PathBuf::from(&pos_args[1])
} else {
panic!("No path to bibfile provided as argument")
},
}
}
}
}
pub mod bib {
use super::PosArgs;
use std::{fs, path::PathBuf};
use biblatex::{self, Bibliography};
use biblatex::{ChunksExt, Type};
pub struct BibiMain {
pub bibfile: PathBuf, // path to bibfile
pub bibfilestring: String, // content of bibfile as string
pub bibliography: Bibliography, // parsed bibliography
pub citekeys: Vec<String>, // list of all citekeys
}
impl BibiMain {
pub fn new() -> Self {
let bibfile = PosArgs::parse_pos_args().bibfilearg;
let bibfilestring = fs::read_to_string(&bibfile).unwrap();
let bibliography = biblatex::Bibliography::parse(&bibfilestring).unwrap();
let citekeys = Self::get_citekeys(&bibliography);
Self {
bibfile,
bibfilestring,
bibliography,
citekeys,
}
}
pub fn get_citekeys(bibstring: &Bibliography) -> Vec<String> {
let mut citekeys: Vec<String> =
bibstring.iter().map(|entry| entry.to_owned().key).collect();
citekeys.sort_by_key(|name| name.to_lowercase());
citekeys
}
}
#[derive(Debug)]
pub struct BibiData {
pub entry_list: BibiDataSets,
}
impl BibiData {
pub fn new() -> Self {
let citekeys = BibiMain::new().citekeys;
Self {
entry_list: BibiDataSets::from_iter(citekeys),
}
}
}
#[derive(Debug)]
pub struct BibiDataSets {
pub bibentries: Vec<Vec<String>>,
}
impl FromIterator<String> for BibiDataSets {
fn from_iter<T: IntoIterator<Item = String>>(iter: T) -> Self {
let bibentries = iter
.into_iter()
.map(|citekey| BibiEntry::new(&citekey))
.collect();
Self { bibentries }
}
}
#[derive(Debug)]
pub struct BibiEntry {
pub authors: String,
pub title: String,
pub year: String,
pub pubtype: String,
pub citekey: String,
}
impl BibiEntry {
pub fn new(citekey: &str) -> Vec<String> {
vec![
Self::get_authors(citekey),
Self::get_title(citekey),
Self::get_year(citekey),
Self::get_pubtype(citekey),
citekey.to_string(),
]
}
fn get_authors(citekey: &str) -> String {
let biblio = BibiMain::new().bibliography;
let authors = {
if biblio.get(&citekey).unwrap().author().is_ok() {
let authors = biblio.get(&citekey).unwrap().author().unwrap();
if authors.len() > 1 {
let authors = format!("{} et al.", authors[0].name);
authors
} else if authors.len() == 1 {
let authors = authors[0].name.to_string();
authors
} else {
let editors_authors = format!("empty");
editors_authors
}
} else {
if biblio.get(&citekey).unwrap().editors().is_ok() {
let editors = biblio.get(&citekey).unwrap().editors().unwrap();
if editors.len() > 1 {
let editors = format!("{} (ed.) et al.", editors[0].0[0].name);
editors
} else if editors.len() == 1 {
let editors = format!("{} (ed.)", editors[0].0[0].name);
editors
} else {
let editors_authors = format!("empty");
editors_authors
}
} else {
let editors_authors = format!("empty");
editors_authors
}
}
};
authors
}
fn get_title(citekey: &str) -> String {
let biblio = BibiMain::new().bibliography;
let title = {
if biblio.get(&citekey).unwrap().title().is_ok() {
let title = biblio
.get(&citekey)
.unwrap()
.title()
.unwrap()
.format_verbatim();
title
} else {
let title = format!("no title");
title
}
};
title
}
fn get_year(citekey: &str) -> String {
let biblio = BibiMain::new().bibliography;
let year = biblio.get(&citekey).unwrap();
let year = {
if year.date().is_ok() {
let year = year.date().unwrap().to_chunks().format_verbatim();
let year = year[..4].to_string();
year
} else {
let year = format!("emtpy");
year
}
};
year
}
fn get_pubtype(citekey: &str) -> String {
let biblio = BibiMain::new().bibliography;
let pubtype = biblio.get(&citekey).unwrap().entry_type.to_string();
pubtype
}
}
}
fn main() {
let entry_vec = BibiData::new().entry_list;
println!("{:#?}", entry_vec);
}
I'm aware that there might be some real beginners mistakes.
Therefore, I appreciate every kind of help or just a hint. First of all, to solve the problem, but also to help me learn the concepts ans ways how to code in Rust.
As suggested, I'll provide a (for the moment, partly) answer:
In all get_...
functions I replaced the direct call of BibiMain::new()
with a parameter biblio: &Bibliography
, as @MindSwipe explained.
E.g. get_title
now looks like this (old var is commented out):
fn get_title(citekey: &str, biblio: &Bibliography) -> String {
// let biblio: &Bibliography = &BibiMain::new().bibliography;
let title = {
if biblio.get(&citekey).unwrap().title().is_ok() {
let title = biblio
.get(&citekey)
.unwrap()
.title()
.unwrap()
.format_verbatim();
title
} else {
let title = format!("no title");
title
}
};
title
}
It solves most issues and makes the startup at runtime much faster!
Furthermore, I merged FromIterator
into BibiData
struct and passed the needed fields as parameter:
impl BibiData {
pub fn new(biblio: &Bibliography, citekeys: &Vec<String>) -> Self {
Self {
entry_list: {
let bibentries = citekeys
.into_iter()
.map(|citekey| BibiEntry::new(&citekey, &biblio))
.collect();
BibiDataSets { bibentries }
},
}
}
}
Everything works fine now and is instantly ready.
Thanks to everybody!