kotlincsvdata-sciencekotlin-notebookkotlin-dataframe

With kotlin-notebook, how do I use DataFrame.readCsv to parse a column of dates formatted mm/dd/yyyy?


The latest release (0.15.0) of the Dataframe library for Kotlin Notebook uses the method readCsv (note: lowercase sv), and in the following code, I'm using it to read in a CSV of my Discover card financials, which has dates in the Trans. Date column that look like 12/31/2024, but it is throwing the following CellConversionException parsing this format:

org.jetbrains.kotlinx.dataframe.exceptions.CellConversionException: Failed to convert '12/31/2024' from kotlin.String to kotlinx.datetime.LocalDate in column 'Trans. Date' in column [Trans. Date], row 0
at Cell In[6], line 17

Here is the Kotlin notebook cell:

// import kotlin dataframe library
%use dataframe(v=0.15.0, enableExperimentalCsv=true)

import kotlinx.datetime.format.DateTimeFormat
import kotlinx.datetime.format.char
import kotlinx.datetime.format.Padding.ZERO as ZEROpad

// seems like I need to define the DateTimeFormat,
// but I don't know how to pass it to the CSV parser.
val dateFormat : DateTimeFormat<LocalDate> = LocalDate.Format {
  monthNumber(ZEROpad)
  char('/')
  dayOfMonth(ZEROpad)
  char('/')
  year(ZEROpad)
}

// read the csv file into a dataframe
var discover = DataFrame.readCsv("2024 discover.csv",
  colTypes = mapOf(
    // how do I get this conversion to work for given mm/dd/yyyy format?
    "Trans. Date" to ColType.LocalDate
  )
)

discover.getRows(0..4)

Note that the dateFormat val is unused! I don't know what to do with it! How do I pass this format (assuming it is correct) to the parser to get a column of machine-readable dates?


Solution

  • @Adrian Klaver is correct. DataFrame.readCsv() works mostly the same as DataFrame.readCSV(). This also applies to their parserOptions argument.

    To read a CSV file with a custom date-time pattern, you can use:

    DataFrame.readCsv("2024 discover.csv",
        // optional, should not be required if the specified column can successfully be parsed
        colTypes = mapOf("Trans. Date" to ColType.LocalDate),
        
        // proving custom ParserOptions
        parserOptions = ParserOptions(dateTimePattern = "MM/dd/yyyy"),
    )
    

    At the moment it's not (yet) possible to use the kotlinx-datetime DateTimeFormat with ParserOptions. DataFrame still uses the java.time DateTimeFormatter, like:

    val dateTimeFormatter = DateTimeFormatter.ofPattern("MM/dd/yyyy")
    DataFrame.readCsv("2024 discover.csv",
        ...,
        
        // proving custom ParserOptions
        parserOptions = ParserOptions(dateTimeFormatter = dateTimeFormatter),
    )
    

    If you do require kotlinx-datetime DateTimeFormat, you can parse the column as String and manually parse the column:

    val dateFormat: DateTimeFormat<LocalDate> = ...
    DataFrame.readCsv("2024 discover.csv",
        // force the csv parser to read the column as String
        colTypes = mapOf("Trans. Date" to ColType.String),
    )
        // manually convert the column to LocalDate with kotlinx-datetime DateTimeFormat.parse
        .convert { "Trans. Date"<String>() }.with { dateFormat.parse(it) }