I want to convert the strings in the following format to java.util.Date range, start and end date:
January 16th – 21st, 2025
January 16th – February 21st, 2025
January 16th – February 21st, 2025
December 16th 2024 – January 2nd 2025
November 16th – 19th
you have examples of different months and different years and also examples without years that should use the current year
Do you have any idea how I should process all these formats?
It appears your format is basically this:
month day[[,] year] – [month] day[[,] year]
I would probably go with a regular expression:
import java.time.LocalDate;
import java.time.Month;
import java.time.Year;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.text.NumberFormat;
import java.text.ParseException;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class DateRangeParser {
public record DateRange(LocalDate start,
LocalDate end) {
}
private final Pattern format = Pattern.compile(
"(\\p{L}+)" // start month
+ "\\s*(\\d+)(?i:st|nd|rd|th)?" // start day
+ "(?:,?\\s*(\\d+))?" // start year (optional)
+ "\\s*\\p{Pd}" // dash separating start and end
+ "\\s*(\\p{L}+)?" // end month (optional)
+ "\\s*(\\d+)(?i:st|nd|rd|th)?" // end day
+ "(?:,?\\s*(\\d+))?"); // end year (optional)
private final NumberFormat numberFormat =
NumberFormat.getIntegerInstance();
private final DateTimeFormatter monthFormat =
DateTimeFormatter.ofPattern("MMMM").withLocale(Locale.ENGLISH);
public DateRange parse(String s)
throws ParseException {
Matcher matcher = format.matcher(s);
if (!matcher.matches()) {
throw new ParseException(s, 0);
}
String startMonthStr = matcher.group(1);
String startDayStr = matcher.group(2);
String startYearStr = matcher.group(3);
String endMonthStr = matcher.group(4);
String endDayStr = matcher.group(5);
String endYearStr = matcher.group(6);
try {
Month startMonth = Month.from(monthFormat.parse(startMonthStr));
int startDay = numberFormat.parse(startDayStr).intValue();
Integer startYear;
if (startYearStr != null) {
startYear = numberFormat.parse(startYearStr).intValue();
} else {
startYear = null;
}
Month endMonth;
if (endMonthStr != null) {
endMonth = Month.from(monthFormat.parse(endMonthStr));
} else {
endMonth = startMonth;
}
int endDay = numberFormat.parse(endDayStr).intValue();
Integer endYear;
if (endYearStr != null) {
endYear = numberFormat.parse(endYearStr).intValue();
} else {
endYear = null;
}
if (startYear == null) {
if (endYear == null) {
endYear = Year.now().getValue();
}
startYear = endYear;
} else if (endYear == null) {
endYear = startYear;
}
LocalDate start = LocalDate.of(startYear, startMonth, startDay);
LocalDate end = LocalDate.of(endYear, endMonth, endDay);
return new DateRange(start, end);
} catch (DateTimeParseException e) {
ParseException pe = new ParseException(s, e.getErrorIndex());
pe.initCause(e);
throw pe;
}
}
public static void main(String[] args)
throws ParseException {
String[] testInputs = {
"January 16th – 21st, 2025",
"January 16th – February 21st, 2025",
"January 16th – February 21st, 2025",
"December 16th 2024 – January 2nd 2025",
"November 16th – 19th",
};
DateRangeParser parser = new DateRangeParser();
for (String input : testInputs) {
DateRange range = parser.parse(input);
System.out.println(range);
}
}
}
Some notes about the regular expression:
\p{L}+
matches 1 or more letters.\d+
maches 1 or more digits.\p{Pd}
matches any dash. In your examples, you are using an en dash character ('\u2013'
), not an ASCII hyphen. This will match both.\p{L}
and \p{Pd}
refer to Unicode general categories.
More information is available in the documentation for Pattern.
Using Month startMonth = Month.from(monthFormat.parse(string))
instead of just Month.valueOf(s.toUpperCase()) allows for localized matching of month names. As Turo points out, if the computer is not set (or might not be set) to an English locale, but you want to parse English month names, you will have to specify English in the month format:
private final DateTimeFormatter monthFormat =
DateTimeFormatter.ofPattern("MMMM").withLocale(Locale.ENGLISH);