fastymd

Overview

fastymd is a package for working with Year-Month-Day (YMD) style date objects. It provides extremely fast passing of character strings and numeric values to date objects as well as fast decomposition of these in to their year, month and day components. The underlying algorithms follow the approach of Howard Hinnant for calculating days from the UNIX Epoch of Gregorian Calendar dates and vice versa.

The API won’t give any surprises:

library(fastymd)
cdate <- c("2025-04-16", "2025-04-17")
(res <- fymd(cdate))
#> [1] "2025-04-16" "2025-04-17"
res == as.Date(cdate)
#> [1] TRUE TRUE
get_ymd(res)
#>   year month day
#> 1 2025     4  16
#> 2 2025     4  17
fymd(2025, 4, 16) == res[1L]
#> [1] TRUE

Invalid dates will return NA and a warning:

fymd(2021, 02, 29) # not a leap year
#> NAs introduced due to invalid month and/or day combinations.
#> [1] NA

More interesting is the handling of output after a valid date. Consider the following timestamp:

timelt <- as.POSIXlt(Sys.time(), tz = "UTC")
(timestamp <- strftime(timelt , "%Y-%m-%dT%H:%M:%S%z"))
#> [1] "2025-05-12T19:22:45+0000"

By default the time element is ignored:

(res <- fymd(timestamp))
#> [1] "2025-05-12"
res == as.Date(timestamp, tz = "UTC")
#> [1] TRUE

This ignoring of the timestamp is both good and bad. For timestamps it makes perfect sense, but perhaps you have simple dates and a concern that some are corrupted. For these we can use the strict argument:

cdate <- "2025-04-16nonsense "
fymd(cdate)
#> [1] "2025-04-16"
fymd(cdate, strict = TRUE)
#> NAs introduced due to invalid date strings.
#> [1] NA

Benchmarks

The character method of fymd() parses input strings in a fixed, year, month and day order. These values must be digits but can be separated by any non-digit character. This is similar in spirit to the fastDate() function in Simon Urbanek’s fasttime package, using pure text parsing and no system calls for maximum speed.

For extremely fast passing of POSIX style timestamps you will struggle to beat the performance of fasttime. This works fantastically for timestamps that do not need validation and are within the date range supported by the package (currently 1970-01-01 through to the year 2199).

fymd() fills the, admittedly small, niche where you want fast parsing of YMD strings along with date validation and support for a wider range of dates from the Proleptic Gregorian calendar (currently we support years in the range [-9999, 9999]). This additional capability does come with a small performance penalty but, hopefully, this has been kept to a minimum and the implementation remains competitive.

library(microbenchmark)

# 1970-01-01 (UNIX epoch) to "2199-01-01"
dates <- seq.Date(from = .Date(0), to = fymd("2199-01-01"), by = "day")

# comparison timings for fymd (character method) 
cdates  <- format(dates)
(res_c <- microbenchmark(
    fasttime  = fasttime::fastDate(cdates),
    fastymd   = fymd(cdates),
    ymd       = ymd::ymd(cdates),
    lubridate = lubridate::ymd(cdates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean   median        uq       max neval
#>   fasttime  529.073  534.7990  576.1368  539.618  545.3535  1901.278   100
#>    fastymd  775.796  780.8305  788.5262  783.881  788.9855   959.180   100
#>        ymd 4384.888 4492.1355 4554.7300 4521.390 4561.2445  5976.746   100
#>  lubridate 4955.098 5070.1950 5999.4278 5150.251 6527.8450 37595.723   100
# comparison timings for fymd (numeric method)
ymd  <- get_ymd(dates)
(res_n <- microbenchmark(
    fastymd   = fymd(ymd[[1]], ymd[[2]], ymd[[3]]),
    lubridate = lubridate::make_date(ymd[[1]], ymd[[2]], ymd[[3]]),
    check     = "equal"
))
#> Unit: microseconds
#>       expr     min      lq     mean  median       uq      max neval
#>    fastymd 340.839 342.167 365.6678 345.413 347.5220 1761.325   100
#>  lubridate 535.485 541.316 640.3393 545.138 550.2875 2482.258   100
# comparison timings for year getter
(res_get_year <- microbenchmark(
    fastymd   = get_year(dates),
    ymd       = ymd::year(dates),
    lubridate = lubridate::year(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean    median        uq       max neval
#>    fastymd  488.537  497.4885  540.4405  501.6010  506.5205  1720.088   100
#>        ymd  497.003  504.5170  549.3673  509.8665  514.8515  2041.231   100
#>  lubridate 7589.773 7604.6310 7896.1416 7612.0955 7629.4425 10253.322   100
# comparison timings for month getter
(res_get_month <- microbenchmark(
    fastymd   = get_month(dates),
    ymd       = ymd::month(dates),
    lubridate = lubridate::month(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min       lq      mean    median        uq       max neval
#>    fastymd  452.900  464.857  572.1960  469.7465  474.3150  1960.229   100
#>        ymd  531.428  536.847  566.3219  539.0115  544.1915  2074.063   100
#>  lubridate 8215.207 8247.988 8876.5144 8271.0365 8307.0845 39568.495   100
# comparison timings for mday getter
(res_get_mday <- microbenchmark(
    fastymd   = get_mday(dates),
    ymd       = ymd::mday(dates),
    lubridate = lubridate::day(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min       lq      mean   median       uq      max neval
#>    fastymd  449.464  460.855  501.8638  466.430  469.852 1802.673   100
#>        ymd  535.535  539.848  577.7772  542.463  545.624 1841.536   100
#>  lubridate 7531.354 7546.001 7755.3286 7553.971 7568.017 9465.392   100