fastymd

Overview

fastymd is a package for working with Year-Month-Day (YMD) style date objects. It provides extremely fast passing of character strings and numeric values to date objects as well as fast decomposition of these in to their year, month and day components. The underlying algorithms follow the approach of Howard Hinnant for calculating days from the UNIX Epoch of Gregorian Calendar dates and vice versa.

The API won’t give any surprises:

library(fastymd)
cdate <- c("2025-04-16", "2025-04-17")
(res <- fymd(cdate))
#> [1] "2025-04-16" "2025-04-17"
res == as.Date(cdate)
#> [1] TRUE TRUE
get_ymd(res)
#>   year month day
#> 1 2025     4  16
#> 2 2025     4  17
fymd(2025, 4, 16) == res[1L]
#> [1] TRUE

Invalid dates will return NA and a warning:

fymd(2021, 02, 29) # not a leap year
#> NAs introduced due to invalid month and/or day combinations.
#> [1] NA

More interesting is the handling of output after a valid date. Consider the following timestamp:

timelt <- as.POSIXlt(Sys.time(), tz = "UTC")
(timestamp <- strftime(timelt , "%Y-%m-%dT%H:%M:%S%z"))
#> [1] "2025-04-25T12:58:48+0000"

By default the time element is ignored:

(res <- fymd(timestamp))
#> [1] "2025-04-25"
res == as.Date(timestamp, tz = "UTC")
#> [1] TRUE

This ignoring of the timestamp is both good and bad. For timestamps it makes perfect sense, but perhaps you have simple dates and a concern that some are corrupted. For these we can use the strict argument:

cdate <- "2025-04-16nonsense "
fymd(cdate)
#> [1] "2025-04-16"
fymd(cdate, strict = TRUE)
#> NAs introduced due to invalid date strings.
#> [1] NA

Benchmarks

The character method of fymd() parses input strings in a fixed, year, month and day order. These values must be digits but can be separated by any non-digit character. This is similar in spirit to the fastDate() function in Simon Urbanek’s fasttime package, using pure text parsing and no system calls for maximum speed.

For extremely fast passing of POSIX style timestamps you will struggle to beat the performance of fasttime. This works fantastically for timestamps that do not need validation and are within the date range supported by the package (currently 1970-01-01 through to the year 2199).

fymd() fills the, admittedly small, niche where you want fast parsing of YMD strings along with date validation and support for a wider range of dates from the Proleptic Gregorian calendar (currently we support years in the range [-9999, 9999]). This additional capability does come with a small performance penalty but, hopefully, this has been kept to a minimum and the implementation remains competitive.

library(microbenchmark)

# 1970-01-01 (UNIX epoch) to "2199-01-01"
dates <- seq.Date(from = .Date(0), to = fymd("2199-01-01"), by = "day")

# comparison timings for fymd (character method) 
cdates  <- format(dates)
(res_c <- microbenchmark(
    fasttime  = fasttime::fastDate(cdates),
    fastymd   = fymd(cdates),
    ymd       = ymd::ymd(cdates),
    lubridate = lubridate::ymd(cdates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean   median        uq       max neval
#>   fasttime  529.212  534.0410  614.6670  537.833  548.0680  3176.584   100
#>    fastymd  775.403  780.3025  821.9901  783.288  788.7785  3701.388   100
#>        ymd 4449.140 4525.8585 4643.8929 4551.756 4597.2325  7874.310   100
#>  lubridate 5015.090 5166.2885 7039.9450 5314.832 6771.9085 37672.764   100
# comparison timings for fymd (numeric method)
ymd  <- get_ymd(dates)
(res_n <- microbenchmark(
    fastymd   = fymd(ymd[[1]], ymd[[2]], ymd[[3]]),
    lubridate = lubridate::make_date(ymd[[1]], ymd[[2]], ymd[[3]]),
    check     = "equal"
))
#> Unit: microseconds
#>       expr     min       lq     mean  median       uq      max neval
#>    fastymd 342.993 344.7160 380.7935 347.467 349.5500 2044.251   100
#>  lubridate 537.447 544.7915 616.5864 547.742 551.2075 2400.349   100
# comparison timings for year getter
(res_get_year <- microbenchmark(
    fastymd   = get_year(dates),
    ymd       = ymd::year(dates),
    lubridate = lubridate::year(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min       lq      mean   median        uq       max neval
#>    fastymd  481.482  496.976  535.5263  501.881  504.3110  2243.586   100
#>        ymd  498.755  507.937  543.8808  511.980  515.5915  2648.965   100
#>  lubridate 7583.374 7600.496 7872.6538 7606.718 7629.4710 10319.503   100
# comparison timings for month getter
(res_get_month <- microbenchmark(
    fastymd   = get_month(dates),
    ymd       = ymd::month(dates),
    lubridate = lubridate::month(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean    median        uq       max neval
#>    fastymd  455.033  463.5890  510.1937  467.1205  471.7345  1949.854   100
#>        ymd  532.889  537.4675  584.1824  539.7115  542.1165  2039.753   100
#>  lubridate 8230.087 8265.2170 8872.8169 8281.1420 8319.9645 38371.485   100
# comparison timings for mday getter
(res_get_mday <- microbenchmark(
    fastymd   = get_mday(dates),
    ymd       = ymd::mday(dates),
    lubridate = lubridate::day(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq      mean    median        uq      max neval
#>    fastymd  447.238  463.3640  504.0715  467.7415  471.8895 1838.937   100
#>        ymd  536.435  540.2325  550.2922  542.3120  544.5660  867.186   100
#>  lubridate 7529.523 7543.0935 7696.1561 7547.6175 7557.5015 8946.419   100