pythonpostgresqldatetimesqlalchemyavailability

Date intersection and space availability


I am currently trying to check availability on a "space" within a date range, while this date range can be indefinitely long. The tables are as follows:

Space:

id  available_spaces    name
1   20                  Space 1
2   40                  Space 2
3   10                  Space 3

Booking (end_date can be null, this meaning endless booking):

id  space_id    start_date  end_date    spaces
1   1           13/12-2017  null        9
1   1           13/12-2017  13/12-2018  10

Then i would like to be able to make a search, an example could be:

from:   11/12-2016
to:     null (again meaning endless)
spaces: 2

This query should return the spaces: Space 2, Space 3, since those both have enough availability for that time interval.

And by changing the amount of spaces needed in the search to 1 instead of 2 should yield the following: Search:

from:   11/12-2016
to:     null (again meaning endless)
spaces: 1

Space 1, Space 2, Space 3. The thing i find hard to solve is the variable amounts of spaces that can be available from month to month, and the ability to have endless bookings.


Solution

  • Revisited

    As always, SQL offers multiple ways to solve a given task. The original proposed solution (below) used a self-join, but another way would be to take advantage of window functions. The idea is to increment the used up space every time a new booking starts, and to decrement when it ends:

    with bs as (
        select space_id as _sid
             , unnest(array[start_date,
                            coalesce(end_date, date 'infinity')]) as _d
             , unnest(array[spaces, -spaces]) as _sp
        from booking
        where end_date is null or end_date >= '2016-12-11'),
    cs as (
        select _sid
            -- The inner sum collapses starting and ending bookings on the same
            -- date to a single spaces value, the outer is the running sum. This
            -- avoids the problem where the order of bookings starting or ending
            -- on the same date is unspecified and would produce possibly falsely
            -- high values for spaces, if all starting bookings just so happen to
            -- come first.
             , sum(sum(_sp)) over (partition by _sid
                                   order by _d) as _usp
        from bs
        group by _sid, _d)
    select *
    from space
    where not exists (
        select from cs
        where cs._sid = space.id
          and space.available_spaces - cs._usp < 2)
    

    The same in Python/SQLAlchemy:

    from sqlalchemy import or_
    from sqlalchemy.dialects.postgresql import array
    
    bs = session.query(
            Booking.space_id,
            func.unnest(array([
                Booking.start_date,
                func.coalesce(Booking.end_date, func.date('infinity'))
            ])).label('date'),
            func.unnest(array([Booking.spaces, -Booking.spaces])).label('spaces')).\
        filter(or_(Booking.end_date == None,
                   Booking.end_date >= '2016-12-11')).\
        cte()
    
    cs = session.query(bs.c.space_id,
                       func.sum(func.sum(bs.c.spaces)).over(
                           partition_by=bs.c.space_id,
                           order_by=bs.c.date).label('spaces')).\
        group_by(bs.c.space_id, bs.c.date).\
        cte()
    
    query = session.query(Space).\
        filter(~session.query(cs).
               filter(cs.c.space_id == Space.id,
                      Space.available_spaces - cs.c.spaces < 2).
               exists())
    

    It is easier to explain the workings of the query using SQL first, and then to build up the SQLAlchemy. I'll assume that a booking and a search will always have a start, or in other words can only be unbounded in the end. Using range types and operators, you should start by finding the bookings that overlap your search.

    select *
    from booking
    where daterange(start_date, end_date, '[)')
       && daterange('2016-12-11', null, '[)');
    

    From found bookings you need to find intersections and sum used space. To find the intersections use the start of a booking and look up bookings that contain it. Repeat for all bookings at hand. For example:

    |-------| 5
    .  .  .
    .  |-------------| 2
    .  .  .
    .  .  |-------------------- 3
    .  .  .              .
    .  .  .              |---| 1
    .  .  .              .
    5  7  10             4
    

    and in query form:

    with bs as (
        select *
        from booking
        where daterange(start_date, end_date, '[)')
           && daterange('2016-12-11', null, '[)')
    )
    select distinct
           b1.space_id,
           sum(b2.spaces) as sum
    from bs b1
    join bs b2
      on b1.start_date <@ daterange(b2.start_date, b2.end_date, '[)')
     and b1.space_id = b2.space_id
    group by b1.id, b1.space_id;
    

    which given your example data results in

     space_id | sum 
    ----------+-----
            1 |  19
    (1 row)
    

    because there are 2 bookings only and they have the same start date. The query is far from optimal and for each range has to scan through all the ranges, so O(n^2) at least. In a procedural setting you'd use an interval tree or such for lookups and maybe with some suitable indexing and changes could improve the SQL as well.

    With the intersecting booking sums you can then check that there doesn't exist a sum that leaves less space than required by the search:

    with bs as (
            select *
            from booking
            where daterange(start_date, end_date, '[)')
               && daterange('2016-12-11', null, '[)')
    ), cs as (
            select distinct
                   b1.space_id,
                   sum(b2.spaces) as sum
            from bs b1
            join bs b2
              on b1.start_date <@ daterange(b2.start_date, b2.end_date, '[)')
             and b1.space_id = b2.space_id
            -- Could also use distinct & sum() over (partition by b1.id) instead
            group by b1.id, b1.space_id
    )
    select *
    from space
    where not exists(
            select 1
            from cs
            where cs.space_id = space.id
                  -- Check if there is not enough space
              and space.available_spaces - cs.sum < 2
    );
    

    From this it is straightforward to form the SQLAlchemy version:

    from functools import partial
    from sqlalchemy.dialects.postgresql import DATERANGE
    
    # Hack. Proper type for passing daterange values is
    # psycopg2.extras.DateRange, but that does not have
    # the comparator methods.
    daterange = partial(func.daterange, type_=DATERANGE)
    
    bs = session.query(Booking).\
        filter(daterange(Booking.start_date, Booking.end_date, '[)').
               overlaps(daterange('2016-12-11', None, '[)'))).\
        cte()
    
    bs1 = bs.alias()
    bs2 = bs.alias()
    
    cs = session.query(bs1.c.space_id,
                       func.sum(bs2.c.spaces).label('sum')).\
        distinct().\
        join(bs2, (bs2.c.space_id == bs1.c.space_id) &
                  daterange(bs2.c.start_date,
                            bs2.c.end_date).contains(bs1.c.start_date)).\
        group_by(bs1.c.id, bs1.c.space_id).\
        cte()
    
    query = session.query(Space).\
        filter(~session.query(cs).
               filter(cs.c.space_id == Space.id,
                      Space.available_spaces - cs.c.sum < 2).
               exists())