pythonxmlhttpusps

USPS Package Track API is not returning XML child elements for TrackSummary


Please see the temporary solution at the end.

Summary (added 12/24/22 for clarification):

USPS's tracking API is not returning responses in the same format as their documentation. The actual format makes it difficult to extract the event date since there is no EventDate XML element. Worst case, I can use regex, but was wondering if there was a way to receive API responses as showing in USPS's documentation.

Details

In USPS's Track and Confirm API documentation page 19, the sample response shows <TrackSummary> with child elements (<EventTime>, <EventDate>, etc.):

Screenshot of USPS's sample response

Here's USPS's sample response in text:

<TrackResponse>
 <TrackInfo ID=" XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ">
 <GuaranteedDeliveryDate>June 24, 2022</GuaranteedDeliveryDate>
 <TrackSummary>
 <EventTime>9:00 am</EventTime>
 <EventDate>June 22, 2022</EventDate>
 <Event>Delivered, To Agent</Event>
 <EventCity>AMARILLO</EventCity>
 <EventState>TX</EventState>
 <EventZIPCode>79109</EventZIPCode>
 <EventCountry/>
 <FirmName/>
 <Name>RXXXXXX XXXXXXX</Name>
 <AuthorizedAgent>false</AuthorizedAgent>
 <DeliveryAttributeCode>23</DeliveryAttributeCode>
 <GMT>14:00:00</GMT>
 <GMTOffset>-05:00</GMTOffset>
 </TrackSummary>

However, when performing the call, the actual XML response lacks these children elements within TrackSummary:

<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
    <TrackInfo ID="9405511206213782679396">
        <TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
        <TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
        <TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
        <TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
        <TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
    </TrackInfo>
</TrackResponse>

This can be reproduced with Lob's USPS Postman workspace

The problem I'm trying to solve is obtaining the date from the TrackSummary data, which now requires regex since USPS's API is not returning an EventDate child element.

Is there an option when making the request to return these helpful XML child elements? I couldn't find one in the documentation and the sample responses I've seen all contain these child elements.

I've tried forming the request in Python and with Lob's USPS workspace and both XML responses lack the TrackSummary child elements.

Long-term solution (in progress 12/26/22)

@Parfait pointed out that I should use the Package Tracking “Fields” API instead of the Package Track API.

Here's how I'm currently forming the XML request with Package Track API:

from lxml import etree

def generate_url_tracking(tracking_numbers: list[str]) -> str:
    """generate the USPS tracking request url
    :param: tracking_numbers - list of strings of tracking numbers
    :return url: str tracking url for calling the USPS API
    """
    xml = generate_xml_tracking(tracking_numbers)
    url = f"{base_url}{url_vars['track']}{xml}"
    return url

def generate_xml_tracking(tracking_numbers: list[str]) -> str:
    """
    Generate USPS track and confirm API xml
    :param tracking_numbers: list of strings of tracking numbers
    :return: xml string
    """
    xml = etree.Element("TrackRequest", {"USERID": config("USPS_USER")})
    # loop through tracking numbers
    for tracking in tracking_numbers:
        etree.SubElement(xml, "TrackID", {"ID": tracking})
    xml_string = etree.tostring(xml, encoding="utf8", method="xml").decode()
    return xml_string

I'll update this to the Package Tracking “Fields” API request when I get time.

Temporary Solution (12/25/22)

Until USPS's actual responses match their API docs, this solution extracts the last updated date from <TrackSummary> for several different statuses (pre-shipment, delivered, RTS, etc.)

The TRACK_SUMMARIES dict has the different statuses it's tested against. Some statuses without dates (no_info, out_for_delivery_no_date) return None.

import re
from dateutil.parser import ParserError, parse

TRACK_SUMMARIES = {
    "delivered": """Your
     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
    "out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
    "out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
    "arrived_at_post_office": """Arrived at Post Office,
     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
    "acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
    "pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
    "rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
     because of an incorrect address.""",
    "no_info": "The Postal Service could not locate the tracking information for your request",
    "label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
    "forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
        in REDDING, CA. This was because of forwarding instructions or because the
        address or ZIP Code on the label was incorrect.
        """,
}

def get_last_updated(track_summary: str) -> Optional[datetime]:
    """Takes the USPS TrackSummary string and return the last updated datetime"""
    # remove the zip code since it interferes with the date parser
    track_summary = re.sub(r"\d{5}", "", track_summary)
    months_regex = "January|February|March|April|May|June|July|August|September|October|November|December"
    first_result = re.search(rf"(?={months_regex}).*", track_summary)
    # return early if there's no Month
    if not first_result:
        return
    first_result = first_result.group()
    # some summaries have am/pm and some don't
    result_for_parser = re.search(r".*(?<=am|pm)", first_result)
    if result_for_parser:
        result_for_parser = result_for_parser.group()
    else:
        result_for_parser = first_result
    try:
        # fuzzy parsing is required for dates in certain summaries
        result = parse(result_for_parser, fuzzy=True)
    except ParserError:
        return
    return result

Sources:

Using the dateutil parser Regex for finding months


Solution

  • xml.etree.ElementTree is good job to find a child by XPath

    it provides limited support for XPath expressions for locating elements in a tree. But it is good enough to find TrackSummary data

    To find 'TrackSummary' children of the top-level

    root.find(".//TrackSummary").text ->
    Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.
    

    This python demo

    import xml.etree.ElementTree as ET
    import datetime
    
    document = """\
    <?xml version="1.0" encoding="UTF-8"?>
    <TrackResponse>
        <TrackInfo ID="9405511206213782679396">
            <TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
            <TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
            <TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
            <TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
            <TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
            <TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
            <TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
            <TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
            <TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
        </TrackInfo>
    </TrackResponse>
    """
    
    def find_between( s, first, last ):
        try:
            start = s.index( first ) + len( first )
            end = s.index( last, start )
            return s[start:end]
        except ValueError:
            return ""
    
    root = ET.fromstring(document)
    
    date_time_obj = datetime.datetime.strptime(find_between(root.find(".//TrackSummary").text,' on ', '.'), '%B %d' + ", " + '%Y at %I:%M %p')
    print('Date:', date_time_obj.date())
    print('Time:', date_time_obj.time())
    print('Date-time:', date_time_obj)
    

    Result

    $ python track-summary.py
    Date: 2022-12-23
    Time: 12:40:00
    Date-time: 2022-12-23 12:40:00
    

    Updated for Reg expression parsing

    Base on your updated question for Temporary Solution (12/25/22) I added parsing part with import re library.

    Code

    import re
    import numpy as np
    from datetime import date, time, datetime
    
    def get_date(date_string):
        months = np.array(['January','February','March','April','May','June','July','August','September','October','November','December'])
        pattern = re.compile(r'(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})')
        match = re.search(pattern, date_string)
        if not match:
            d = None
        else:
            month_data = match.groups()[0]
            month = np.where(months==month_data)[0][0] + 1
            day = int(match.groups()[1])
            year = int(match.groups()[2])
            try:
                d = date(year, month, day)
            except ValueError:
                d = None  # or handle error in a different way
        return d
    
    def get_hour_min(hour, min, am_pm):
        hour = int(hour)
        min = int(min)
        add_hour = 0
        if (am_pm == 'pm'):
            if (hour != 12):
                add_hour = 12
        return [hour+add_hour,  min]
    
    def get_time(date_string):
        pattern = re.compile(r'(\d{2}|\d{1})\:(\d{2})\s*(am|pm)')
        matches = re.findall(pattern, date_string)
        if (len(matches) == 2):
            hour, min = get_hour_min(matches[0][0], matches[0][1], matches[0][2])
            start_t = time(hour, min, 0)
            hour, min = get_hour_min(matches[1][0], matches[1][1], matches[1][2])
            end_t = time(hour, min, 0)
            return [start_t, end_t]
    
        match = re.search(pattern, date_string)
        if not match:
            t = None
        else:
            hour, min = get_hour_min(match.groups()[0], match.groups()[1], match.groups()[2])
            try:
                t = time(hour, min, 0)
            except ValueError:
                t = None  # or handle error in a different way
        return [t, None]
    
    TRACK_SUMMARIES = {
        "delivered": """Your
         item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
        "out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
        "out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
        "arrived_at_post_office": """Arrived at Post Office,
         Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
        "acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
        "pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
        "rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
         because of an incorrect address.""",
        "no_info": "The Postal Service could not locate the tracking information for your request",
        "label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
        "forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
            in REDDING, CA. This was because of forwarding instructions or because the
            address or ZIP Code on the label was incorrect.
            """,
    }
    
    tracks = {}
    # parsing and tuple list by key ( example : delivered, out_for_delivery and so on )
    for key in TRACK_SUMMARIES:
        value = TRACK_SUMMARIES[key].replace("\n", "")
        found_date = get_date(value)
        start_time, end_time = get_time(value)
        tracks[key] = [ found_date, start_time, end_time, value ]
        # print(key, '->', value)
        # if (found_date != None):
        #     print('found date: ' + found_date.strftime("%m/%d/%Y"))
        # if (start_time != None):
        #     if(end_time == None):
        #         print('time: ' + start_time.strftime("%H:%M:%S"))
        #     else:
        #         print('start time: ' + start_time.strftime("%H:%M:%S") + ' end time: ' + end_time.strftime("%H:%M:%S"))
        # print('=========================================================================')
    
    # decoding from tuple list by key ( tracks['delivered'], tracks['out_for_delivery'] and so on )
    for key in tracks.keys():
        found_date, start_time, end_time, value = tracks[key]
        
        found_date = found_date.strftime("%m/%d/%Y") if found_date != None else None
        start_time = start_time.strftime("%H:%M:%S") if start_time != None else None
        end_time = end_time.strftime("%H:%M:%S") if end_time != None else None
    
        print(value)
        print(key)
        if (found_date != None):
            print('found date: ' + found_date)
        if (start_time != None):
            if(end_time == None):
                print('time: ' + start_time)
            else:
                print('start time: ' + start_time + ' end time: ' + end_time)
        print('------------------------------------------------------------------------')
    

    Result

    $ python reg-express.py
    Your     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.
    delivered
    found date: 12/24/2022
    time: 10:23:00
    ------------------------------------------------------------------------
    Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.
    out_for_delivery
    found date: 12/13/2021
    time: 06:10:00
    ------------------------------------------------------------------------
    Out for Delivery, Expected Delivery Between 9:45am and 1:45pm
    out_for_delivery_no_date
    start time: 09:45:00 end time: 13:45:00
    ------------------------------------------------------------------------
    Arrived at Post Office,     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER
    arrived_at_post_office
    found date: 12/11/2021
    time: 21:23:00
    ------------------------------------------------------------------------
    Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313
    acceptance
    found date: 12/10/2021
    time: 12:54:00
    ------------------------------------------------------------------------
    Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021
    pre_shipment
    found date: 12/27/2021
    ------------------------------------------------------------------------
    Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402     because of an incorrect address.
    rts
    found date: 01/31/2022
    time: 09:14:00
    ------------------------------------------------------------------------
    The Postal Service could not locate the tracking information for your request
    no_info
    ------------------------------------------------------------------------
    A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON
    label_prepared
    found date: 12/16/2021
    time: 10:47:00
    ------------------------------------------------------------------------
    Your item was forwarded to a different address at 5:13 pm on January 4, 2022        in REDDING, CA. This was because of forwarding instructions or because the        address or ZIP Code on the label was incorrect.
    forwarded
    found date: 01/04/2022
    time: 17:13:00
    ------------------------------------------------------------------------
    
    

    Date/time patterns

    I extract from your TRACK_SUMMARIES dictionary data. This is time and date pattern, some line no date and some has Between time.

    10:23 am on December 24, 2022
    December 13, 2021, 6:10 am
    Between 9:45am and 1:45pm
    December 10, 2021, 12:54 pm
    December 27, 2021
    January 31, 2022 at 9:14 am
    at 10:47 am on December 16, 2021
    at 5:13 pm on January 4, 2022
    

    Date parsing

    (January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})
    

    enter image description here

    enter image description here Matched item with groups - it use in code.

    enter image description here

    Time parsing

    (\d{2}|\d{1})\:(\d{2})\s*(am|pm)
    

    enter image description here

    enter image description here

    Matched item with groups - it use in code.

    enter image description here

    References

    Find string between two substrings

    Converting Strings Using datetime

    Regexper

    regular expression 101