Please see the temporary solution at the end.
Summary (added 12/24/22 for clarification):
USPS's tracking API is not returning responses in the same format as their documentation. The actual format makes it difficult to extract the event date since there is no EventDate XML element. Worst case, I can use regex, but was wondering if there was a way to receive API responses as showing in USPS's documentation.
Details
In USPS's Track and Confirm API documentation page 19, the sample response shows <TrackSummary>
with child elements (<EventTime>, <EventDate>
, etc.):
Screenshot of USPS's sample response
Here's USPS's sample response in text:
<TrackResponse>
<TrackInfo ID=" XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ">
<GuaranteedDeliveryDate>June 24, 2022</GuaranteedDeliveryDate>
<TrackSummary>
<EventTime>9:00 am</EventTime>
<EventDate>June 22, 2022</EventDate>
<Event>Delivered, To Agent</Event>
<EventCity>AMARILLO</EventCity>
<EventState>TX</EventState>
<EventZIPCode>79109</EventZIPCode>
<EventCountry/>
<FirmName/>
<Name>RXXXXXX XXXXXXX</Name>
<AuthorizedAgent>false</AuthorizedAgent>
<DeliveryAttributeCode>23</DeliveryAttributeCode>
<GMT>14:00:00</GMT>
<GMTOffset>-05:00</GMTOffset>
</TrackSummary>
However, when performing the call, the actual XML response lacks these children elements within TrackSummary:
<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
<TrackInfo ID="9405511206213782679396">
<TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
<TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
<TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
<TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
<TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
</TrackInfo>
</TrackResponse>
This can be reproduced with Lob's USPS Postman workspace
The problem I'm trying to solve is obtaining the date from the TrackSummary data, which now requires regex since USPS's API is not returning an EventDate child element.
Is there an option when making the request to return these helpful XML child elements? I couldn't find one in the documentation and the sample responses I've seen all contain these child elements.
I've tried forming the request in Python and with Lob's USPS workspace and both XML responses lack the TrackSummary child elements.
Long-term solution (in progress 12/26/22)
@Parfait pointed out that I should use the Package Tracking “Fields” API instead of the Package Track API.
Here's how I'm currently forming the XML request with Package Track API:
from lxml import etree
def generate_url_tracking(tracking_numbers: list[str]) -> str:
"""generate the USPS tracking request url
:param: tracking_numbers - list of strings of tracking numbers
:return url: str tracking url for calling the USPS API
"""
xml = generate_xml_tracking(tracking_numbers)
url = f"{base_url}{url_vars['track']}{xml}"
return url
def generate_xml_tracking(tracking_numbers: list[str]) -> str:
"""
Generate USPS track and confirm API xml
:param tracking_numbers: list of strings of tracking numbers
:return: xml string
"""
xml = etree.Element("TrackRequest", {"USERID": config("USPS_USER")})
# loop through tracking numbers
for tracking in tracking_numbers:
etree.SubElement(xml, "TrackID", {"ID": tracking})
xml_string = etree.tostring(xml, encoding="utf8", method="xml").decode()
return xml_string
I'll update this to the Package Tracking “Fields” API request when I get time.
Temporary Solution (12/25/22)
Until USPS's actual responses match their API docs, this solution extracts the last updated date from <TrackSummary>
for several different statuses (pre-shipment, delivered, RTS, etc.)
The TRACK_SUMMARIES dict has the different statuses it's tested against. Some statuses without dates (no_info, out_for_delivery_no_date) return None.
import re
from dateutil.parser import ParserError, parse
TRACK_SUMMARIES = {
"delivered": """Your
item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
"out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
"out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
"arrived_at_post_office": """Arrived at Post Office,
Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
"acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
"pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
"rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
because of an incorrect address.""",
"no_info": "The Postal Service could not locate the tracking information for your request",
"label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
"forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
in REDDING, CA. This was because of forwarding instructions or because the
address or ZIP Code on the label was incorrect.
""",
}
def get_last_updated(track_summary: str) -> Optional[datetime]:
"""Takes the USPS TrackSummary string and return the last updated datetime"""
# remove the zip code since it interferes with the date parser
track_summary = re.sub(r"\d{5}", "", track_summary)
months_regex = "January|February|March|April|May|June|July|August|September|October|November|December"
first_result = re.search(rf"(?={months_regex}).*", track_summary)
# return early if there's no Month
if not first_result:
return
first_result = first_result.group()
# some summaries have am/pm and some don't
result_for_parser = re.search(r".*(?<=am|pm)", first_result)
if result_for_parser:
result_for_parser = result_for_parser.group()
else:
result_for_parser = first_result
try:
# fuzzy parsing is required for dates in certain summaries
result = parse(result_for_parser, fuzzy=True)
except ParserError:
return
return result
Sources:
xml.etree.ElementTree
is good job to find a child by XPath
it provides limited support for XPath expressions for locating elements in a tree. But it is good enough to find TrackSummary data
To find 'TrackSummary' children of the top-level
root.find(".//TrackSummary").text ->
Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.
This python demo
import xml.etree.ElementTree as ET
import datetime
document = """\
<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
<TrackInfo ID="9405511206213782679396">
<TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
<TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
<TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
<TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
<TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
</TrackInfo>
</TrackResponse>
"""
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
root = ET.fromstring(document)
date_time_obj = datetime.datetime.strptime(find_between(root.find(".//TrackSummary").text,' on ', '.'), '%B %d' + ", " + '%Y at %I:%M %p')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)
Result
$ python track-summary.py
Date: 2022-12-23
Time: 12:40:00
Date-time: 2022-12-23 12:40:00
Base on your updated question for Temporary Solution (12/25/22)
I added parsing part with import re
library.
Code
import re
import numpy as np
from datetime import date, time, datetime
def get_date(date_string):
months = np.array(['January','February','March','April','May','June','July','August','September','October','November','December'])
pattern = re.compile(r'(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})')
match = re.search(pattern, date_string)
if not match:
d = None
else:
month_data = match.groups()[0]
month = np.where(months==month_data)[0][0] + 1
day = int(match.groups()[1])
year = int(match.groups()[2])
try:
d = date(year, month, day)
except ValueError:
d = None # or handle error in a different way
return d
def get_hour_min(hour, min, am_pm):
hour = int(hour)
min = int(min)
add_hour = 0
if (am_pm == 'pm'):
if (hour != 12):
add_hour = 12
return [hour+add_hour, min]
def get_time(date_string):
pattern = re.compile(r'(\d{2}|\d{1})\:(\d{2})\s*(am|pm)')
matches = re.findall(pattern, date_string)
if (len(matches) == 2):
hour, min = get_hour_min(matches[0][0], matches[0][1], matches[0][2])
start_t = time(hour, min, 0)
hour, min = get_hour_min(matches[1][0], matches[1][1], matches[1][2])
end_t = time(hour, min, 0)
return [start_t, end_t]
match = re.search(pattern, date_string)
if not match:
t = None
else:
hour, min = get_hour_min(match.groups()[0], match.groups()[1], match.groups()[2])
try:
t = time(hour, min, 0)
except ValueError:
t = None # or handle error in a different way
return [t, None]
TRACK_SUMMARIES = {
"delivered": """Your
item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
"out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
"out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
"arrived_at_post_office": """Arrived at Post Office,
Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
"acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
"pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
"rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
because of an incorrect address.""",
"no_info": "The Postal Service could not locate the tracking information for your request",
"label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
"forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
in REDDING, CA. This was because of forwarding instructions or because the
address or ZIP Code on the label was incorrect.
""",
}
tracks = {}
# parsing and tuple list by key ( example : delivered, out_for_delivery and so on )
for key in TRACK_SUMMARIES:
value = TRACK_SUMMARIES[key].replace("\n", "")
found_date = get_date(value)
start_time, end_time = get_time(value)
tracks[key] = [ found_date, start_time, end_time, value ]
# print(key, '->', value)
# if (found_date != None):
# print('found date: ' + found_date.strftime("%m/%d/%Y"))
# if (start_time != None):
# if(end_time == None):
# print('time: ' + start_time.strftime("%H:%M:%S"))
# else:
# print('start time: ' + start_time.strftime("%H:%M:%S") + ' end time: ' + end_time.strftime("%H:%M:%S"))
# print('=========================================================================')
# decoding from tuple list by key ( tracks['delivered'], tracks['out_for_delivery'] and so on )
for key in tracks.keys():
found_date, start_time, end_time, value = tracks[key]
found_date = found_date.strftime("%m/%d/%Y") if found_date != None else None
start_time = start_time.strftime("%H:%M:%S") if start_time != None else None
end_time = end_time.strftime("%H:%M:%S") if end_time != None else None
print(value)
print(key)
if (found_date != None):
print('found date: ' + found_date)
if (start_time != None):
if(end_time == None):
print('time: ' + start_time)
else:
print('start time: ' + start_time + ' end time: ' + end_time)
print('------------------------------------------------------------------------')
Result
$ python reg-express.py
Your item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.
delivered
found date: 12/24/2022
time: 10:23:00
------------------------------------------------------------------------
Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.
out_for_delivery
found date: 12/13/2021
time: 06:10:00
------------------------------------------------------------------------
Out for Delivery, Expected Delivery Between 9:45am and 1:45pm
out_for_delivery_no_date
start time: 09:45:00 end time: 13:45:00
------------------------------------------------------------------------
Arrived at Post Office, Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER
arrived_at_post_office
found date: 12/11/2021
time: 21:23:00
------------------------------------------------------------------------
Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313
acceptance
found date: 12/10/2021
time: 12:54:00
------------------------------------------------------------------------
Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021
pre_shipment
found date: 12/27/2021
------------------------------------------------------------------------
Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402 because of an incorrect address.
rts
found date: 01/31/2022
time: 09:14:00
------------------------------------------------------------------------
The Postal Service could not locate the tracking information for your request
no_info
------------------------------------------------------------------------
A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON
label_prepared
found date: 12/16/2021
time: 10:47:00
------------------------------------------------------------------------
Your item was forwarded to a different address at 5:13 pm on January 4, 2022 in REDDING, CA. This was because of forwarding instructions or because the address or ZIP Code on the label was incorrect.
forwarded
found date: 01/04/2022
time: 17:13:00
------------------------------------------------------------------------
I extract from your TRACK_SUMMARIES dictionary data. This is time and date pattern, some line no date and some has Between time.
10:23 am on December 24, 2022
December 13, 2021, 6:10 am
Between 9:45am and 1:45pm
December 10, 2021, 12:54 pm
December 27, 2021
January 31, 2022 at 9:14 am
at 10:47 am on December 16, 2021
at 5:13 pm on January 4, 2022
(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})
Matched item with groups - it use in code.
(\d{2}|\d{1})\:(\d{2})\s*(am|pm)
Matched item with groups - it use in code.
References
Find string between two substrings