I have a table with address1, city, state, and postal code. However, some address1 will also contains city, state and postal code (separated by either comma or space or both). Example:
Address1: 9999 western Rd, Los Angeles, CA, 90001
City: Los Angeles
State: CA
Postal: 90001
I want Address1 to be: 9999 western Rd
I've tried the following SQL to fix those addresses: (To simplify here I assume all fields are not null or empty, in the real system for countries without states it will be either empty or same as the country name.)
SELECT LEFT(address1, PATINDEX('%[, ]'+city+'%', billingAddress) - 1)
FROM addresses
WHERE address1 like '%[, ]'+city+'%'+state+'%'+postal_code+'%'
AND PATINDEX('%[, ]'+City+'%', address1) < 12
The problem here is many street names include the city they are in. For example, with 9999 KIRKLAND WAY
in city KIRKLAND
, this query will cause the remaining street name from this query to be 9999
.
Is there anyway to solve this problem in SQL?
First, I will agree with other comments that there is no flawless system - working with addresses can be a giant pain. That being said...
If your Address1 values are sanitized enough to only use commas to separate the various address parts, you might try using string_split to break it apart and then filter out all but the street information. If it's not that clean, maybe this can at least be a starting point. And this definitely doesn't comprehensively cover international information or any other weirdness that may crop up.
The fiddle example provided below uses some dummy addresses where I've overpopulated a few of the Address1 values and then filter them down to streets with string_split.
CREATE TABLE [dbo].[addresses](
[Address1] [nvarchar](max) NULL,
[City] [nvarchar](max) NULL,
[State] [char](2) NULL,
[Postal] [varchar](20) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'204 Cowley Avenue', N'Corpus Christi', N'FL', N'94075-8586')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'273 Green Milton Road', N'Albuquerque', N'ND', N'46038-4384')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'662 Clarendon Street', N'Detroit', N'RI', N'27706-7936')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'57 Oak Boulevard', N'St. Petersburg', N'UT', N'25607-1143')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'53 South Rocky Fabien Blvd., Des Moines, IL, 53316-8523', N'Des Moines', N'IL', N'53316-8523')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'38 Shreveport Way, Shreveport, NC, 72394-4783', N'Shreveport', N'NC', N'72394-4783')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'938 South First Road', N'Arlington', N'MN', N'83467-8269')
GO
INSERT [dbo].[addresses] ([Address1], [City], [State], [Postal]) VALUES (N'674 South Cowley Avenue, Grand Rapids, KY, 51374-4173', N'Grand Rapids', N'KY', N'51374-4173')
GO
SELECT
LTRIM(RTRIM(t.value)) AS fixedAddress1,
a.* from addresses a
CROSS APPLY string_split(a.Address1, ',') t
WHERE LTRIM(RTRIM(t.value)) <> a.City
AND LTRIM(RTRIM(t.value)) <> a.State
AND LTRIM(RTRIM(t.value)) <> a.Postal