Been banging my head at this for a few trying to come up with a clever way to regex (or if someone has a better idea) a size value from a very random string. I've setup the below test to show what I'm working with, I'm trying to get the Final output in this statement to be like these in Green, and not these in Red by using the @pattern in the query to pull out the...
Number x Number
Hoping there's a regex or string guru out there that might have an idea for me :)
DECLARE @pattern AS VARCHAR(100)
SET @pattern = '%[0-9][0-9. x][0-9.x ][0-9. x]%'
BEGIN
WITH cte AS (
SELECT 'Italy Terrazzo Sacra Nero 24x24 Honed' [Name]
UNION
SELECT 'Nero Marquina 1x3 Herringbone' [Name]
UNION
SELECT 'Myorka Blue 2x8' [Name]
UNION
SELECT 'Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4' [Name]
UNION
SELECT 'Myorka Blue 2x8' [Name]
UNION
SELECT 'Nero Marquina 1x3 Herringbone' [Name]
UNION
SELECT 'Broadway Lvt Ash 9x72 Rigid Core Click- 6.0mm/28mil Wear Layer' [Name]
UNION
SELECT 'Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36' [Name]
UNION
SELECT 'Speak Leather Black 24 x 24' [Name]
UNION
SELECT 'Accent Montana White 12 x 36 Glossy' [Name]
)
SELECT
--FULL NAME--
[Name]
--HELPERS TO SEE WHAT I'M TESTING--
,PATINDEX('%[0-9]%x%', [Name]) AS [START]
,SUBSTRING([Name], PATINDEX('%[0-9]%x%', [Name]), LEN([Name]))
,REVERSE([Name]) AS [REVERSE]
,LEN([Name]) AS [STRLenght]
,PATINDEX(@pattern, REVERSE([Name])) + 1
,LEN([Name]) - PATINDEX('%[0-9][^A-z]x%', REVERSE([Name])) + 1 AS [END]
--FULL CALCULATION FOR FINAL OUTPUT--
,CASE WHEN [Name] LIKE '%[0-9] x [0-9]%'
THEN SUBSTRING([Name], PATINDEX('%[0-9]%x%', [Name]), (LEN([Name]) - PATINDEX('%[0-9]%x%', REVERSE([Name])) + 1) - PATINDEX('%[0-9]%x%', [Name]) + 1)
WHEN [Name] LIKE '%[0-9]x[0-9]%'
THEN REPLACE(SUBSTRING([Name], PATINDEX(@pattern, [Name]), (LEN([Name]) - PATINDEX(@pattern, REVERSE([Name])) + 1) - PATINDEX(@pattern, [Name]) + 1), 'x', ' x ')
ELSE NULL
END AS [Final]
FROM cte
END
I'm still trying to make sense of the query from Yitzhak has suggested. There are still a few outliers I'm trying to attack by changing your
,c.query('
for $x in /root/r[lower-case(text()[1])="x"]
let $pos := count(root/r[. << $x]) + 1
let $before := /root/r[$pos - 1]
,$twobefore := /root/r[$pos - 2]
,$after := /root/r[$pos + 1]
return
if (xs:decimal($before[1]) instance of xs:decimal and
xs:decimal($after[1]) instance of xs:decimal and
xs:string($twobefore[1]) instance of xs:string)
then data(($before, $x, $after))
else if (xs:decimal($before[1]) instance of xs:decimal and
xs:decimal($after[1]) instance of xs:decimal and
xs:decimal($twobefore[1]) instance of xs:decimal)
then data(($twobefore, $before, $x, $after))
else()
').value('text()[1]', 'VARCHAR(20)') AS result
This was in an attempt to attack this outlier case but doesn't seem to be getting the $twobefore which what I thought would get the 1 instead of missing it.
Baroque Crackled 1 3/4 X 6 Chair Rail Blanco
Please try the following solution based on tokenization.
It is leveraging SQL Server built-in XML and XQuery functionality.
Notable points:
CROSS APPLY
is tokenizing input string as XML.[lower-case(text()[1])="x"]
./root/r[$pos - 1]
and /root/r[$pos + 1]
XPath predicates get
preceding and following tokens.where
clause is checking for a decimal data type for at least one of
the surrounding values.SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(256));
INSERT @tbl (tokens) VALUES
('Italy Terrazzo Sacra Nero 24x24 Honed'),
('Nero Marquina 1x3 Herringbone'),
('Myorka Blue 2x8'),
('Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4'),
('Nero Marquina 1x3 Herringbone'),
('Broadway Lvt Ash 9x72 Rigid Core Click- 6.0mm/28mil Wear Layer'),
('Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36'),
('Speak Leather Black 24 x 24'),
('Accent Montana White 12 x 36 Glossy'),
('Baroque Crackled 1/2 X 6 Pencil Capri'),
('Banza Green Cielo 8" Hex'),
('Masia Torello Nero Brillo 3/4x12 Pencil'),
('Baroque Crackled 1 3/4 X 6 Chair Rail Blanco');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*
, c.query('
for $x in /root/r[lower-case(text()[1])="x"]
let $pos := count(root/r[. << $x]) + 1
let $before2 := /root/r[$pos - 2],
$before := /root/r[$pos - 1],
$after := /root/r[$pos + 1]
where xs:decimal($before[1]) instance of xs:decimal
or xs:decimal($after[1]) instance of xs:decimal
return data((if (xs:decimal($before2[1]) instance of xs:decimal) then $before2 else ()
, $before, $x, $after))
').value('text()[1]', 'VARCHAR(20)') AS result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(REPLACE(REPLACE(tokens,'x', ' x '),SPACE(2),SPACE(1)), @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
ORDER BY id;
Output
id | tokens | result |
---|---|---|
1 | Italy Terrazzo Sacra Nero 24x24 Honed | 24 x 24 |
2 | Nero Marquina 1x3 Herringbone | 1 x 3 |
3 | Myorka Blue 2x8 | 2 x 8 |
4 | Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4 | 6.3 x 48.4 |
5 | Nero Marquina 1x3 Herringbone | 1 x 3 |
6 | Broadway Lvt Ash 9x72 Rigid Core Click- 6.0mm/28mil Wear Layer | 9 x 72 |
7 | Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36 | 18 x 36 |
8 | Speak Leather Black 24 x 24 | 24 x 24 |
9 | Accent Montana White 12 x 36 Glossy | 12 x 36 |
10 | Baroque Crackled 1/2 X 6 Pencil Capri | 1/2 x 6 |
11 | Banza Green Cielo 8" Hex | NULL |
12 | Masia Torello Nero Brillo 3/4x12 Pencil | 3/4 x 12 |
13 | Baroque Crackled 1 3/4 X 6 Chair Rail Blanco | 1 3/4 x 6 |