sql-serverfor-xml-path

How to add distinct to a string concat via for xml type query


I have a query which works well except that it doesn't return unique values, but repeats some.

If the columns of the query were independent and not part of a string concatenation I am able to add distinct as long as I also add each item in the select to the order by clause, which is what you would expect.

The complication comes when I use a concatenated string (as presented here) because, by design, if I add a distinct to the select, SQL Server will complain that one needs to include each order by item in the select and vice versa, but how does one do that when a string concat via for xml is being done? That is where I am stuck.

I tried various incantations and haven't landed on anything that works with the string concatenation scheme I have presented here.

Unfortunately I must return the data concatenated as presented here as that data must appear all in one column which is later consumed by some SQLCLR code.

So again, the select needs a distinct keyword, but in this current form, SQL Server will need some adjustments before it will accept it.

(Also note that the code is part of a larger query so the alias and the ending comma can be removed and a select added to the beginning and @Warehouses and s.itemcode can be declared as variables in a test environment which is how I've tested various [failed] approaches.)

My sense is that some SQL Server XML skills might be needed or turning this into a subquery with an outer query that somehow picks up the now unique subquery columns and concatenates them.

Any ideas?

coalesce(stuff((select '; ' + convert(nvarchar, t0.docnum) + ': '
                    + convert(nvarchar, t0.duedate, 101) + ', '
                    + convert(nvarchar, convert(int, t0.plannedqty))
                from owor t0
                inner join wor1 t1 on t0.docentry = t1.docentry
                where t0.status in ('P','R')
                  and t1.warehouse in (select value from string_split(@Warehouses,','))
                  and t0.itemcode = s.itemcode
                order by t0.duedate
                for xml path(''), type).value('text()[1]', 'nvarchar(max)'), 1, 2, N''), N'') as p_pros_data,

Solution

  • distinct is in any case often less efficient when used on calculated columns, as it cannot use indexes.

    So just use a group by instead

    coalesce(
      stuff((
        select '; ' + convert(nvarchar(30), o.docnum) + ': '
                    + convert(nvarchar(30), o.duedate, 101) + ', '
                    + convert(nvarchar(30), convert(int, o.plannedqty))
        from owor o
        inner join wor1 w1 on o.docentry = w1.docentry
        where o.status in ('P','R')
          and w1.warehouse in (select value from string_split(@Warehouses,','))
          and o.itemcode = s.itemcode
        group by
          o.docnum,
          o.duedate,
          o.plannedqty
        order by
          o.duedate
        for xml path(''), type
        ).value('text()[1]', 'nvarchar(max)'), 1, 2, N''
      ),
      N'') as p_pros_data,
    

    If you have SQL Server 2017+ available, you can use the much more efficient STRING_AGG function. You still need to group separately, because there is no DISTINCT modifier, it's also more efficient as mentioned.

    coalesce(
      (
        select string_agg(convert(nvarchar(30), o.docnum) + ': '
                    + convert(nvarchar(30), o.duedate, 101) + ', '
                    + convert(nvarchar(30), convert(int, o.plannedqty)),
               '; ')
                   within group (order by t0.duedate)
        from (
            select
              o.docnum,
              o.duedate,
              o.plannedqty
            from owor o
            inner join wor1 w1 on o.docentry = w1.docentry
            where o.status in ('P','R')
              and w1.warehouse in (select value from string_split(@Warehouses,','))
              and o.itemcode = s.itemcode
            group by
              o.docnum,
              o.duedate,
              o.plannedqty
        ) o
      ),
      N'') as p_pros_data,