sqlrsql-servermicrosoft-machine-learning-server

Query for R Machine Learning Services - Filtering Categories in Where Clause


I am trying to run a query in SQL Server using the Machine Learning Services extension.

So this is the problem I am having;

DECLARE @query NVARCHAR(MAX) = '
SELECT CATEGORY1,CATEGORY2
FROM TABLE
WHERE CATEGORY1 = 'A'
AND CATEGORY2 = 'B'
'
EXEC sp_execute_external_script @language = N'R',
@script = N'data <- Rscript'
,@input_data_1 = @query
,@input_data_1_name= N'data'

You see I am creating a variable with a string value, which is the SQL Query. This query is then passed into the stored procedure that runs R in SQL Server. The problem is the WHERE clause in the query. I have string values specified and it breaks the query because now I have Apostrophes within Apostrophes.

I have tried to replace the apostrophes with different characters, but I cannot get this to work. Is establishing string values in the query possible in this fashion?


Solution

  • When debugging these, I find it best to declare the query and just print the results, rather than just execute it.

    If you want to reference an apostrophe within a string, you actually need to put in two apostrophes e.g.,

    DECLARE @query NVARCHAR(MAX) = '
    SELECT CATEGORY1,CATEGORY2
    FROM TABLE
    WHERE CATEGORY1 = ''A''
    AND CATEGORY2 = ''B''
    '
    

    Now, if A and B are actually variables or other data sources, you often cannot just refer to them as their variable names within the dynamic SQL, you need to add it as text e.g.,

    DECLARE @query NVARCHAR(MAX) = '
    SELECT CATEGORY1,CATEGORY2
    FROM TABLE
    WHERE CATEGORY1 = ''' + @A + '''
    AND CATEGORY2 = ''' + @B + '''
    '
    

    There's one last bridge - if @A or @B can have apostrophes in them. That becomes awkward.

    Basically, if it was written as normal SQL (not as an nvarchar variable), and assuming @A was "Bob's" and @B is "Peter's", you'd want the output to be

    SELECT CATEGORY1,CATEGORY2
    FROM TABLE
    WHERE CATEGORY1 = 'Bob''s'
    AND CATEGORY2 = 'Peter''s'
    

    However, this means when creating the variable version, you'd need to replace every apostrophe with two apostrophes. However, that means you'd need to replace the apostrophes in the categories with double apostrophes - and that your REPLACE function, as it's referring to apostrophes, needs to use double apostophes e.g.,

    DECLARE @query NVARCHAR(MAX) = '
    SELECT CATEGORY1,CATEGORY2
    FROM TABLE
    WHERE CATEGORY1 = ''' + REPLACE(@A,'''','''''') + '''
    AND CATEGORY2 = ''' + REPLACE(@B,'''','''''') + '''
    '
    

    Here's a DB<>fiddle with examples.