paginationlinq-to-sqlsql-order-by

Linq to SQL provides different results when using TOP and Between


I'm using Linq to Sql (in fact it's Dynamic Linq to SQL that allows you to pass strings at runtime for where clauses, orderby etc.) But I'm getting some different results and it seems to be based on whether the underlying T-SQL is using the TOP keyword or using BETWEEN.

I've tried to the break the problem down into a small example, here's the scenario:

I'm using a repository pattern and the following method that simply joins 2 tables with a left outer join.

   public IQueryable<TestGalleryViewModel> FetchGalleryItems()
    {

        var galleryItems = from painting in Gallery
                           join artist in Artists
                               on painting.ArtistID equals artist.ArtistID

                           into paintingArtists
                           from artist in paintingArtists.DefaultIfEmpty()

                           select new TestGalleryViewModel
                           {

                               Id = painting.PaintingID,
                               ArtistName = artist == default(Artist) ? "" : artist.Surname + " " + artist.Forenames,
                           };

        return galleryItems;
    }

I then have a little test method that uses the FetchGalleryItems method:

        var query = respository.Test_FetchGalleryItems().Where("ArtistName.Contains(\"Adams Charles James\")");

        var orderedlist = query.OrderBy("ArtistName asc");
        var page1 = orderedlist.Skip(0).Take(5);
        var page2 = orderedlist.Skip(5).Take(5);

The orderedList contains the following underlying values:

176 ADAMS Charles James
620 ADAMS Charles James
621 ADAMS Charles James
660 ADAMS Charles James
683 ADAMS Charles James
707 ADAMS Charles James
735 ADAMS Charles James
739 ADAMS Charles James
740 ADAMS Charles James
741 ADAMS Charles James

Which is what I would expect. But page1 contains

707 ADAMS Charles James
683 ADAMS Charles James
660 ADAMS Charles James
621 ADAMS Charles James
620 ADAMS Charles James

Which as you can see is NOT the first 5 items. Page2 contains

707 ADAMS Charles James
735 ADAMS Charles James
739 ADAMS Charles James
740 ADAMS Charles James
741 ADAMS Charles James

Whis is what I would expect, it is items 6 to 10.

The underlying T-SQL for page1 is

SELECT TOP (5) [t3].[PaintingID] AS [Id], [t3].[value] AS [ArtistName]
FROM (
    SELECT [t0].[PaintingID], 
        (CASE 
            WHEN [t2].[test] IS NULL THEN CONVERT(NVarChar(101),'')
            ELSE ([t2].[Surname] + ' ') + [t2].[Forenames]
         END) AS [value]
    FROM [dbo].[Gallery] AS [t0]
    LEFT OUTER JOIN (
        SELECT 1 AS [test], [t1].[ArtistID], [t1].[Surname], [t1].[Forenames]
        FROM [dbo].[Artists] AS [t1]
        ) AS [t2] ON [t0].[ArtistID] = ([t2].[ArtistID])
    ) AS [t3]
WHERE [t3].[value] LIKE '%Adams Charles James%'
ORDER BY [t3].[value]

Notice it's using TOP(5)

The underlying T-SQL for page2 is

SELECT [t4].[PaintingID] AS [Id], [t4].[value] AS [ArtistName]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t3].[value], [t3].[Surname], [t3].[Forenames]) AS [ROW_NUMBER], [t3].[PaintingID], [t3].[value]
    FROM (
        SELECT [t0].[PaintingID], 
            (CASE 
                WHEN [t2].[test] IS NULL THEN CONVERT(NVarChar(101),'')
                ELSE ([t2].[Surname] + ' ') + [t2].[Forenames]
             END) AS [value], [t2].[Surname], [t2].[Forenames]
        FROM [dbo].[Gallery] AS [t0]
        LEFT OUTER JOIN (
            SELECT 1 AS [test], [t1].[ArtistID], [t1].[Surname], [t1].[Forenames]
            FROM [dbo].[Artists] AS [t1]
            ) AS [t2] ON [t0].[ArtistID] = ([t2].[ArtistID])
        ) AS [t3]
    WHERE [t3].[value] LIKE '%Adams Charles James%'
    ) AS [t4]
WHERE [t4].[ROW_NUMBER] BETWEEN 5 + 1 AND 5 + 5
ORDER BY [t4].[ROW_NUMBER]

Notice it's using BETWEEN

When I paste the T-SQL commands into SQL Express Management Studio I get the results I've described. If I used the page2 T-SQL and amended the line

 WHERE [t4].[ROW_NUMBER] BETWEEN 5 + 1 AND 5 + 5

to be

WHERE [t4].[ROW_NUMBER] BETWEEN 1 AND 5

I get the results I was expecting for page1. i.e. The first 5 items.

176 ADAMS Charles James
620 ADAMS Charles James
621 ADAMS Charles James
660 ADAMS Charles James
683 ADAMS Charles James

So in a nutshell when the T-SQL uses Between instead of TOP I get the results I expected.

I'm using filtering (where clause), sorting (orderBy) and paging (skip and take) all over my app and need to handle this fairly generically.

Apologies for the long post.

Regards, Simon


Solution

  • Regardless of how the SQL is generated (LINQ or otherwise), if you ORDER BY a column that has duplicate values, you can get different results every time you run the query.

    When you ORDER BY [t3].[value] you are sorting on a column containing many duplicate values.

    You can test this by running a very simple SQL SELECT from Management Studio. Every time you run it, you'll get a different result.

    One way to get consistent results is to use ROW_NUMBER as you have done. Alternately, adding any other column to the ORDER BY that is unique will cause the results to always be returned in the same order. It doesn't matter whether that other column has anything to do with your query, just that it's unique.