I'm using Linq to Sql (in fact it's Dynamic Linq to SQL that allows you to pass strings at runtime for where clauses, orderby etc.) But I'm getting some different results and it seems to be based on whether the underlying T-SQL is using the TOP keyword or using BETWEEN.
I've tried to the break the problem down into a small example, here's the scenario:
I'm using a repository pattern and the following method that simply joins 2 tables with a left outer join.
public IQueryable<TestGalleryViewModel> FetchGalleryItems()
{
var galleryItems = from painting in Gallery
join artist in Artists
on painting.ArtistID equals artist.ArtistID
into paintingArtists
from artist in paintingArtists.DefaultIfEmpty()
select new TestGalleryViewModel
{
Id = painting.PaintingID,
ArtistName = artist == default(Artist) ? "" : artist.Surname + " " + artist.Forenames,
};
return galleryItems;
}
I then have a little test method that uses the FetchGalleryItems method:
var query = respository.Test_FetchGalleryItems().Where("ArtistName.Contains(\"Adams Charles James\")");
var orderedlist = query.OrderBy("ArtistName asc");
var page1 = orderedlist.Skip(0).Take(5);
var page2 = orderedlist.Skip(5).Take(5);
The orderedList contains the following underlying values:
176 ADAMS Charles James
620 ADAMS Charles James
621 ADAMS Charles James
660 ADAMS Charles James
683 ADAMS Charles James
707 ADAMS Charles James
735 ADAMS Charles James
739 ADAMS Charles James
740 ADAMS Charles James
741 ADAMS Charles James
Which is what I would expect. But page1 contains
707 ADAMS Charles James
683 ADAMS Charles James
660 ADAMS Charles James
621 ADAMS Charles James
620 ADAMS Charles James
Which as you can see is NOT the first 5 items. Page2 contains
707 ADAMS Charles James
735 ADAMS Charles James
739 ADAMS Charles James
740 ADAMS Charles James
741 ADAMS Charles James
Whis is what I would expect, it is items 6 to 10.
The underlying T-SQL for page1 is
SELECT TOP (5) [t3].[PaintingID] AS [Id], [t3].[value] AS [ArtistName]
FROM (
SELECT [t0].[PaintingID],
(CASE
WHEN [t2].[test] IS NULL THEN CONVERT(NVarChar(101),'')
ELSE ([t2].[Surname] + ' ') + [t2].[Forenames]
END) AS [value]
FROM [dbo].[Gallery] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[ArtistID], [t1].[Surname], [t1].[Forenames]
FROM [dbo].[Artists] AS [t1]
) AS [t2] ON [t0].[ArtistID] = ([t2].[ArtistID])
) AS [t3]
WHERE [t3].[value] LIKE '%Adams Charles James%'
ORDER BY [t3].[value]
Notice it's using TOP(5)
The underlying T-SQL for page2 is
SELECT [t4].[PaintingID] AS [Id], [t4].[value] AS [ArtistName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t3].[value], [t3].[Surname], [t3].[Forenames]) AS [ROW_NUMBER], [t3].[PaintingID], [t3].[value]
FROM (
SELECT [t0].[PaintingID],
(CASE
WHEN [t2].[test] IS NULL THEN CONVERT(NVarChar(101),'')
ELSE ([t2].[Surname] + ' ') + [t2].[Forenames]
END) AS [value], [t2].[Surname], [t2].[Forenames]
FROM [dbo].[Gallery] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[ArtistID], [t1].[Surname], [t1].[Forenames]
FROM [dbo].[Artists] AS [t1]
) AS [t2] ON [t0].[ArtistID] = ([t2].[ArtistID])
) AS [t3]
WHERE [t3].[value] LIKE '%Adams Charles James%'
) AS [t4]
WHERE [t4].[ROW_NUMBER] BETWEEN 5 + 1 AND 5 + 5
ORDER BY [t4].[ROW_NUMBER]
Notice it's using BETWEEN
When I paste the T-SQL commands into SQL Express Management Studio I get the results I've described. If I used the page2 T-SQL and amended the line
WHERE [t4].[ROW_NUMBER] BETWEEN 5 + 1 AND 5 + 5
to be
WHERE [t4].[ROW_NUMBER] BETWEEN 1 AND 5
I get the results I was expecting for page1. i.e. The first 5 items.
176 ADAMS Charles James
620 ADAMS Charles James
621 ADAMS Charles James
660 ADAMS Charles James
683 ADAMS Charles James
So in a nutshell when the T-SQL uses Between instead of TOP I get the results I expected.
I'm using filtering (where clause), sorting (orderBy) and paging (skip and take) all over my app and need to handle this fairly generically.
Apologies for the long post.
Regards, Simon
Regardless of how the SQL is generated (LINQ or otherwise), if you ORDER BY
a column that has duplicate values, you can get different results every time you run the query.
When you ORDER BY [t3].[value]
you are sorting on a column containing many duplicate values.
You can test this by running a very simple SQL SELECT from Management Studio. Every time you run it, you'll get a different result.
One way to get consistent results is to use ROW_NUMBER
as you have done. Alternately, adding any other column to the ORDER BY
that is unique will cause the results to always be returned in the same order. It doesn't matter whether that other column has anything to do with your query, just that it's unique.