Wednesday, March 28, 2012

How can I refactor these GROUP BY queries to be faster?

(SQL Server 2000, SP3a)
I'm trying to construct a VIEW that will return *one* row per DossierUNID,
and the row should be the DossierUNID row with the lowest SortOrder value.
However, I have some cases where SortOrder is NULL (which should be treated
as the lowest). Additionally, some SortOrder values may be *duplicated*.
Consider the following representative data:
if (object_id(N'[tempdb].[dbo].[#Education]') is not NULL) drop table
[dbo].[#Education]
create table [dbo].[#Education]
(
[RowUNID] uniqueidentifier,
[DossierUNID] uniqueidentifier,
[SortOrder] int,
[TStamp] timestamp
)
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'3C2CE763-DAF8-46CD-8B09-67979CA68325',
'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 1
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'C98A6838-2523-4507-A4F3-FFC6FAD51329',
'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 1
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'4E0733DF-B15B-4895-925B-6A4ADF0C1BC1',
'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 2
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'885B1138-E13A-4D87-8F5A-B8C379B1F328',
'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 3
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'D14DDA37-1E51-42BF-BC0A-25744AB638C6',
'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 4
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'EC70574F-6470-4348-B471-4DE3E81D402C',
'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 5
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'B3010834-9CBD-4075-84BA-D0AA3B2D8420',
'CFD039FC-EAF4-4F0B-B6DB-CB143948CE26', NULL
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'D0C43744-B59A-48BC-99ED-61A9D759261C',
'CFD039FC-EAF4-4F0B-B6DB-CB143948CE26', NULL
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'BC97CB2C-C629-47EB-8F84-3EF97B99C664',
'CFD039FC-EAF4-4F0B-B6DB-CB143948CE26', 1
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'D5E90FED-3023-470B-ADA2-4F809B746F81',
'8B359795-73CE-469C-B797-5AD55A9B023A', 1
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'446C0422-2F90-4B01-957C-9ADA2669F385',
'8B359795-73CE-469C-B797-5AD55A9B023A', 2
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'8383D1B5-ACBD-46CD-B4A0-A83AB23D6CCC',
'8B359795-73CE-469C-B797-5AD55A9B023A', 3
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'591C60B8-23A8-4D01-A519-3515A1032951',
'8B359795-73CE-469C-B797-5AD55A9B023A', 4
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'0EEED645-27B0-459C-B964-BF1B5BCB35C0',
'5548776F-6DEE-4AEA-8CD4-4108D331BE63', NULL
insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder]) select
'1544DF52-F93A-4871-86C9-E67DB976BDB7',
'5548776F-6DEE-4AEA-8CD4-4108D331BE63', NULL
I have a couple of queries that yield the results that I'm looking for:
select e.*
from (
select [RowUNIDChecksum] = min(binary_checksum(e.[RowUNID]))
from (
select [DossierUNID] = e.[DossierUNID],
[SortOrder] = min(isnull(e.[SortOrder],
0))
from [dbo].[#Education] as e with (nolock)
group by e.[DossierUNID]
) as q
inner join [dbo].[#Education] as e with (nolock) on
e.[DossierUNID] = q.[DossierUNID]
and
isnull(e.[SortOrder], 0) = q.[SortOrder]
group by e.[DossierUNID]
) as q
inner join [dbo].[#Education] as e with (nolock) on
binary_checksum(e.[RowUNID]) = q.[RowUNIDChecksum]
Or:
select e.*
from [dbo].[#Education] as e with (nolock)
inner join (
select [DossierUNID] = e.[DossierUNID],
[Key] = min(right(replicate('0', 3) +
convert(varchar(3), isnull(e.[SortOrder], 0)), 3) + convert(varchar(36),
e.[RowUNID]))
from [dbo].[#Education] as e with (nolock)
group by e.[DossierUNID]
) as q on right(q.[Key], 36) = e.[RowUNID]
But I'm wondering if there's a better (read: more performant ;-) way to
accomplish this.
Thanks for any help anyone can provide! :-)
John Peterson
OK, what about that ?

> I'm trying to construct a VIEW that will return *one* row per DossierUNID,
> and the row should be the DossierUNID row with the lowest SortOrder value.
Select RowUNID, MIN(ISNULL(SortOrder,-1))
FROM #Education
GROUP BY RowUNID
HTH, Jens SUessmeyer.
http://www.sqlserver2005.de
"John Peterson" <j0hnp@.comcast.net> schrieb im Newsbeitrag
news:eHadnvYUFHA.3140@.TK2MSFTNGP14.phx.gbl...
> (SQL Server 2000, SP3a)
> I'm trying to construct a VIEW that will return *one* row per DossierUNID,
> and the row should be the DossierUNID row with the lowest SortOrder value.
> However, I have some cases where SortOrder is NULL (which should be
> treated as the lowest). Additionally, some SortOrder values may be
> *duplicated*.
> Consider the following representative data:
> if (object_id(N'[tempdb].[dbo].[#Education]') is not NULL) drop table
> [dbo].[#Education]
> create table [dbo].[#Education]
> (
> [RowUNID] uniqueidentifier,
> [DossierUNID] uniqueidentifier,
> [SortOrder] int,
> [TStamp] timestamp
> )
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '3C2CE763-DAF8-46CD-8B09-67979CA68325',
> 'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 1
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'C98A6838-2523-4507-A4F3-FFC6FAD51329',
> 'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 1
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '4E0733DF-B15B-4895-925B-6A4ADF0C1BC1',
> 'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 2
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '885B1138-E13A-4D87-8F5A-B8C379B1F328',
> 'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 3
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'D14DDA37-1E51-42BF-BC0A-25744AB638C6',
> 'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 4
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'EC70574F-6470-4348-B471-4DE3E81D402C',
> 'A552B76B-FCD7-4D7C-BE5E-AA17A730904B', 5
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'B3010834-9CBD-4075-84BA-D0AA3B2D8420',
> 'CFD039FC-EAF4-4F0B-B6DB-CB143948CE26', NULL
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'D0C43744-B59A-48BC-99ED-61A9D759261C',
> 'CFD039FC-EAF4-4F0B-B6DB-CB143948CE26', NULL
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'BC97CB2C-C629-47EB-8F84-3EF97B99C664',
> 'CFD039FC-EAF4-4F0B-B6DB-CB143948CE26', 1
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select 'D5E90FED-3023-470B-ADA2-4F809B746F81',
> '8B359795-73CE-469C-B797-5AD55A9B023A', 1
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '446C0422-2F90-4B01-957C-9ADA2669F385',
> '8B359795-73CE-469C-B797-5AD55A9B023A', 2
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '8383D1B5-ACBD-46CD-B4A0-A83AB23D6CCC',
> '8B359795-73CE-469C-B797-5AD55A9B023A', 3
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '591C60B8-23A8-4D01-A519-3515A1032951',
> '8B359795-73CE-469C-B797-5AD55A9B023A', 4
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '0EEED645-27B0-459C-B964-BF1B5BCB35C0',
> '5548776F-6DEE-4AEA-8CD4-4108D331BE63', NULL
> insert into [dbo].[#Education]([RowUNID], [DossierUNID], [SortOrder])
> select '1544DF52-F93A-4871-86C9-E67DB976BDB7',
> '5548776F-6DEE-4AEA-8CD4-4108D331BE63', NULL
> I have a couple of queries that yield the results that I'm looking for:
> select e.*
> from (
> select [RowUNIDChecksum] = min(binary_checksum(e.[RowUNID]))
> from (
> select [DossierUNID] = e.[DossierUNID],
> [SortOrder] = min(isnull(e.[SortOrder],
> 0))
> from [dbo].[#Education] as e with (nolock)
> group by e.[DossierUNID]
> ) as q
> inner join [dbo].[#Education] as e with (nolock) on
> e.[DossierUNID] = q.[DossierUNID]
> and
> isnull(e.[SortOrder], 0) = q.[SortOrder]
> group by e.[DossierUNID]
> ) as q
> inner join [dbo].[#Education] as e with (nolock) on
> binary_checksum(e.[RowUNID]) = q.[RowUNIDChecksum]
> Or:
> select e.*
> from [dbo].[#Education] as e with (nolock)
> inner join (
> select [DossierUNID] = e.[DossierUNID],
> [Key] = min(right(replicate('0', 3) +
> convert(varchar(3), isnull(e.[SortOrder], 0)), 3) + convert(varchar(36),
> e.[RowUNID]))
> from [dbo].[#Education] as e with (nolock)
> group by e.[DossierUNID]
> ) as q on right(q.[Key], 36) = e.[RowUNID]
>
> But I'm wondering if there's a better (read: more performant ;-) way to
> accomplish this.
> Thanks for any help anyone can provide! :-)
> John Peterson
>
|||Sorry, Jens -- I probably wasn't clear: I want one row per DossierUNID (not
RowUNID). You'll notice that the RowUNIDs are unique, and there are
multiple DossierUNIDs in the table. See my sample output with those queries
for the results that I'm hoping to achieve.
"Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote in
message news:evhI12YUFHA.612@.TK2MSFTNGP12.phx.gbl...
> OK, what about that ?
>
> Select RowUNID, MIN(ISNULL(SortOrder,-1))
> FROM #Education
> GROUP BY RowUNID
> HTH, Jens SUessmeyer.
> --
> http://www.sqlserver2005.de
> --
> "John Peterson" <j0hnp@.comcast.net> schrieb im Newsbeitrag
> news:eHadnvYUFHA.3140@.TK2MSFTNGP14.phx.gbl...
>
|||Ok, sorry messed up with your explanation, so you want one one per
DossierUNID, you have 4 different DossierUNIDs:
Select count(*), DossierUNID
From #Education
Group by DossierUNID
DossierUNID
-- --
2 5548776F-6DEE-4AEA-8CD4-4108D331BE63
4 8B359795-73CE-469C-B797-5AD55A9B023A
6 A552B76B-FCD7-4D7C-BE5E-AA17A730904B
3 CFD039FC-EAF4-4F0B-B6DB-CB143948CE26
You should get the result that query (or did i missed a thing in your
description)
Select DossierUNID, MIN(ISNULL(SortOrder,-1))
FROM #Education
GROUP BY DossierUNID
DossierUNID
-- --
5548776F-6DEE-4AEA-8CD4-4108D331BE63 -1
8B359795-73CE-469C-B797-5AD55A9B023A 1
A552B76B-FCD7-4D7C-BE5E-AA17A730904B 1
CFD039FC-EAF4-4F0B-B6DB-CB143948CE26 -1
Jens Suessmeyer.
"John Peterson" <j0hnp@.comcast.net> schrieb im Newsbeitrag
news:eJy2s7YUFHA.3176@.TK2MSFTNGP12.phx.gbl...
> Sorry, Jens -- I probably wasn't clear: I want one row per DossierUNID
> (not RowUNID). You'll notice that the RowUNIDs are unique, and there are
> multiple DossierUNIDs in the table. See my sample output with those
> queries for the results that I'm hoping to achieve.
>
> "Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote
> in message news:evhI12YUFHA.612@.TK2MSFTNGP12.phx.gbl...
>
|||Yes -- but now I want to somehow *relate* that result set to the appropriate
record, in toto, in the original table. If the SortOrder is duplicated, we
can pick one at random (though, ideally, I would want the one with the max
TStamp value).
"Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote in
message news:OZsVxFZUFHA.4056@.TK2MSFTNGP15.phx.gbl...
> Ok, sorry messed up with your explanation, so you want one one per
> DossierUNID, you have 4 different DossierUNIDs:
> --
> Select count(*), DossierUNID
> From #Education
> Group by DossierUNID
> DossierUNID
> -- --
> 2 5548776F-6DEE-4AEA-8CD4-4108D331BE63
> 4 8B359795-73CE-469C-B797-5AD55A9B023A
> 6 A552B76B-FCD7-4D7C-BE5E-AA17A730904B
> 3 CFD039FC-EAF4-4F0B-B6DB-CB143948CE26
> --
> You should get the result that query (or did i missed a thing in your
> description)
>
> Select DossierUNID, MIN(ISNULL(SortOrder,-1))
> FROM #Education
> GROUP BY DossierUNID
> DossierUNID
> -- --
> 5548776F-6DEE-4AEA-8CD4-4108D331BE63 -1
> 8B359795-73CE-469C-B797-5AD55A9B023A 1
> A552B76B-FCD7-4D7C-BE5E-AA17A730904B 1
> CFD039FC-EAF4-4F0B-B6DB-CB143948CE26 -1
>
> Jens Suessmeyer.
> "John Peterson" <j0hnp@.comcast.net> schrieb im Newsbeitrag
> news:eJy2s7YUFHA.3176@.TK2MSFTNGP12.phx.gbl...
>
|||ok, getting nearer ;-)
What about that one:
Select E.* from #Education AS E
INNER JOIN
(
Select DossierUNID, min(TStamp) TStamp,
MIN(SortOrder) SortOrder
FROM #Education
GROUP BY DossierUNID
) Subquery
ON Subquery.DossierUNID = E.DossierUNID AND
Subquery.TStamp = E.TStamp
"John Peterson" <j0hnp@.comcast.net> schrieb im Newsbeitrag
news:ejM70JZUFHA.2768@.tk2msftngp13.phx.gbl...
> Yes -- but now I want to somehow *relate* that result set to the
> appropriate record, in toto, in the original table. If the SortOrder is
> duplicated, we can pick one at random (though, ideally, I would want the
> one with the max TStamp value).
>
> "Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote
> in message news:OZsVxFZUFHA.4056@.TK2MSFTNGP15.phx.gbl...
>
|||I thought about something like that -- but I think that we're not able to
correlate the min TStamp and min SortOrder values. (That is, the min TStamp
might be referencing a row that has a larger SortOrder.)
That's the problem I'm having, in a nutshell. Is there any way to use the
GROUP BY clause in some "magic" way, that it will let me retain the original
row from which it came (either by RowUNID or TStamp)?
"Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote in
message news:%23ElenXZUFHA.3572@.TK2MSFTNGP12.phx.gbl...
> ok, getting nearer ;-)
>
> What about that one:
> Select E.* from #Education AS E
> INNER JOIN
> (
> Select DossierUNID, min(TStamp) TStamp,
> MIN(SortOrder) SortOrder
> FROM #Education
> GROUP BY DossierUNID
> ) Subquery
> ON Subquery.DossierUNID = E.DossierUNID AND
> Subquery.TStamp = E.TStamp
>
> "John Peterson" <j0hnp@.comcast.net> schrieb im Newsbeitrag
> news:ejM70JZUFHA.2768@.tk2msftngp13.phx.gbl...
>
|||I finally settled on:
select t.*
from [dbo].[#Education] as t with (nolock)
where exists (
select [TStamp] = max(e.[TStamp])
from [dbo].[#Education] as e with (nolock)
where e.[TStamp] = t.[TStamp]
group by e.[DossierUNID],
e.[SortOrder]
having max(e.[TStamp]) = t.[TStamp]
and isnull(e.[SortOrder], 0) <= all
(
select min(isnull(x.[SortOrder], 0))
from [dbo].[#Education] as x with (nolock)
where x.[DossierUNID] = e.[DossierUNID]
group by x.[DossierUNID]
)
)
It looks a little "ugly", but it was more performant that the other
alternatives that I came up with. Further, it avoided parallelization in
the query, which was problematic in this case.
"John Peterson" <j0hnp@.comcast.net> wrote in message
news:%23IiytcaUFHA.1508@.tk2msftngp13.phx.gbl...
>I thought about something like that -- but I think that we're not able to
>correlate the min TStamp and min SortOrder values. (That is, the min
>TStamp might be referencing a row that has a larger SortOrder.)
> That's the problem I'm having, in a nutshell. Is there any way to use the
> GROUP BY clause in some "magic" way, that it will let me retain the
> original row from which it came (either by RowUNID or TStamp)?
>
> "Jens Smeyer" <Jens@.Remove_this_For_Contacting.sqlserver2005.de> wrote
> in message news:%23ElenXZUFHA.3572@.TK2MSFTNGP12.phx.gbl...
>

No comments:

Post a Comment