Friday, March 23, 2012

How Can I obtain the Top 3 of a dataset

If I have say 20 records of sales employees, for example, how can I get the top 3 locations for $$$sales for EACH employee? Each employee can have multiple locations where they have sold(let's say up to 50). I only want the names of the top 3 locations. The closest I can get is filtering the dataset by a HAVING clause > a dollar amount but this still gives me between 3 - 12 records for each plus I have to literally enter each salesperson's number as it stands now. Is this a loop or a cursor? Thanks.

ddaveOnce you have your SQL set up to return the sales dollars for all of the locations, just add TOP 3 before the select:

SELECT TOP 3 Sum(Dollars) FROM MyTable;


Note that you can also SELECT TOP n PERCENT.|||General solution to your problem:

The TOP N clause in TSQL is useful, but is limited by the fact that it only applies to the entire recordset, and that it will not accept a variable as a parameter.

This method will return any requested number of records for each group of one or more column values in a dataset. It uses a type of join called a Theta join, where the values in two datasets are compared, but do not necessarily have to be equal:


declare @.N int
set @.N = 5 --The number of records to return for each grouping.
select YourTable.YourColumns
from YourTable
inner join YourTable ThetaTable
on YourTable.GroupColumns = ThetaTable.GroupColumns
and YourTable.SortColumn <= ThetaTable.SortColumn
group by YourTable.YourColumns
having count(*) <= @.N|||The TOP function does not work as blindman indicates as it gives me only the top 3 of 120+ records, not the top 3 for EACH person. I will try the Theta join. Thanks to both for replying.

ddave|||blindman, your queries are starting to look good :)

but the join should be outdented

use foo and bar if more than one column is intended (YourColumns alone does not show comma syntax)

and shorter aliases make it easier to read ;)select T.foo, T.bar
from YourTable as T
inner join YourTable as Th
on T.GroupColumns = Th.GroupColumns
and T.SortColumn <= Th.SortColumn
group by T.foo, T.bar
having count(*) <= n|||blindman, your queries are starting to look good :)

Looks can be deceiving...it doesn't work

USE Northwind
GO

SET NOCOUNT ON
CREATE TABLE myTable99(SiteId int, EmpId int, Sales money)
GO

INSERT INTO myTable99(SiteId, EmpId, Sales)
SELECT 1, 1, 10.00 UNION ALL
SELECT 2, 1, 15.00 UNION ALL
SELECT 3, 1, 20.00 UNION ALL
SELECT 4, 1, 50.00 UNION ALL
SELECT 5, 1, 10.00 UNION ALL
SELECT 6, 1, 5.00 UNION ALL
SELECT 1, 2, 100.00 UNION ALL
SELECT 2, 2, 1500.00 UNION ALL
SELECT 3, 2, 2000.00 UNION ALL
SELECT 4, 2, 5000.00 UNION ALL
SELECT 5, 2, 1000.00 UNION ALL
SELECT 6, 2, 500.00 UNION ALL
SELECT 1, 3, 1.00 UNION ALL
SELECT 2, 3, 1.50 UNION ALL
SELECT 3, 3, 2.00 UNION ALL
SELECT 4, 3, 5.00 UNION ALL
SELECT 5, 3, 1.00 UNION ALL
SELECT 6, 3, .50
GO

DECLARE @.N int
SET @.N = 3 --The number of records to return for each grouping.

SELECT a.SiteId, a.EmpId
FROM myTable99 a
JOIN myTable99 b
ON a.SiteId = b.SiteId
AND a.EmpId = b.EmpId
AND a.Sales < = b.Sales
GROUP BY a.SiteId, a.EmpId
HAVING COUNT(*) <= @.N
GO

SET NOCOUNT OFF
DROP TABLE myTable99
GO

I know I've seen this work somehow though|||At least to me, it is a lot more intuitively obvious to do this via a sub-query, something like: DECLARE @.N int
SET @.N = 3 --The number of records to return for each grouping.

SELECT a.SiteId, a.EmpId, a.Sales
FROM myTable99 a
WHERE (SELECT Count(*)
FROM myTable99 b
WHERE b.EmpId = a.EmpId
AND a.Sales <= b.Sales) <= @.N
ORDER BY a.EmpID, a.Sales DESC, a.SiteIDThe problem with any set-based solution to this kind of problem is that it does not deal well with "ties" in the data... They make your results a bit "funky", but I don't know any reliable way to resolve that using set based logic.

-PatP|||I know I've seen this work somehow thoughyeah, it's tricky

here's what you were searching for -- select a.EmpId
, a.SiteId
, a.sales
from myTable99 a
inner
join myTable99 b
on a.EmpId = b.EmpId
and a.Sales <= b.Sales
group
by a.EmpId
, a.SiteId
, a.sales
having count(*) <= 3
order
by a.EmpId
, a.sales descwhich produces the following (correct) results:1 4 50.0000
1 3 20.0000
1 2 15.0000

2 4 5000.0000
2 3 2000.0000
2 2 1500.0000

3 4 5.0000
3 3 2.0000
3 2 1.5000pat, i like the subquery method too, i guess i just got used to the join solution after so many times showing pre-4.1 mysql people how to do it

:) :)|||I indents 'em as I wants 'em, thank you! :)|||Hey thanks guys...

And yes Pat, I agree it may be more intuitive...but have a look at the plans...it looks like Rudy's Join is more effecient...even with the group by.

I would not have guessed this. Anyone care if I blog this? With appropriate references of course.

It's just one of things I know can be done...and I forget how to contruct it.

Anyway, here's the plans|||blog away

:cool: :cool: :cool:|||OK, You've all been Blogged (http://weblogs.sqlteam.com/brettk/archive/2005/02/10/4153.aspx)|||OK, You've all been Blogged (http://weblogs.sqlteam.com/brettk/archive/2005/02/10/4153.aspx)ta very much

and usually i just ignore your frequent typos, but in this case, i must insist that you correct the spelling of my surname

thanks in advance|||mia culpa...done|||mia culpa...donethanks

and now, would you mind changing the link so that it points to the correct person

:) :)|||Done...it's been a rough week...they've got me doing project plans...aaaahhhhhhhhh|||"The problem with any set-based solution to this kind of problem is that it does not deal well with "ties" in the data... They make your results a bit "funky", but I don't know any reliable way to resolve that using set based logic."

That is the catching point. I have 20 people, for each of whom I wish to get the "TOP 3" records. When I have a "tie" it fails, giving me only 2 of the 3 desired records.

I inserted a couple of extra records to the table given above by:

INSERT INTO myTable99(SiteId, EmpId, Sales)
SELECT 2, 2, 15.00
SELECT 2, 2, 1500.00

The 1500.00 will create a tie with another record, already having a value of 1500.00.

Here are all the records:

select *
from myTable99

Results:
1 1 10.0000
2 1 15.0000
3 1 20.0000
4 1 50.0000
5 1 10.0000
6 1 5.0000
1 2 100.0000
2 2 1500.0000
3 2 2000.0000
4 2 5000.0000
5 2 1000.0000
6 2 500.0000
1 3 1.0000
2 3 1.5000
3 3 2.0000
4 3 5.0000
5 3 1.0000
6 3 .5000
2 2 1500.0000
2 2 15.0000

When I run your query:

select a.EmpId
, a.SiteId
, a.sales
from myTable99 a
inner
join myTable99 b
on a.EmpId = b.EmpId
and a.Sales <= b.Sales
group
by a.EmpId
, a.SiteId
, a.sales
having count(*) <= 3
order
by a.EmpId
, a.sales desc

My results are here:

1 4 50.0000
1 3 20.0000
1 2 15.0000
2 4 5000.0000
2 3 2000.0000
3 4 5.0000
3 3 2.0000
3 2 1.5000

I need Employee 2 to show at least one of his two 1500.00 records. Any way to do this?

ddave|||This can still be done with a set-based solution:

select a.EmpId,
a.SiteId,
a.sales
from myTable99 a
inner join myTable99 b
on a.EmpId = b.EmpId
and (a.Sales < b.Sales
or a.Sales = b.Sales and a.EmpID < b.EmpID)
group by a.EmpId,
a.SiteId,
a.sales
having count(*) <= 3
order by a.EmpId,
a.sales desc

Output:

EmpId SiteId sales
---- ---- -------
1 3 20.0000
1 2 15.0000
1 1 10.0000
1 5 10.0000
2 3 2000.0000
2 2 1500.0000
2 5 1000.0000
3 3 2.0000
3 2 1.5000
3 1 1.0000
3 5 1.0000|||No kidding...

SELECT a.EmpId, a.SiteId, a.sales
FROM (SELECT DISTINCT EmpId, SiteId, sales FROM myTable99) a
INNER JOIN (SELECT DISTINCT EmpId, SiteId, sales FROM myTable99) b
ON a.EmpId = b.EmpId
AND a.Sales <= b.Sales
GROUP BY a.EmpId, a.SiteId, a.sales
HAVING COUNT(*) <= @.N
ORDER BY a.EmpId, a.sales desc
GO|||without getting into the actual sql, the problem of ties isn't really a problem of sql, it's more of a problem of semantics

blindman's last example is perfect --

3 3 2.0000
3 2 1.5000
3 1 1.0000
3 5 1.0000

as far as i'm concerned, a tie across the last place requires that all rows with that value be included

let's say we had 25 people in a classroom, and we wanted the top 3 students based on marks

joe has a gpa of 3.90, mary has a gpa of 3.85, and all twenty-three others have exactly the same gpa, 3.80

who ya gonna exclude?

No comments:

Post a Comment