Monday, March 12, 2012

How can I keep only the top ten rows and delete the rest

I have a table that I would like to only keep the top 10 rows for each username. How can I kep the top 10 and delete the rest?

You could do something like

DELETE FROM <table> where ID not in (SELECT TOP 10 id from <table>)

So if the results were 1-10 any row with an id column > 10 would be deleted.

|||

Delete MainTable where id not in (select top 10 id from MainTable)

I assumed that "id" is the primary key column

|||

Doug,

That helped, however, it is more complicated than that. Heres the summary; I have a table that keeps track of the clients records that users open. It puts the username, clientid, lastviewed (datetime), and the crvid (key) into table tblclientlastviewed. I want to periodically remove all but the top ten rows based on clientid for each username. The user could open a client several times in a day, so I don't want ten entries for the same client for that user. I can easily correct this in the UI by using a group by in the query. But the removing of the extra rows in the table is much more challenging.

Do you have any ideas?

|||

You DONT want to use IN.

You can get the top 10 rows into some temp table, empty the original table (you can use TRUNCATE if you dont have identity identity columns or resetting identity columns does not cause problems), then get the 10 rows back into this table from temp table.

|||

If I understand your question, I would actually do this differently than the other posts. I'm not sure what you mean by top 10, but I assume you want the last 10 rows to exist -- to only keep 10 rows at a time. If this is the case I would use a SQL trigger. Basically at any given time you want 10 rows max per user. So on Insert you would use a SQL trigger with the same SQL that others have posted to delete the other rows.

edit:
Seems as if this comment grew stale as it sat on my screen. by the time i posted it it wasn't really relevant anymore... sorry

|||

Are you looking for a single SQL statement? It seems like it would be an easy thing to handle with a TSQL script:

1. Create a cursor for the users:SELECT username, clientid, count(*) FROM table GROUP BY username, clientid HAVING count(*) > 10

2. Loop thru each user/client, and holdSELECT TOP 10 crvid FROM table WHERE username = @.username AND clientid = @.clientid ORDER BY lastviewed DESC in a temp table or table variable

3. Delete records for each user/client not in the holding table

It's not elegant, but it would get the job done.

|||

CURSORS/TRIGGERS ?

God!!! Save me !!! right now !!!

|||

This is actually quite an advanced query.

Yes, you could use IN queries, especially if you only want to perform the query for one user.

DELETE FROM tblclientlastviewed WHERE clientid=@.clientid AND crvid NOT IN (SELECT TOP 10 crvid FROM tblclientlastviewed WHERE clientid=@.clientid ORDER BY crvid DESC)

This query could of course be executed in a cursor iterating over all the users. It's not beautiful, but easy to understand and maintain.

A third option is to use a smarter query. In the example below I've used a table variable to hold the values that should remain, in order to be able to truncate the table (more efficient than deleting individual rows).

Assuming that crvid is used as an identity. I strongly recommend executing this in a transaction

DECLARE @.remainder TABLE
(
clientid int,
username varchar(50),
lastviewed datetime
)

INSERT INTO @.remainder (clientid,username,lastviewed)
SELECT clientid,username,lastviewed FROM tblclientlastviewed t1
WHERE (SELECT COUNT(*) FROM tblclientlastviewed t2 WHERE t1.clientid=t2.clientid AND t2.crvid>t1.crvid)<10

TRUNCATE TABLE tblclientlastviewed

INSERT INTO tblclientlastviewed (clientid,username,lastviewed)
SELECT clientid,username,lastviewed FROM @.remainder

It would alse be possible to do it in one query, but if there's a lot of data, this may be very inefficient.

DELETE FROM tblclientlastviewed
WHERE crvid NOT IN (
SELECT crvid FROM tblclientlastviewed t1
WHERE (SELECT COUNT(*) FROM tblclientlastviewed t2 WHERE t1.clientid=t2.clientid AND t2.crvid>t1.crvid)<10
)

|||

I think what I will try and do is to:

get the rows into a temp table that I want to keep, per user

remove all rows for that user

insert back into the table what is in the temp table.

as suggested.

I don't know what is the best practise for this.

|||

I agree with gunteman.

You can schedule a job that runs every X hours/minutes and cleans up the table.

|||

Except that TRUNCATE statements aren't transactional, kudos togunteman for giving so many options

|||

doyleits:

Except that TRUNCATE statements aren't transactional

Good catch! Indeed they aren't. Oh well, the strategy should be quite performant even with a delete operation.

|||

So does this mean that a job would be better than a trigger? If yes, how can I create a job using Management Studio Express?

|||

Jackxxx:

So does this mean that a job would be better than a trigger? If yes, how can I create a job using Management Studio Express?

Yes, Triggers are not used for this purpose.In your Object Explorer, go to SQL Server Agent -> Jobs -> Right click -> New Job.

No comments:

Post a Comment