Friday, February 24, 2012

How can I get modified data using timestamp columns

I am putting together an SQL script that is pulling recently modified data from 3 tables and INSERTing that data into another table.

All 3 of my input tables have a timestamp column and I have the previous values for these 3 timestamp columns at the time my SQL script was run previously. So, using the timestamp column values that I had from the previous run of my SQL script and the current timestamp columns that exist in my 3 tables, I am able to derive any recently modified rows.

So, here are my 3 input tables:

Items (has a timestamp column) and has several million rows.

Attributes1 (has a timestamp column) and has a million rows.

Attributes2 (has a timnestamp column) and has a million rows.

The Attributes1 and Attributes2 tables have attributes that describe the items in the Items table. I want to INSERT the Items rows with all of their attributes into a fourth table (that doesn't need a timestamp column).

The kicker is if any attribute changes in the Attributes1 and/or Attributes2 tables, I want to completely resummarize the entire item in the fourth table.

So, I have 3 INSERT/SELECTs in my SQL Script so that I can pickup any combination of modified data in my 3 input tables.

INSERT INTO Table4

.......

SELECT

.....

FROM Items

LEFT OUTER JOIN Attributes1 ...

LEFT OUTER JOIN Attributes2 ...

WHERE Items.TimestampColumn BETWEEN a AND b

INSERT INTO Table4

.......

SELECT

.....

FROM Items, Attributes1

LEFT OUTER JOIN Attributes1 ...

LEFT OUTER JOIN Attributes2 ...

WHERE Attributes1.TimestampColumn BETWEEN c AND d

AND (the Items row is not already in Table4)

INSERT INTO Table4

.......

SELECT

.....

FROM Items, Attributes2

LEFT OUTER JOIN Attributes1 ...

LEFT OUTER JOIN Attributes2 ...

WHERE Attributes2.TimestampColumn BETWEEN e AND f

AND (the Items row is not already in Table4)

This SQL takes a whole long time to run (more than an hour).

I would like to consense my SQL into a single INSERT/SELECT.

Does anybody know of an SQL technique that I haven't thought of...

TIA

Will this work:

INSERT INTO Table4

.......

SELECT

.....

FROM Items, Attributes1

LEFT OUTER JOIN Attributes1 ...

LEFT OUTER JOIN Attributes2 ...

WHERE (Attributes1.TimestampColumn BETWEEN a AND b

OR Attributes1.TimestampColumn BETWEEN c AND d

OR Attributes1.TimestampColumn BETWEEN e AND f)

AND (the Items row is not already in Table4)

Alternatively, you could try doing a UNION on the select statements to get them into one derived table (if the items table is empty before the first query is run, you can drop the NOT EXISTS as the UNION will remove duplicate rows), and then insert into the table in one go from the derived table (UNION statement). The latter may be quicker if the items table is empty to begin with.

No comments:

Post a Comment