Wednesday, March 21, 2012

How can I merge Identity tables?

Ok,
I have tables t1, t2, t3 and t4. I also have identical tables in a
CopyOfFirstDataBase, except this one contains older data that needs to
combine with the newer. It's like:
DATABASE1 --> DATABASE2
[1995 through 2003] [2004 through Present]
Both are structured identically. But, they both have Identity
fields. Each row in t1 through t4 is really one record relationally
and needs to remain so when merged.
Of course, there is already in the Identity columns records 1 through
whatever the last one is in both sets of the t1 - t4 1995 and records 1
through whatever in t1 - t4 2004 to present.
I don't care what the Identity numbers are, as long as I remain with
all records in tact. Any help is appreciated.
Thanks,
TrintI'd suggest you add a suitable negative offset to the identity values in
DATABASE1. Assuming your tables have fewer than 1000000000 rows,
set identity_insert t1 on
go
insert into DATABASE2..t1(identCol, dataCol1, dataCol2...)
select
-1000000000 + DATABASE1..t1.identCol,
dataCol1,
dataCol2,
...
go
set identity_insert t1 off
go
set identity_insert t2 on
go
-- repeat for t2, then t3, then t4
Steve Kass
Drew University
trint wrote:

>Ok,
>I have tables t1, t2, t3 and t4. I also have identical tables in a
>CopyOfFirstDataBase, except this one contains older data that needs to
>combine with the newer. It's like:
> DATABASE1 --> DATABASE2
>[1995 through 2003] [2004 through Present]
>Both are structured identically. But, they both have Identity
>fields. Each row in t1 through t4 is really one record relationally
>and needs to remain so when merged.
>Of course, there is already in the Identity columns records 1 through
>whatever the last one is in both sets of the t1 - t4 1995 and records 1
>through whatever in t1 - t4 2004 to present.
>I don't care what the Identity numbers are, as long as I remain with
>all records in tact. Any help is appreciated.
>Thanks,
>Trint
>
>|||You can use the proprietary IDENTITY_INSERT extension to preserve your
problem.
The real problems are that you do not know that rows are not records
and columns are not fields,and that IDENTITY cannot be a relational
key. You are still using the mindset and terminology of a sequential
file system -- a magnetic tape merge to be exact. The right answer is
to find a relational in the data and use it, so you do not keep
mimicing 1950's technology in SQL.|||> The real problems are that you do not know that rows are not records
> and columns are not fields,and that IDENTITY cannot be a relational
> key. You are still using the mindset and terminology of a sequential
> file system -- a magnetic tape merge to be exact. The right answer is
> to find a relational in the data and use it, so you do not keep
> mimicing 1950's technology in SQL.
Tilt that windmill Don Celko.
Thomas|||An identity column is an acceptable candidate for a primary key, especially
if the natural key is composed of several columns and thus very wide.
Another problem with natural keys is that things like last names, zip codes,
and even social security numbers tend are subject to change.
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1122296478.891875.17300@.g14g2000cwa.googlegroups.com...
> You can use the proprietary IDENTITY_INSERT extension to preserve your
> problem.
> The real problems are that you do not know that rows are not records
> and columns are not fields,and that IDENTITY cannot be a relational
> key. You are still using the mindset and terminology of a sequential
> file system -- a magnetic tape merge to be exact. The right answer is
> to find a relational in the data and use it, so you do not keep
> mimicing 1950's technology in SQL.
>|||>> An identity column is an acceptable candidate for a primary key, .. <<
Can you give me one authority for that statement? Damn that Dr. Codd!
Where did he and the last 30+ years of RDBMS definitions and theory
screw up? Now that you got it right, why not publish a paper with all
the formal proofs? By definition, a relational key is a subset of
attributes, not the internal state of the hardware in which the data is
stord.
And the answer to all questions is 42 when the numbrew get too large
for you to compute correctly? Has it ever occured to you that if you
have a (n) column natural key, then you HAVE to guarantee its
uniqueness? Well, only if you want to have data integrity, to avoid
redundant storage, and have validationa dn verification of your data.
But that would mean learning RDBMS, taking the time to do it right and
all those other anoiding things that professionals programmers do.
But, a "cowboy coder" jsut needs to pop IDENTITY on every table and
start coding.
LOL! Did yoiu notice that the problem in this thread was that IDENTITY
has complete duplicates for entiites in the **same schema**' Again,
look at the definition of a key. It must always refer to the same
entity in the same schema or (better) in the entire universe of
discourse (VIN) .
This non-key changes in the schema and no meaning in the Universe as
well as having no way to verify or validate it. What should have been
a simple INSERT INTO has to use proprietary kludges, special functions
and when the dat i smoved, it still will have no data integrity.
One of the advantages of an industry standard is that when it changes,
it changes everywhere, not just on one local machine. Have you looked
at the changes in retail with the UPC codes? Do you think that these
changes will destroy the Retail Industry? Do you think that Retail
would work better if every POS computer in the world used an IDENTITY?
You don't know what a key is, and you don't know what a surrogate key
is. Please do some reading before Ishow up at your job to try to save
a screwed up database at an insanely high per-day rate. Wait a minute,
what am I saying' :)
.|||Add a uniqueidentifier column populate all rows in all existing DB's with
newid()
Update t1
Set GUID = NEWID()
etc.
Once you've done that, peform an insert into your t1 and let SQL create new
identity values.
OK, here's the key to the system.
Now if you do the following SQL statements...
Select
a.Identity As OLDID,
b.Identity as NEWID
From
OldDB..t1 a
Join NewDB..t1 b on a.GUID = b.GUID
that way you can match the old ID to the new ID.
Then you can use this as part of a lookup so that when you perform the
insert for T2,3 and 4 you can substitute the new ID.
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1122312065.439046.68720@.g43g2000cwa.googlegroups.com...
> Can you give me one authority for that statement? Damn that Dr. Codd!
> Where did he and the last 30+ years of RDBMS definitions and theory
> screw up? Now that you got it right, why not publish a paper with all
> the formal proofs? By definition, a relational key is a subset of
> attributes, not the internal state of the hardware in which the data is
> stord.
>
> And the answer to all questions is 42 when the numbrew get too large
> for you to compute correctly? Has it ever occured to you that if you
> have a (n) column natural key, then you HAVE to guarantee its
> uniqueness? Well, only if you want to have data integrity, to avoid
> redundant storage, and have validationa dn verification of your data.
> But that would mean learning RDBMS, taking the time to do it right and
> all those other anoiding things that professionals programmers do.
> But, a "cowboy coder" jsut needs to pop IDENTITY on every table and
> start coding.
>
> LOL! Did yoiu notice that the problem in this thread was that IDENTITY
> has complete duplicates for entiites in the **same schema**' Again,
> look at the definition of a key. It must always refer to the same
> entity in the same schema or (better) in the entire universe of
> discourse (VIN) .
> This non-key changes in the schema and no meaning in the Universe as
> well as having no way to verify or validate it. What should have been
> a simple INSERT INTO has to use proprietary kludges, special functions
> and when the dat i smoved, it still will have no data integrity.
> One of the advantages of an industry standard is that when it changes,
> it changes everywhere, not just on one local machine. Have you looked
> at the changes in retail with the UPC codes? Do you think that these
> changes will destroy the Retail Industry? Do you think that Retail
> would work better if every POS computer in the world used an IDENTITY?
> You don't know what a key is, and you don't know what a surrogate key
> is. Please do some reading before Ishow up at your job to try to save
> a screwed up database at an insanely high per-day rate. Wait a minute,
> what am I saying' :)
> .
>|||Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.programming:541372
Hi Joe,
Your message, like many of your other anti-IDENTITY messages, uses many
terribly bad arguments. And I think that's a shame. You know, you might
well have a very valid point, in saying that IDENTITY and other methods
for generating meaningless surrogate keys have no plcae in a relational
database - but the arguments you use to defend this case are so flawed
that they fail to convince me.
I've been a strong defender of natural keys for a long time. Lately,
I've seen some very convincing arguments from people in favor of
surrogate keys. This has caused a shift in opinion - I am now in favor
of judging each individual situation to decide whether or not to use a
surrogate key.
I'm open to new arguments that might have me return to the "natural keys
only" point of view. But they have to be better than what you'e writing
ion the groups lately. I'll illustrate this by showing how flawed the
arguments in this particular message are.
On 25 Jul 2005 10:21:05 -0700, --CELKO-- wrote:

> Can you give me one authority for that statement?
How about Dr. Codd? "..Database users may cause the system to generate
or delete a surrogate, but they have no control over its value, nor is
its value ever displayed to them ..." (CODD 1979, pp 409-410)
From: Codd, E. (1979), Extending the database relational model to
capture more meaning. ACM Transactions on Database Systems, 4(4). pp.
397-434.

> Damn that Dr. Codd!
> Where did he and the last 30+ years of RDBMS definitions and theory
>screw up?
Could you please tell me which of Codd's 12 rules prohibits the use of
IDENTITY as a primary key, and why? I fail to see a violation when I
check the rules, so please enlighten me.

> Now that you got it right, why not publish a paper with all
>the formal proofs?
Cynicism != arguments (or, in ANSI: cynicism <> arguments).

> By definition, a relational key is a subset of
>attributes, not the internal state of the hardware in which the data is
>stord.
A relational key, yes. But how does that definition preclude the use of
a surrogate key in addition to the natural key (which is, indeed, a
subset of attributes).
Oh and by the way - an IDENTITY is in no way dependent on the internal
state of the hardware. If you save a script with SQL commands that
create a table and populate it with data, then you can run it on as many
different machines as you wish, you'll always get the same identity
values. Not that it matters (since the value of a surrogate key is
irrelevant) - but these kind of flagrant misunderstandings about how
IDENTITY works seriously weaken your credibility, and as a result weaken
your other arguments as well.

>And the answer to all questions is 42 when the numbrew get too large
>for you to compute correctly?
I'm always in for references to Douglas Adams, but in this case I fail
to see how your remark relates to the quote directly above it.

> Has it ever occured to you that if you
>have a (n) column natural key, then you HAVE to guarantee its
>uniqueness? Well, only if you want to have data integrity, to avoid
>redundant storage, and have validationa dn verification of your data.
Has it ever occured to you that one single UNIQUE constraint *will*
guarantee the uniqueness of said (n) column natural key, even if the
table sports a surrogate primary key as well?

>But that would mean learning RDBMS, taking the time to do it right and
>all those other anoiding things that professionals programmers do.
Cynicism != arguments (or, in ANSI: cynicism <> arguments).

>But, a "cowboy coder" jsut needs to pop IDENTITY on every table and
>start coding.
Yup, you're right. Cowboy coders abuse the IDENTITY property, and later
they (or rather: their employers and customers) come to regret it. So
let's all try to teach these cowboys to use the tools properly, instead
of trying to keep other codes from actually using the tools for their
intended purpose.
There are many "cowboy drivers" who abuse their cars: they drive drunk,
neglect traffic lights and speed limits and run over little children. Do
you now think that everybody should be forbidden to use cars?

>LOL! Did yoiu notice that the problem in this thread was that IDENTITY
>has complete duplicates for entiites in the **same schema**'
Yes, the original post in this thread is a fine example of bad design.
It's a variation of the design flaw known as "attribute splitting".
Allthough there might be historical reasons why the situation has grown
to it's current situation, there's no denying that the current situation
is not something you or I would design.
The good news is that the original poster tries to fix it, instead of
using a kludge that would really destroy the database. The other good
news is that fixing this particular problems (two different entities
have the same surrogate key value) is not quite as hard as fixing a
similar problem I've seen with natural keys: two different entities
having the same *natural* key value!

> Again,
>look at the definition of a key. It must always refer to the same
>entity in the same schema or (better) in the entire universe of
>discourse (VIN) .
It seems that you keep forgetting that a surrogate key is a surrogate
for something - for the natural key. The natural key (that is included
in the main table, and constrained by a UNIQUE constraint) refers to the
same entity in the same schema; the surrogate key is used to refer to
that row from other places.

>One of the advantages of an industry standard is that when it changes,
>it changes everywhere, not just on one local machine. Have you looked
>at the changes in retail with the UPC codes? Do you think that these
>changes will destroy the Retail Industry? Do you think that Retail
>would work better if every POS computer in the world used an IDENTITY?
And one of the advantages of using a surrogate key is that changes to
the industry standard take far less downtime - none if things are
properly prepared.

>You don't know what a key is, and you don't know what a surrogate key
>is. Please do some reading before Ishow up at your job to try to save
>a screwed up database at an insanely high per-day rate. Wait a minute,
>what am I saying' :)
I don't know what the original poster does or doesn't know, so let's
forget the ad-hom. But please show me how a design like below would
require your insanely expensive intervention. How would the use of the
IDENTITY property screw up my integrity in this case?
CREATE TABLE Products (ProductID int NOT NULL IDENTITY PRIMARY KEY,
UPC char(10) NOT NULL UNIQUE,
-- other columns,
);
CREATE TABLE Orders (OrderNo int NOT NULL PRIMARY KEY,
-- other columns,
);
CREATE TABLE OrderProducts (ProductID int NOT NULL,
OrderNo int NOT NULL,
Amount int NOT NULL CHECK (Amount > 0),
-- other columns,
PRIMARY KEY (ProductID, OrderNo),
FOREIGN KEY (ProductID)
REFERENCES Products(ProductID),
FOREIGN KEY (OrderNum)
REFERENCES Orders(OrderNum),
);
I'm sorry for the long-winded reply. I've been saving this up for quite
some time now. Even back when I was still in the natural key only camp,
I often found myself grinding my teeth, asking myself why you did such a
lousy job of defending the Right Way To Do It. When I started getting
doubts, I found myself reading youyr messages, hoping to find some good
arguments not to change my mind - but I only saw the same flawed
arguments over and over again. Often, I started to write a message - and
just as often, I held myself back. Asking myself: "Why bother?"
This time, I won't hold back. I still think that you might be defending
a very good point. But you do need to supply better arguments. If you
can't - well, I guess that I'll than have to conclude that I was right
in changing my mind about surrogate keys...
Awaiting your reply!
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||"Thomas Coleman" <replyingroup@.anywhere.com> wrote in message
news:uotSpGTkFHA.3756@.TK2MSFTNGP15.phx.gbl...
> Tilt that windmill Don Celko.
>
> Thomas
>
Snipe that Celko, Thomas.
Jeremy|||Hugo Kornelis wrote:
> I don't know what the original poster does or doesn't know, so let's
> forget the ad-hom. But please show me how a design like below would
> require your insanely expensive intervention. How would the use of the
> IDENTITY property screw up my integrity in this case?
> CREATE TABLE Products (ProductID int NOT NULL IDENTITY PRIMARY KEY,
> UPC char(10) NOT NULL UNIQUE,
> -- other columns,
> );
> CREATE TABLE Orders (OrderNo int NOT NULL PRIMARY KEY,
> -- other columns,
> );
> CREATE TABLE OrderProducts (ProductID int NOT NULL,
> OrderNo int NOT NULL,
> Amount int NOT NULL CHECK (Amount > 0),
> -- other columns,
> PRIMARY KEY (ProductID, OrderNo),
> FOREIGN KEY (ProductID)
> REFERENCES Products(ProductID),
> FOREIGN KEY (OrderNum)
> REFERENCES Orders(OrderNum),
> );
>
--BEGIN PGP SIGNED MESSAGE--
Hash: SHA1
Hugo,
Always enjoy your posts. How about this for your example - I'll use
natural keys instead of IDENTITY keys (I admit I use IDENTITY alot in my
DBs, though I don't set them as PKs. Set them as UNIQUE instead):
CREATE TABLE Products (UPC char(10) NOT NULL PRIMARY KEY,
-- other columns,
);
CREATE TABLE Orders (OrderNo int NOT NULL IDENTITY UNIUQE,
order_date DATETIME NOT NULL,
customer_id int NOT NULL,
-- other columns,
PRIMARY KEY (order_date, customer_id)
);
The PK in Orders assumes only 1 order per customer per date (just for
example's sake) ;-). OrderNo acts as a surrogate key (as I understand
surrogates).
CREATE TABLE OrderProducts (UPC char(10) NOT NULL
FOREIGN KEY REFERENCES Products (UPC)
ON DELETE CASCADE
ON UPDATE CASCADE,
OrderNo int NOT NULL
FOREIGN KEY REFERENCES Orders(OrderNo)
ON DELETE CASCADE
ON UPDATE CASCADE,
OrderAmount int NOT NULL CHECK (Amount > 0),
-- other columns,
PRIMARY KEY (UPC, OrderNo),
);
The reason I use OrderNo as a surrogate key is to avoid this:
CREATE TABLE OrderProducts (UPC char(10) NOT NULL
FOREIGN KEY REFERENCES Products (UPC)
ON DELETE CASCADE
ON UPDATE CASCADE,
order_date DATETIME NOT NULL
FOREIGN KEY REFERENCES Orders
ON DELETE CASCADE
ON UPDATE CASCADE,
customer_id int NOT NULL
FOREIGN KEY REFERENCES Orders
ON DELETE CASCADE
ON UPDATE CASCADE,
OrderAmount int NOT NULL CHECK (Amount > 0),
-- other columns,
PRIMARY KEY (UPC, order_date, customer_id),
);
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
--BEGIN PGP SIGNATURE--
Version: PGP for Personal Privacy 5.0
Charset: noconv
iQA/ AwUBQuVX1IechKqOuFEgEQJs2QCfVP9Cy7NKU6YR
T96fVlZ/uN71dSQAn2OL
KN52zWrCb83lQKGiTvlJuBq5
=voUL
--END PGP SIGNATURE--

No comments:

Post a Comment