Tuesday, 1 October 2013

Why do you store a GUID in your database?

I remember way back when I 1st came across a database that used GUIDs to identify uniqueness,

Well today I don't see the point in storing the GUID.

A GUID is 128bits and displayed as a 32 byte hexadecimal string with an additional 4 bytes in separators displayed as Hyphens.

Most of the "Stored" version of a GUID that I have come across  is in a 36 Byte Unicode string and this is just because the Warehouse architect didn't care about how much data he loaded on the warehouse or its client.

Lets do some number crunching:

So lets say that you have 100 million rows in a table, the table has a row GUID and a Related GIUD (Forging Key) both stored in an NVARCHAR(36) column, this means that each row consumes 72 Bytes of data and that means that the table consumes a total of 6.7GB of data in identities alone.... so here is my suggestion, how about we use the GUID only as a seed in a hash and convert the hash to a Big INT IE:

 This way we cut down the Stored GUID from 36 bytes to 8 Bytes and thus reducing the size of stored identities by a factor of 4.5 times, in our example above that would bring the space consumed down to 1.4GB.

If you now think that hold on how do I trace back to my source data (if you convert a system GUID (SourceKey) to a hashed BIG INT) well that is simple since you know how you seed the hash you can always calculate it and find it in your source and warehouse.

Happy hunting!

No comments:

Post a Comment