小开

I work with a student management system which uses UUID's in the form of an integer. They have a table which hold the next unique ID.

Although this is probably a good idea for an architectural point of view, it makes working with on a daily basis difficult. Sometimes there is a need to do bulk inserts and having a UUID makes this very difficult, usually requiring writing a cursor instead of a simple SELECT INTO statement.

小开

I think using a GUID would be the better choice in your situation. It takes up more space but it's more secure.

小开

最佳答案

I can't say about the web side of your question. But uuids are great for n-tier applications. PK generation can be decentralized: each client generates it's own pk without risk of collision. And the speed difference is generally small.

Make sure your database supports an efficient storage datatype (16 bytes, 128 bits). At the very least you can encode the uuid string in base64 and use char(22).

I've used them extensively with Firebird and do recommend.

小开

I can answer you that in SQL server if you use a uniqueidentifier (GUID) datatype and use the NEWID() function to create values you will get horrible fragmentation because of page splits. The reason is that when using NEWID() the value generated is not sequential. SQL 2005 added the NEWSEQUANTIAL() function to remedy that

One way to still use GUID and int is to have a guid and an int in a table so that the guid maps to the int. the guid is used externally but the int internally in the DB

for example

457180FB-C2EA-48DF-8BEF-458573DA1C10    1
9A70FF3C-B7DA-4593-93AE-4A8945943C8A    2

1 and 2 will be used in joins and the guids in the web app. This table will be pretty narrow and should be pretty fast to query

小开

I don't think a GUID gives you many benefits. Users hate long, incomprehensible URLs.

Create a shorter ID that you can map to the URL, or enforce a unique user name convention (http://example.com/user/brianly). The guys at 37Signals would probably mock you for worrying about something like this when it comes to a web app.

Incidentally you can force your database to start creating integer IDs from a base value.

小开

For what it's worth, I've seen a long running stored procedure (9+ seconds) drop to just a few hundred milliseconds of run time simply by switching from GUID primary keys to integers. That's not to say displaying a GUID is a bad idea, but as others have pointed out, joining on them, and indexing them, by definition, is not going to be anywhere near as fast as with integers.

小开

Rather than URLs like this:

http://example.com/user/783

Why not have:

http://example.com/user/yukondude

Which is friendlier to humans and doesn't leak that tiny bit of information?

小开

You could use an integer which is related to the row number but is not sequential. For example, you could take the 32 bits of the sequential ID and rearrange them with a fixed scheme (for example, bit 1 becomes bit 6, bit 2 becomes bit 15, etc..).
This will be a bidirectional encryption, and you will be sure that two different IDs will always have different encryptions.
It would obviously be easy to decode, if one takes the time to generate enough IDs and get the schema, but, if I understand correctly your problem, you just want to not give away information too easily.

小开

I think that this is one of these issues that cause quasi-religious debates, and its almost futile to talk about. I would just say use what you prefer. In 99% of systems it will no matter which type of key you use, so the benefits (stated in the other posts) of using one sort over the other will never be an issue.

小开

We use GUIDs as primary keys for all our tables as it doubles as the RowGUID for MS SQL Server Replication. Makes it very easy when the client suddenly opens an office in another part of the world...

小开

It also depends on what you care about for your application. For n-tier apps GUIDs/UUIDs are simpler to implement and are easier to port between different databases. To produce Integer keys some database support a sequence object natively and some require custom construction of a sequence table.

Integer keys probably (I don't have numbers) provide an advantage for query and indexing performance as well as space usage. Direct DB querying is also much easier using numeric keys, less copy/paste as they are easier to remember.

小开

Why couple your primary key with your URI?

Why not have your URI key be human readable (or unguessable, depending on your needs), and your primary index integer based, that way you get the best of both worlds. A lot of blog software does that, where the exposed id of the entry is identified by a 'slug', and the numeric id is hidden away inside of the system.

The added benefit here is that you now have a really nice URL structure, which is good for SEO. Obviously for a transaction this is not a good thing, but for something like stackoverflow, it is important (see URL up top...). Getting uniqueness isn't that difficult. If you are really concerned, store a hash of the slug inside a table somewhere, and do a lookup before insertion.

edit: Stackoverflow doesn't quite use the system I describe, see Guy's comment below.

小开

As long as you use a DB system with efficient storage, HDD is cheap these days anyway...

I know GUID's can be a b*tch to work with some times and come with some query overhead however from a security perspective they are a savior.

Thinking security by obscurity they fit well when forming obscure URI's and building normalised DB's with Table, Record and Column defined security you cant go wrong with GUID's, try doing that with integer based id's.

小开

I've tried both in real web apps.

My opinion is that it is preferable to use integers and have short, comprehensible URLs.

As a developer, it feels a little bit awful seeing sequential integers and knowing that some information about total record count is leaking out, but honestly - most people probably don't care, and that information has never really been critical to my businesses.

Having long ugly UUID URLs seems to me like much more of a turn off to normal users.

小开

YouTube uses 11 characters with base64 encoding which offers 11^64 possibilities, and they are usually pretty manageable to write. I wonder if that would offer better performance than a full on UUID. UUID converted to base 64 would be double the size I believe.

More information can be found here: https://www.youtube.com/watch?v=gocwRvLhDf8

小开

Pros and Cons of UUID

Note: uuid_v7 is time based uuid instead of random. So you can use it to order by creation date and solve some performance issues with db inserts if you do really many of them.

Pros:

can be generated on api level (good for distributed systems)
hides count information about entity
doesn't have limit 2,147,483,647 as 32-bit int
removes layer of errors related to passing one entity id userId: 25 to get another bookId: 25 accidently
more friendly graphql usage as ID key

Cons:

128-bit instead 32-bit int (slightly bigger size in db and ~40% bigger index, around ~30MB for 1 million rows), should be a minor concern
can't be sorted by creation (can be solved with uuid_v7)
non-time-ordered UUID versions such as UUIDv4 have poor database index locality (can be solved with uuid_v7)

URL usage

Depending on app you may care or not care about url. If you don't care, just use uuid as is, it's fine.

If you care, then you will need to decide on url format.

Best case scenario is a use of unique slug if you ok with never changing it:

http://example.com/sale/super-duper-phone

If your url is generated from title and you want to change slug on title change there is a few options. Use it as is and query by uuid (slug is just decoration):

http://example.com/book/035a46e0-6550-11dd-ad8b-0800200c9a66/new-title

Convert it to base64url:

you can get uuid back from AYEWXcsicACGA6PT7v_h3A
AYEWXcsicACGA6PT7v_h3A - 22 characters
035a46e0-6550-11dd-ad8b-0800200c9a66 - 36 characters

http://example.com/book/AYEWXcsicACGA6PT7v_h3A/new-title

Generate a unique short 11 chars length string just for slug usage:

http://example.com/book/icACEWXcsAY-new-title
http://example.com/book/icACEWXcsAY/new-title

If you don't want uuid or short id in url and want only slug, but do care about seo and user bookmarks, you will need to redirect all request from

http://example.com/sale/phone-1-title

to

http://example.com/sale/phone-1-title-updated

this will add additional complexity of managing slug history, adding fallback to history for all queries where slug is used and redirects if slugs doesn't match

What's your opinion on using UUIDs as database row identifiers, particularly in web apps?

Pros and Cons of UUID

URL usage