为什么 SQL ANSI-92标准不能比 ANSI-89更好地被采用?

在我工作过的每一家公司,我都发现人们仍然使用 ANSI-89标准编写 SQL 查询:

select a.id, b.id, b.address_1
from person a, address b
where a.id = b.id

而不是 ANSI-92标准:

select a.id, b.id, b.address_1
from person a
inner join address b
on a.id = b.id

对于这样一个非常简单的查询,可读性没有很大的区别,但是对于大型查询,我发现将我的连接条件分组并列出表格可以更容易地查看我的连接中可能出现的问题,并且让我将所有的过滤都保留在 WHERE 子句中。更不用说我觉得外部连接比 Oracle 中的(+)语法更直观。

当我试图向人们传播 ANSI-92时,使用 ANSI-92比使用 ANSI-89有什么具体的性能好处吗?我想自己尝试一下,但是我们这里的 Oracle 设置不允许我们使用 EXPLAIN PLAN-不会希望人们尝试优化他们的代码,不是吗?

48617 次浏览

According to "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, of the six or eight RDBMS brands they tested, there was no difference in optimization or performance of SQL-89 versus SQL-92 style joins. One can assume that most RDBMS engines transform the syntax into an internal representation before optimizing or executing the query, so the human-readable syntax makes no difference.

I also try to evangelize the SQL-92 syntax. Sixteen years after it was approved, it's about time people start using it! And all brands of SQL database now support it, so there's no reason to continue to use the nonstandard (+) Oracle syntax or *= Microsoft/Sybase syntax.

As for why it's so hard to break the developer community of the SQL-89 habit, I can only assume that there's a large "base of the pyramid" of programmers who code by copy & paste, using ancient examples from books, magazine articles, or another code base, and these people don't learn new syntax abstractly. Some people pattern-match, and some people learn by rote.

I am gradually seeing people using SQL-92 syntax more frequently than I used to, though. I've been answering SQL questions online since 1994.

A few reasons come to mind:

  • people do it out of habit
  • people are lazy and prefer the "old style" joins because they involve less typing
  • beginners often have their problems wrapping their heads around the SQL-92 join syntax
  • people don't switch to new syntax just because it is there
  • people are unaware of the benefits the new (if you want to call it that) syntax has, primarily that it enables you to filter a table before you do an outer join, and not after it when all you have is the WHERE clause.

For my part, I do all my joins in the SQL-92 syntax, and I convert code where I can. It's the cleaner, more readable and powerful way to do it. But it's hard to convince someone to use the new style, when they think it hurts them in terms of more typing work while not changing the query result.

I don't know the answer for sure.. this is a religous war (albiet of a lesser degree than Mac-Pc or others)

A guess is that until fairly recently, Oracle, (and maybe other vendors as well) did not adopt the ANSI-92 standard (I think it was in Oracle v9, or thereabouts) and so, for DBAs/Db Developers working at companies which were still using these versions, (or wanted code to be portable across servers that might be using these versions, they had to stick to the old standard...

It's a shame really, because the new join syntax is much more readable, and the old syntax generates wrong (incorrect) results in several well-documented scenarios.

  • Specifically, outer Joins when there are conditional filtering predicates on non-Join related columns from the table on the "outer" side of the join.

I can answer from the point of view of an average developer, knowing just enough SQL to understand both syntaxes, but still googling the exact syntax of insert each time I need it... :-P (I don't do SQL all day, just fixing some problems from time to time.)

Well, actually, I find the first form more intuitive, making no apparent hierarchy between the two tables. The fact I learned SQL with possibly old books, showing the first form, probably doesn't help... ;-)
And the first reference I find on a sql select search in Google (which returns mostly French answers for me...) first shows the older form (then explain the second one).

Just giving some hints on the "why" question... ^_^ I should read a good, modern book (DB agnostic) on the topic. If somebody has suggestions...

Oracle does not implement ANSI-92 at all well. I've had several problems, not least because the data tables in Oracle Apps are so very well endowed with columns. If the number of columns in your joins exceeds about 1050 columns (which is very easy to do in Apps), then you will get this spurious error which makes absolutely no logical sense:

ORA-01445: cannot select ROWID from a join view without a key-preserved table.

Re-writing the query to use old style join syntax makes the issue disappear, which seems to point the finger of blame squarely at the implementation of ANSI-92 joins.

Until I encountered this problem, I was a steadfast promoter of ASNI-92, because of the benefits in reducing the chance of an accidental cross join, which is far too easy to do with old-style syntax.

Now, however, I find it much more difficult to insist on it. They point to Oracle's bad implementation and say "We'll do it our way, thanks."

I can't speak for all schools but at my university when we were doing the SQL module of our course, they didn't teach ANSI-92, they taught ANSI-89 - on an old VAX system at that! I wasn't exposed to ANSI-92 until I started digging around in Access having built some queries using the query designer and then digging into the SQL code. Realising I had no idea how it was completing the joins, or the implications of the syntax I started digging deeper so I could understand it.

Given that the available documentation isn't exactly intuitive in a lot of cases, and that people tend to stick to what they know and in many cases don't strive to learn any more than they need in order to get their job done, it's easy to see why adoption is taking so long.

Of course, there are those technical evangelists that like to tinker and understand and it tends to be those types that adopt the "newer" principles and try to convert the rest.

Oddly, it seems to me that a lot of programmers come out of school and stop advancing; thinking that because this is what they were taught, this is how it's done. It's not until you take off your blinkers that you realise that school was only meant to teach you the basics and give you enough understanding to learn the rest yourself and that really you barely scratched the surface of what there is to know; now it's your job to continue that path.

Of course, that's just my opinion based on my experience.

Well the ANSI092 standard includes some pretty heinous syntax. Natural Joins are one and the USING Clause is another. IMHO, the addition of a column to a table shouldn't break code but a NATURAL JOIN breaks in a most egregious fashion. The "best" way to break is by compilation error. For example if you SELECT * somewhere, the addition of a column could fail to compile. The next best way to fail would be a run time error. It's worse because your users may see it, but it still gives you a nice warning that you've broken something. If you use ANSI92 and write queries with NATURAL joins, it won't break at compile time and it won't break at run time, the query will just suddenly start producing wrong results. These types of bugs are insidious. Reports go wrong, potentially financial disclosure are incorrect.

For those unfamiliar with NATURAL Joins. They join two tables on every column name that exists in both tables. Which is really cool when you have a 4 column key and you're sick of typing it. The problem comes in when Table1 has a pre-existing column named DESCRIPTION and you add a new column to Table2 named, oh I don't know, something innocuous like, mmm, DESCRIPTION and now you're joining the two tables on a VARCHAR2(1000) field that is free form.

The USING clause can lead to total ambiguity in addition to the problem described above. In another SO post, someone showed this ANSI-92 SQL and asked for help reading it.

SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123

This is completely ambiguous. I put a UserID column in both Companies and user tables and there's no complaint. What if the UserID column in companies is the ID of the last person to modify that row?

I'm serious, Can anyone explain why such ambiguity was necessary? Why is it built straight into the standard?

I think Bill is correct that there is a large base of developer who copy/paste there way through coding. In fact, I can admit that I'm kind of one when it comes to ANSI-92. Every example I ever saw showed multiple joins being nested in parentheses. Honesty, that makes picking out the tables in the sql difficult at best. But then an SQL92 evangilist explained that would actually force a join order. JESUS... all those Copy pasters I've seen are now actually forcing a join order - a job that's 95% of the time better left to optimizers especially a copy/paster.

Tomalak got it right when he said,

people don't switch to new syntax just because it is there

It has to give me something and I don't see an upside. And if there is an upside, the negatives are an albatross too big to be ignored.

Inertia and practicality.

ANSI-92 SQL is like touch-typing. In some theoretical way it might make everything better someday, but I can type much faster looking at the keys with four fingers now. I would need to go backwards in order to go forwards, with no guarantee that there would ever be a pay-off.

Writing SQL is about 10% of my job. If I need ANSI-92 SQL to solve a problem that ANSI-89 SQL can't solve then I'll use it. (I use it in Access, in fact.) If using it all the time would help me solve my existing problems much faster, I'd spend the time to assimilate it. But I can whip out ANSI-89 SQL without ever thinking about the syntax. I get paid to solve problems--thinking about SQL syntax is a waste of my time and of my employer's money.

Someday, young Grasshopper, you'll be defending your use of ANSI-92 SQL syntax against young people whining that you should be using SQL3 (or whatever). And then you'll understand. :-)

First let me say that in SQL Server the outer join syntax (*=) does not give correct results all the time. There are times when it interprets that as a cross join and not an outer join. So right there is a good reason to stop using it. And that outer join syntax is a deprecated feature and will not be in the next version of SQL Server after SQL Server 2008. You'll still be able to do the inner joins but why on earth would anyone want to? They are unclear and much much harder to maintain. You don't easily know what is part of the join and what is really just the where clause.

One reason why I believe you should not use the old syntax is that understanding joins and what they do and do not do is a critical step for anyone who will write SQL code. You should not write any SQL code without understanding joins thoroughly. If you understand them well, you will probably come to the conclusion that the ANSI-92 syntax is clearer and easier to maintain. I've never met a SQL expert who didn't use the ANSI-92 syntax in preference to the old syntax.

Most people who I have met or dealt with who use the old code, truly don't understand joins and thus get into trouble when querying the database. This is my personal experience so I'm not saying it is always true. But as a data specialist, I've had to fix too much of this junk through the years not to believe it.

In response to the NATURAL JOIN and USING post above.

WHY would you ever see the need to use these - they weren't available in ANSI-89 and were added for ANSI-92 as what I can only see as a shortcut.

I would never leave a join to chance and would always specify the table/alias and id.

For me, the only way to go is ANSI-92. It is more verbose and the syntax isn't liked by ANSI-89 followers but it neatly separates your JOINS from your FILTERING.

I had a query that was originally written for SQL Server 6.5, which did not support the SQL 92 join syntax, i.e.

select foo.baz
from foo
left outer join bar
on foo.a = bar.a

was instead written as

select foo.baz
from foo, bar
where foo.a *= bar.a

The query had been around for a while, and the relevant data had accumulated to make the query run too slow, abut 90 seconds to complete. By the time this problem arose, we had upgraded to SQL Server 7.

After mucking about with indexes and other Easter-egging, I changed the join syntax to be SQL 92 compliant. The query time dropped to 3 seconds.

There's a good reason to switch.

Reposted from here.

I was taught ANSI-89 in school and worked in industry for a few years. Then I left the fabulous world of DBMS for 8 years. But then I came back and this new ANSI 92 stuff was being taught. I have learned the Join On syntax and now I actually teach SQL and I recommend the new JOIN ON syntax.

But the downside that I see is correlated subqueries don't seem to make sense in the light of ANSI 92 joins. When join information was included in the WHERE and correlated subqueries are "joined" in the WHERE all seemed right and consistent. In ANSI 92 table join criteria is not in the WHERE and subquery "join" is, the syntax seems inconsistent. On the other hand, trying to "fix" this inconsistency would probably just make it worse.

1) Standard way to write OUTER JOIN, versus *= or (+)=

2) NATURAL JOIN

3) Depend in the database engine, ANSI-92 trends to be more optimal.

4) Manual optimization :

Let's say that we have the next syntax (ANSI-89):

(1)select * from TABLE_OFFICES to,BIG_TABLE_USERS btu
where to.iduser=tbu.iduser and to.idoffice=1

It could be written as:

(2)select * from TABLE_OFFICES to
inner join BIG_TABLE_USERS btu on to.iduser=tbu.iduser
where to.idoffice=1

But also as :

(3)select * from TABLE_OFFICES to
inner join BIG_TABLE_USERS btu on to.iduser=tbu.iduser and to.idoffice=1

All of them (1),(2),(3) return the same result, however they are optimized differently, it depends in the database engine but most of them do :

  • (1) its up to the database engine decide the optimization.
  • (2) it joins both tables then do the filter per office.
  • (3) it filters the BIG_TABLE_USERS using the idoffice then join both tables.

5) Longer queries are less messy.

Reasons people use ANSI-89 from my practical experience with old and young programmers and trainees and fresh graduates:

  • They learn SQL from existing code they see (rather than books) and learn ANSI-89 from code
  • ANSI-89 because is less typing
  • They do not think about it and use one or other style and do not even know which of both is considered new or old and do not care either
  • The idea that code is also a communication to the next programmer coming along maintaining the code does not exist. They think they talk to the computer and the computer does not care.
  • The art of "clean coding" is unknown
  • Knowledge of programming language and SQL specifically is so poor that they copy and paste together what they find elsewhere
  • Personal preference

I personally prefer ANSI-92 and change every query I see in ANSI-89 syntax sometimes only to better understand the SQL Statement at hand. But I realized that the majority of people I work with are not skilled enough to write joins over many tables. They code as good as they can and use what they memorized the first time they encountered a SQL statement.

Here are a few points comparing SQL-89, and SQL-92 and clearing up some misconceptions in other answers.

  1. NATURAL JOINS are a horrible idea. They're implicit and they require meta-information about the table. Nothing about SQL-92 requires their use so simply ignore them. They're not relevant to this discussion.
  2. USING is a great idea, it has two effects:
    1. It produces only one column on the result set from an equijoin.
    2. It's enforces a sound and sane convention. In SQL-89 you had people writing the column id on both tables. After you join the tables, this becomes and ambiguous and it requires explicit aliasing. Further, the ids on the join almost certainly had different data. If you join person to company, you now have to alias one id to person_id, and one id to company_id, without which the join would produce two ambiguous columns. Using a globally-unique identifier for the table's surrogate key is the convention the standard rewards with USING.
  3. The SQL-89 syntax is an implicit CROSS JOIN. A CROSS JOIN doesn't reduce the set, it implicitly grows it. FROM T1,T2 is the same as FROM T1 CROSS JOIN T2, that produces a Cartesian join which is usually not what you want. Having the selectivity to reduce that removed to a distant WHERE conditional means that you're more likely to make mistakes during design.
  4. SQL-89 , and SQL-92 explicit JOINs have different precedence. JOIN has a higher precedence. Even worse, some databases like MySQL got this wrong for a very long time.. So mixing the two styles is a bad idea, and the far more popular style today is the SQL-92 style.

A new SQL standard inherits everything from the previous standard, a.k.a. 'the shackles of compatibility'. So the 'old' / 'comma-separated' / 'unqualified' join style is perfectly valid SQL-92 sytax.

Now, I argue that SQL-92's NATURAL JOIN is the only join you need. For example, I argue it is superior to inner join because it does not generate duplicate columns - no more range variables in SELECT clauses to disambiguate columns! But I can't expected to change every heart and mind, so I need to work with coders who will continue to adopt what I personally consider to be legacy join styles (and they may even refer to range variables as 'aliases'!). This is the nature of teamwork and not operating in a vacuum.

One of the criticisms of the SQL language is that the same result can be obtained using a number of semantically-equivalent syntaxes (some using relational algebra, some using the relational calculus), where choosing the 'best' one simply comes down to personal style. So I'm as comfortable with the 'old-style' joins as I am with INNER. Whether I'd take the time to rewrite them as NATURAL depends on context.