不存在的 MySQL 连接

我有一个连接两个表的 MySQL 查询

  • 选民
  • 家庭

他们在 voters.household_idhousehold.id上连接。

现在我需要做的是修改它,将选民表连接到沿 voter.idelimination.voter_id的第三个称为消除的表。但问题是,我想排除选民表中在消除表中具有相应记录的任何记录。

我如何设计一个查询来完成这项工作?

这是我目前的疑问:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
`voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
`voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
`household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'
ORDER BY `Last_Name` ASC
LIMIT 30
133323 次浏览

I'd probably use a LEFT JOIN, which will return rows even if there's no match, and then you can select only the rows with no match by checking for NULLs.

So, something like:

SELECT V.*
FROM voter V LEFT JOIN elimination E ON V.id = E.voter_id
WHERE E.voter_id IS NULL

Whether that's more or less efficient than using a subquery depends on optimization, indexes, whether its possible to have more than one elimination per voter, etc.

I'd use a 'where not exists' -- exactly as you suggest in your title:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
`voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
`voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
`household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'


AND NOT EXISTS (
SELECT * FROM `elimination`
WHERE `elimination`.`voter_id` = `voter`.`ID`
)


ORDER BY `Last_Name` ASC
LIMIT 30

That may be marginally faster than doing a left join (of course, depending on your indexes, cardinality of your tables, etc), and is almost certainly much faster than using IN.

There are three possible ways to do that.

  1. Option

    SELECT  lt.* FROM    table_left lt
    LEFT JOIN
    table_right rt
    ON      rt.value = lt.value
    WHERE   rt.value IS NULL
    
  2. Option

    SELECT  lt.* FROM    table_left lt
    WHERE   lt.value NOT IN
    (
    SELECT  value
    FROM    table_right rt
    )
    
  3. Option

    SELECT  lt.* FROM    table_left lt
    WHERE   NOT EXISTS
    (
    SELECT  NULL
    FROM    table_right rt
    WHERE   rt.value = lt.value
    )
    

Be wary of "LEFT" JOINS - LEFT JOINS are essentially OUTER JOINS. Different RDBMS query parsers and optimizers may handle OUTER JOINS very differently. Take for instance, how LEFT (OUTER) JOINS are parsed by MySQL's query optimizer, and the difference in resulting execution plans they could evaluate to per iteration:

https://dev.mysql.com/doc/refman/8.0/en/outer-join-simplification.html

LEFT JOINS by their very nature are ALWAYS going to be NonDeterministic. IMO - they should not be used in Production code.

I prefer to write JOIN type statements in a more "old school" approach first, leaving out any specific JOIN declarations. Let the RDBMS query parser do what its designed to do - analyze your statement and translate it to most optimal execution plan based on its evaluation of your index stats and data model design. That said, the build in query parsers / optimizers can even get it wrong, trust me I've seen it happen many times. In general, I feel like taking this approach first generally provides sufficient baseline information to make informed further tuning decisions in most cases.

To illustrate - using the question query from this thread:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
`voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
`voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
`household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'


AND NOT EXISTS (
SELECT * FROM `elimination`
WHERE `elimination`.`voter_id` = `voter`.`ID`
)


ORDER BY `Last_Name` ASC
LIMIT 30

Consider it re-written without the explicit JOIN and NOT EXISTS statements above (assumes the non fully qualified fields in the WHERE clause belonged to the voter table):

SELECT v.`ID`, v.`Last_Name`, v.`First_Name`,
v.`Middle_Name`, v.`Age`, v.`Sex`,
v.`Party`, v.`Demo`, v.`PV`,
h.`Address`, h.`City`, h.`Zip`
FROM `voter` v, `household` h, `elimination` e
WHERE v.`House_ID` = h.`id`
AND v.`ID` != e.`voter_id`
AND v.`CT` = '5'
AND v.`Precnum` = 'CTY3'
AND  v.`Last_Name`  LIKE '%Cumbee%'
AND  v.`First_Name`  LIKE '%John%'
ORDER BY v.`Last_Name` ASC
LIMIT 30;

Try writing some of your future SQL queries BOTH ways syntactically going forward, compare their results, and see what you think. Writing your SQL in the style I have suggested above comes with the added benefit of being more RDBMS agnostic, also.

Cheers!