What's the difference between INNER, LEFT and RIGHT JOINs?
It seems every so often, I forget the answer to that question - I usually get confused ("what happens if there is a null - do I get a row in my results or not?"). Now, I'm sure it is easy for most people to remember how these JOINs work, but I always seem to forget. However, recently I got a good idiot lesson from Joseph Marques on this; he gave me a very short and easy way to remember the rules of these three types of JOINs. It was such an elegant way to explain it that I promised myself I would write it down so I don't forget it, so here it goes (again, I'm sure everyone knows this, and I'm sure I was even told this myself at some point in the past, but of course, I don't remember):
JOINs can be thought of in the context of a Venn diagram. You have two circles A and B. These circles partially overlap. (circles A and B represent your table data - when you join the two circles/tables together, the overlap represents relationships between the data.)Note: A LEFT or RIGHT JOIN are OUTER JOINs - sometimes you see these specified as "LEFT OUTER JOIN" or "RIGHT OUTER JOIN". Looking at the Venn diagram you can see why - they include the data outside of the inner overlap area (but also including the inner data). If you just see the word "JOIN" without a specifier, it typically is referring to an "INNER JOIN".
- The overlap is the INNER join - it contains only the elements in A that also have associated B elements.
- The A circle is the LEFT join - it contains all A elements regardless of whether they have associated B elements.
- The B circle is the RIGHT join - it contains all B elements regardless of whether they have associated A elements.
Below is an example complete with this Venn diagram visualization:
I have a Person table containing three people: John, Mary and Bob. I have a Phone table containing three phones that have extensions of 987, 555 and 123. John and Bob have been assigned phones (I can call John on extension 987, Bob on extension 123). Mary is not assigned a phone. The phone with the extension 555 is not assigned to any person.
The INNER join:
select name,dial_number
from person
inner join phone on person.phone_id=phone.phone_id
returns results as found in the overlapping area (John:987 and Bob:123).
The LEFT join:
select name,dial_number
from person
left join phone on person.phone_id=phone.phone_id
returns results as found in the entire left circle (John:987, Mary:null, Bob:123). Because the left table is the Person table, I have one row for each person, regardless of whether they have a phone assigned or not.
The RIGHT join:
select name,dial_number
from person
right join phone on person.phone_id=phone.phone_id
returns results as found in the entire right circle (John:987, null:555, Bob:123). Because the right table is the Phone table, I have one row for each phone, regardless of whether a person is assigned a phone or not.
I don't know why it is so hard for me to remember this - it's so obvious when you see it written down like that. But, nevertheless, I find it helpful to refresh my memory with that Venn diagram explanation whenever I experience a brain freeze while developing queries.
...and a full (outer) join is the union of both circles. thus inner, left, right, and full joins will at most give you N+M rows - the total amount of data inside both circles.
ReplyDeletethe venn diagram analogy breaks down for the cross join, however, which gives you up to NxM rows. the lay analogy i use for cross joins is an excel spreadsheet where each table becomes one of the (vertical or horizontal) axes. adding where conditions then starts to "blank-out" certain cells that don't match those conditions...and the rest are returned.
You share my affliction -- you're too hard on yourself.
ReplyDeleteExcellent post, and right when I needed it! Thanks,