The problem has to do with paging through database data and the query used to do the paging.
The code in question was using a JPA query that looked like this:
SELECT r.id, r.foo
FROM Table t
JOIN t.relation r
WHERE t.id = :tid
This query has the potential to return many hundreds of rows. The code using this query, therefore, used paging to retrieve chunks of the data, one page at a time. This paging code did so by setting the appropriate start/max rows via the JPA API:
query.setFirstResult(startRow);
query.setMaxResults(pageSize);
In my unit test, I set up test data that contained over 1,000 rows to be returned by this query. Much to my surprise, my unit test was failing because it wasn't getting the full amount of rows expected. Worse yet, after multiple runs of my test (which paged through what was supposed to be the entire data set), it was getting what seemed to be a random number of rows back for each run - it wasn't even consistent! Sometimes the unit test would get back 800 rows, other times 900. Sometimes it would work with 2 or 3 pages of data, but would be broken with more pages. The reason why this was hard to detect (without a unit test) was because it only showed up when there was many multiple pages of data to be retrieved (thus it required first to have a lot of data and second it required you to actually scan the data and notice you had less rows than expected).
It turns out that some database implementations do not guarantee the order in which a query returns back results without a defined ORDER BY clause in the query. In this case here, because the query didn't have a ORDER BY clause, each time the paging code ran a query to obtain a page of data, the query was returning the data in a different order. Code that ran the query multiple times to obtain different pages (that is, with different firstRow/maxResults settings) wasn't guaranteed that it wouldn't get back duplicate rows from a previous query execution.
I found this behavior in both Postgres 8 and Oracle 10.
The good news is once I added an ORDER BY clause to that query, all worked well and my unit test started passing:
SELECT r.id, r.foo
FROM Table t
JOIN t.relation r
WHERE t.id = :tid
ORDER BY r.id
So the moral of the story is: if you ever use paging queries, always double check your results by testing with many multiple pages of data and if you see problems, put an ORDER BY clause in your query to see if it fixes the issue.
NOTE: I found the following two resources that talk about this issue (I'm sure there are more). See: