Getting Pagination Right: Offset vs Seek

ℹ️
This post is a summary of Paging Through Results of the amazing “Use The Index, Luke” DB performance guide by Markus Winand, showing the best practices of pagination.

The offset method

When implementing pagination, we tend to go for the more obvious solution:

SELECT *
FROM posts p
ORDER BY p.publish_date DESC
OFFSET 10
FETCH NEXT 10 ROWS ONLY

This is what we call the offset method, because we offset by an amount and then fetch the next N rows.

Although simple, there are downsides to this method.

Pages drift if new items are added while you browse pages. Let’s say you’re browsing a list of Git tutorials. There are 10 posts per page, where the posts are ordered by publish date descending, and you’re on page 1. The final post on the current page is “Using git push and pull”. Just before going to page 2, a new post is added. When you click on page 2, guess what you see as the first post on the page? Yep, “Using git push and pull” again. Not an ideal user experience.

Response time increases the more pages you browse through. This is the more significant downside. When you offset by an amount, you’re not avoiding the sequential scan of those rows. These rows are still selected, although then promptly discarded from the final result.

We can prove this by using an EXPLAIN of a sample query (from an example table containing millions of car trips):

EXPLAIN SELECT *
FROM trips
OFFSET 3000000 -- Skip the first 3 million rows
FETCH NEXT 10 ROWS ONLY

/* 

QUERY PLAN                                                                       |
---------------------------------------------------------------------------------+
Limit  (cost=64091.52..64091.73 rows=10 width=67)                                |
  ->  Seq Scan on trips  (cost=0.00..70256.89 rows=3288589 width=67)|
*/

Notice how the Seq Scan node in the query plan! We want to avoid these when possible. But this is the last resort in this case, because no indexes can be used, so we can only deal with row numbers.

Fortunately, this won’t be much of an issue if the number of rows is small. Here’s some sample runtimes of that same query but with different OFFSET values:

Adding ~50ms per million rows.

The seek method

The seek method uses the last entry from the previous page as the delimiter. You keep track of the value of the last entry of the page, then use that in the

We show this in the following query, where we use the last trip_date from the previous page as the delimiter, and fetch the next 10 rows after that.

EXPLAIN SELECT *
FROM trips
WHERE trip_date < '2022-01-01'
ORDER BY trip_date DESC
FETCH NEXT 10 ROWS ONLY

/*

QUERY PLAN                                                                                                       |
-----------------------------------------------------------------------------------------------------------------+
Limit  (cost=0.43..0.83 rows=10 width=67)                                                                        |
  ->  Index Scan Backward using trip_date_index on trips  (cost=0.43..96001.21 rows=2424673 width=67)            |
        Index Cond: (trip_date < '2022-01-01 00:00:00+01'::timestamp with time zone)                             |
*/

You might notice from the query plan that we avoid the biggest performance issue that we have in the offset method: the database uses an index (on the trip_date column), and avoids a sequential scan. The engine truly skips the rows, giving us that sweet O(1) performance.

Deterministic ordering is crucial

The previous query is admittedly a bit simplified, but a deliberate choice to explain the importance of deterministic ordering.

It has two issues.

First, filtering by date has the risk of leaving out trips that meet the criteria (< ‘2022-01-1’), but are left out because we’re only selecting the first 10.

Second, the rows aren’t truly ordered deterministically. We’re likely to see the same order when running the same query over and over again, but that’s just because the database tends to order things the same way. But realistically, it could shuffle around the rows and still fulfill the ORDER BY condition. Who knows, maybe a future release of the database could include a new optimization feature that, as a side effect, ends up giving a different ordering of items on every query run. We’re at the database’s mercy, and we don’t want that.

We can order our rows deterministically by including additional rows that can be ordered deterministically. For example, an ID.

SELECT *
FROM trips
WHERE (trip_date, trip_id) < ('2022-01-01', '05fcce76-347c-4227-8d39-84907820d427')
ORDER BY trip_date DESC, trip_id DESC
FETCH NEXT 10 ROWS ONLY

This is made possible by SQL Row Values (also known as composite values), a way to combine multiple values into one logical value, which can then be compared to each other. Most databases support it, but for those that don’t, the main post contains a (rather ugly) workaround.

ℹ️

The seek method is similar to GraphQL’s relay-style cursor pagination, a natural way to paginate through graph edges. In the GraphQL query, an edge must define a cursor, which will then be used in the after argument, accompanied by a first argument:

{
  user {
    id
    name
    friends(first: 10, after: "opaqueCursor") {
      edges {
        cursor
        node {
          id
          name
        }
      }
      pageInfo {
        hasNextPage
      }
    }
  }
}

For the reverse direction, you can use a combination of last and before.

Conclusion

The seek method is obviously better for performance, but it falls short if you need to fetch arbitrary pages. So, if you want a footer with numbered pages (and the previous and next buttons) at the bottom of your list, the offset method is the way to go. The seek method is ideal if you want to implement an infinite scrolling mechanism for your pagination.


See Also