Paginating Magento catalogs without OFFSET

I was building a module that reads every product in a Magento catalog to generate an llms.txt file. On a store with 70,000 products, the standard Magento collection was fast at page 1 and unusably slow around page 40. The code did not change — only the page number.

This is not specific to Magento. OFFSET pagination gets slower with page depth in every SQL database, and any ORM that exposes LIMIT offset, n inherits the same curve — Laravel, Django, Rails, Hibernate, all of them. setPageSize() and setCurPage() are the standard Magento 2 way to iterate a collection, so the pattern shows up a lot in Magento code. This post is about the fix.

The standard way, and why it breaks

In Magento 2, if you want to read every product, you usually write something like this:

php
$collection = $this->collectionFactory->create()
    ->addAttributeToSelect('*')
    ->setPageSize(1000)
    ->setCurPage($page);

Under the hood, setPageSize and setCurPage turn into SQL LIMIT and OFFSET:

sql
SELECT * FROM catalog_product_entity
ORDER BY entity_id ASC
LIMIT 1000 OFFSET 999000;

MySQL reads this as: "start from row 1, skip 999,000 rows, then give me the next 1,000." The database has no shortcut. It walks the index from the start every single time.

That is the slowdown. At page 1, OFFSET 0 is free. At page 1000, OFFSET 999000 is 999,000 wasted row reads per request. The cost grows with the page number — and adds up if the same query runs many times.

The fix, part one: keyset pagination

Instead of using OFFSET, use the index you already have — the primary key — as a cursor:

sql
SELECT * FROM catalog_product_entity
WHERE entity_id > :lastId
ORDER BY entity_id ASC
LIMIT 1000;

This tells MySQL: "start from the row right after the last one I saw, and give me the next 1,000." It is an index range scan — the database jumps straight to the right spot through the B-tree. The cost is the same whether :lastId is 0, 10,000, or 1,000,000.

You pass the last entity_id you saw into the next query as :lastId. The cursor moves forward through the table without ever skipping rows.

Other teams have written about large speedups from this switch on production tables. The point is not the exact number — it is that the cost no longer grows with page depth.

Here is what it looks like in the module:

php
private function buildProductSelect(int $storeId, int $lastId): Select
{
    $select = $this->connection->select()
        ->from('catalog_product_entity')
        ->where('entity_id > ?', $lastId)
        ->order('entity_id ASC')
        ->limit(1000);
 
    // ... EAV joins, status filter, visibility filter, etc.
 
    return $select;
}

The WHERE entity_id > :lastId is the whole trick. Everything else is standard Magento code.

The fix, part two: PHP generators

Keyset pagination solves the database side. But if you load 70,000 products into PHP arrays in one go, you have only moved the problem.

This is where yield helps. A PHP generator lets a function behave like an iterator — it hands back one batch at a time, pauses, and continues on the next call. Each batch can be freed from memory before the next one is fetched.

php
private function getProductGenerator(int $storeId): Generator
{
    $lastId = 0;
 
    while (true) {
        $rows = $this->connection->fetchAll(
            $this->buildProductSelect($storeId, $lastId)
        );
 
        if (empty($rows)) {
            break;
        }
 
        $lastId = (int) end($rows)['entity_id'];
 
        yield $rows;
 
        if (count($rows) < 1000) {
            break;
        }
    }
}

The caller reads it like any other iterable:

php
foreach ($this->getProductGenerator($storeId) as $batch) {
    foreach ($batch as $row) {
        // process one product
    }
    // $batch can be freed from memory here
    // before the next batch is fetched
}

Memory per batch stays small — around 5–10 MB for 1,000 rows in my case. The database runs constant-cost range scans. The caller does not need to know how many products exist.

One honest caveat

A generator keeps per-batch memory bounded. But if you collect results into a string or a big array across batches, total memory through the full loop still grows with catalog size — only the peak per fetch stays small.

So the full picture:

  • Database side: truly constant cost per page, no matter the page depth
  • Per-batch PHP memory: small and bounded
  • Total memory through the full loop: still grows with catalog size, unless you also stream the output to disk

The first two are big wins on their own. Do not overstate the third.

When to actually use this

Keyset pagination is the right default for any loop that:

  • Reads a large table from start to finish. Catalog exports, data sync jobs, sitemap generation, search reindexing — anywhere the caller needs every row, not a slice.
  • Runs in a background job, cron, or CLI. The sequential nature of cursor pagination is fine when nothing downstream needs to jump to an arbitrary page.
  • Uses a primary key that is a number and always goes up. Magento's entity_id works this way — every new product gets a bigger ID than the last one, so WHERE entity_id > :lastId always moves forward without skipping or repeating rows.

It is not the right tool when:

  • You need to jump to an arbitrary page. Keyset pagination is sequential — to load page 50 you need the cursor value from page 49. There is no way to compute "the row just after position 49,000" without walking there first. Storefront product listings where a user clicks "page 27" directly still need OFFSET.
  • The sort column is not the primary key. If users sort by price or name, duplicate values exist (two products at €9.99), so the cursor needs a deterministic tie-breaker. The WHERE clause becomes a pair: WHERE (price, entity_id) > (:lastPrice, :lastId). It works and it is still fast, but the SQL is longer and the index needs to cover both columns in the right order.
  • The table is small. Under a few thousand rows, OFFSET cost is lost in the noise. Stick with the default collection — extra complexity is not worth it.

The one-line version

Use WHERE pk > :lastId ORDER BY pk ASC LIMIT n instead of OFFSET, and yield your batches from a generator function. Your catalog loops will stop slowing down as the catalog grows.

Source for the module that uses this technique: github.com/iamrobindhiman/magento2-module-llms-txt.

Further reading