postgresqldatabase-designpg

Limit clause workings in postgres?


So i have a table with 1000 rows, i simply say select * from hugedata limit 1 the query plan simply says table scan then limit operator attached is query plan

    [
  {
    "Plan": {
      "Node Type": "Limit",
      "Parallel Aware": false,
      "Async Capable": false,
      "Startup Cost": 0.00,
      "Total Cost": 0.02,
      "Plan Rows": 1,
      "Plan Width": 41,
      "Actual Startup Time": 0.007,
      "Actual Total Time": 0.007,
      "Actual Rows": 1,
      "Actual Loops": 1,
      "Output": ["pk", "description", "flags"],
      "Shared Hit Blocks": 1,
      "Shared Read Blocks": 0,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0,
      "WAL Records": 0,
      "WAL FPI": 0,
      "WAL Bytes": 0,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Async Capable": false,
          "Relation Name": "hugedata",
          "Schema": "public",
          "Alias": "hugedata",
          "Startup Cost": 0.00,
          "Total Cost": 15406.01,
          "Plan Rows": 1000001,
          "Plan Width": 41,
          "Actual Startup Time": 0.006,
          "Actual Total Time": 0.006,
          "Actual Rows": 1,
          "Actual Loops": 1,
          "Output": ["pk", "description", "flags"],
          "Shared Hit Blocks": 1,
          "Shared Read Blocks": 0,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 0,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 0,
          "WAL Records": 0,
          "WAL FPI": 0,
          "WAL Bytes": 0
        }
      ]
    },
    "Settings": {
    },
    "Planning": {
      "Shared Hit Blocks": 0,
      "Shared Read Blocks": 0,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0
    },
    "Planning Time": 0.035,
    "Triggers": [
    ],
    "Execution Time": 0.069
  }
]

The question is How is limit clause applied in postgres?

  1. DB Engine selects 1000(all) records then passes these to limit operator where it accepts 1st record and throw away 999 records..

OR

  1. DB Engine retrieves the first record checks back with limit operator which accepts it, then DB Engines tries to search for second record checks back with limit operator which rejects it and DB Engine stops searching any further... Difference from 1st one is DB Engine always works(searches/retrieves) for N+1(Limit number) rows, where in first one it always get all rows in the table..

From my search it tells me limit operator works as the 1st point suggested, if this is true why so inefficient implementation, or am i missing anything?

Limit docs say's it takes limit into account when generating query plan but does it Yield at N+1 row or "taken into account" means something else is not specified... https://www.postgresql.org/docs/current/queries-limit.html


Solution

  • The PostgreSQL executor works on demand and top down.

    To calculate the first result row, it fetches the first row from the Limit node. That in turn fetches the first row from the Seq Scan.

    When PostgreSQL fetches the second result row from the Limit node, it is informed that the execution is done.

    in sum, only a single row is fetched from the sequential scan.

    You can find this documented in the executor README:

    The executor processes a tree of "plan nodes". The plan tree is essentially a demand-pull pipeline of tuple processing operations. Each node, when called, will produce the next tuple in its output sequence, or NULL if no more tuples are available. If the node is not a primitive relation-scanning node, it will have child node(s) that it calls in turn to obtain input tuples.