sql-serverindexingclustered-indexnon-clustered-indexfragmentation

External fragmentation when we do not have clustered index


If we don't have a clustered index, and consequently, the data is physically stored in heap order, and we create a non-clustered index on the heap. Data could store everywhere without any limitation. In this case, there won't be external fragmentation because it doesn't make sense to split pages?

I want to clarify the impact of clustered index on non clustered indexes and fragmentation in general.


Solution

  • I think I see what you're thinking but it is based upon some misconceptions.

    Let's start with a simple phonebook analogy. You make your brand new phonebook only to be contacted with details of a new phone number. Damn, you think. The pages are full. The phonebook was nice and efficient with consistent full pages of content. You slip a post-it inside the relevant page with details. Fragmentation.

    Now let's take a different example. This time, you have a log of transactions, where the transaction datetime is the primary key, or even a sequential transaction number. Some new data arrives. You add it to the last page of your book in sequence. Great. No fragmentation. However, you need to go back to a previous record and update it. Suddenly it no longer fits on the row. You have an issue. Fragmentation.

    The bottom line is that fragmentation is a natural part of changing data content. It's not limited to clustered indexes either. These issues will occur on NON clustered indexes too. The ONLY key difference is that the clustered index reflects the actual record ordering in the base table - the data is stored in the leaf node of the index rather than a pointer to a record.

    There are strategies to mitigate fragmentation. Ola's index maintenance scripts are a mainstream go-to for example. Fill factors can postpone small volumes of changes having fragmentation impact too, or mitigating them.

    However, the bottom line is that if you're sat there "faced by a phonebook", you will undoubtedly want an index. Depending on your application requirements, you may well opt to create "covered indexes" ie non clustered indexes that allow you to locate records and the required content without the overheads of lookups. For the most part, indexes should be designed around the required usage cases. Look into the downsides of ranges, sorting, grouping, higher update frequency and more with heaps.

    An afterthought: There ARE valid scenarios where a heap may fulfil requirements better. Are you pushing huge numbers of records into a table, cleaning content and pushing them out ie in a data warehouse? If it's a dumping ground as part of a pipeline, fair enough. If the bulk of the data is sat there fairly consistently and not so dynamic in change, why the aversion to indexes? Ultimately it's your usage that will determine the best course of action. Please elaborate.