Table Fragmentation in SQL Anywhere 9

This is an excerpt from a new book, SQL Anywhere Studio 9 Developer's Guide, ISBN 1-55622-506-7, published this summer by Wordware Publishing. The material comes from two sections in Chapter 10, "Tuning," which discuss table fragmentation in ASA databases.

The first part of this article describes how rows and columns are stored in the physical database file, how that data can become so fragmented that performance suffers, and how fragmentation can be measured. The second part presents one solution to table fragmentation, the REORGANIZE TABLE statement, and it describes several ways to avoid fragmentation in the first place.

Table Fragmentation
Even if the disk isn't fragmented, and all the database files are contained in contiguous areas on the disk, the actual table data inside the database can be highly fragmented. Like disk fragmentation, table fragmentation can slow down performance and it is worth investigating.

When a new row is inserted, it is placed in a contiguous area on a single page. If there is not enough free space on a page that already contains data from other rows, an empty page is taken. If the row is larger than one page, or contains one or more long columns that contain 255 or more bytes, the row is split across two or more pages, but those are the only conditions that prevent a new row from being inserted entirely within a single page.

Note: Data for different tables is never stored on the same page. Also, the table's PCTFREE setting (free space percentage) is honored when a new row is inserted: if the free space on a page containing existing data would sink below the PCTFREE setting when inserting a new row, that page is not chosen for the new row. In other words, the PCTFREE free space may be consumed by an UPDATE later on, but an INSERT leaves it alone.

Subsequent UPDATE operations may cause a row to grow beyond its original page. That can happen for two reasons: one or more short columns may grow in size by an amount larger than the free space remaining on the page, or one or more long columns may grow to 255 bytes or longer. In the first case, the row splits because it won't fit. In the second case, a blob column causes a split as soon as it grows to 255 bytes.

Row splits are generally a bad thing, especially row splits caused by short columns. Row splits lead to table fragmentation and heavy table fragmentation can slow down processing a great deal, affecting both SELECT operations, which must gather together the split rows, and UPDATE operations, which must spend time handling new splits.

SQL Anywhere offers some facilities for measuring table fragmentation. These measurements are based on the following definitions.

A short column is any column that can't grow beyond 254 bytes in length, whereas a long column can hold more. Both kinds of columns can cause fragmentation, and the kind of fragmentation caused by short columns is often worse.

A row segment is all or part of one row that is contained on one page, excluding the portions of long columns beyond 254 bytes. A row may have one or more row segments.

A table page is a page containing the leading row segments of one or more rows of data.

The placement of each leading row segment on a page is immutable. That means that once the leading row segment for a row has been inserted on a particular table page, subsequent UPDATE operations never move that leading row segment to a different page. In other words, an UPDATE may cause parts of the row to be moved to other row segments on other pages because they no longer fit on this page, but the first part of the row always remains on its original table page. The only exception to this is the REORGANIZE TABLE statement, discussed in the next section.

A row continuation is that part of a row that does not fit in the leading row segment, excluding the portions of long columns beyond 254 bytes. A row continuation consists of one or more row segments. Row continuations are caused by short column row splits.

A blob continuation is that part of a row containing the portion of a single long column that has grown longer than 254 bytes. Each long column is treated separately, with its own blob continuation for the overflow data. Blob continuations are caused by long column row splits. Note that a column exactly 255 bytes in length does require a blob continuation, but that the blob continuation is empty of data; the first 255 bytes of each long column are always stored together with the short columns in one of the row segments.

An extension page is a page containing data for row and/or blob continuations. Extension pages are sometimes referred to as the "blob arena" but they are not limited to blobs. Unlike table pages, data on extension pages is dynamically rearranged so that each row continuation is stored contiguously in column order. Also, each blob continuation is managed independently and is stored contiguously. However, SQL Anywhere does not use table bitmaps to efficiently locate extension pages; table bitmaps are only used for table pages.

Tip: If your UPDATE performance grinds to a halt, it might not have anything to do with execution plans or indexes. It might be caused by short columns that are growing in size, causing more row splits in an already heavily fragmented table. SQL Anywhere might be spending a lot of time shuffling pages around to keep the row continuation data in contiguous order.

The built-in sa_table_fragmentation stored procedure can be used to show the average number of row segments per row for each table. Here is an example of how to call sa_table_fragmentation to show all the tables owned by "DBA":

CHECKPOINT;
SELECT *
FROM sa_table_fragmentation ( owner_name = 'DBA' );

Note: The examples in this article make use of stored procedure calls in the FROM clause, a wonderful new feature of SQL Anywhere 9.

Note: Explicit CHECKPOINT statements are not normally required but in this case one is used to force SQL Anywhere to store up-to-date row counts in the SYSTABLE system catalog table so they can be reported by sa_table_fragmentation.

Here is an example of the output from sa_table_fragmentation: the segs_per_row column shows the average number of row segments per row. A segs_per_row value of 1 is as good as it gets, and any value much above 1 indicates a high degree of fragmentation; e.g., a value of 1.5 means that half of the rows are partitioned. Three of the tables listed here have very high values of segs_per_row: child, parent, and twide. On average, almost every row in the child and parent tables have row continuations caused by short column row splits, and the single row in the twide table has been split several times. Only the normal and tblob tables have no row continuations; their segs_per_row values of 1.0 are ideal (see Table 1).

Note: A call to sa_table_fragmentation can take a long time and require a lot of disk I/O, especially for large fragmented tables with many pages, which are not currently in the cache.

Another useful built-in procedure is called sa_table_stats; it shows the number of table pages and extension pages for each table, as follows:

SELECT sa_table_stats.table_name AS table_name,
    sa_table_stats.count AS rows,
    sa_table_stats.table_page_count AS table_pages,
    sa_table_stats.ext_page_count AS ext_pages
  FROM sa_table_stats()
WHERE sa_table_stats.creator = 'DBA'
ORDER BY table_name;

Here's what the output from sa_table_stats looks like for the same tables as the earlier example. As expected, the three tables that had a lot of row continuations also have a large number of extension pages: child, parent, and twide. However, the tblob table also has a large number of extension pages (104), even though it didn't show up earlier as having any row continuations. That's because tblob contains a long column that required a blob continuation rather than a row continuation (see Table 2).

Listing 1 shows a procedure that combines the output of both sa_table_fragmentation and sa_table_stats procedures so you can view the data about row segments and extension pages side by side.


Table 3 shows the output from the call to p_table_fragmentation showing the same five tables. Tables where segments_per_row is large but extension_pages more or less matches row_segments (in this case child, parent, twide) probably have fragmentation caused by row continuations, whereas tables where the segments_per_row is 1 or close to it but which have a comparatively large number of extension pages (tblob) probably have fragmentation caused by blob continuations.

Note: The sa_table_stats procedure is currently undocumented and it is primarily intended to report on the current contents of the cache. In the p_table_fragmentation procedure above, sa_table_stats is only being called to obtain information about table and extension pages. That information is not available immediately after database startup because the cache hasn't been populated yet, and that's why the call to sa_table_fragmentation is placed first in the procedure: it forces the cache to be populated so the subsequent call to sa_table_stats will return useful information.

Listing 2 is yet another query that reports "Fragmentation!" for tables where these conditions are satisfied: the average number of segments per row is 1.05 or larger, and/or the ratio of extension pages to row segments is 2 or larger. The first condition detects table fragmentation caused by row continuations and the second attempts to detect fragmentation caused by blob continuations. Both numbers, 1.05 and 2, are arbitrary thresholds that can be changed, perhaps to match actual values measured when performance begins to degrade.

Table 4 shows what the "Fragmentation!" report looks like - nice and easy to read and interpret; only the normal table escapes detection.

Note: Extension pages may be used for both row continuations and blob continuations, and there is no accurate way to determine which is which. The "Fragmentation!" query above isn't perfect, but it probably won't miss too many tables that are causing trouble, and it probably won't give too many false warnings.

Table Reorganization
One way to deal with table fragmentation is to periodically use the REORGANIZE TABLE statement; here is the syntax:

reorganize_table ::= "REORGANIZE TABLE" [ owner_name "." ] table_name

The REORGANIZE TABLE statement rebuilds all the table pages by deleting and reinserting rows. If a clustered index exists, it is used to determine the order in which the rows are inserted, otherwise the order is determined by the primary key. That means REORGANIZE TABLE won't work for a table that has neither a primary key nor a clustered index.

The REORGANIZE TABLE process works by deleting and reinserting rows in large groups while holding an exclusive lock on the table. When it finishes with a group of rows, it releases the lock to allow other processes to proceed if they've been blocked by the reorganization process. After a while the reorganization process will obtain the table lock again and proceed with the next group. This process won't completely kill performance for other connections, but it may reduce concurrency on a busy server. Therefore running REORGANIZE TABLE during the busiest time of the day might not be a good idea.

Tip: Execute the following statement just before starting a long-running REORGANIZE TABLE statement: SET TEMPORARY OPTION BACKGROUND_PRIORITY = 'ON'. This tells SQL Anywhere that this connection should give up control more often to allow other higher priority connections to get some work done. In fact, this tip isn't just for REORGANIZE TABLE, it applies to any connection that does a lot of work but doesn't need a rapid response.

The REORGANIZE TABLE statement does not cause any triggers to fire, and it does not write anything to the transaction log; from a logical point of view it isn't changing any data, just moving it around on different pages. It does cause the checkpoint log to grow, however, because database pages are changing and they must be written to the database file at the next checkpoint.

Here is an example of REORGANIZE TABLE followed by a call to p_table_fragmentation to show the effect:

REORGANIZE TABLE child;
CHECKPOINT;
SELECT * FROM p_table_fragmentation ( 'DBA' );

Table 5 shows what p_table_fragmentation reports for the child table, both before and after the REORGANIZE TABLE. The segments_per_row value drops to a perfect 1.0 indicating that all the row continuations have been eliminated. Also, the number of extension pages has dropped by half, with the remaining extension pages required for blob continuations.

Several techniques may be used to avoid table fragmentation in the first place, or at least minimize its effects:

  1. Don't insert empty rows and immediately update them with nonempty values. If the real values are available at the point of INSERT, use them.
  2. Avoid inserting NULL values in columns that will be updated later with non-NULL values. A NULL column value takes up no space at all, so updating it with any non-NULL value will always cause the row to grow in size.
  3. Avoid inserting short or empty strings if they will be updated later with longer values. Insert the real values, or pad the initial value with blanks.
  4. Do not repeatedly update large numbers of rows so they grow in size with each update. That kind of processing is almost guaranteed to result in a heavily fragmented table, and the update process will probably become slower and slower with each repetition, eventually running about as fast as continental drift.
  5. Specify a larger PCTFREE value for tables whose rows are likely to grow in size when short columns are updated.
  6. Place frequently used short columns near the front of the row so that even if a row split occurs, those columns will remain on the table page instead of moving way out on an extension page.
About the Book
SQL Anywhere Studio 9 Developer's Guide, ISBN 1-55622-506-7, is being published this summer by Wordware Publishing of Plano, Texas. It is available at Amazon and other booksellers; see www.amazon.com/exec/obidos/ASIN/1556225067/risingroad-20.

Other sections in Chapter 10 discuss file and index fragmentation, as well as request-level logging, the Index Consultant and Execution Profiler, the graphical plan, index design, and database performance counters. Other chapters cover creating tables, inserting, selecting, updating, deleting, fetching, synchronizing with MobiLink, packaging code, and protecting your database. For more information about the book see www.risingroad.com and www.wordware.com/computer.

© 2008 SYS-CON Media