Microsoft SQL Server supports different compression types for tables and indexes, and also supports archival compression for them. The data compression feature in SQL Server is very helpful in reducing the size of the database file. Additionally, data compression is helpful in improving performance of I/O intensive workloads because the data is stored in fewer pages and queries need to read fewer pages from disk.
Different types of SQL Server compression used for MDF database:
Non-leaf levels of pages in database are compressed with ROW-level compression. Data rows are put on the page in a serial wise, starting immediately after the header. The maximum amount of data and overhead contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting from the column with the largest width.
When Row level compression is applied, the following changes take place:
- It reduces the metadata overhead that is associated with the record.
- It uses variable-length storage format for numeric types (for example integer, decimal, and float) and the types that are based on numeric (for example date & time and money).
- It stores fixed character strings by using variable-length format by not storing the blank characters.
Page compression consists of three different operations in the following order:
- Row Compression
- Prefix Compression
- Dictionary Compression
Row Compression: When you use page compression, non-leaf level pages of indexes are compressed by using only Row compression.
Prefix Compression: For each page that is being compressed, prefix compression uses the following steps:
- For each column, a value is identified that can be used to reduce the storage space for the values in each column.
- A row that represents the prefix values for each column is created and stored in the compression information (CI) structure that immediately follows the page header.
- The repeated prefix values in the column are replaced by a reference to the corresponding prefix. If the value in a row does not exactly match the selected prefix value, a partial match can still be indicated.
Dictionary Compression: After prefix compression has been completed, dictionary compression is applied on the database. Dictionary compression searches for repeated values anywhere on the page, and stores them in the CI area. Unlike prefix compression, dictionary compression is not restricted to one column. Dictionary compression can replace repeated values that occur anywhere on a page.
SQL server compresses Unicode values that are stored in row or page compressed objects. The Database Engine stores Unicode data as 2 bytes, regardless of locale. This is known as UCS-2 encoding. Unicode compression supports the fixed-length nchar (n) and nvarchar (n) data types. Data values that are stored off row or in nvarchar (max) columns are not compressed with Unicode compression.