Glossary |
Non-resident attributes are stored in intervals of clusters called runs. Each run is represented by its starting cluster and its length. The starting cluster of a run is coded as an offset to the starting cluster of the previous run.
Normal, compressed and sparse files are all defined by runs.
The examples start simple, then quickly get complicated.
This is a table written in the content part of a non-resident file attribute, which allows to have access to its stream.
NB Assume a 1KB cluster size, throughout. And little endian disk storage.
Therefore, Data1 is a unfragmented file, of size 0x18 clusters, starting at LCN 0x5634.
Therefore, Data2 is a fragmented file, of size 0x18E clusters, with data blocks at LCNs 0x342573, 0x363758 and 0x393802.
Therefore, Data3 is a badly fragmented file of size 0x60 clusters, with data blocks at LCNs 0x60, 0x160 and 0x140. Furthermore, the third block of data is physically located between the first and second blocks. (The third run has a negative offset, placing it before the previous run).
Therefore, Data4 is a sparse, unfragmented file, of size 0xA0 clusters, with data blocks at LCNs 0x20 and 0x50.
This file has a sparse part in the middle of size 0x60 clusters. It takes up no space on disk, but it it represented by 0x60 VCNs.
Therefore, Data5 is a compressed, unfragmented, file of length 0x30, with data blocks at LCNs 0x40, 0x48 and 0x58.
The data, as stored on disk, is contiguous. The sparse runs pad out the compression units to blocks of 16 clusters (0x10).
The runlist is a sequence of elements: each element stores an offset to the starting LCN of the previous element and the length in clusters of a run.
To save space, Offset and Length are variable size fields (probably up to 8 bytes), and an element is written in this crunched format:
Offset in nibble to the beginning of the element | Size | Description |
---|---|---|
0 | 1 | F=Size of the Offset field |
1 | 1 | L=Size of the Length field |
2 | 2*L | Length of the run |
2+2*L | 2*F | Offset to the starting LCN of the previous element |
The layout of the runlist must take account of the data compression: the set of VCNs containing the stream of a compressed file attribute is divided in compression units (also called chunks) of 16 clusters: VCNs 0 to 15 constitutes the 1st compression unit, VCNs 16 to 31 the 2nd one, and so on... For each compression unit,
Runlist: 21 14 00 01 11 10 18 11 05 15 01 27 11 20 05 Decode 0x14 at 0x100 21 0x100, 0x14 0x10 at + 0x18 11 0x18, 0x10 0x05 at + 0x15 11 0x15, 0x05 0x27 at + none 01 0x27, none 0x20 at + 0x05 11 0x05, 0x20 Absolute LCNs 0x14 at 0x100 0x10 at 0x118 0x05 at 0x12D 0x27 at none 0x20 at 0x132 Regroup 0x10 at 0x100 0x04 at 0x110 0x0C at 0x118 0x04 at 0x118 0x05 at 0x12D 0x07 at none 0x10 at none 0x10 at none 0x10 at 0x132 0x10 at 0x142 Compression unit beginning at VCN 0x0 0x10 clusters at LCN 0x100 Unit not compressed Compression unit beginning at VCN 0x10 0x4 clusters at LCN 0x110 0xC clusters at LCN 0x118 Unit not compressed Compression unit beginning at VCN 0x20 0x4 clusters at LCN 0x124 0x5 clusters at LCN 0x12D 0x7 unused clusters: compressed unit Compression unit beginning at VCN 0x30 0x10 zeroed clusters: sparse unit Compression unit beginning at VCN 0x40 0x10 zeroed clusters: sparse unit Compression unit beginning at VCN 0x50 0x10 clusters at LCN 0x132 Unit not compressed Compression unit beginning at VCN 0x60 0x10 clusters at LCN 0x142 Unit not compressed --------------------------------------------------- file.txt 31KB bytes (disk has a 1KB cluster size) it's stored at clusters 10-26, 45-49, 100-108 17 clusters at LCN 10 5 clusters at LCN 45 9 clusters at LCN 100 next make the offsets relative 17 clusters at LCN 10 5 clusters at LCN 45 9 clusters at LCN 100 is encoded as 11 working in unit of 16 clusters relative offsets (including -ve) compressed sparse variable length structures stored as: save space implies wherever MFT places data it's best not to spread it too far. -ve implies an offset of +129 would have to use two bytes therefore -10 = 0xF6 0x80 = -128 0XFF7F = -129 21 14 00 01 11 10 18 11 05 15 01 27 11 20 05
Length and starting cluster are variable size fields. The first byte of a run indicates the size of both. The size of the offset is stored in the high nibble, and the size of the length in the low nibble.
For compressed or sparse runs, the offset is 0, and the size of the offset is also 0. Each compression unit starts at a multiple of 16 clusters. If compression is possible, at the VCN of a unit will be one or more data runs followed by an empty run. If there are data runs for more than 16 clusters, the unit was not compressible. If there is no data run at all (only a large empty run), the unit Consists of All zeroes.
Example: 21 20 ED 05 22 48 07 48 22 21 28 C8 DB First run: 20 clusters starting from 5ED (5ED to 60D) 2nd run: 748 clusters starting from 5ED+2248 (2835 to 2F7D) 3rd run: 28 clusters starting from 2835+DBC8 (3FD to 425)
Note that the offset is interpreted as signed value.
Take a file of size 0x80 clusters (anywhere on disk). This is represented by VCN (virtual cluster numbers) 0x00 to 0x7F. These VCNs are mapper to LCN (logical cluster numbers) in runs (or extents), eg 21 80 30 60 00.
These runs are variable length, terminated with a zero. The low nibble of the first byte determines the length of the next number (1 byte) namely 80. The high nibble determines the length of the following number (2 bytes) namely 6030.
Outcome: 80 clusters, starting at cluster 6030.
The "sizes" are stored in one byte. The length is unsigned. The offset is signed and relative to the previous offset.
11 30 60 - 21 10 00 01 - 11 20 E0 - 00
Run 1 length 30 offset 60 (first run relative to 0) Run 2 length 10 offset 100 + 60 Run 3 length 20 offset 160 - 20 (EO == -20) -- 80
Files are represented by a set of VCNs. Sparse files, simply, have VCNs missing, eg
21 09 F5 47 9 clusters from 47F5 01 07 7 clusters from nowhere (0) 11 07 09 7 clusters from 47F5 + 9 ---- 0x17 123456789ABCDEFG1234... VCN RRRRRRRRRZZZZZZZRRRR... Real/Zero
Compresses files are first broken into blocks of 16 (0x10) clusters. Imagine:
VCN0123... XXXXXXXXXXOOOOO X=DATA O=SPACE
The data is compressed, here, into just ten clusters (If we can't save 1 cluster in 16, we don't bother) The above is coded as:
21 0A 10 F6 10 clusters of compressed data at F610 01 06 6 clusters of nothing to round up this block to 16
The 6 extra clusters aren't actually taking up any disk space. The VCNs are bunched into 16s. {{ If a block cannot be compressed, it would be represented by:
21 10 10 F6 16 clusters of compressed data at F610
FIXME: In fact, life is more complicated because adjacent entries of the same type can be coalesced. This means that one has to keep track of the number of clusters handled and work on a basis of X clusters at a time being one block. An example: if length L > X this means that this particular run list entry contains a block of length X and part of one or more blocks of length L - X. Another example: if length L > X, this does not necessarily mean that the block is compressed as it might be that the lcn changes inside the block and hence the following run list entry describes the continuation of the potentially compressed block. The block would be compressed if the following run list entry describes at least X - L sparse clusters, thus making up the compression block length as described in point 3 above. (Of course, there can be several run list entries with small lengths so that the sparse entry does not follow the first data containing entry with length < X.) NOTE: At the end of the compressed attribute value, there most likely is not just the right amount of data to make up a compression block, thus this data is not even attempted to be compressed. It is just stored as is.
Compressed and sparse runs can be intermixed. All this to save space.