Glossary

Concept - Data Runs

Previous Next

Overview

Non-resident attributes are stored in intervals of clusters called runs. Each run is represented by its starting cluster and its length. The starting cluster of a run is coded as an offset to the starting cluster of the previous run.

Normal, compressed and sparse files are all defined by runs.

The examples start simple, then quickly get complicated.

This is a table written in the content part of a non-resident file attribute, which allows to have access to its stream.

NB Assume a 1KB cluster size, throughout. And little endian disk storage.


Example 1 - Normal, Unfragmented File

Data runs:

Run 1:

Run 2:

Summary:

Therefore, Data1 is a unfragmented file, of size 0x18 clusters, starting at LCN 0x5634.


Example 2 - Normal, Fragmented File

Data runs:

Run 1:

Run 2:

Run 3:

Run 4:

Summary:

Therefore, Data2 is a fragmented file, of size 0x18E clusters, with data blocks at LCNs 0x342573, 0x363758 and 0x393802.


Example 3 - Normal, Scrambled File

Data runs:

Run 1:

Run 2:

Run 3:

Run 4:

Summary:

Therefore, Data3 is a badly fragmented file of size 0x60 clusters, with data blocks at LCNs 0x60, 0x160 and 0x140. Furthermore, the third block of data is physically located between the first and second blocks. (The third run has a negative offset, placing it before the previous run).


Example 4 - Sparse, Unfragmented File

Data runs:

Run 1:

Run 2:

Run 3:

Run 4:

Summary:

Therefore, Data4 is a sparse, unfragmented file, of size 0xA0 clusters, with data blocks at LCNs 0x20 and 0x50.

This file has a sparse part in the middle of size 0x60 clusters. It takes up no space on disk, but it it represented by 0x60 VCNs.


Example 5 - Compressed, Unfragmented File

Data runs:

Run 1:

Run 2:

Run 3:

Run 4:

Run 5:

Run 6:

Summary:

Therefore, Data5 is a compressed, unfragmented, file of length 0x30, with data blocks at LCNs 0x40, 0x48 and 0x58.

The data, as stored on disk, is contiguous. The sparse runs pad out the compression units to blocks of 16 clusters (0x10).


Example 6 - Compressed, Sparse, Fragmented File

brain damaged file

Layout

The runlist is a sequence of elements: each element stores an offset to the starting LCN of the previous element and the length in clusters of a run.

To save space, Offset and Length are variable size fields (probably up to 8 bytes), and an element is written in this crunched format:

Offset in nibble to the beginning of the element Size Description
0 1 F=Size of the Offset field
1 1 L=Size of the Length field
2 2*L Length of the run
2+2*L 2*F Offset to the starting LCN of the previous element
Offset to the starting LCN of the previous element
This is a signed value. For the first element, consider the offset as relative to the LCN 0, the beginning of the volume.

The layout of the runlist must take account of the data compression: the set of VCNs containing the stream of a compressed file attribute is divided in compression units (also called chunks) of 16 clusters: VCNs 0 to 15 constitutes the 1st compression unit, VCNs 16 to 31 the 2nd one, and so on... For each compression unit,

In practice, this is a bit more complicated because some of the element can be gathered. But let's take an ...

...Example

We have to decode the following runlist:
Runlist:
    21 14 00 01 11 10 18 11 05 15 01 27 11 20 05

Decode
    0x14   at   0x100   21 0x100, 0x14
    0x10   at  + 0x18   11  0x18, 0x10
    0x05   at  + 0x15   11  0x15, 0x05
    0x27   at  + none   01  0x27, none
    0x20   at  + 0x05   11  0x05, 0x20

Absolute LCNs
    0x14   at   0x100
    0x10   at   0x118
    0x05   at   0x12D
    0x27   at   none
    0x20   at   0x132

Regroup
    0x10   at   0x100

    0x04   at   0x110
    0x0C   at   0x118

    0x04   at   0x118
    0x05   at   0x12D
    0x07   at   none

    0x10   at   none

    0x10   at   none

    0x10   at   0x132

    0x10   at   0x142

Compression unit beginning at VCN 0x0
 0x10 clusters at LCN 0x100
 Unit not compressed

Compression unit beginning at VCN 0x10
 0x4 clusters at LCN 0x110
 0xC clusters at LCN 0x118
 Unit not compressed

Compression unit beginning at VCN 0x20
 0x4 clusters at LCN 0x124
 0x5 clusters at LCN 0x12D
 0x7 unused clusters: compressed unit

Compression unit beginning at VCN 0x30
 0x10 zeroed clusters: sparse unit

Compression unit beginning at VCN 0x40
 0x10 zeroed clusters: sparse unit

Compression unit beginning at VCN 0x50
 0x10 clusters at LCN 0x132
 Unit not compressed

Compression unit beginning at VCN 0x60
 0x10 clusters at LCN 0x142
 Unit not compressed

---------------------------------------------------

file.txt 31KB bytes (disk has a 1KB cluster size)

it's stored at clusters 10-26, 45-49, 100-108

17 clusters at LCN 10
5  clusters at LCN 45
9  clusters at LCN 100

next make the offsets relative

17 clusters at LCN 10
5  clusters at LCN 45
9  clusters at LCN 100

is encoded as

11

working in unit of 16 clusters
relative offsets (including -ve)
compressed sparse
variable length structures
stored as:
save space implies wherever MFT places
data it's best not to spread it too far.

-ve implies an offset of +129 would have to use two bytes
therefore -10 = 0xF6
0x80 = -128
0XFF7F = -129

21 14 00 01 11 10 18 11 05 15 01 27 11 20 05

data runs

Length and starting cluster are variable size fields. The first byte of a run indicates the size of both. The size of the offset is stored in the high nibble, and the size of the length in the low nibble.

For compressed or sparse runs, the offset is 0, and the size of the offset is also 0. Each compression unit starts at a multiple of 16 clusters. If compression is possible, at the VCN of a unit will be one or more data runs followed by an empty run. If there are data runs for more than 16 clusters, the unit was not compressible. If there is no data run at all (only a large empty run), the unit Consists of All zeroes.

    Example: 21 20 ED 05 22 48 07 48 22 21 28 C8 DB
    First run: 20 clusters starting from 5ED (5ED to 60D)
    2nd run: 748 clusters starting from 5ED+2248 (2835 to 2F7D)
    3rd run: 28 clusters starting from 2835+DBC8 (3FD to 425)
    

Note that the offset is interpreted as signed value.

Take a file of size 0x80 clusters (anywhere on disk). This is represented by VCN (virtual cluster numbers) 0x00 to 0x7F. These VCNs are mapper to LCN (logical cluster numbers) in runs (or extents), eg 21 80 30 60 00.

These runs are variable length, terminated with a zero. The low nibble of the first byte determines the length of the next number (1 byte) namely 80. The high nibble determines the length of the following number (2 bytes) namely 6030.

Outcome: 80 clusters, starting at cluster 6030.

The "sizes" are stored in one byte. The length is unsigned. The offset is signed and relative to the previous offset.

11 30 60 - 21 10 00 01 - 11 20 E0 - 00

    Run 1 length 30 offset 60 (first run relative to 0)
    Run 2 length 10 offset 100 + 60
    Run 3 length 20 offset 160 - 20 (EO == -20)
                 --
                 80
    

Files are represented by a set of VCNs. Sparse files, simply, have VCNs missing, eg

    21 09 F5 47  9 clusters from 47F5
    01 07        7 clusters from nowhere (0)
    11 07 09     7 clusters from 47F5 + 9
      ----
      0x17

    123456789ABCDEFG1234... VCN
    RRRRRRRRRZZZZZZZRRRR... Real/Zero
    

Compresses files are first broken into blocks of 16 (0x10) clusters. Imagine:

    VCN0123...
       XXXXXXXXXXOOOOO  X=DATA O=SPACE
    

The data is compressed, here, into just ten clusters (If we can't save 1 cluster in 16, we don't bother) The above is coded as:

    21 0A 10 F6   10 clusters of compressed data at F610
    01 06         6 clusters of nothing to round up this block to 16
    

The 6 extra clusters aren't actually taking up any disk space. The VCNs are bunched into 16s. {{ If a block cannot be compressed, it would be represented by:

    21 10 10 F6   16 clusters of compressed data at F610
    
    FIXME:
    In fact, life is more complicated because adjacent entries of the same type
    can be coalesced. This means that one has to keep track of the number of
    clusters handled and work on a basis of X clusters at a time being one
    block. An example: if length L > X this means that this particular run list
    entry contains a block of length X and part of one or more blocks of length
    L - X. Another example: if length L > X, this does not necessarily mean that
    the block is compressed as it might be that the lcn changes inside the block
    and hence the following run list entry describes the continuation of the
    potentially compressed block. The block would be compressed if the
    following run list entry describes at least X - L sparse clusters, thus
    making up the compression block length as described in point 3 above. (Of
    course, there can be several run list entries with small lengths so that the
    sparse entry does not follow the first data containing entry with
    length < X.)

    NOTE: At the end of the compressed attribute value, there most likely is not
    just the right amount of data to make up a compression block, thus this data
    is not even attempted to be compressed. It is just stored as is.
    

Compressed and sparse runs can be intermixed. All this to save space.


Copyright ©