Compartir a través de


3.1.4.1.1.2.2.4 Match Length

Unlike the metadata offset, the match length is extensible. If the length is less than 10 bytes, it is encoded in the 3 low-order bits of the 2-byte metadata. Although 3 bits seems to allow for a maximum length of 6 (the value b'111' is reserved), because the minimum match is 3 bytes, these 3 bits actually allow for the expression of lengths from 3 to 9. The match length goes from L = b'000' + 3 bytes, to L = b'110' + 3 bytes. Because smaller lengths are much more common than the larger lengths, the algorithm tries to optimize for smaller lengths. To encode a length between 3 and 9, we use the 3 bits that are "in-line" in the 2-byte metadata.

If the length of the match is greater than 9 bytes, an initial bit pattern of b'111' is put in the 3 bits. This does not signify a length of 10 bytes, but instead a length that is greater than or equal to 10, which is included in the low-order nibble of the following byte.

Every other time that the length is greater than 9, an additional byte follows the initial 2-byte metadata. The first time that the additional byte is included, the low-order nibble is used as the additive length. The next nibble is "reserved" for the next metadata instance when the length is greater than 9. Therefore, the first time that the decoder encounters a length that is greater than 9, it reads the next byte from the data stream and the low-order nibble is extracted and used to compute length for this metadata instance. The high-order nibble is remembered and used the next time that the decoder encounters a metadata length that is greater than 9. The third time that a length that is greater than 9 is encountered, another extra byte is added after the 2-byte metadata, with the low-order nibble used for this length and the high-order nibble reserved for the fourth length that is greater than 9, and so on.

If the nibble from this "shared" byte is all 1s (for example, b'1111'), another byte is added after the shared byte to hold more length. In this manner, a length of 24 is encoded as follows:

b'111' (in the 3 bits in the original 2 bytes of metadata), plus

b'1110' (in the nibble of the 'shared' byte of extended length)

b'111' means 10 bytes plus b'1110', which is 14, which results in a total of 24.

If the length is more than 24, the next byte is also used in the length calculation. In this manner, a length of 25 is encoded as follows:

b'111' (in the 3 bits in the original 2 bytes of metadata), plus

b'1111' (in the nibble of the 'shared' byte of extended length), plus

b'00000000' (in the next byte)

This scheme is good for lengths of up to 279 (a length of 10 in the 3 bits in the original 2 bytes of metadata, plus a length of 15 in the nibble of the "shared" byte of extended length, plus a length of up to 254 in the extra byte).

A "full" (all b'1') bit pattern (b'111', b'1111', and b'11111111') means that there is more length in the following 2 bytes.

The final 2 bytes of length differ from the length information that comes earlier in the metadata. For lengths that are equal to 280 or greater, the length is calculated only from these last 2 bytes and is not added to the previous length bits. The value in the last 2 bytes, a 16-bit integer, is 3 bytes less than the metadata length. These last 2 bytes allow for a match length of up to 65,535 bytes + 3 bytes (the minimum match length).

The following table summarizes the length representation in metadata.

Note Length is computed from the bits that are included in the metadata plus the minimum match length of 3.

Length representation in metadata

Match length

Length bits in the metadata

24

b'111' (3 bits in the original 2 bytes of metadata)

+

b'1110' (in the high- or low-order nibble, as appropriate, of the shared byte)

25

b'111' (3 bits in the original 2 bytes of metadata)

+

b'1111' (in the high- or low-order nibble, as appropriate, of the shared byte)

+

b'00000000' (in the next byte)

26

b'111' (3 bits in the original 2 bytes of metadata)

+

b'1111' (in the high- or low-order nibble, as appropriate, of the shared byte)

+

b'00000001' (in the next byte)

279

b'111' (3 bits in the original 2 bytes of metadata)

+

b'1111' (in the high- or low-order nibble, as appropriate, of the shared byte)

+

b'11111110' (in the next byte)

280

b'111' (3 bits in the original 2 bytes of metadata)

b'1111' (in the high- or low-order nibble, as appropriate, of the shared byte)

b'11111111' (in the next byte)

0x0115 (in the next 2 bytes). These 2 bytes represent a length of 277 + 3 (minimum match length).

Note All the length is included in the final 2 bytes and is not additive, as were the previous length calculations for lengths that are smaller than 280 bytes.

281

b'111' (3 bits in the original 2 bytes of metadata)

b'1111' (in the high- or low-order nibble, as appropriate, of the shared byte)

b'11111111' (in the next byte)

0x0116 (in the next 2 bytes). This is 278 + 3 (minimum match length).

Note All the length is included in the final 2 bytes and is not additive, as were the previous length calculations for lengths that are smaller than 280 bytes.

A "full" bit pattern in that last half word does not mean that more metadata is coming after the last bytes.

The LZ77 compression algorithm produces a well-compressed encoding for small valued lengths, but as the length increases, the encoding becomes less well compressed. A match length of greater than 279 bytes requires a relatively large number of bits: 3 + 4 + 8 + 16. This includes 3 bits in the original 2 bytes of metadata, 4 bits in the nibble in the "shared" byte, 8 bits in the next byte, and 16 bits in the final 2 bytes of metadata.