Jaa


6 New ESENT features in Windows 7

A quick look at some of the new features that are available in the Windows 7 version of ESENT. You'll need esent.h from the Windows 7 SDK to see these definitions:

1. Column Compression

Columns of type JET_coltypLongBinary and JET_coltypLongText can now be compressed. There are two ways of enabling compression.

  • Create the column with the JET_bitColumnCompressed flag. This can only be used with long-value columns, otherwise ESENT will return JET_errColumnCannotBeCompressed.
  • Set the column with JET_bitSetCompressed. This flag can be used with any column type, we just ignore it for non-compressable columns. Alternatively you can use JET_bitSetUncompressed when updating a compressed column with data that you don't want compressed.

How does compression work?

For 'short' columns (less than 1kb in size) we look to see if all the data is 7-bit ASCII characters. If so we compress each character to its 7-bit value. This is the only compression done for short values. It was added to help with the fact that Exchange stores a lot of strings which are often just 7-bit ASCII values, e-mail headers for example. Turning a 16-bit Unicode character to a 7-bit ASCII value gives a large savings and is cheap to do.

For 'long' columns (greater than 1kb) we apply Microsoft's XPRESS compression, which is similar to ZIP or RAR compression. We only store the compressed data if it is smaller than the uncompressed data. Of course determining that data doesn't compress well costs CPU cycles so if you are storing already compressed data you shouldn't use compressed columns.

Unlike SQL Server's page compression this feature is designed to compress large column values, not remove duplication when multiple records have similar values. This is done because this feature was added for Microsoft Exchange, where most of the data is stored in relatively large BLOB columns (e.g. e-mail bodies).

What else do I need to know?

Nothing, really. A compressed column looks just like a normal column and can be indexed, set and retrieved as normal. There is no way to retrieve the compressed data, you will always get uncompressed data when calling JetRetrieveColumn or a similar API. If you want to know how well a record compressed use the new JetGetRecordSize2 API which gives both the logical (uncompressed) and physical (compressed) size of a record.

2. 32kb and 16kb page support

ESENT now supports 32kb, 16kb and 2kb pages. Changing the page size affects b-tree performance and you may find your application runs faster with bigger pages. Bigger pages also allow more columns to be set in one record before getting JET_errRecordTooLarge.

3. JetPrereadKeys

If you want to retrieve non-contiguous records quickly you can use JetMakeKey to calculate their keys and then call JetPrereadKeys. Prereading the records will issue the I/O as efficiently as possible, which makes retrieving the records a LOT faster if I/O is involved. Records must be preread in either ascending or descending key order. JetPrereadKeys may not be able to preread all the records so it tells you how many keys were preread so that you can issue a preread call for the rest after processing the initial batch.

4. Space Hints

In older versions of ESENT you could specify the initial size of a table/index, but didn't have much control over how the table size grew. Space hints allow you to give the initial size of a table/index and then indicate how much space the table should use as it grows. This feature is designed to allow databases with large numbers of tables where the tables start out small (perhaps one page) and then grow their allocation size so that they have contiguous space.

5. JET_paramWaypointLatency and JET_bitReplayIgnoreLostLogs

Write-ahead logging depends on the underlying storage being durable, which isn't really true for inexpensive desktop-class machines. To improve database recoverability we have added a 'Lost Log Resiliency' (LLR) feature to the Windows 7 version of ESENT. Setting the JET_paramWaypointLatency parameter to a non-zero value will enable the LLR feature, causing ESENT to delay writes to the database until N logfiles have been generated, where N is the waypoint depth.

For example, if the waypoint is set to 1 and the current log generation is 28 then ESENT will not write any pages to the database that have been modified by log records in generation 28. This means that the database can be recovered without the most recent log but there will be data loss! If you are comfortable with some potential data loss and want to emphasise database recovery then set JET_paramWaypointLatency to 1 (or a small value, there is a performance penalty for maintaining the waypoint). You have to use the JET_bitReplayIgnoreLostLogs option with JetInit2 to tell ESENT that it is OK to perform a lossy recovery.

This will allow you to recover your database even if a system crash loses the most recent log. In our power-down testing this improves the chances of a sucessful database recovery after a machine with non-battery-backed write caching is powered down.

6. JET_bitTermDirty

If you want to shutdown REALLY quickly you can use JET_bitTermDirty, which won't even flush the database cache. Recovery will take longer the next time but no data will be lost.

I know this was a very quick summary. If you have specific questions, leave them in the comments or use the 'contact me' area and I'll try to answer them or provide samples.

Comments

  • Anonymous
    March 07, 2011
    Thanks for the details! It's a shame much of this information is missing from the MSDN documentation...