Don’t let documents and mixed media complicate your product!
For the last twenty-five years, I have faced the pain of needing to store documents and mixed media in a relational database management system. When you are designing or building your ISV product you have probably experienced the same pain.
By mixed media I mean images, video and audio files – not oils, inks and collage. In the jargon of the trade, they are called BLOBs or Binary Large Objects.
The earliest solution was simple: store the location of the file in the database. This solution still works fine for simple systems but often exhibits the following problems:
- File corruption is not easily detected.
- The ability to roll back to an earlier version of content is rarely implemented.
- Slow performance because:
- The operating system has to hunt for the file through levels of directory indices.
- No or low-intelligence caching by the operating system.
- File fragmentation.
- Orphaned (unreferenced) content.
- Efficient replication is difficult to implement.
There are a variety of solutions that kludge the files into the database – from the use of binary to text encoding and saving it into a TEXT column, to the use of IMAGE and VARBINARY(max). These solutions have problems with large content sizes (when the size exceeds 1 megabyte) and there is often added coding complexity. For a discussion of these issues see To BLOB or Not To BLOB: Large Object Storage in a Database or a File System.
With SQL Server 2008, we have an enhancement to data storage called FILESTREAM, which lets you store BLOBs data directly in the file system allowing the best of both approaches in most cases. Oracle has a similar feature called BFILE ; unfortunately the BFILE content is not captured in an Oracle backup, whereas FILESTREAM content is captured.
Bottom line: Parallel logic may be used for SQL Server 2008 and Oracle in your ISV product to handle BLOBs. The Oracle implementation needs an auxiliary component written to back up the files, and may have some complexities in doing a recovery because the database backups and file backups may not be in tight synchronization. SQL Server 2008 always retains synchronization between the BLOBs and the rest of the data in the database.
To learn more technical details:
- An excellent white paper has just published -- FILESTREAM Design and Implementation Considerations,
- An earlier technical article -- FILESTREAM Storage in SQL Server 2008.