Share via


SQL Server Driver for PHP Connection Options: CharacterSet

In this post, I’ll add to my slowly growing series that elaborates on the connection options for the SQL Server Driver for PHP. This time I’ll focus on the CharacterSet option. This is the option that specifies how data will be encoded when it is sent to and from the server.  I’ll take a look at how and when to use the different values for this option, as well as briefly look at what the driver is doing under the hood for each value. This is especially interesting in the case of storing UTF-8 encoded data since SQL Server only supports storing USC-2 encoded data.

One thing that helped me understand the CharacterSet option was to realize that its name is a bit misleading (although it seems to be inline with other uses of CharacterSet or charset). It is used to specify the encoding of data that is being sent to the server, not the character set. With that in mind, the possible values for the option begin to make sense: SQLSRV_ENC_CHAR, SQLSRV_ENC_BINARY, and UTF-8. The Connection Options topic in the official documentation will give you a clue about how to use the various settings, but I hope to elaborate on that here.

Note: If character encoding is somewhat mysterious to you, this article by Joel Spolsky may help to demystify things. It’s a bit old, but well written and still relevant.

To set the CharacterSet option on a connection, simply assign one of the 3 values listed above (SQLSRV_ENC_CHAR, SQLSRV_ENC_BINARY, or UTF-8) to the CharacterSet keyword in the connection options array. The following example shows how to specify UTF-8 encoding on a connection:

$serverName = "serverName\sqlexpress";
$connectionInfo = array( "Database"=>"DBName", "CharacterSet" =>"UTF-8" );
$conn = sqlsrv_connect( $serverName, $connectionInfo);

Note that this means encoding for all data on the open connection will be assumed to be UTF-8. Of course, if you want to send data with a different format on the same connection, you can. You will just need to specify the encoding at a more granular level (see the example in SQLSRV_ENC_BINARY below).

If you are using the PDO_SQLSRV driver, the CharacterSet option is not available when creating a connection object. However, you can achieve essentially the same functionality by using the PDO::setAttribute method after you have created a connection object. You can also determine the character set encoding when calling PDO::prepare, PDOStatement::setAttribute, PDOStatement::bindColumn, or PDOStatement::bindParam. For a list of constants used to set character encoding, see Constants in the driver documentation. Note that unlike the SQLSRV driver, the default character set encoding in the PDO_SQLSRV driver is UTF-8.

SQLSRV_ENC_CHAR (the default for php_sqlsrv)

This value indicates that data will be stored according to the server collation.  In most cases, which collation a database uses is determined by the Windows locale at the time of installation. If all of your data does not match your server’s collation you should not use this option. Instead, consider using the UTF-8 option (see below for details).

SQLSRV_ENC_BINARY

This value indicates that binary data (such as image data) is being sent to the server. Unless all the data going to the server on a given connection is binary, it is advisable not to set this at the connection level. Instead, it is better to specify that you are sending binary data at the query level. The following example shows how to specify the encoding for a query parameter. (Note that since this example assumes image data is being sent to the server, the encoding is specified within the specified PHP stream type, SQLSRV_PHPTYPE_STREAM.)

$sql = "INSERT INTO Production.ProductPhoto (LargePhoto)
         VALUES (?); SELECT SCOPE_IDENTITY() AS PhotoID";

$fileStream = fopen($_FILES['file']['tmp_name'], "r");

$params = array(
                array(&$fileStream,
                      SQLSRV_PARAM_IN,
                      SQLSRV_PHPTYPE_STREAM(SQLSRV_ENC_BINARY) ,
                      SQLSRV_SQLTYPE_VARBINARY('max'))
                );

$uploadPic = sqlsrv_prepare($conn, $sql, $params);

sqlsrv_execute($uploadPic);

 

UTF-8

This value indicates that UTF-8 encoded data is being sent to the server. If you are doing this, there are a few things to keep in mind:

  1. Make sure the data you are sending is UTF-8 encoded! This is a no-brainer, but is easy to overlook. If you are processing form data, you can (in most cases) set the accept-charset attribute of the form to “UTF-8”.
  2. Make sure that the source or destination column is of type nchar or nvarchar. SQL Server will pass data to these column types “as is”. For other column types (such as char or varchar), SQL Server interprets the data according to the server collation (which is largely dependent on the Windows locale of the machine).

Note that, as with binary data in the example above, you can specify that a particular parameter is UTF-8 encoded by setting its PHP type to SQLSRV_PHPTYPE_STRING(“UTF-8”).

Given that SQL Server actually only stores UCS-2 encoded data, it might be interesting to understand how the SQL Server Driver for PHP allows you to store and retrieve UTF-8 encoded data. When you indicate that incoming data is UTF-8 encoded (at either the connection level or the parameter level), the driver passes the data to the MultiByteToWideChar Windows function to convert the the data to UTF-16 encoded data (which, in most cases is the same as UCS-2 encoded data). Similarly, if you indicate that you want UTF-8 encoded data on the way out, the driver calls the WideCharToMuliByte function to perform the conversion in reverse.

That’s it for today. Hope this post was useful/interesting.

Thanks.

-Brian

Share this on Twitter

Comments

  • Anonymous
    August 04, 2011
    The comment has been removed
  • Anonymous
    August 05, 2011
    The comment has been removed
  • Anonymous
    October 05, 2011
    Are you aware of any tools or scripts that will convert data stored in 'default' character encoding into UTF-8?
  • Anonymous
    October 07, 2011
    @Neil: For text documents on a local filesystem, notepad can do the trick.Here's a powershell thing that looks like it would work as a scripted solution: stackoverflow.com/.../powershell-setting-encoding-for-get-content-pipelineHope that helps.-Brian