SSIS DT_TEXT column contains alternating CHAR(0) characters

Peter Bishop 181 Reputation points
2025-03-10T16:13:40.14+00:00

I have an SSIS package that is pulling JSON data from a REST source. This was originally pulling data as (DT_STR, 8000, 1252) and writing to a [varchar] column but we have started getting a couple of values than will not fit.

I have changed the source to (DT_TEXT) and writing this to a [varchar] column. However, every other character in the resulting column is CHAR(0).

I've added a transform script task into the package to try and see what's going on and this is what I've found:

byte[] GetBytes(string str)
{
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
}

BlobColumn blobColumn = Row.jsonData;
System.Int32 blobLength = System.Convert.ToInt32(blobColumn.Length);
byte[] blobData = blobColumn.GetBlobData(0, blobLength);
string stringData = System.Text.Encoding.Unicode.GetString(blobData);
byte[] convertedData = GetBytes(stringData);
return convertedData;
  • The original column's content:
    blobData =
[0]: 91

  [1]: 0

  [2]: 13

  [3]: 0

  [4]: 10

  [5]: 0

  [6]: 32

  [7]: 0

  [8]: 32

  [9]: 0

  [10]: 123

  [11]: 0

  [12]: 13

  [13]: 0

  [14]: 10

  [15]: 0

  [16]: 32

  [17]: 0

  [18]: 32

  [19]: 0
```- **The string equivalent (all characters as expected):**  
stringData = "[\r\n  {\r\n  ..."

- **The character values, the ensure no hidden characters:**  
stringData.ToCharArray() =  

...


- **The value I started with:**  
    [0]: 91

      [1]: 0
  
      [2]: 13
  
      [3]: 0
  
      [4]: 10
  
      [5]: 0
  
      [6]: 32
  
      [7]: 0
  
      [8]: 32
  
      [9]: 0
  
      [10]: 123
  
      [11]: 0
  
      [12]: 13
  
      [13]: 0
  
      [14]: 10
  
      [15]: 0
  
      [16]: 32
  
      [17]: 0
  
      [18]: 32
  
      [19]: 0
  
So my question is, why does a DT_TEXT data type, written to a [varchar] column add these CHAR(0) characters and is there a way to stop it or remove them from with a stored procedure (where the data will be consumed)?

I'm guessing I'm missing something obvious but I thought DT_TEXT has to be written to a varchar(MAX) - and I've not seen these extra characters previously.

Thanks.

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
11,339 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Jiale Xue - MSFT 49,126 Reputation points Microsoft External Staff
    2025-03-11T05:28:45.2866667+00:00

    Hi @Peter Bishop , Welcome to Microsoft Q&A,

    Your problem should be caused by encoding. The DT_TEXT data type in SSIS is usually stored in Unicode (UTF-16) format. When you extract JSON data as DT_TEXT, it is most likely stored as UTF-16, where each character is represented by two bytes (hence there are empty CHAR(0) between each visible character)

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        BlobColumn blobColumn = Row.jsonData;
        int blobLength = Convert.ToInt32(blobColumn.Length);
        byte[] blobData = blobColumn.GetBlobData(0, blobLength);
    
        // Convert from UTF-16 (default) to UTF-8
        string stringData = Encoding.Unicode.GetString(blobData);  // SSIS default encoding
        byte[] utf8Bytes = Encoding.UTF8.GetBytes(stringData); // Convert to UTF-8
        string utf8String = Encoding.UTF8.GetString(utf8Bytes); // Ensure it's a proper UTF-8 string
    
        // Assign cleaned-up string back to the output column
        Row.CleanJsonData = utf8String;
    }
    
    

    Best Regards,

    Jiale


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". 

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.