Importing CHAR() fields containing null values from DB2 to SQL Server via SSIS
Earlier this year, I was working with a customer who was doing a mainframe migration from DB2 to SQL Server, and they were using SSIS to import the tables.
Several of their columns were CHAR data types, with a null byte as the first byte of the field. The columns, when imported, all contained empty character strings. Also, when using t-sql from a linked server, the import was also failing.
For testing purposes, I setup a simple table as such, in DB2 (all columns are CHAR(10) in this table:
Row 1:
Column 1 = “Bob”
Column 2 = “[null]Bob”
Column 3 = “Bob[null]”
Row 2:
Column 1 = “Bob”
Column 2 = “[null][null]Bob”
Column 3 = “Bob[null] [null]”
Now, as we all know, NULL is a 'string terminator'. So when SSIS or our data provider would see the null, the data was truncated at that point. We were able to provide a fix to correct the issue with the data provider truncating the null, so that t-sql scripts similar to the below would work:
insert into localtable (select * from linkedserver.database.schema.table)
The above would be fine if you only had a few rows, but if you had millions of rows, a performance hit. And when using SSIS to import the data, even though we passed the correct string to SSIS, it was still terminating the string at the null.
Through some creative scripting with the help of some SSIS Developers, we were able to come up with a workaround that was acceptable to the customer for their migration project.
First, you would need to create a temporary database in SQL, using the TEXT data type for the columns containing NULLs, similar to this:
CREATE TABLE [DB2TestDatabases].[dbo].[TEMPDATABASE]
([C1] char(10),
[C2] text,
[C3] text)
Then, in your source query for SSIS, use a query similar to this:
Select
C1,
cast(c2 as char(10) for bit data) as C2,
cast(c3 as char(10) for bit data) as C3
from schema .table
The above query tells DB2 to cast columns C2 and C3 as binary data.
Next, you need to create a scripting component in the SSIS package.
In the script component, select C2 and C3 as the input columns. No need, in this case, to use C1 as it does not contain nulls, and can be passed directly through.
Next, create 2 output columns. I called mine (not being very creative) newcolumn2 and newcolumn3. Set these columns to be DT_TEXT data types, since this is what we are going to be writing to the SQL Server Database.
Then, in the script editor, add code similar to this:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
'
' Add your code here
'
Dim i As Integer
Dim C2bytearray(UBound(Row.C2)) As Byte
For i = 0 To UBound(Row.C2)
C2bytearray(i) = Row.C2(i)
Next
Row.NewColumn2.AddBlobData(C2bytearray)
Dim C3Bytearray(UBound(Row.C3)) As Byte
For i = 0 To UBound(Row.C3)
C3Bytearray(i) = Row.C3(i)
Next
Row.NewColumn3.AddBlobData(C3Bytearray)
End Sub
The above was not the original code, but proved to be much faster. It takes the binary data and converts it to a format acceptable for the TEXT data type. The original code I uses was a bit slower, converting each byte from binary to CHR, inserting into a STRING, then assigning the string to the new columns.
After running the package, you can verify the import by running the following t-sql script (you have to double cast c2 and c3 to read the text data as binary to actually see that the nulls are present.
select cast(c1 as varbinary(10)) as C1,
cast(cast(c2 as char(10)) as varbinary(10)) as C2,
cast(cast(c3 as char(10)) as varbinary(10)) as C3
from dbo.TEMPDATABASE
This returns:
0x426F6220202020202020 0x00426F62202020202020 0x426F6200202020202020
0x426F6220202020202020 0x0000426F622020202020 0x426F6200002020202020
Notice in the 2nd columns the nulls(00) before the ‘42’ and in the 3rd column the nulls after the ‘62’.
Now, the only issue left here is that the data in the temp database is a TEXT data type, not char. So, create the final table:
CREATE TABLE [dbo].[newTable](
[C1] [char](10) NULL,
[C2] [char](10) NULL,
[C3] [char](10) NULL
)
Then run the following t-sql:
insert into dbo.newTable
select * from dbo.TEMPDATABASE
Once this completes, the data in the newTable contains the null char values as was originally needed:
select cast(c1 as varbinary(10)) as C1,
cast(c2 as varbinary(10)) as C2,
cast(c3 as varbinary(10)) as C3
from dbo. newTable
This returns:
0x426F6220202020202020 0x00426F62202020202020 0x426F6200202020202020
0x426F6220202020202020 0x0000426F622020202020 0x426F6200002020202020
This is not the optimal solution, but I currently do not have a better method.
Finally, if the data types in the original DB2 database had been CHAR() FOR BIT DATA (binary), this would not have been an issue, as you could have directly mapped the columns to VarBinary() in SQL and the import would have worked properly.
If anyone knows of another method, I would love to hear about it! And hope this helps someone else in a similar situation.
Comments
- Anonymous
October 30, 2008
PingBack from http://blog.a-foton.ru/index.php/2008/10/30/importing-char-fields-containing-null-values-from-db2-to-sql-server-via-ssis/