How to Store Different Languages Data in the Same Database and Analyse them Effectively
Reference # http://social.msdn.microsoft.com/Forums/sqlserver/en-US/d6da2336-5208-4d1c-9406-dcccbb003dca/collation
Today we got an interesting question from the user, where he is trying to store multiple collations consisting of multiple languages in the same database and he needs to manage it, i.e. run sort, search on top of the same fields.
A re-cap on collation might be good over here, a collation of a particular column / table / schema / database dictates the sort, case and accent sensitivity of the data stored in the char and varchar columns in SQL Server databases.
So, what if we want to store multiple languages in the same database? The easiest approach is we will create one table per language and store the language specific info in that table in the nvarchar / nchar columns, or if the languages are few (1 to 10 ). We can add columns with different collations to the same table and add logic in the application layer to populate the appropriate field.
Now, what if we want to sort the data and analyse this data on a set of rules? Remember, by converting collations from one to another, SQL Server does not translate the text from one language to another for us.
We can use a free API from Microsoft for translating, the Microsoft Translator API. It has various interfaces. More of the same can be found here http://www.microsofttranslator.com/dev/.
By using this interface we can convert foreign language data into the native language of the country and store them in a different column in the same table.
This new derived column can be used for sorting and analysis purposes down the road more effectively than the alternatives.
Points to note:
Although the solution seems to be good, there are a few drawbacks. The translator may not be able to translate all the words, in some cases, when there is a spelling mistake it may not be able to translate the input entirely.
And the data sent to the translator will be sent to Microsoft servers, so this solution may not be advisable where the data should be behind the red tape at all the time.