Freigeben über


SYSK 150: RTF2PlainText

The other day I needed to display an RTF text stored in a database in a SQL Reporting Services report…  After doing some research, the team came to a conclusion that there is no easy way to do that, especially given the time we had left on the project.  The decision was made to strip-off the RTF formatting characters and display it as plain text.  A co-worker of mine, Scott Jennings, a.k.a. the “RegEx-Man”, came up with the following expression to do the job:

Regex.Replace(Regex.Match(rtf, @"\x5cviewkind4[^ ]*(.+)\x5cpar").Groups[1].Value, @"[\n\r\f]|(\x5cpar)|(\x5c[a-zA-Z0-9]+)", "");

 

We’ve done some preliminary testing and it looks good so far…  However, if you come up with any improvements, kindly post them to this blog so all could benefit.

Comments

  • Anonymous
    July 21, 2006
    The comment has been removed
  • Anonymous
    July 21, 2006
    I tried your RegEx expression but got an empty string back. I also tried variations but could not fix it (my RegEx level is still 101;)

    I am using it exactly as posted. Do you have a clue what could be wrong?
  • Anonymous
    July 24, 2006
    Our original plan was to use a .NET class that's invoked on the server side by SSRS report... But then we figured out that you can use it directly from SSRS...  
  • Anonymous
    July 24, 2006
    Post your code and I'll take a look...
  • Anonymous
    July 24, 2006
    Thanks. Probably a stupid mistake (I am much more fluent in SQL than C#)

    here goes:

           [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None, IsPrecise = true)]
           [return: SqlFacet(MaxSize = -1, IsFixedLength = false, IsNullable = true)]
           public static SqlString RTF2Text([SqlFacet(MaxSize = -1, IsFixedLength = false, IsNullable = true)] SqlString input)
           {
               if (input.IsNull)
                   return SqlString.Null;
               else
               {
                   String rtf;
                   rtf = input.Value.ToString();

                   return (SqlString)Regex.Replace(Regex.Match(rtf, @"x5cviewkind4[^ ]*(.+)x5cpar").Groups[1].Value, @"[nrf]|(x5cpar)|(x5c[a-zA-Z0-9]+)", "");
                   
               }
  • Anonymous
    July 24, 2006
    Try passing in the following RTF as input to the proc: '{rtf1ansiansicpg1252deff0deflang1033{fonttbl{f0fnilfcharset0 Microsoft Sans Serif;}}viewkind4uc1pardf0fs17 Testpar}'

    If it returns 'Test', then it's functionning as intended.  This RegEx was designed to convert RTF created by .NET's RichTextBox control to plain text.  It is likely that RTF from Microsoft Word would not work with this RegEx, so if that's what you're using in your tests, that would be the reason...

    Just FYI -- there is no need to call ToString on input.Value since the returned value is of type String (no harm, but no need).

    IMPORANT: I would not recommend doing this kind of CPU intensive processing on the SQL Server; consider putting it on your application tier instead.
  • Anonymous
    July 25, 2006
    Your example works. Thanks.

    Indeed I was testing with rtf produced by Office, but intended to use it .NET's RichTextBox control. And the application tier will do the work most of the time, but having it server side too is sometimes useful. A mirror server is often used for reports and a view can expose raw text if necessary for 3rd party reporting tools. Too bad there is no standard control in VS to support html. Today, it would a much better choice to store xhtml fragments than rtf.

    Thanks again for your help
  • Anonymous
    March 07, 2007
    Hi this expression returns empty... could you check with Scott what's happening ? {rtf1ansideff0{fonttbl{f0fnilfcharset0 MS Sans Serif;}{f1fnil MS Sans Serif;}} {colortbl ;red0green0blue0;} viewkind4uc1pardcf1lang2067f0fs16 RENOVEREN BRIEVENBUS - BEKLEDING BRIEVENBUSKLEP IS WEG par VERVANGEN KLEP, ONTROESTEN EN HERSPUITEN VAN DE GEVERFDE METALEN DELEN par SCHOONMAKEN VAN DE INOX BUITENPLAAT par ZIE FOTO'Sf1 par }