CStdioFile::ReadString does not correctly read Unicode text file

semicode 40 Reputation points
2024-12-15T01:29:13.58+00:00

While reading a UTF-8 text file in a Unicode C++ build, the CStdioFile::ReadString method fails to read certain Unicode characters correctly. For example, “John W. Gates” Day, is read as “John W. Gates” Day.

In memory I see this:

0x000001B291ADC258 e2 00 80 00 9c 00 4a 00 6f 00 68 00 6e 00 20 00 â.€.œ.J.o.h.n. .

0x000001B291ADC268 57 00 2e 00 20 00 47 00 61 00 74 00 65 00 73 00 W... .G.a.t.e.s.

0x000001B291ADC278 e2 00 80 00 9d 00 20 00 44 00 61 00 79 00 2c 00 â.€... .D.a.y

I tried opening the file with CFile::typeUnicode flag instead of CFile::typeText, but that makes things worse because the text is converted to 8 bit ASCII which is completely unreadable in a Unicode environment.

After reading the text into an edit box, I can paste in the correct text, and it displays correctly, so the problem is strictly with reading the text file.

Am I doing something wrong or does this call just not support UTF-8?

C++
C++
A high-level, general-purpose programming language, created as an extension of the C programming language, that has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation.
3,804 questions
{count} votes

Accepted answer
  1. Viorel 118.6K Reputation points
    2024-12-15T05:04:27.6766667+00:00

    If the UTF-8 file is correctly encoded, then try this approach:

    FILE* f;
    errno_t e = fopen_s( &f, "C:\\MyFile.txt", "rt, ccs=UTF-8" ); // opened successfully if e==0
    
    CStdioFile file( f );
    CString text;
    file.ReadString( text );
    . . .
    
    file.Close( ); // also close the FILE*
    
    3 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.