UNICODE/Wide Characters handling in C++
I was bitten again.
Life was never meant to be easier, and it's tougher when you come to deal with wide characters in C++ with wfstream, wcout or any other WIDE versions of standard I/O facilities.
Two Rule of Thumbs:
#1 Unicode files must be opened as binary
Example:
std::wifstream xmlFile(m_FileName, ios::binary);
std::wofstream xmlFile(m_FileName, ios::binary);
#2 when working with languages other than English, wifstream/wofstream must be imbued with a non-default facet to read from or write to a real UNICOE file, or else wofstream ends up writing an ANSI file.
An explanation is available from here .
Example:
1: wstring ws(L"this is a wide string");
2: wofstream of_imbued;
3:
4: IMBUE_NULL_CODECVT(of_imbued);
5:
6: of_imbued.open(L"c:\\imbued.txt", ios::binary);
7: of_imbued<<ws.c_str();
8:
9: wofstream of_not_imbued;
10: of_not_imbued.open(L"c:\\not_imbued.txt", ios::binary);
11: of_not_imbued<<ws.c_str();
12:
Outputs of the above code:
Two imbue facilities are available:
- Boost Library
- imbue_null_codecvt (the one used in above example)
There's also a classical C way to write UNICODE files:
1: wchar_t myWString[] = L"Some strange characters."
2: fwrite(myWString, sizeof(wchar_t), sizeof(myWString)/sizeof(wchar_t),
3: myFile );
However, it is not portable.
References:
- Unicode Implementation
http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/ffe0912d1462d7a5/7601a62008fdd25a?lnk=st&q=wfstream+fstream+cout+wcout&rnum=6&hl=en#7601a62008fdd25a
- Unicode in C++
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/f4a6a434b0453187/1edc2bc1f4187597?lnk=st&q=wfstream+fstream+cout+wcout&rnum=3&hl=en#1edc2bc1f4187597
- how to read a Unicode file with fstream?
http://groups.google.com/group/microsoft.public.vc.stl/browse_thread/thread/45d7520ec3ad3f51/d57b41e9abb20117?lnk=st&q=wfstream+fstream+cout+wcout&rnum=2&hl=en#
- A very puzzling problem: cout vs. wcout, fstream vs. wfstream
http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/37c3e24861ca09e3/78fe0aeed7b728de?lnk=st&q=wfstream+fstream+cout+wcout&rnum=1&hl=en#78fe0aeed7b728de
- Upgrading an STL-based application to use Unicode
http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp