Windows API: Returning Unicode data in C6 using BSTRINGs 2007-04-12 -- Vadim Berman While the new C7 has full Unicode support, it is also possible to enable it in prior versions using a bit of Windows API and direct memory access. It is not Unicode-enabled GUI (while it is also possible, the window / control requires to be rebuilt for that), but simply returning BSTRING data in Unicode. It is particularly relevant to the situations when Clarion-created COM objects are used in ASP pages or by other Unicode applications. A little background on BSTRING and Unicode conversions. SoftVelocity has courteously made 1-byte string (STRINGs, CSTRINGs, PSTRINGs) conversion to BSTRING seamless, so simple assignment such as this: my1ByteStr CSTRING(31) myBStr BSTRING ... CODE ... my1ByteStr = 'This is a test' myBStr = my1Byt1Str - this assignment will allocate enough space for the new string (length * 2 + 2), will copy byte-by-byte the entire string, initializing the other byte of each and adding 2 NULLs to the end of the new BSTRING (unlike in regular zero-terminated strings, two NULLs are required to terminate it, because some 2 byte characters contain 0 in either high or low byte). You have to appreciate what SoftVelocity has done, because it usually takes about 4-6 more error prone lines in C/C++. What it doesn't do, however, is convert your characters to Unicode. The assignment is straightforward. As a result, location of national characters (Greek, Cyrillic, Arabic, etc.) may overlap with other unrelated characters, and the new BSTRING will look funny. Microsoft has created a function called MultiByteToWideChar, that does roughly the same as that assignment, plus conversion from the specified codepage to Unicode. (I haven't tested it with Asian languages, but I think it also should work.) The only downside is that it can't handle Clarion BSTRINGs. This is because it writes to a string of 2-byte characters rather than a classic BSTRING. Classic BSTRING, on the other hand, is: 1. 4-byte pointer to the actual set of 2-byte characters. When using ADDRESS() on a BSTRING variable, this is where it points. 2. 4-byte segment holding the length of the string, preceding the pointer. 3. The actual set of 2-byte characters. Therefore, in order to use MultiByteToWideChar, we need to: 1. Allocate space for the newly created array. 2. Pass the pointer to the array, which we'll read from the 4-byte pointer. Another minor problem is that Clarion runtime does not hold information about code pages; we only have PROP:Charset. But this is easy to solve by either using TranslateCharsetInfo (which is a bit unreliable) or creating a homebrew conversion procedure (not a big effort, so this is what I did). The result is listed below: !============================================= ANSIToUnicode PROCEDURE(STRING pSrc,SHORT pCharset) l:LenA SIGNED l:LenW SIGNED l:CodePage UNSIGNED(CP ACP) l:RVAddress ULONG l:RV BSTRING CODE IF pSrc = '' RETURN pSrc END l:CodePage = CharsetToCodepage(pCharset) l:LenA = LEN(CLIP(pSrc)) l:LenW = MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, 0, 0) IF l:LenW > 0 l:RV = ALL(' ',l:LenW) ! resize the BSTRING, there's no other way PEEK(ADDRESS(l:RV),l:RVAddress) ! get the address of the actual wide char string MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, l:RVAddress, l:LenW) ELSE l:RV = pSrc END RETURN l:RV !============================================= ! reference: http://www.treodesktop.com/codepages.htm ! http://blogs.msdn.com/shawnste/archive/2006/09/29/list-of-ansi-code-pages-us ed-by-windows.aspx CharsetToCodepage PROCEDURE(SHORT pCharset) CODE CASE pCharset OF CHARSET:SHIFTJIS RETURN 932 ! Japanese OF CHARSET:HANGEUL RETURN 949 ! Korean (Hangeul is a set of precombined Korean characters; about 4,260 unique Hanja characters exist) OF CHARSET:JOHAB RETURN 1361 ! Korean (Johab is a set of combinations between Hangul characters totaling to about 12,000) ! WARNING: not always supported OF CHARSET:GB2312 RETURN 936 ! simplified Chinese - used in mainland China and Singapore OF CHARSET:CHINESEBIG5 RETURN 950 ! traditional Chinese - used in Hong Kong and Taiwan OF CHARSET:GREEK RETURN 1253 ! Greek OF CHARSET:TURKISH RETURN 1254 ! Turkish OF CHARSET:HEBREW RETURN 1255 ! Hebrew, also suitable for Yiddish OF CHARSET:ARABIC RETURN 1256 ! Arabic and languages using Arabic scripts - Urdu (Pakistan), Persian (Iran) OF CHARSET:BALTIC RETURN 1257 ! Estonian, Latvian, Lithuanian OF CHARSET:CYRILLIC RETURN 1251 ! languages using Cyrillic scripts: Azerbaijanian (sometimes can be Latin), Belarussian, Bulgarian, ! Macedonian, Kazakh, Kyrgyz, Mongolian, Russian, Serbian (Cyrillic), Ukrainian, ! Uzbek (sometimes can be Latin) OF CHARSET:THAI RETURN 874 ! Thai OF CHARSET:EASTEUROPE RETURN 1250 ! Central / Eastern European languages: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, ! Serbian (Latin), Slovak, Slovenian OF CHARSET:DEFAULT RETURN CP ACP ! system default charset / codepage END RETURN 1252 ! West European !=========================================================================== ====== The API function is prototyped like this: MultiByteToWideChar(long CodePage, long dwFlags, long lpMultiByteStr, long cbMultiByte, | long lpWideCharStr, long cchWideCharStr),long,pascal Printed November 23, 2024, 2:09 am This article has been viewed/printed 35451 times. Google search has resulted in 287 hits on this article since January 25, 2004. |