` Returning Unicode data in C6 using BSTRINGs (Vadim Berman) - Icetips Article
Icetips - Templates, Tools & Utilities for Clarion Developers

Templates, Tools and Utilities
for Clarion Developers

Icetips Article

Back to article list   Search Articles     Add Comment     Printer friendly     Direct link  

Windows API: Returning Unicode data in C6 using BSTRINGs
2007-04-12 -- Vadim Berman
 
While the new C7 has full Unicode support, it is also possible to enable it in prior versions using a bit of Windows API and direct memory access. It is not Unicode-enabled GUI (while it is also possible, the window / control requires to be rebuilt for that), but simply returning BSTRING data in Unicode. It is particularly relevant to the situations when Clarion-created COM objects are used in ASP pages or by other Unicode applications. A little background on BSTRING and Unicode conversions. SoftVelocity has courteously made 1-byte string (STRINGs, CSTRINGs, PSTRINGs) conversion to BSTRING seamless, so simple assignment such as this: my1ByteStr CSTRING(31) myBStr BSTRING ... CODE ... my1ByteStr = 'This is a test' myBStr = my1Byt1Str - this assignment will allocate enough space for the new string (length * 2 + 2), will copy byte-by-byte the entire string, initializing the other byte of each and adding 2 NULLs to the end of the new BSTRING (unlike in regular zero-terminated strings, two NULLs are required to terminate it, because some 2 byte characters contain 0 in either high or low byte). You have to appreciate what SoftVelocity has done, because it usually takes about 4-6 more error prone lines in C/C++. What it doesn't do, however, is convert your characters to Unicode. The assignment is straightforward. As a result, location of national characters (Greek, Cyrillic, Arabic, etc.) may overlap with other unrelated characters, and the new BSTRING will look funny. Microsoft has created a function called MultiByteToWideChar, that does roughly the same as that assignment, plus conversion from the specified codepage to Unicode. (I haven't tested it with Asian languages, but I think it also should work.) The only downside is that it can't handle Clarion BSTRINGs. This is because it writes to a string of 2-byte characters rather than a classic BSTRING. Classic BSTRING, on the other hand, is: 1. 4-byte pointer to the actual set of 2-byte characters. When using ADDRESS() on a BSTRING variable, this is where it points. 2. 4-byte segment holding the length of the string, preceding the pointer. 3. The actual set of 2-byte characters. Therefore, in order to use MultiByteToWideChar, we need to: 1. Allocate space for the newly created array. 2. Pass the pointer to the array, which we'll read from the 4-byte pointer. Another minor problem is that Clarion runtime does not hold information about code pages; we only have PROP:Charset. But this is easy to solve by either using TranslateCharsetInfo (which is a bit unreliable) or creating a homebrew conversion procedure (not a big effort, so this is what I did). The result is listed below: !============================================= ANSIToUnicode PROCEDURE(STRING pSrc,SHORT pCharset) l:LenA SIGNED l:LenW SIGNED l:CodePage UNSIGNED(CP ACP) l:RVAddress ULONG l:RV BSTRING CODE IF pSrc = '' RETURN pSrc END l:CodePage = CharsetToCodepage(pCharset) l:LenA = LEN(CLIP(pSrc)) l:LenW = MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, 0, 0) IF l:LenW > 0 l:RV = ALL(' ',l:LenW) ! resize the BSTRING, there's no other way PEEK(ADDRESS(l:RV),l:RVAddress) ! get the address of the actual wide char string MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, l:RVAddress, l:LenW) ELSE l:RV = pSrc END RETURN l:RV !============================================= ! reference: http://www.treodesktop.com/codepages.htm ! http://blogs.msdn.com/shawnste/archive/2006/09/29/list-of-ansi-code-pages-us ed-by-windows.aspx CharsetToCodepage PROCEDURE(SHORT pCharset) CODE CASE pCharset OF CHARSET:SHIFTJIS RETURN 932 ! Japanese OF CHARSET:HANGEUL RETURN 949 ! Korean (Hangeul is a set of precombined Korean characters; about 4,260 unique Hanja characters exist) OF CHARSET:JOHAB RETURN 1361 ! Korean (Johab is a set of combinations between Hangul characters totaling to about 12,000) ! WARNING: not always supported OF CHARSET:GB2312 RETURN 936 ! simplified Chinese - used in mainland China and Singapore OF CHARSET:CHINESEBIG5 RETURN 950 ! traditional Chinese - used in Hong Kong and Taiwan OF CHARSET:GREEK RETURN 1253 ! Greek OF CHARSET:TURKISH RETURN 1254 ! Turkish OF CHARSET:HEBREW RETURN 1255 ! Hebrew, also suitable for Yiddish OF CHARSET:ARABIC RETURN 1256 ! Arabic and languages using Arabic scripts - Urdu (Pakistan), Persian (Iran) OF CHARSET:BALTIC RETURN 1257 ! Estonian, Latvian, Lithuanian OF CHARSET:CYRILLIC RETURN 1251 ! languages using Cyrillic scripts: Azerbaijanian (sometimes can be Latin), Belarussian, Bulgarian, ! Macedonian, Kazakh, Kyrgyz, Mongolian, Russian, Serbian (Cyrillic), Ukrainian, ! Uzbek (sometimes can be Latin) OF CHARSET:THAI RETURN 874 ! Thai OF CHARSET:EASTEUROPE RETURN 1250 ! Central / Eastern European languages: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, ! Serbian (Latin), Slovak, Slovenian OF CHARSET:DEFAULT RETURN CP ACP ! system default charset / codepage END RETURN 1252 ! West European !=========================================================================== ====== The API function is prototyped like this: MultiByteToWideChar(long CodePage, long dwFlags, long lpMultiByteStr, long cbMultiByte, | long lpWideCharStr, long cchWideCharStr),long,pascal


Today is December 4, 2024, 2:33 am
This article has been viewed 35466 times.
Google search has resulted in 287 hits on this article since January 25, 2004.



Back to article list   Search Articles   Add Comment   Printer friendly

Login

User Name:

Password: