Login
`
Templates, Tools and Utilities
|
||
Icetips Article
Back to article list
Search Articles
Add Comment
Printer friendly
Direct link
Windows API: Returning Unicode data in C6 using BSTRINGs 2007-04-12 -- Vadim Berman While the new C7 has full Unicode support, it is also possible to enable it
in prior versions using a bit of Windows API and direct memory access.
It is not Unicode-enabled GUI (while it is also possible, the window /
control requires to be rebuilt for that), but simply returning BSTRING data
in Unicode. It is particularly relevant to the situations when
Clarion-created COM objects are used in ASP pages or by other Unicode
applications.
A little background on BSTRING and Unicode conversions. SoftVelocity has
courteously made 1-byte string (STRINGs, CSTRINGs, PSTRINGs) conversion to
BSTRING seamless, so simple assignment such as this:
my1ByteStr CSTRING(31)
myBStr BSTRING
...
CODE
...
my1ByteStr = 'This is a test'
myBStr = my1Byt1Str
- this assignment will allocate enough space for the new string (length * 2
+ 2), will copy byte-by-byte the entire string, initializing the other byte
of each and adding 2 NULLs to the end of the new BSTRING (unlike in regular
zero-terminated strings, two NULLs are required to terminate it, because
some 2 byte characters contain 0 in either high or low byte). You have to
appreciate what SoftVelocity has done, because it usually takes about 4-6
more error prone lines in C/C++.
What it doesn't do, however, is convert your characters to Unicode. The
assignment is straightforward. As a result, location of national characters
(Greek, Cyrillic, Arabic, etc.) may overlap with other unrelated characters,
and the new BSTRING will look funny.
Microsoft has created a function called MultiByteToWideChar, that does
roughly the same as that assignment, plus conversion from the specified
codepage to Unicode. (I haven't tested it with Asian languages, but I think
it also should work.) The only downside is that it can't handle Clarion
BSTRINGs. This is because it writes to a string of 2-byte characters rather
than a classic BSTRING. Classic BSTRING, on the other hand, is:
1. 4-byte pointer to the actual set of 2-byte characters. When using
ADDRESS() on a BSTRING variable, this is where it points.
2. 4-byte segment holding the length of the string, preceding the pointer.
3. The actual set of 2-byte characters.
Therefore, in order to use MultiByteToWideChar, we need to:
1. Allocate space for the newly created array.
2. Pass the pointer to the array, which we'll read from the 4-byte pointer.
Another minor problem is that Clarion runtime does not hold information
about code pages; we only have PROP:Charset. But this is easy to solve by
either using TranslateCharsetInfo (which is a bit unreliable) or creating a
homebrew conversion procedure (not a big effort, so this is what I did). The
result is listed below:
!=============================================
ANSIToUnicode PROCEDURE(STRING pSrc,SHORT pCharset)
l:LenA SIGNED
l:LenW SIGNED
l:CodePage UNSIGNED(CP ACP)
l:RVAddress ULONG
l:RV BSTRING
CODE
IF pSrc = ''
RETURN pSrc
END
l:CodePage = CharsetToCodepage(pCharset)
l:LenA = LEN(CLIP(pSrc))
l:LenW = MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, 0, 0)
IF l:LenW > 0
l:RV = ALL(' ',l:LenW) ! resize the BSTRING, there's no other way
PEEK(ADDRESS(l:RV),l:RVAddress) ! get the address of the actual wide
char string
MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, l:RVAddress,
l:LenW)
ELSE
l:RV = pSrc
END
RETURN l:RV
!=============================================
! reference: http://www.treodesktop.com/codepages.htm
!
http://blogs.msdn.com/shawnste/archive/2006/09/29/list-of-ansi-code-pages-us
ed-by-windows.aspx
CharsetToCodepage PROCEDURE(SHORT pCharset)
CODE
CASE pCharset
OF CHARSET:SHIFTJIS
RETURN 932 ! Japanese
OF CHARSET:HANGEUL
RETURN 949 ! Korean (Hangeul is a set of precombined Korean characters;
about 4,260 unique Hanja characters exist)
OF CHARSET:JOHAB
RETURN 1361 ! Korean (Johab is a set of combinations between Hangul
characters totaling to about 12,000)
! WARNING: not always supported
OF CHARSET:GB2312
RETURN 936 ! simplified Chinese - used in mainland China and Singapore
OF CHARSET:CHINESEBIG5
RETURN 950 ! traditional Chinese - used in Hong Kong and Taiwan
OF CHARSET:GREEK
RETURN 1253 ! Greek
OF CHARSET:TURKISH
RETURN 1254 ! Turkish
OF CHARSET:HEBREW
RETURN 1255 ! Hebrew, also suitable for Yiddish
OF CHARSET:ARABIC
RETURN 1256 ! Arabic and languages using Arabic scripts - Urdu
(Pakistan), Persian (Iran)
OF CHARSET:BALTIC
RETURN 1257 ! Estonian, Latvian, Lithuanian
OF CHARSET:CYRILLIC
RETURN 1251 ! languages using Cyrillic scripts: Azerbaijanian (sometimes
can be Latin), Belarussian, Bulgarian,
! Macedonian, Kazakh, Kyrgyz, Mongolian, Russian, Serbian
(Cyrillic), Ukrainian,
! Uzbek (sometimes can be Latin)
OF CHARSET:THAI
RETURN 874 ! Thai
OF CHARSET:EASTEUROPE
RETURN 1250 ! Central / Eastern European languages: Albanian, Croatian,
Czech, Hungarian, Polish, Romanian,
! Serbian (Latin), Slovak, Slovenian
OF CHARSET:DEFAULT
RETURN CP ACP ! system default charset / codepage
END
RETURN 1252 ! West European
!===========================================================================
======
The API function is prototyped like this:
MultiByteToWideChar(long CodePage, long dwFlags, long lpMultiByteStr,
long cbMultiByte, |
long lpWideCharStr, long cchWideCharStr),long,pascal
Today is December 4, 2024, 2:33 am This article has been viewed 35466 times. Google search has resulted in 287 hits on this article since January 25, 2004.
|
|