Login
`
Templates, Tools and Utilities
|
||
Add a comment to an Icetips ArticlePlease add your comments to this article.
Please note that you must provide both a name and a valid email address in order
for us to publish your comment. Comments are moderated and are not visible until they have been approved. Spam is never approved!
Back to article list Search Articles Add Comment Printer friendly Direct link Windows API: Returning Unicode data in C6 using BSTRINGs 2007-04-12 -- Vadim Berman While the new C7 has full Unicode support, it is also possible to enable it
in prior versions using a bit of Windows API and direct memory access.
It is not Unicode-enabled GUI (while it is also possible, the window /
control requires to be rebuilt for that), but simply returning BSTRING data
in Unicode. It is particularly relevant to the situations when
Clarion-created COM objects are used in ASP pages or by other Unicode
applications.
A little background on BSTRING and Unicode conversions. SoftVelocity has
courteously made 1-byte string (STRINGs, CSTRINGs, PSTRINGs) conversion to
BSTRING seamless, so simple assignment such as this:
my1ByteStr CSTRING(31)
myBStr BSTRING
...
CODE
...
my1ByteStr = 'This is a test'
myBStr = my1Byt1Str
- this assignment will allocate enough space for the new string (length * 2
+ 2), will copy byte-by-byte the entire string, initializing the other byte
of each and adding 2 NULLs to the end of the new BSTRING (unlike in regular
zero-terminated strings, two NULLs are required to terminate it, because
some 2 byte characters contain 0 in either high or low byte). You have to
appreciate what SoftVelocity has done, because it usually takes about 4-6
more error prone lines in C/C++.
What it doesn't do, however, is convert your characters to Unicode. The
assignment is straightforward. As a result, location of national characters
(Greek, Cyrillic, Arabic, etc.) may overlap with other unrelated characters,
and the new BSTRING will look funny.
Microsoft has created a function called MultiByteToWideChar, that does
roughly the same as that assignment, plus conversion from the specified
codepage to Unicode. (I haven't tested it with Asian languages, but I think
it also should work.) The only downside is that it can't handle Clarion
BSTRINGs. This is because it writes to a string of 2-byte characters rather
than a classic BSTRING. Classic BSTRING, on the other hand, is:
1. 4-byte pointer to the actual set of 2-byte characters. When using
ADDRESS() on a BSTRING variable, this is where it points.
2. 4-byte segment holding the length of the string, preceding the pointer.
3. The actual set of 2-byte characters.
Therefore, in order to use MultiByteToWideChar, we need to:
1. Allocate space for the newly created array.
2. Pass the pointer to the array, which we'll read from the 4-byte pointer.
Another minor problem is that Clarion runtime does not hold information
about code pages; we only have PROP:Charset. But this is easy to solve by
either using TranslateCharsetInfo (which is a bit unreliable) or creating a
homebrew conversion procedure (not a big effort, so this is what I did). The
result is listed below:
!=============================================
ANSIToUnicode PROCEDURE(STRING pSrc,SHORT pCharset)
l:LenA SIGNED
l:LenW SIGNED
l:CodePage UNSIGNED(CP ACP)
l:RVAddress ULONG
l:RV BSTRING
CODE
IF pSrc = ''
RETURN pSrc
END
l:CodePage = CharsetToCodepage(pCharset)
l:LenA = LEN(CLIP(pSrc))
l:LenW = MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, 0, 0)
IF l:LenW > 0
l:RV = ALL(' ',l:LenW) ! resize the BSTRING, there's no other way
PEEK(ADDRESS(l:RV),l:RVAddress) ! get the address of the actual wide
char string
MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, l:RVAddress,
l:LenW)
ELSE
l:RV = pSrc
END
RETURN l:RV
!=============================================
! reference: http://www.treodesktop.com/codepages.htm
!
http://blogs.msdn.com/shawnste/archive/2006/09/29/list-of-ansi-code-pages-us
ed-by-windows.aspx
CharsetToCodepage PROCEDURE(SHORT pCharset)
CODE
CASE pCharset
OF CHARSET:SHIFTJIS
RETURN 932 ! Japanese
OF CHARSET:HANGEUL
RETURN 949 ! Korean (Hangeul is a set of precombined Korean characters;
about 4,260 unique Hanja characters exist)
OF CHARSET:JOHAB
RETURN 1361 ! Korean (Johab is a set of combinations between Hangul
characters totaling to about 12,000)
! WARNING: not always supported
OF CHARSET:GB2312
RETURN 936 ! simplified Chinese - used in mainland China and Singapore
OF CHARSET:CHINESEBIG5
RETURN 950 ! traditional Chinese - used in Hong Kong and Taiwan
OF CHARSET:GREEK
RETURN 1253 ! Greek
OF CHARSET:TURKISH
RETURN 1254 ! Turkish
OF CHARSET:HEBREW
RETURN 1255 ! Hebrew, also suitable for Yiddish
OF CHARSET:ARABIC
RETURN 1256 ! Arabic and languages using Arabic scripts - Urdu
(Pakistan), Persian (Iran)
OF CHARSET:BALTIC
RETURN 1257 ! Estonian, Latvian, Lithuanian
OF CHARSET:CYRILLIC
RETURN 1251 ! languages using Cyrillic scripts: Azerbaijanian (sometimes
can be Latin), Belarussian, Bulgarian,
! Macedonian, Kazakh, Kyrgyz, Mongolian, Russian, Serbian
(Cyrillic), Ukrainian,
! Uzbek (sometimes can be Latin)
OF CHARSET:THAI
RETURN 874 ! Thai
OF CHARSET:EASTEUROPE
RETURN 1250 ! Central / Eastern European languages: Albanian, Croatian,
Czech, Hungarian, Polish, Romanian,
! Serbian (Latin), Slovak, Slovenian
OF CHARSET:DEFAULT
RETURN CP ACP ! system default charset / codepage
END
RETURN 1252 ! West European
!===========================================================================
======
The API function is prototyped like this:
MultiByteToWideChar(long CodePage, long dwFlags, long lpMultiByteStr,
long cbMultiByte, |
long lpWideCharStr, long cchWideCharStr),long,pascal
Today is November 21, 2024, 6:48 am This article has been viewed 35445 times. Google search has resulted in 287 hits on this article since January 25, 2004.
|
|