Returning Unicode data in C6 using BSTRINGs (Vadim Berman)

Back to article list Search Articles Add Comment Printer friendly Direct link

Windows API: Returning Unicode data in C6 using BSTRINGs
2007-04-12 -- Vadim Berman

While the new C7 has full Unicode support, it is also possible to enable it
in prior versions using a bit of Windows API and direct memory access. 

It is not Unicode-enabled GUI (while it is also possible, the window /
control requires to be rebuilt for that), but simply returning BSTRING data
in Unicode. It is particularly relevant to the situations when
Clarion-created COM objects are used in ASP pages or by other Unicode
applications.

A little background on BSTRING and Unicode conversions. SoftVelocity has
courteously made 1-byte string (STRINGs, CSTRINGs, PSTRINGs) conversion to
BSTRING seamless, so simple assignment such as this:

my1ByteStr            CSTRING(31)

myBStr                  BSTRING

...

    CODE

...

    my1ByteStr = 'This is a test'

    myBStr = my1Byt1Str

- this assignment will allocate enough space for the new string (length * 2
+ 2), will copy byte-by-byte the entire string, initializing the other byte
of each and adding 2 NULLs to the end of the new BSTRING (unlike in regular
zero-terminated strings, two NULLs are required to terminate it, because
some 2 byte characters contain 0 in either high or low byte). You have to
appreciate what SoftVelocity has done, because it usually takes about 4-6
more error prone lines in C/C++.

What it doesn't do, however, is convert your characters to Unicode. The
assignment is straightforward. As a result, location of national characters
(Greek, Cyrillic, Arabic, etc.) may overlap with other unrelated characters,
and the new BSTRING will look funny.

Microsoft has created a function called MultiByteToWideChar, that does
roughly the same as that assignment, plus conversion from the specified
codepage to Unicode. (I haven't tested it with Asian languages, but I think
it also should work.) The only downside is that it can't handle Clarion
BSTRINGs. This is because it writes to a string of 2-byte characters rather
than a classic BSTRING. Classic BSTRING, on the other hand, is:  

1. 4-byte pointer to the actual set of 2-byte characters. When using
ADDRESS() on a BSTRING variable, this is where it points.

2. 4-byte segment holding the length of the string, preceding the pointer.

3. The actual set of 2-byte characters.

Therefore, in order to use MultiByteToWideChar, we need to:

1. Allocate space for the newly created array.

2. Pass the pointer to the array, which we'll read from the 4-byte pointer.



Another minor problem is that Clarion runtime does not hold information
about code pages; we only have PROP:Charset. But this is easy to solve by
either using TranslateCharsetInfo (which is a bit unreliable) or creating a
homebrew conversion procedure (not a big effort, so this is what I did). The
result is listed below:



!=============================================
ANSIToUnicode PROCEDURE(STRING pSrc,SHORT pCharset)
l:LenA        SIGNED
l:LenW        SIGNED
l:CodePage    UNSIGNED(CP ACP)
l:RVAddress   ULONG
l:RV          BSTRING
  CODE
  IF pSrc = ''
    RETURN pSrc
  END
  l:CodePage = CharsetToCodepage(pCharset)
  l:LenA = LEN(CLIP(pSrc))
  l:LenW = MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, 0, 0)
  IF l:LenW > 0
    l:RV = ALL(' ',l:LenW)  ! resize the BSTRING, there's no other way
    PEEK(ADDRESS(l:RV),l:RVAddress) ! get the address of the actual wide
char string
    MultiByteToWideChar(l:CodePage, 0, ADDRESS(pSrc), l:LenA, l:RVAddress,
l:LenW)
  ELSE
    l:RV = pSrc
  END

  RETURN l:RV


!=============================================
! reference: http://www.treodesktop.com/codepages.htm
!
http://blogs.msdn.com/shawnste/archive/2006/09/29/list-of-ansi-code-pages-us
ed-by-windows.aspx
CharsetToCodepage PROCEDURE(SHORT pCharset)
  CODE
  CASE pCharset
  OF CHARSET:SHIFTJIS
    RETURN 932  ! Japanese
  OF CHARSET:HANGEUL
    RETURN 949  ! Korean (Hangeul is a set of precombined Korean characters;
about 4,260 unique Hanja characters exist)
  OF CHARSET:JOHAB
    RETURN 1361 ! Korean (Johab is a set of combinations between Hangul
characters totaling to about 12,000)
                ! WARNING: not always supported
  OF CHARSET:GB2312
    RETURN 936  ! simplified Chinese - used in mainland China and Singapore
  OF CHARSET:CHINESEBIG5
    RETURN 950  ! traditional Chinese - used in Hong Kong and Taiwan
  OF CHARSET:GREEK
    RETURN 1253 ! Greek
  OF CHARSET:TURKISH
    RETURN 1254 ! Turkish
  OF CHARSET:HEBREW
    RETURN 1255 ! Hebrew, also suitable for Yiddish
  OF CHARSET:ARABIC
    RETURN 1256 ! Arabic and languages using Arabic scripts - Urdu
(Pakistan), Persian (Iran)
  OF CHARSET:BALTIC
    RETURN 1257 ! Estonian, Latvian, Lithuanian
  OF CHARSET:CYRILLIC
    RETURN 1251 ! languages using Cyrillic scripts: Azerbaijanian (sometimes
can be Latin), Belarussian, Bulgarian,
                ! Macedonian, Kazakh, Kyrgyz, Mongolian, Russian, Serbian
(Cyrillic), Ukrainian,
                ! Uzbek (sometimes can be Latin)
  OF CHARSET:THAI
    RETURN 874  ! Thai
  OF CHARSET:EASTEUROPE
    RETURN 1250 ! Central / Eastern European languages: Albanian, Croatian,
Czech, Hungarian, Polish, Romanian,
                ! Serbian (Latin), Slovak, Slovenian
  OF CHARSET:DEFAULT
    RETURN CP ACP ! system default charset / codepage
  END
  RETURN 1252   ! West European

!===========================================================================
======



The API function is prototyped like this:

    MultiByteToWideChar(long CodePage, long dwFlags, long lpMultiByteStr,
long cbMultiByte, |
                        long lpWideCharStr, long cchWideCharStr),long,pascal

Today is March 31, 2025, 11:44 am
This article has been viewed 35614 times.
Google search has resulted in 287 hits on this article since January 25, 2004.

Back to article list Search Articles Add Comment Printer friendly

Templates, Tools and Utilities
for Clarion Developers

Icetips Article

Templates, Tools and Utilities for Clarion Developers
	Home Products Download Documentation Demos Support Articles About Buy ALL PRODUCTS Build Automator Checkbox Fixer Icetips Utilities Magic Buttons Magic Entries Magic Locks OutlookBar Power ToolBar Report Previewer SQL Templates TaskPanel Window Fixer XP Theme Xplore GOLD SUBSCRIPTION SILVER SUBSCRIPTION SOLO SUBSCRIPTION Compare Gold/Silver Products Documentation Demos Free SQL Stuff Other Free Stuff PAR2 files Tips & Tricks - NEW Icetips Blog Icetips News Icetips on Facebook Icetips on Twitter Icetips on Youtube All Articles Search articles Top Ten articles Articles by Category Video tutorials What is Clarion? Support Options Live Support Email & Mail Icetips Blog Icetips on Twitter Gold Subscription Silver Subscription Solo Subscription Recurring Subscription Compare Gold & Silver Renew Your Subscription Multiple Copy Pricing

Templates, Tools and Utilitiesfor Clarion Developers

Icetips Article

Templates, Tools and Utilities
for Clarion Developers