Thursday, January 10, 2008

Random thoughts on Unicode

Recently CodeGear engineers started to talk about Unicode support [1][2][3][4][5] in the next major release of Delphi (codenamed Tiburon [6]). I am still so excited to hear about that, although it should have been done ten years ago.

Is Unicode support a NP-problem?

You might ask the same question. In fact, a set of Wide-string types have been introduced for many years. But their RTL, VCL and IDE stay in Ansi-version stage. Why not update? IMO: 50% is because of technical issues and 50% is because of their market focus and operative issues. On one hand, a smooth migration is a big challenge, which must be tested completely and seriously. On another hand, their human resource might be limited, so that Unicode support has been shifted several times.

How about the days without Unicode support?

Fortunately, some nice guys have created some nice components for us in the most difficult days. The TntControls [7], a collection of basic Unicode enabled RTL and VCL, is one of them. The concept behind TntControls is to override all Ansi-version of string properties and routines with wide-versions. For Win9x platforms, all Wide-something will be casted to Ansi-something at runtime, so that everything works fine for all Windows platforms. But you have to create Wide-version of components and functions one by one. This is time intensive, boring and sometimes a little bit difficult. Furthermore, WideString is not reference counted. It performs /much more/ slower [8] than AnsiString. Unfortunately, nowadays there are few options...

What will be done to next Delphi?

First, a reference-counted new string type UnicodeString will be introduced. Second, all type aliases and function aliases will be switched from Ansi-version to Wide-version. So that most existed projects can be upgraded to Unicode stage without difficulties or performance lost. It sounds very simple, doesn't it? Thanks for those genies in advance.

But there are also two things I do not like:
  1. The new type name is inconsistent with existed types. AnsiString is equivalent to UnicodeString, but AnsiChar is equivalent to WideChar. I suggest deprecating WideChar as well and introducing a new type UnicodeChar.
  2. Type aliases and function aliases are not switchable, which means, that you have to make sure that UnicodeString will NOT break your code. Unfortunately, if your project is not test driven, it is very hard to say...

Conclusion

So many questions, discusses and requests about the new UnicodeString. No doubt, CodeGear engineers will be quite busy this year. The fully Unicode support is a big challenge to everyone. Are you ready? [9]

References

  1. Chris Bensen: Unicode
  2. Chris Bensen: Unicode: SizeOf is Different than Length Part II
  3. Chris Bensen: Unicode: SizeOf is Different than Length
  4. The Oracle at Delphi: DPL & Unicode - a toss up
  5. The Oracle at Delphi: More FAQs about Unicode in Tiburón
  6. Delphi and C++ Builder Roadmap
  7. TMSUnicode Components (formal TntControls)
  8. Tobias Gurock: What’s wrong with Delphi’s WideString?
  9. The Chinese version of this article (on my blog @csdn)
 

1 comment:

topaz said...

hello,ihave a problem in delphi 2007 update pack3 trial edtion.
the equation of "alt+0161" after exiting from Label->caption converts to the " ? ".
what is the reason ?
could you please help me , here are some pictures of that problem.
thank you.

http://www.noavari.com/images/err.jpg