[Building Sakai] UTF8 vs AL32UTF8

Beth Kirschner bkirschn at umich.edu
Fri May 28 08:20:42 PDT 2010


Hi Rémi,

I hadn't heard of AL32UTF8, but we both probably came upon the same  
text via google:

"Recently, one of our clients had a question on the differences  
between these two character sets since they were in the process of  
making their application global.  In an upcoming whitepaper, we will  
discuss in detail what it takes (from a RDBMS perspective) to address  
localization and globalization issues.  As far as these two character  
sets go in Oracle,  the only difference between AL32UTF8 and UTF8  
character sets is that AL32UTF8 stores characters beyond U+FFFF as  
four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores  
these characters as a sequence of two UTF-16 surrogate characters  
encoded using UTF-8 (or six bytes per character).  Besides this  
storage difference, another difference is better support for  
supplementary characters in AL32UTF8 character set."

My interpretation of the above statement is that the AL32UTF8  
character set simply affects the storage of the UTF8 characters.  
Oracle UTF-8 format correctly stores all languages tried to-date,  
including Chinese, Arabic, Japanese, etc.

- Beth

On May 28, 2010, at 10:37 AM, Remi Saias wrote:

> Hello,
>
> I was wondering what is the impact to use an Oracle database encoded  
> with UTF8 instead of AL32UTF8 as recommended on confluence?
>
> I read that the difference only shows up in characters beyond U+FFFF  
> but I have no idea what that means practically!
>
> We have been running with this encoding for a couple of months and  
> saw no problems with French or Spanish characters. We don't have  
> specific plans for using Chinese or Arabic characters yet but I  
> wonder if that would be problematic using Oracle's UTF8 which  
> apparently is not a true UTF8.
>
> Thanks for any information!
> -- 
> Rémi Saïas
> Analyste en informatique
> Gestion des technologies de l'information - HEC Montréal
> Projet Sakai-OpenSyllabus: 514.340.6776 - Decelles 4.070 (UdeM B-2249)
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"



More information about the sakai-dev mailing list