Unicode CLDR integration into Squeak/Pharo

Mentor: Henrik Sperre Johansen
Second mentor: Paul DeBruicker
Level: Beginner
Invited students: Siddharth Bhatia, Nirbhai Singh
Students interested: Siddharth Bhatia, Nirbhai Singh, Gareth Cox(lightly)


The Common Locale Data Repository CLDR (http://cldr.unicode.org/) is a maintained set of locale specific information for use in programs.  Creating a package that integrates it into Squeak & Pharo would help with internationalization (i18n) and localization (l10n) for the platforms.  It would be preferable to be able to load a limited number of locales rather than the whole thing into every image.  We would probably benefit from having a Fuel based data repo somewhere that can access the updated and interpreted standards.

Technical details

We envision the project proceeding in this manner:

1. Create a limited object-model for the part of CLDR Locales that provides the same capabilities as current Locales (including a legacy protocol)

2. Implement parsing/reading of the CLDR data files.

3. System integration; import a CLDR Locale from the data files for the system-reported locale (en_GB, nb_NO), and installing it as the current locale

4. Implement import/export of Locale objects using Fuel

5. Expand the object model to include a new capabiliy of a CLDR Locale not present in the current C-inspired locales (string collation, etc)

6. Provide useful system integration of said capability

7. Repeat 5/6 until running out of time

The CLDR is constantly edited and updated and published as a spec and set of XML files.  Those files would need to be regularly downloaded and parsed to create a publicly available repository that Smalltalkers could access to import only those locales they're interested in having in their images. A repository on github that contains a Fuel file per locale would be an acceptable solution.

Benefits to the Student

Work with international standards, web services, wider knowledge of the affects of the actions of politicians and history.

Benefits to the Community

Being able to properly interpret, compare, and format currencies, dates, times, inside their images.

Updated: 24.4.2013