The infamous Turkish locale bug

I discovered a quirky comment today in Confluence’s Permission.forName(String) method:

// use the english locale to avoid the infamous turkish locale bug
String upperName = permissionName.toUpperCase(Locale.ENGLISH);

Naturally the question popped into my mind: what is the ‘infamous Turkish locale bug’? Looking into the JIRA issues related to the commit (CONF-5931, CONF-7168), I found a link Agnes put to this article about a common Java bug in the Turkish locale: Turkish Java Needs Special Brewing.

In the Turkish alphabet there are two letters for ‘i’, dotless and dotted. The problem is that the dotless ‘i’ in lowercase becomes the dotless in uppercase. At first glance this wouldn’t appear to be a problem; however, the problem lies in what programmers do with upper- and lowercases in their code.

The two lowercase letters are \u0069 ‘i’ and \u0131 ‘ı’ (dotless ‘I’) and are totally unrelated. Their uppercase versions are \u0130 ‘İ’ (capital letter ‘I’ with dot above it) and \u0049 ‘I’. The issue is that this behavior does not occur in English where the single lowercase dotted ‘i’ becomes an uppercase dotless ‘I’.

With the statement String.toUppercase(), most Java programmers try to effectively neutralize case. Consider a HashMap with string keys and you have a key that you want to look up. If you want to ignore case, you’ll probably uppercase everything going into the map, its entries, and the string you’re doing the lookup with. This works fine for English, but not for Turkish, where dotless becomes dotless.

This is a nice example of where you need to be very careful how you handle upper- and lower-casing in your application. Changing the word ‘quit’ to uppercase in the Turkish locale will result in ‘QUİT’, not ‘QUIT’. I’ve heard of other examples where the German ß (sharp ‘s’) doesn’t behave exactly as English speakers would expect either.

There are two ways to properly perform a case-insensitive comparison of Strings in Java in any locale:

  • (preferred) use String.equalsIgnoreCase()
  • use a fixed locale (like Locale.ENGLISH) as an argument to String.toUpperCase(Locale) or String.toLowerCase(Locale).

You can also use Character.toLowerCase() or Character.toUpperCase() to derive a locale-independent case-insensitive String value. This was the solution used in a recent (and still unreleased) fix for the same problem in the Commons Collections CaseInsensitiveMap.

Portrait of Matt Ryall

About Matt

I’m a technology nerd, husband and father of four, living in beautiful Sydney, Australia.

My passion is building software products that make the world a better place. For the last 15 years, I’ve led product teams at Atlassian to create collaboration tools.

I'm also a startup advisor and investor, with an interest in advancing the Australian space industry. You can read more about my work on my LinkedIn profile.

To contact me, please send an email or reply on Twitter.