I had to quickly write a method to replace german umlaute in filenames before saving them to disk (File upload in a JSF application)
Using the jave.text.Normalizer class or the regex match \\p{InCombiningDiacriticalMarks}+ wasn’t of a lot of help, because I couldn’t replace the umlaute in the correct form by appending an e (ö as oe etc.). Additionally there is an issue with capitalization. ÜBUNG should be all capitalized UEBUNG. While Übung should become Uebung and a single capital Ü will become Ue. An umlaut followed by a digit will be all capitalized “TÜ24″ => “TUE24″.
Based on this solution I wrote the following function, the results are almost the same:
/** * Replaces all german umlaute in the input string with the usual replacement * scheme, also taking into account capitilization. * A test String such as * "Käse Köln Füße Öl Übel Äü Üß ÄÖÜ Ä Ö Ü ÜBUNG" will yield the result * "Kaese Koeln Fuesse Oel Uebel Aeue Uess AEOEUe Ae Oe Ue UEBUNG" * @param input * @return the input string with replaces umlaute */ private static String replaceUmlaut(String input) { //replace all lower Umlauts String o_strResult = input .replaceAll("ü", "ue") .replaceAll("ö", "oe") .replaceAll("ä", "ae") .replaceAll("ß", "ss"); //first replace all capital umlaute in a non-capitalized context (e.g. Übung) o_strResult = o_strResult .replaceAll("Ü(?=[a-zäöüß ])", "Ue") .replaceAll("Ö(?=[a-zäöüß ])", "Oe") .replaceAll("Ä(?=[a-zäöüß ])", "Ae"); //now replace all the other capital umlaute o_strResult = o_strResult .replaceAll("Ü", "UE") .replaceAll("Ö", "OE") .replaceAll("Ä", "AE"); return o_strResult; }
“Käse Köln Füße Öl Übel Äü Üß ÄÖÜ Ä Ö Ü ÜBUNG” will become: “Kaese Koeln Fuesse Oel Uebel Aeue Uess AEOEUe Ae Oe Ue UEBUNG”