I had to quickly write a method to replace german umlaute in filenames before saving them to disk (File upload in a JSF application)
Using the jave.text.Normalizer class or the regex match \\p{InCombiningDiacriticalMarks}+ wasn’t of a lot of help, because I couldn’t replace the umlaute in the correct form by appending an e (ö as oe etc.). Additionally there is an issue with capitalization. ÜBUNG should be all capitalized UEBUNG. While Übung should become Uebung and a single capital Ü will become Ue. An umlaut followed by a digit will be all capitalized “TÜ24″ => “TUE24″.
Based on this solution I wrote the following function, the results are almost the same:
/**
* Replaces all german umlaute in the input string with the usual replacement
* scheme, also taking into account capitilization.
* A test String such as
* "Käse Köln Füße Öl Übel Äü Üß ÄÖÜ Ä Ö Ü ÜBUNG" will yield the result
* "Kaese Koeln Fuesse Oel Uebel Aeue Uess AEOEUe Ae Oe Ue UEBUNG"
* @param input
* @return the input string with replaces umlaute
*/
private static String replaceUmlaut(String input) {
//replace all lower Umlauts
String o_strResult =
input
.replaceAll("ü", "ue")
.replaceAll("ö", "oe")
.replaceAll("ä", "ae")
.replaceAll("ß", "ss");
//first replace all capital umlaute in a non-capitalized context (e.g. Übung)
o_strResult =
o_strResult
.replaceAll("Ü(?=[a-zäöüß ])", "Ue")
.replaceAll("Ö(?=[a-zäöüß ])", "Oe")
.replaceAll("Ä(?=[a-zäöüß ])", "Ae");
//now replace all the other capital umlaute
o_strResult =
o_strResult
.replaceAll("Ü", "UE")
.replaceAll("Ö", "OE")
.replaceAll("Ä", "AE");
return o_strResult;
}
“Käse Köln Füße Öl Übel Äü Üß ÄÖÜ Ä Ö Ü ÜBUNG” will become: “Kaese Koeln Fuesse Oel Uebel Aeue Uess AEOEUe Ae Oe Ue UEBUNG”