Tag Archives: Umlaut

Replacing German Umlauts

I had to quickly write a method to replace german umlaute in filenames before saving them to disk (File upload in a JSF application)

Using the jave.text.Normalizer class or the regex match \\p{InCombiningDiacriticalMarks}+ wasn’t of a lot of help, because I couldn’t replace the umlaute in the correct form by appending an e (ö as oe etc.). Additionally there is an issue with capitalization. ÜBUNG should be all capitalized UEBUNG. While Übung should become Uebung and a single capital Ü will become Ue. An umlaut followed by a digit will be all capitalized “TÜ24″ => “TUE24″.

Based on this solution I wrote the following function, the results are almost the same:

   /**
     * Replaces all german umlaute in the input string with the usual replacement 
     * scheme, also taking into account capitilization.
     * A test String such as 
     * "Käse Köln Füße Öl Übel Äü Üß ÄÖÜ Ä Ö Ü ÜBUNG" will yield the result 
     * "Kaese Koeln Fuesse Oel Uebel Aeue Uess AEOEUe Ae Oe Ue UEBUNG"
     * @param input
     * @return the input string with replaces umlaute
     */
    private static String replaceUmlaut(String input) {

        //replace all lower Umlauts
        String o_strResult =
                input
                .replaceAll("ü", "ue")
                .replaceAll("ö", "oe")
                .replaceAll("ä", "ae")
                .replaceAll("ß", "ss");

        //first replace all capital umlaute in a non-capitalized context (e.g. Übung)
        o_strResult =
                o_strResult
                .replaceAll("Ü(?=[a-zäöüß ])", "Ue")
                .replaceAll("Ö(?=[a-zäöüß ])", "Oe")
                .replaceAll("Ä(?=[a-zäöüß ])", "Ae");

        //now replace all the other capital umlaute
        o_strResult =
                o_strResult
                .replaceAll("Ü", "UE")
                .replaceAll("Ö", "OE")
                .replaceAll("Ä", "AE");

        return o_strResult;
    }

“Käse Köln Füße Öl Übel Äü Üß ÄÖÜ Ä Ö Ü ÜBUNG” will become: “Kaese Koeln Fuesse Oel Uebel Aeue Uess AEOEUe Ae Oe Ue UEBUNG”