There's large simularity in the way data travels from and to a web server.
But unfortunately there is one difference: while the web server tells the browser what
character encoding the page it sends is in (via the Content-Type HTTP header),
the client does not send such information.
Accordingly to the HTTP spec the HTTP request the browser send to the server
(that contains the submitted form) may well contain the Content-Type header too.
This would give the server the key to decript the form parameters.
Regretfully our present internet browsers do not send it
The browser generally does the following: it takes user input in national characters
-
translates it to a byte sequence using the character encoding that the web page that contains
the form is encoded with
-
the resulting byte secuence is encoded into the query string according to the usual rules of
encoding query strings. That is all bytes that correspond to legal ascii alpha-numeric chars
are encoded as those chars, all the rest are converted to the %xy representation, where xy
is the hexademical code of the corresponding byte (like %C1, for example)
Then the encoded query (possibly containing %xy codes) is sent to the server. ascii characters,
according to the procedure described above are sent to the server as they are, provided that they
have the same codes both in ascii character encoding and in the national character encoding that is used.
This filter sets the character encoding before parameters are handled.
The filter sets the character encoding by the following information:
- HTTP content-type header
- Parameter of filter in the WEB-INF/web.xml
- MMBase encoding set in mmbase-config/modules/mmbaseroot.xml
- No encoding defined. (default UTF-8)
Get it to work by incorporating the following piece of XML in your web.xml:
<filter>
<filter-name>Set Character Encoding</filter-name>
<filter-class>org.mmbase.servlet.CharacterEncodingFilter</filter-class>
<!-- Overrides config/module/mmbaseroot.xml#encoding -->
<!-- <init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
-->
</filter>
<filter-mapping>
<filter-name>Set Character Encoding</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>