War Story: UTF-8 Content Disposition Header

War Story: UTF-8 Content Disposition Header

Recently I had some problems with special filename characters during file upload with wildfly 8.2.0 application server. In my case the filename contained chinese letters and was encoded with the wrong encoding on application server side.  After searching for a suitable solution I found an interesting thread on the jboss developer forum. It seems that there is a bug in the server that causes wrong filename encodings.

See the full forum thread here:

https://developer.jboss.org/thread/263484

 

In my case the request was like this:


------WebKitFormBoundary6hQpVgFVOpr9ArmL
Content-Disposition: form-data; name="Upload_file"; filename="香港.pdf"
Content-Type: application/pdf

 

 

The filename 香港.pdf was displayed not correctly, but after implementing the the solution described in the mentioned thread the filename characters were right:


	private String restoreUtf8FileName(String isoFileName) {
		String result;
		byte fileNameISOBytes[] = isoFileName.getBytes(StandardCharsets.ISO_8859_1);
		String fileNameUTF8 = new String(fileNameISOBytes, StandardCharsets.UTF_8);
		if (isoFileName.length() != fileNameUTF8.length()) {
			result = new String(isoFileName.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
		} else {
			result = isoFileName;
		}

		return result;
	}

...

	final Part filePart = req.getPart(FILE_DATA);
	String fileName = getFileName(filePart);

	fileName = restoreUtf8FileName(fileName)

It’s a little surprisingly that this bug was not noticed earlier by the application server developers. But UTF-8 Encodings for file names during up- and download is a little bit messy, see here:

https://tools.ietf.org/html/rfc6266#section-4.5

https://issues.jboss.org/browse/RESTEASY-1214

http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http