Friday, May 8, 2009

Encodings in Dojo, JSON and mod_python

Just had a hell of a time getting UTF-8 encoding to come all the way through using Dojo, JSON and mod_python. Here's how I ended up doing it:

To get correct encoding from a published python function:
def testJSONFromDB(req):
req.content_type = "application/json; charset=UTF-8"
cur = getCursor()
rc = cur.execute("select Orthography FROM Cognate WHERE CognateId = 609")
return req.write('/*' + simplejson.dumps(cur.fetchone()['Orthography'], ensure_ascii=False) + '*/');

The content-type was subtle; other response headers are set in a different way. The ensure_ascii bit is by default true -- which is annoying if you're trying to be international. (The DB statement is just to get one of the words that have non-ASCII values)

To read the encoding correctly on the Dojo side:
dojo.xhrGet({
handleAs: 'json-comment-filtered',
url: '/Python/swadesh.py/testJSONFromDB',
headers: {
"Content-Type": "text/plain; charset=utf-8"
},
load: function(data) {
document.getElementById('JSONFromDB').innerHTML = data;
}});

Here, the trick is that the headers should ask for text/plain in UTF-8, but the handleAs value should indicate JSON. Go fig. It works now.

Also, don't just the code tag when describing Python, it removes the syntactically important whitespace.

No comments:

Post a Comment