Thursday 28 May 2009

ruby oracle oci8 UTF8 corruption

I have a small script oci1.rb:
require 'oci8' 
conn = OCI8.new("test","test","test_db")
File.open("out.txt","wb") do |out|
tabs = conn.exec('SELECT * from test_utf8') do |r|
out.write(r.join(','))
out.write("\n")
end
end
conn.logoff

that runs perfectly on windows producing from my test_db:
polish,łóśżć 
russsian,фывафыва
german,äöü

when I run exactly the same on my linux box:
polish,loszc 
russsian,????????
german,aou

I tested then a small script that reads and writes UTF8 on linux to be sure my linux terminal can display UTF8
(read_write_utf8.rb):
File.open("utf8_out.txt","wb") do |out| 
File.open("utf8.txt","r").each do |line|
out.write(line)
end
end

and that works good!
chris@emeadb:~/work/ruby/oci$ ruby read_write_utf8.rb 
chris@emeadb:~/work/ruby/oci$ cat utf8.txt
polish,łóśżć
russsian,фывафыва
german,äöü
chris@emeadb:~/work/ruby/oci$ cat utf8_out.txt
polish,łóśżć
russsian,фывафыва
german,äöü


Finally the issue was with my NLS_LANG variable that was not set in the Linux box on the account I used for running the script.

I tried first to set it up in the ruby script itself:
ENV['NLS_LANG']='AMERICAN_AMERICA.UTF8'

that did not work.

the solution was to set it in the shell.
export NLS_LANG=AMERICAN_AMERICA.UTF8


you can set it in your .bashrc (depending on the shell you use) to make it default for the account.