How to insert the encoding: UTF-8 directive automatically in Ruby 1.9 files

September 23, 2011, revised March 4, 2013 bash Ruby Ruby 1.9 sed shell script UTF-8

I had to upgrade a project to Ruby 1.9.3, and, as usual, the encoding problem came up - if your source file is in UTF-8 (or any other non-ASCII encoding), you have to explicitly declare that.

Turns out this is really easy to do with a bit of shell-script-fu:

find . -iname '*.rb' -o -iname '*.rake' | \
  xargs file -I | \
  grep utf-8 | \
  sed -E 's/\:.+$//g' | \
  xargs -I {} sh -c "echo \"# encoding: utf-8\\n\" | cat - {} >> /tmp/utf8comment.tmp && mv /tmp/utf8comment.tmp {}"

The tedious work is complete in under five minutes.

What happens is you find all Ruby files, then file -I guesses the file’s encoding, then grep selects the files we need, sed extracts the filenames and finally, we prepend the directive with xargs and some tricks described here.

Needless to say, I offer no guarantees here, so be sure to check the git diff output before committing the changes. You do have the code under version control, right?