How to insert the encoding: UTF-8 directive automatically in Ruby 1.9 files

Updated on April 09, 2012, first published on September 23, 2011

I had to upgrade a project to Ruby 1.9.3, and, as usual, the encoding problem came up – if your source file is in UTF-8 (or any other non-ASCII encoding), you have to explicitly declare that.

Turns out this is really easy to do with a bit of shell-script-fu:

find . -iname '*.rb' -o -iname '*.rake' | \
  xargs file -I | \
  grep utf-8 | \
  sed -E 's/\:.+$//g' | \
  xargs -I {} sh -c "echo \"# encoding: utf-8\\n\" | cat - {} > /tmp/utf8comment.tmp && mv /tmp/utf8comment.tmp {}"

The tedious work is complete in under five minutes.

What happens is you find all Ruby files, then file -I guesses the file’s encoding, then grep selects the files we need, sed extracts the filenames and finally, we prepend the directive with xargs and some tricks described here.

Needless to say, I offer no guarantees here, so be sure to check the git diff output before committing the changes. You do have the code under version control, right?

Five comments. Post another one
  1. E6c4bdab1b1754154a9b509defc19426 # On November 12, 2011 Rytis Lukoševičius (tech.rytis.net) wrote:

    Hello a nice script, just noticed one very interesting side effect for the files that had Lithuanian non ascii letters in them this script totally yanked the content and only #encoding: utf-8 was left (thank you Linus for git). It converted all the latin files correctly all the time.

    1. 777894ea5153122bfa6b83f5bbf23622 # On November 12, 2011 Leonid Shevtsov (the author) wrote:

      Whoops, sorry, the last line should of course do an append (>>{}) instead of rewrite (>{}). Thanks for mentioning!

  2. 8997e86c3bb30f73e61b8089d30d4a0f # On March 04, 2013 J. Random Hacker wrote:

    Maybe you could consider correcting the code on the page instead of just apologizing for the error?

  3. Db1c2886f42903ef709816b1c814f02c # On July 16, 2013 Dima (twitter.com/ipronix) wrote:

    find . -name *.rb -exec grep -ic ‘encoding’ {} ;sed -i -e ‘1i\# encoding: utf-8’ {} \;

    1. 777894ea5153122bfa6b83f5bbf23622 # On July 16, 2013 Leonid Shevtsov (the author) wrote:

      Wow, thanks for making a twitter-friendly version! :)

