Monday, October 12, 2009

Should PERL_UNICODE be considered harmful?

I set PERL_UNICODE to "SDL" as a matter of course when setting up my environment. This means that all of my filehandles will use the UTF-8 PerlIO layer unless the locale says otherwise or a specific layer is chosen explicitly. I do this because I don't want to have to worry about calling binmode or explicitly setting the PerlIO layer when opening a file:
open my $fh, "<:utf8", $filename
or die "could not open $filename: $!"
This has worked fine for me for years; however, recently I have noticed a few problems with it:
  • You cannot compile Perl will it set
  • Many modules fail their tests when it is set
  • Scripts that work just fine in your environment fail in other environments
  • since it affects all filehandles, it could cause bugs in modules (I have never actually seen this)
Given these issues, I am starting to consider PERL_UNICODE harmful and thinking about giving it up.

Does anyone know of any arguments to keep using it, or, conversely, more arguments to stop using it?


  1. You should not be using the :utf8 layer. You want :encoding(UTF-8) instead. The :utf8 layer does nothing more than the equivalent of _utf8_on – you want to de-/encode the data instead.

  2. Thanks Aristotle, I don't tend to use the layers at all (since I use PERL_UNICODE), so I haven't paid much attention to which one I really want.


Some limited HTML markup is allowed by blogger: strong, b, i, and a. You may also use em, but I have repurposed it through the magic of CSS to be behave very much like <tt><code></code></tt>.