Monday, October 12, 2009

Should PERL_UNICODE be considered harmful?

I set PERL_UNICODE to "SDL" as a matter of course when setting up my environment. This means that all of my filehandles will use the UTF-8 PerlIO layer unless the locale says otherwise or a specific layer is chosen explicitly. I do this because I don't want to have to worry about calling binmode or explicitly setting the PerlIO layer when opening a file:
open my $fh, "<:utf8", $filename
or die "could not open $filename: $!"
This has worked fine for me for years; however, recently I have noticed a few problems with it:
  • You cannot compile Perl will it set
  • Many modules fail their tests when it is set
  • Scripts that work just fine in your environment fail in other environments
  • since it affects all filehandles, it could cause bugs in modules (I have never actually seen this)
Given these issues, I am starting to consider PERL_UNICODE harmful and thinking about giving it up.

Does anyone know of any arguments to keep using it, or, conversely, more arguments to stop using it?

2 comments:

  1. You should not be using the :utf8 layer. You want :encoding(UTF-8) instead. The :utf8 layer does nothing more than the equivalent of _utf8_on – you want to de-/encode the data instead.

    ReplyDelete
  2. Thanks Aristotle, I don't tend to use the layers at all (since I use PERL_UNICODE), so I haven't paid much attention to which one I really want.

    ReplyDelete

Some limited HTML markup is allowed by blogger: strong, b, i, and a. You may also use em, but I have repurposed it through the magic of CSS to be behave very much like <tt><code></code></tt>.