Thursday, July 30, 2009

adventures in ignorance: continue

So, what is a continue block and why would you want to use one? I have a habit of writing mini-daemonoid (for real daemons look at Proc::Daemon) scripts. Here is a trivial example:
#!/usr/bin/perl

use strict;
use warnings;

#wait to be killed by SIGTERM or a control-c
my $continue = 1;
local $SIG{INT} = local $SIG{TERM} = sub { $continue = 0 };
while ($continue) {
print "foo: ", time(), "\n";
sleep 1;
}
This works very well for most purposes, but it has a minor annoyance: if you say next in the body of the loop, you create a tight loop that will consume 100% of a CPU. The problem here is that we would skip the sleep statement that puts a break on the loop. That sleep statement is not really part of the body of the loop. It something extra that we want to execute each time through the loop. And that is what a continue block is: something that runs each time through the loop. For example, the following code prints "Hello World\n" five times despite the fact that the next statement skips to the next iteration:
#!/usr/bin/perl

use strict;
use warnings;

for (1 .. 5) {
print "Hello ";
next;
print "beautiful ";
} continue {
print "World\n";
}
This means we can use the continue block to solve the problem:
#!/usr/bin/perl

use strict;
use warnings;

#wait to be killed by SIGTERM or a control-c
my $continue = 1;
local $SIG{INT} = local $SIG{TERM} = sub { $continue = 0 };
while ($continue) {
print "foo: ", time(), "\n";
} continue {
sleep 1;
}
Even though we don't need it now, it is good practice to identify the portions of your loop that should always run, and move them into continue blocks. This servers as a hedge against some future enhancement that might want to use next and it is a useful clue to some later maintainer that the code is important.

Sunday, July 26, 2009

adventures in ignorance: each, keys, and values

So, each, keys, and values all use the same iterator under the covers. I knew this and it is the reason I almost never use each. There is just too much chance of each leaving the iterator partway through the hash. For instance, the following code is fine:
while (my ($key, $value) = each %hash) {
do_something($key, $value);
}
But this code has a subtle bug just waiting to bite you:
eval {
while (my ($key, $value) = each %hash) {
do_something($key, $value);
}
1;
} or do {
die $@ unless $@ eq "We're all fine here now, thank you. How are you?\n";
}
If the eval block dies with the acceptable message then the code will continue on with a borked iterator. A more common mistake is the use of last in a loop using each:
while (my ($key, $value) = each %hash) {
last unless do_something($key, $value);
}
Adding or removing items from the hash while using each can also bite you. So, all of those little problems means I tend to write the loops above like this:
for my $key (keys %hash) {
my $value = $hash{$key};
do_something($key, $value);
}
All of this means I only dust off each and try to remember all of its issues when I know the number of keys in the hash (or the size of the keys themselves) is going to be huge in relation to memory (which generally means it is a tied dbm). However, it just struck me today that this behavior has can be used for good. I want to loop over a bunch of hash entries checking to see if the are all equal to each other. Now I could say:
my ($first, @others) = values %hash;
die "bad" if grep { $first ne $_ } @others;
but I could also say:
my ($key, $value) = each %hash;
die "bad" if grep { $value ne $_ } values %hash;
And the second bit of code is between twice as fast (tiny hashes) and five times as fast (mid-size and up):
#!/usr/bin/perl

use strict;
use warnings;

use Carp;
use Benchmark;

sub benchmark {
my $subs = shift;

my %results;
for my $sub (keys %$subs) {
$results{$sub} = $subs->{$sub}->();
}
my ($k, $v) = each %results;
croak "bad" if grep { $v ne $_ } values %results;

Benchmark::cmpthese -1, $subs;
}

for my $n (10, 100, 1_000, 10_000) {
my %h = map { $_ => $_ } 1 .. $n;
print "for $n:\n";
benchmark {
values => sub {
my ($first, @others) = values %h;
return join "", $first, @others;
},
each => sub {
my ($k, $v) = each %h;
return join "", values %h;
},
};
}
And yes, the reason I started thinking about this is the bit in the benchmark function. Of course, now that I am looking at it with a critical eye, I see it should be
my ($k, $sub) = each %$subs;
my $value = $sub->();
croak "bad" if first { $value ne $_->() } values %$subs

Saturday, July 25, 2009

adventures in ignorance: making do with the new \d

So, I have given up hope of \d being [0-9] in Perl 5. Even if it gets changed back in 5.12, it will be unsafe to consider it to be [0-9] for a long time (since it will still be wrong on 5.8 and 5.10, and we will need for those interpreters to leave the ecosystem). By the time it would be safe to assume \d means [0-9], Perl 6 will be here, and the current Perl 6 policy is that \d will continue to match any Unicode digit.

In light of this surrender, it would be nice if there were a simple way of specifying a specific digit. Right now, you see regexes like
    /(?<!\d) ( 03 (?: \d\d-\d{7} | \d{9} ) ) (?!\d)/gx
The hardcoded 03 in that regex causes a problem in the brave new world where we happily deal with digits other than [0-9]. To live in that world, we have three choices I can see:
  1. add new syntax to handle this to regexes
  2. use \d and a code block to check the value character
  3. create a character class of every 0 character and every 3 character
I am not certain what option 1 would look like (maybe \p{0}, \p{1}, etc.), but I am not holding my breath. Option 2 is dangerous because (?{}) is marked as experimental and is ugly (if it even works, I spent a couple of hours trying to make it work this morning to no avail). Option 3 is probably the most likely (if only because I can do it for myself) and safest (a new module won't have to worry about backwards compatibility or Unicode adding a numeric name) choice.

The problem is that between Perl 5.8 and 5.10, new digit characters were added to Unicode (this, by the way, is why Unicode::Digits is failing one of its tests on 5.8), so we can't use a static list, we must build it dynamically. Luckily there is a file, at least in Perl 5.8.0 – Perl 5.10.0, in one of the lib directories named unicore/lib/To/Digit.pl that has a mapping of digit characters to their decimal values. This makes it easy to build the character classes we need:
#!/usr/bin/perl

use perl5i;

my @digits;
for (split "\n", require "unicore/To/Digit.pl") {
my ($ord, $val) = split;
$digits[$val] .= "\\x{$ord}";
}
@digits = map { qr/[$_]/ } @digits;

my $mobile = qr{
(?<!\d) ( $digits[0] $digits[3] (?: \d\d-\d{7} | \d{9} ) ) (?!\d)
}x;

my $thai = "\x{0e53}" x 9; #9 THAI DIGIT THREE characters

my @cases = (
"0312-1234567",
"03123456789",
"03$thai",
"\x{0e50}\x{0e53}$thai",
"0212-1234567",
);

for my $case (@cases) {
say "$case ", $case =~ /$mobile/ ? "matches" : "doesn't match";
}
Which outputs:
0312-1234567 matches
03123456789 matches
03๓๓๓๓๓๓๓๓๓ matches
๐๓๓๓๓๓๓๓๓๓๓ matches
0212-1234567 doesn't match
If unicore/To/Digit.pl is supported (I have a question on the Perl 5 Porters list at the moment) I will probably be creating a nice interface to it and the other files. Once I have that interface I can build a better, more efficient version of Unicode::Digits and have better tests for it (i.e. ones that won't break because of the version of Perl).

Maybe this new world won't be as bad as I thought.

Friday, July 24, 2009

Google overreaches in latest anti-SPAM feature

Google is now adding an "Unsubscribe and report spam" option to newsletters and mailing lists. The problem I see with this is the classic "UNSUBSCRIBE ME" type post you tend to see on high traffic lists like Perl Beginners. People sign up for mailing lists. SPAM, by definition, is unsolicited. Encouraging people to mark things that are not SPAM as SPAM is wrong and dilutes the term. It is all well and good if Google wants to make it easy for people to unsubscribe from mailing lists, but they shouldn't be stigmatizing those mailing lists because someone wants to unsubscribe and doesn't know the proper way to do it.

Monday, July 20, 2009

adventures in ignorance: Multiple roles with MooseX::Declare

I am really liking Moose, in particular MooseX::Declare; however, the docs and error messages can be very confusing. For instance, I was trying to create a class that used two roles and kept getting the following error:
expected option name at [path to MooseX/Declare/Syntax/NamespaceHandling.pm] line 45
What this meant was that I needed to say:
class FooBar with (Fooable, Barable) {}
instead of:
class FooBar with Fooable, Barable {}
Now, I should never have gotten that error if I had followed the docs, so lets see what MooseX::Declare has to say about with:
with
    class Foo with Role { ... }

Applies a role to the class being declared.

No parentheses there (and no example of multiple roles). How about the Moose docs then?
with (@roles)

This will apply a given set of @roles to the local class.
Well, this has the parentheses, but let's look at how with is used in the examples:
package MovieCar;

use Moose;

extends 'Car';

with 'Breakable', 'ExplodesOnBreakage';
Hey! Where are the parentheses? In Moose they are optional, and in MooseX::Declare they are optional if you have only one role, but required if you have more than one.

Monday, July 13, 2009

When the alarm clock goes of unexpectedly.

Recently, I saw someone questioning the need for an alarm 0; after code like
eval {
alarm 5;
do_stuff();
alarm 0;
};
You don't need it under the two obvious code paths (code runs successfully within the time limit and code doesn't finish before the time limit), but if do_stuff(); dies, then you need to disable the alarm (because the alarm 0; in the block eval won't get a chance to run). My solution to this problem is
sub timeout {
my ($wait, $code, $timedout, $error) = (@_,
sub { warn $@ }, sub { die $@ });

eval {
local $SIG{ALRM} = sub { die "timeout\n" };
alarm $wait;
$code->();
alarm 0;
1;
} or do {
alarm 0; #ensure that alarm is not still set
#raise error if it isn't a timeout
if ($@ eq "timeout\n") {
$timedout->();
} else {
$error->();
}
};
}
This function takes between two and four arguments. The first two are the number of seconds to wait before timing out and a reference to the code to run respectively. The next argument is a reference to code that should be run in the event that the code times out, and the last is a reference to code that should be run in the event that an error occurs. Here are a few examples of how to call it:
timeout 1,
sub { die "oops\n" },
sub { warn "timeout out\n" },
sub { warn "died with $@" };

timeout 1,
sub { select undef, undef, undef, 2 },
sub { warn "timeout out\n" },
sub { warn "died with $@" };

timeout 1,
sub { print "normal execution\n" },
sub { warn "timeout out\n" },
sub { warn "died with $@" };

timeout 1, sub { select undef, undef, undef, 2 };
timeout 1, sub { die "and here it ends" };
This is probably reinventing the wheel, but it works for me.

Here is the full code.

Wednesday, July 8, 2009

»ö«

I am now the proud owner of xn--iba5a8l.net, also known as »ö«.net. I plan on putting some basic Perl 6 information (where to get Rakudo, Parrot, the specs, etc.) on a web site there shortly, but mostly I just wanted it because I think Camelia is cool.

Saturday, July 4, 2009

adventures in ignorance: hex vs oct

At one time I must have known this, but, like many parts of Perl I don't use on a common basis, it must have fallen out of my head. The hex function does exactly what I would expect it to do; that is it turns a string of hexadecimal digits into a Perl number. It can also handle strings that start with "0x". However, the oct function does significantly more than I would expect it to. In addition to converting strings of octal digits to Perl numbers, it can convert hexadecimal numbers (if they start with "0x") and binary numbers (if they start with "0b"). The reason for this is obvious: hex can't determine if "0b10" is binary for 2 or hexadecimal for 2_832, but oct can. There is no common convert function for the same reason. I do still find it odd that hex throws a warning and returns zero when confronted with leading spaces, but has no problem with trailing spaces.

#!/usr/bin/perl

use perl5i;

my @strings = ("10 ", " 10", "0b10", "010", "0x10");

say "testing hex:";
for my $string (@strings) {
say "\thex '$string' is '", hex $string, "'";
}

say "testing oct:";
for my $string (@strings) {
say "\toct '$string' is '", oct $string, "'";
}