Tuesday, October 20, 2009

Day six with Google Wave

Well, I am starting to get a real understanding of why they made the UI decisions that seemed weird at first. I am even considering reading the specs and seeing if I get talk to their implementation of the server with my own client (to get things like growl/libnotify notifications working).

On the browser side, it seems to work best with Google Chrome. This is not a big shock. It doesn't seem as memory hungry with Google Chrome (63 MB vs 200 MB in Safari or Firefox on the first page), but it does seem to grow in size over time as I access waves (102 MB after accessing 10 waves). Another nice thing about Chrome is its model for tabs. Each runs in its own process, so killing a tab frees all of the memory associated with it. This means that when it starts eating too much memory, I can simply start a new Google Wave tab and close the old one instead of having to shutdown the whole application (which is what I have resorted to with Firefox and Safari). The downside is that Google Chrome has not been released for OS X. Luckily there is a developer version available for Linux and OS X (which is what I have been running).

Thursday, October 15, 2009

Google Wave day two

I have finally had an IM-like exchange with one of my friends. It is confusing to say the least. Should I start a new blip after each exchange? Should I modify my blip in response to the other one instead? Should I clean the exchange up for others on the Wave? I figure social rules will develop similar to how bottom vs top vs inline rules developed for email; of course, that means we are in for another round of religious debates and flames.

The idea of flames brings up another thought. What will flames look like when you can edit not only your own messages, but other's messages as well? Google Wave needs a reputation system of some form. One that weights a person's reputation based on the opinion's of people you already trust. It also needs a way to control who can see a wave vs who can edit it. There are already read-only waves, but only the system can create them. I think it might become important for blips to be marked read-only, or writable only by a specific subset of the wave participants.

As for the memory issue I mentioned yesterday, it isn't so bad if I stay away from Waves that have many blips. Firefox (on OS X) is eating about 300 MB, which is up from the 100 MB it normally eats, but it has not yet exploded to the 1 GB mark like it did yesterday.

Wednesday, October 14, 2009

Day one with Google Wave

So I got a Google Wave invite yesterday morning and immediately signed up. The interface is nice, if a bit slow at times on my 2.16 GHz MacBook Pro with 2 GB of RAM. It also ate 1 GB at one point and Firefox started acting weirdly (left clicks on flash apps were generating right click behavior). Firefox normally eats around 100 MB of RAM on my machine, but right now with Google Wave open, and having only worked with two waves, it is eating 300 MB.

If have mostly used it like email so far, I haven't had a chance to play with the IM like features. I have used the wiki-like editing feature though, replayed the history of a couple of waves, and used the map gadget and everything worked pretty much the way I expected it to. I played a little with the kasyntaxy bot and it was interesting, but its Perl syntax highlighting leaves a lot to be desired.

If you are on Google Wave as well, drop me a line at chas.owens.

Monday, October 12, 2009

A blogger gadget for Ironman

So, since my badge is now working, I thought it would be a good idea to add my badge to my blog. It turned out to be fairly simple, and if you are using blogger, you can just reuse my work. All you need to do is
  1. Click "Customize"
  2. Click "Add a Gadget"
  3. Click "Add your own"
  4. enter the URL http://wonkden.net/ironman_perl_gadget.xml
  5. Click "ADD BY URL"
  6. Set the background color
  7. Set the name you registered with the Ironman competition
  8. Set the sex of your badge
  9. And, finally, click "Save"
Just for the record, here is the XML used to create the gadget:
<?xml version="1.0" encoding="UTF-8" ?> 
<Module>
<ModulePrefs
title = "Ironman Perl Gadget"
title_url = "http://wonkden.net/ironman_perl_gadget.xml"
height = "80"
author = "Chas. Owens"
author_email = "chas.owens@gmail.com"
/>
<UserPref
name = "bg"
display_name = "Background Color"
datatype = "string"
required = "true"
/>
<UserPref
name = "ironman-name"
display_name = "Ironman Perl name"
datatype = "string"
required = "true"
/>
<UserPref
name = "ironman-sex"
display_name = "Sex"
datatype = "enum"
required = "true"
>
<EnumValue value="male" display_value="Male"/>
<EnumValue value="female" display_value="Female"/>
</UserPref>

<Content type="html">
<![CDATA[
<script type="text/javascript">
var prefs = new _IG_Prefs();
var name = prefs.getString("ironman-name");
var sex = prefs.getString("ironman-sex");
var color = prefs.getString("bg");
var html = '<div align="center" style="background-color: ' + color + '"><img src="http://ironman.enlightenedperl.org/munger/mybadge/' + sex + '/' + name + '.png" alt="Ironman Perl badge for ' + name + '"></div>'

document.write(html);
</script>
]]>
</Content>
</Module>

Should PERL_UNICODE be considered harmful?

I set PERL_UNICODE to "SDL" as a matter of course when setting up my environment. This means that all of my filehandles will use the UTF-8 PerlIO layer unless the locale says otherwise or a specific layer is chosen explicitly. I do this because I don't want to have to worry about calling binmode or explicitly setting the PerlIO layer when opening a file:
open my $fh, "<:utf8", $filename
or die "could not open $filename: $!"
This has worked fine for me for years; however, recently I have noticed a few problems with it:
  • You cannot compile Perl will it set
  • Many modules fail their tests when it is set
  • Scripts that work just fine in your environment fail in other environments
  • since it affects all filehandles, it could cause bugs in modules (I have never actually seen this)
Given these issues, I am starting to consider PERL_UNICODE harmful and thinking about giving it up.

Does anyone know of any arguments to keep using it, or, conversely, more arguments to stop using it?

Monday, October 5, 2009

Yay! my badge works, Boo! no time for Perl

Yay! I am a bronze man. Also, all of my projects either on hold or are going to progress very slowly for the next few months. My wife has been in and out of the hospital for the last few months, the last time via an ambulance. I have been force to make lifestyle changes such as fixed times to go to bed at night (2200 ET versus somewhere between 2000 ET and 0600 ET) and get up in the morning (0630 ET vs somewhere between 0900 ET and 1100 ET) and the careful cooking of all of our meals (did you know there was a meal called breakfast that didn't entail just grabbing a soda on the way out the door?), so I don't have nearly as much discretionary time as I once did. Hopefully in a couple months she will be strong enough to start helping with things again and I will get some time back for Perl.

Monday, September 28, 2009

perlopref needs you

The perlopref document is nearing completion. I need to add the file test, quote-like, regex quote-like, and I/O operators, but now is a good time for other people to go over the document and make sure I haven't made any mistakes or omissions.

I am also thinking about changing the name to perlopquick since it isn't really a reference (it doesn't contain everything about the operators).

Sunday, September 20, 2009

giving some love to do

The do {} construct does not get much love in most of the code I see. Besides do {} while loops, it is very useful for combining steps and limiting the scope of variables:
my $contents = do {
local $/;
open my $fh, "<", $filename
or die "could not open $filename: $!";
<$fh>;
};
The snippet above keeps the change to $/ and the $fh variable local to the section that needs it.

Sunday, September 13, 2009

adventures in ignorance: what I have learned from perlopref so far

While working on perlopref I have learned a few things about Perl I had never known. This is a short list of the things I learned (besides the bit about modulo that I have already blog'ed about).

Only the plain assignment operator (=) can do list assignment. I found this out when I expected @a x= 5 to do something (besides throw an error that is). This makes sense for most of the assignment operators. For instance, @x += 2 makes very little sense, but I think @a x= 3 makes some sense, but alas it is an error.

The bitwise and operator (&) has different behavior with respect to strings than the bitwise or and xor operators (| and ^ respectively). Bitwise and truncates to the length of the shorter string, but bitwise or and xor extend the shorter string to the length of the longer one with nul characters.

The arrow operator has surprising behavior with respect to coderefs. This code throws an error as I would expect:
#!/usr/bin/perl

use strict;
use warnings;

sub func {
return [qw/a b c/];
}

print func()[0], "\n";
But this code prints "a\n":
#!/usr/bin/perl

use strict;
use warnings;

sub func {
return [qw/a b c/];
}

my $ref = \&func;

print $ref->()[0], "\n";
This seems to be an extension of the rule that lets you pretend that an AoA is really a multi-dimensional array. I knew this rule allowed you to say $aoa->[0][1]("arg1", "arg2"), but I had never tried it with the function call first.

Some people use ~~X (i.e. ~(~X)) to force scalar context on X, boy are they going to be annoyed by the smartmatch operator.

I go through bouts of knowing and forgetting this, but the flip-flop operators (.. and ...) actually return useful information, not just true or false:
#!/usr/bin/perl

use strict;
use warnings;

for my $i (1 .. 10) {
my $range = ($i == 2 or $i == 6) .. ($i == 4 or $i == 8);
print "$range: $i\n" if $range;
}
The code above prints
1: 2
2: 3
3E0: 4
1: 6
2: 7
3E0: 8

The comparison operators (<=> and cmp) return -1, 0, and 1 not negative, zero, and positive. I had always though it wasn't guaranteed what the return value was, but apparently it is.

Perl 5.10.1 does not have the err operator (i.e. the low precedence version of //. I could have sworn it did, but it is not there. It looks like I am not the only one confused by its non-existence though, perlop has a section titled "Logical or, Defined or, and Exclusive Or" that only describes or and xor.

The left bit shift operator (<<) is not defined if you you shift past the boundary of your native integer. I had always assumed that it dropped those bits, but apparently that is just the behavior of the versions of C I have used in the past (ANSI/ISO C does not define the behavior of overflow due to shifting, so it is a crap shoot).

The and and or operators have different precedence levels. The and operator binds more tightly than or, so
$x == 5 and $y == 6 or $x == 6 and $y == 5
is the same as
(($x == 5) and ($y == 6)) or (($x == 6) and ($y == 5))

Saturday, September 12, 2009

Adding house policies to Perl::Critic.

Recently on Stack Overflow, someone want code that could either modify Perl source code to remove comments that where on the same line as code, or warn him that comments were on the same line as code. Apparently he had a house style that needed to be enforced. I decided a Perl::Critic policy to warn him of the style violation was the way to go. I spent about fifteen minutes looking at the docs and writing code and I came up with this:
package Perl::Critic::Policy::CodeLayout::NoSideComments;

use strict;
use warnings;

use Readonly;

use Perl::Critic::Utils qw{ :severities :classification :ppi };
use parent 'Perl::Critic::Policy';

our $VERSION = 20090904;

Readonly::Scalar my $DESC => "side comments are not allowed";
Readonly::Scalar my $EXPL => "put the comment above the line, not next to it";

sub supported_parameters { return }
sub default_severity { return 5 }
sub default_themes { return qw( custom ) }
sub applies_to { return 'PPI::Token::Comment' }

sub violates {
my ($self, $elem) = @_;

#look backwards until you find whitespace that contains a
#newline (good) or something other than whitespace (error)

my $prev = $elem->previous_sibling;
while ($prev) {
return $self->violation( $DESC, $EXPL, $elem )
unless $prev->isa("PPI::Token::Whitespace");
return if $prev->content =~ /\n/;
$prev = $prev->previous_sibling;
}

#catch # after a block start, but leave the #! line alone
return $self->violation( $DESC, $EXPL, $elem )
unless $elem->parent->isa("PPI::Document");
return;
}

1;
I was surprised at just how easy PPI and Perl::Critic makes this.

Announcing perlcolor

So, recently on the Perl Beginners list there was some discussion about the merits and flaws of documentation that comes with Perl. At one point Raymond Wan said
Perldoc is somewhat hard to get into...but it's the manual for a programming language, so that's expected; I don't think having pages to color and draw on would be a feasible idea for the next update. :-)
I immediately disagreed. I think it is a fine idea and began work on such a document. At first it was simply a joke, but I am beginning to think that it might actually be a very good, light-hearted, introduction to Perl concepts such as CPAN, reporting bugs, etc. As with just about everything I do, I have put it up on GitHub. Fork it and add to it.

=head1 NAME

perlcolor - coloring book for new perlers

=head1 IMAGES

=head2 Camels and Llamas

Camels in connection with Perl are a trademark of O'Reilly Media. O'Reilly
publishes many of the most important Perl books such as Programming Perl (aka the Camel) and Learning Perl (aka the Llama). Have you read them?

               _
.--' |
/___^ | .--.
) | / \
/ | /` '.
| '-' / \
\ | |\
\ / \ /\|
\ /'----`\ /
|| \ |
(| (|
|| ||
jgs /_( /_(

_ _
( \__//)
.' )
__/b d . )
(_Y_`, .)
`--'-,-' )
(. )
( )
( )
( . ) .---.
( ) ( )
( . ) ( . )
( ) ( . ),
( . `"'` . `)\
( . .)\
(( . . ( . )\\
(( . ( ) \\
(( ) _( . . ) \\
( ( . )"'"`(.( ) ( ;
( ( ) ( ( . ) \'
|~( ) |~( )
| ||~| | ||~|
jgs | || | | || |
_| || | _| || |
/___(| | /___(| |
/___( /___(


=head2 Monkeys

In the end we are all code monkeys. Code, code monkeys, code!

                             .="=.
_/.-.-.\_ _
( ( o o ) ) ))
|/ " \| //
.-------. \'---'/ //
_|~~ ~~ |_ /`"""`\\ ((
=(_|_______|_)= / /_,_\ \\ \\
|:::::::::| \_\\_'__/ \ ))
|:::::::[]| /` /`~\ |//
|o=======.| / / \ /
jgs `"""""""""` ,--`,--'\/\ /
'-- "--' '--'


=head2 Butterflies

Camelia is a butterfly. She is the Perl 6 mascot. Can you find the hidden
P6?

           _                           _
/ `._ _.' \
( @ : `. .' : @ )
\ `. `. ._ _. .' .' /
\;' P. `. \ / .' 6' `;/
\`. `. \ \_/ / .' .'/
) :-._`. \ (:) / .'_.-: (
(`.....,`.\/:\/.',.....')
>------._|:::|_.------<
/ .'._>_.-|:::|-._<_.'. \
|o _.-'_.-^|:|^-._`-._ o|
|`' ;_.-'|:|`-._; `'|
jgs ".o_.-' ;."|:|".; `-._o."
".__." \:/ ".__."
^

=head2 Bugs

Bugs are errors in code. Sometimes the bug is in Perl, but most of the time
the bug is your code. If you think the bug is in someone else's code, you
should report it to them. CPAN has a link for reporting bugs on each
module's page. Have you ever found a bug?

          ,_      _,
'.__.'
'-, (__) ,-'
'._ .::. _.'
_'(^^)'_
_,` `>\/<` `,_
` ,-` )( `-, `
| /==\ |
,-' |=-| '-,
)-=(
jgs \__/



=head2 Penguins

This is Tux. He is the Linux mascot. Linux is an important platform for
Perl. Linux comes in many flavors: Redhat, Ubuntu, SUSE, and many others.
Do you run Linux?
                  ___
,-' '-.
/ _ _ \
| (o)_(o) |
\ .-""-. /
//`._.-'`\\
// : ; \
//. - '' -.| |
/: : | |
| | : ,/ /,
jgs _;'`-, ' |`.-' `\
) `\.___./; .'
'.__ )----'\__.-'
`""`


=head2 Daemons

This is the BSD Daemon. His name is Beastie. BSD is an important platform
for Perl. FreeBSD, NetBSD, OS X, and others are all variants of BSD. Do
you run a BSD variant?

                    ,        ,         
/( )`
\ \___ / |
/- _ `-/ '
(/\/ \ \ /\
/ / | ` \
O O ) / |
`-^--'`< '
(_.) _ ) /
`.___/` /
`-----' /
<----. __ / __ \
<----|====O)))==) \) /====
<----' `--' `.__,' \
| |
\ / /\
______( (_ / \______/
,' ,-----' |
`--{__________)


=head2 Strawberries

Strawberry Perl is a version of Perl for Microsoft Windows. It comes with
its own build environment to make installing Perl modules from CPAN easy.
Have you every installed a Perl module from CPAN?

          VVVVVVVV 
'oOOOOOOOOo'
'ooOOOOOOoo'
'oooOOooo'
'oooooo'
'oooo'



=head2 Shebang

The shebang line is what tells the OS the path to Perl. What is the path to
your Perl?

        #  #    #  #               !!  !!
# # # # !! !!
# # # # !! !!
#### #### #### !! !!
!! !!
#### #### #### !! !!
# # # # !! !!
# # # # !! !!
# # # # !! !!
#### #### #### !! !!

#### #### ####
# # # # !! !!
# # # # !! !!
# # # # !! !!


=head1 TODO

We need a good ASCII art version of Hexley (the Darwin Mascot).

My strawberry is terrible.

=head1 LICENSE

The BSD Daemon appears to be public domain. The strawberry and the #! are
released under the same terms as Perl. The rest of the ASCII art came from
L<http://www.geocities.com/soho/7373/index.html>.

Monday, September 7, 2009

What to call the ... operator?

What should we call Perl 5's ... operator? In list context it is identical to the range operator (..) and in scalar context it is almost identical to the flip-flop operator (also ..). The Camel and perlop are strangely silent on this. I am currently calling it the alternate range/flip-flop operator for lack of a better term. What do you call it?

Sunday, August 30, 2009

meta: no time to write what I want! badges! brokeness!

Well, it has been eight days and I am pushing the limit of staying in the competition (or at least not resetting the clock). There are a few things I want to write, but don't have the time to pay attention to right now. Quick list of posts I want to write so I don't forget them:
  • simple parsing
  • shared memory (IPC and persistence)
  • dealing with Unicode combining characters (needs more research)
In other news, the Perl Ironman Badges are up finally, but sadly my name seems to be breaking it. I can find a Chas and an Owens badge, but not a Chas. Owens badge. Also, I am showing up as a paperman in both which is either a bug in its calculation or mine.

Saturday, August 22, 2009

Projects in progress as of 2009-08-22

Unicode::Digits (GitHub, CPAN)

A test is failing on Perl 5.8. This is due to the test using a character that is a digit in Perl 5.10, but not in Perl 5.8. I need to fix this problem.

I also want to add a function that will return a regex that can match a given digit or digits (i.e. it should be able to match "42", "𝟜𝟚", or "᠔᠒"), but I am still not certain how the interface should work. For instance, should it only allow single digits (forcing the user to build more complicated regexes?
my $one = digit_one();
Should it take a string and replace the digits and character classes with the expanded character classes?
my $re = digit_regex "This matches 42";
Should it be OO based and let you chose specific sets of digits?
my $ud = Unicode::Digit->new(
"ASCII", #special case, "" looks stupid
"MATHEMATICAL DOUBLE-STRUCK",
"MONGOLIAN"
);
my $re = $ud->digit_regex("[1-3]");
I am leaning toward the last one. And I am leaning toward only allowing a small subset of regex-like syntax (basically numbers and character classes containing numbers or ranges only).

autobox::dump (GitHub, CPAN)

I need to finish extending it to handle YAML and Data::Dump::Streamer.

Term::Throbber (not published anywhere)

I saw Term::Spinner and hubris struck me. First I patched Term::Spinner to be able to handle arbitrary sized frames, but then I found my self annoyed by the interface, terminology, and use of vanilla Moose in Term::Spinner, so I have written my own MooseX::Declare based version. I haven't written any tests for it yet. Once the tests are written and passing (and I am satisfied with the interface) I will put it up on GitHub and CPAN.

perlopref (GitHub)

Lots more work needs to be done before this is finished. My goal of adding at least one operator a day failed. I need to recommit myself to getting it done.

warnin's (GitHub)

I think warnin's is going to die an ignoble death. The talk of removing ' as a package separator combined with my own apathy means it is unlikely for anything new to be done with it.

Sunday, August 16, 2009

project: Anouncing perlopref

In my last post I discussed the need for a perlopref document and even posted a portion of the version I am working on. I got some positive feedback and no negative feedback, so I have created a GitHub repository and uploaded what I currently have to it. There has already been one fork that fixed some of my POD stupidity.

If you want to help, just fork my repository, file an issue against my repository claiming an operator, make your change, and then submit a pull request. Note: not all of the operators are currently listed in the file (e.g. the filetest operators and the quote-like operators are not yet listed).

There is no guarantee that this document will make it into a release of Perl, but at the very least I will be creating a CPAN distribution for it.

Saturday, August 15, 2009

adventures in ignorance: modulo operator

Recently on the Perl Beginners mailing list I saw a new user having difficulty understanding the documentation for ||=. My first reaction was "hey, it is spelled out in straight forward English, ||= is like += but using || instead of +, go look up || and you there you are." Then I starting thinking about it. As a reference, perlop is less than optimal. Many of the operators are discussed tangentially (like ||=) and many others are never mentioned (like the file test operators, which I know are documented in perlfunc, but they look like operators to me). This has inspired me to write perlopref. I haven't socialized this anywhere but the Perl Beginners mailing list and here because I want to make sure the idea is viable first. So far it seems to be working for me, and I am learning a lot of the nooks and crannies I had been able to ignore in the past.

One of these nooks (or is it a cranny?) is the modulo operator (%), or more specifically what happens with negative numbers. I had never bother to consider how negative numbers would affect modulo. I found the text in perlop to be very opaque. Every time I tried to read it I found my eyes slipping down the page trying to get away, and I know what modulo does. I don't know if it is me, or the text, but I can't imagine trying to understand what the text was saying if I didn't already know what it did. Here is the part that covers modulo in my first draft of perlopref.pod (the pod is available here)

X % Y
Description

This is the modulo operator. It computes the remainder of X divided by Y. The remainder is affect by the type of the numbers and whether they are positive or negative.

Given integer operands X and Y: If Y is positive, then X % Y is X minus the largest multiple of Y less than or equal to X. If Y is negative, then X % Y is X minus the smallest multiple of Y that is not less than X (i.e. the result will be less than or equal to zero). To illustrate this, here are the results of modding -9 through 9 with 4:
when X is     -9 -8 -7 -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9
the result is 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
And here is -9 through 9 modded with -4:
when X is     -9 -8 -7 -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9
the result is -1 0 -3 -2 -1 0 -3 -2 -1 0 -3 -2 -1 0 -3 -2 -1 0 -3
From this we can see a positive Y constrains X to a range from 0 to (Y - 1) that wraps around and a negative Y constrains X to a range from (Y + 1) to 0.

When Y is a floating point number whose absolute value is in the range of 0 to (UV_MAX + 1) (where UV_MAX is the maximum of the unsigned integer type) X and Y are truncated to integers. If the absolute value of Y is larger than (UV_MAX + 1) then the formula (X - I * Y) (where I is a certain integer that makes the result have the same sign as Y). For example, on 32-bit systems 4.5 % (2 ** 32 - 1) is 4, but 4.5 % 2 ** 32 is 4.5.

Note: when the integer pragma is in scope % gives you direct access to the modulo operator as implemented by your C compiler. This operator is not as well defined for negative operands, but it will execute faster.

Example
my $odd = $x % 2; #$odd is 1 when $x is odd and 0 when $x is even
my $hour = ($hour + 1) % 24; # 23 (11pm) plus 1 hour is 0 (12am).

Sunday, August 9, 2009

meta: how I am keeping myself motivated part 2

oylenshpeegul asked on my last meta post about whether or not Google Analytics tracks RSS feeds. It does not; however, another Google service does: FeedBurner.The data for clicks matches up pretty well against what Analytics is saying. I find the interface to Analytics nicer than FeedBurner, but FeedBurner has statistics about how the feed is being used. I will probably visit both now.

Friday, August 7, 2009

error building Perl 5.10.1-RC1

So, if you are trying to compile 5.10.1-RC1 and it throws the error "Can't locate unicore/PVA.pl in @INC", you most likely have the PERL_UNICODE environment variable set. A quick
unset PERL_UNICODE
will fix the problem (or, at least, it did for me).

meta: how I am keeping myself motivated

So, I have had blogs in the past, but I have never managed to stay interested in them. This time is different and the reason is simple: data. I turned on Google Analytics and I visit the graphs once or twice a day:

Just this little reassurance that people are looking at what I am writing is keeping the blog in my mind. Because it is in my mind, when something happens (like, say, my next article about my stupidity using split and ord) I remember to start writing stuff down.

Tuesday, August 4, 2009

I am living in the future

I saw an interesting post today on the Perl Iron Man feed about the fact that /\Q foo \E/x turns into /\ foo\ / despite the /x option being set. This is not so remarkable because of the content (which was interesting and unexpected, although I am not sure what I would have expected to happen there), but rather because the post was in Japanese and I can't read Japanese. With machine translation I was able to understand what he/she was driving at. Truly, we are living in the future.

Thursday, July 30, 2009

adventures in ignorance: continue

So, what is a continue block and why would you want to use one? I have a habit of writing mini-daemonoid (for real daemons look at Proc::Daemon) scripts. Here is a trivial example:
#!/usr/bin/perl

use strict;
use warnings;

#wait to be killed by SIGTERM or a control-c
my $continue = 1;
local $SIG{INT} = local $SIG{TERM} = sub { $continue = 0 };
while ($continue) {
print "foo: ", time(), "\n";
sleep 1;
}
This works very well for most purposes, but it has a minor annoyance: if you say next in the body of the loop, you create a tight loop that will consume 100% of a CPU. The problem here is that we would skip the sleep statement that puts a break on the loop. That sleep statement is not really part of the body of the loop. It something extra that we want to execute each time through the loop. And that is what a continue block is: something that runs each time through the loop. For example, the following code prints "Hello World\n" five times despite the fact that the next statement skips to the next iteration:
#!/usr/bin/perl

use strict;
use warnings;

for (1 .. 5) {
print "Hello ";
next;
print "beautiful ";
} continue {
print "World\n";
}
This means we can use the continue block to solve the problem:
#!/usr/bin/perl

use strict;
use warnings;

#wait to be killed by SIGTERM or a control-c
my $continue = 1;
local $SIG{INT} = local $SIG{TERM} = sub { $continue = 0 };
while ($continue) {
print "foo: ", time(), "\n";
} continue {
sleep 1;
}
Even though we don't need it now, it is good practice to identify the portions of your loop that should always run, and move them into continue blocks. This servers as a hedge against some future enhancement that might want to use next and it is a useful clue to some later maintainer that the code is important.

Sunday, July 26, 2009

adventures in ignorance: each, keys, and values

So, each, keys, and values all use the same iterator under the covers. I knew this and it is the reason I almost never use each. There is just too much chance of each leaving the iterator partway through the hash. For instance, the following code is fine:
while (my ($key, $value) = each %hash) {
do_something($key, $value);
}
But this code has a subtle bug just waiting to bite you:
eval {
while (my ($key, $value) = each %hash) {
do_something($key, $value);
}
1;
} or do {
die $@ unless $@ eq "We're all fine here now, thank you. How are you?\n";
}
If the eval block dies with the acceptable message then the code will continue on with a borked iterator. A more common mistake is the use of last in a loop using each:
while (my ($key, $value) = each %hash) {
last unless do_something($key, $value);
}
Adding or removing items from the hash while using each can also bite you. So, all of those little problems means I tend to write the loops above like this:
for my $key (keys %hash) {
my $value = $hash{$key};
do_something($key, $value);
}
All of this means I only dust off each and try to remember all of its issues when I know the number of keys in the hash (or the size of the keys themselves) is going to be huge in relation to memory (which generally means it is a tied dbm). However, it just struck me today that this behavior has can be used for good. I want to loop over a bunch of hash entries checking to see if the are all equal to each other. Now I could say:
my ($first, @others) = values %hash;
die "bad" if grep { $first ne $_ } @others;
but I could also say:
my ($key, $value) = each %hash;
die "bad" if grep { $value ne $_ } values %hash;
And the second bit of code is between twice as fast (tiny hashes) and five times as fast (mid-size and up):
#!/usr/bin/perl

use strict;
use warnings;

use Carp;
use Benchmark;

sub benchmark {
my $subs = shift;

my %results;
for my $sub (keys %$subs) {
$results{$sub} = $subs->{$sub}->();
}
my ($k, $v) = each %results;
croak "bad" if grep { $v ne $_ } values %results;

Benchmark::cmpthese -1, $subs;
}

for my $n (10, 100, 1_000, 10_000) {
my %h = map { $_ => $_ } 1 .. $n;
print "for $n:\n";
benchmark {
values => sub {
my ($first, @others) = values %h;
return join "", $first, @others;
},
each => sub {
my ($k, $v) = each %h;
return join "", values %h;
},
};
}
And yes, the reason I started thinking about this is the bit in the benchmark function. Of course, now that I am looking at it with a critical eye, I see it should be
my ($k, $sub) = each %$subs;
my $value = $sub->();
croak "bad" if first { $value ne $_->() } values %$subs

Saturday, July 25, 2009

adventures in ignorance: making do with the new \d

So, I have given up hope of \d being [0-9] in Perl 5. Even if it gets changed back in 5.12, it will be unsafe to consider it to be [0-9] for a long time (since it will still be wrong on 5.8 and 5.10, and we will need for those interpreters to leave the ecosystem). By the time it would be safe to assume \d means [0-9], Perl 6 will be here, and the current Perl 6 policy is that \d will continue to match any Unicode digit.

In light of this surrender, it would be nice if there were a simple way of specifying a specific digit. Right now, you see regexes like
    /(?<!\d) ( 03 (?: \d\d-\d{7} | \d{9} ) ) (?!\d)/gx
The hardcoded 03 in that regex causes a problem in the brave new world where we happily deal with digits other than [0-9]. To live in that world, we have three choices I can see:
  1. add new syntax to handle this to regexes
  2. use \d and a code block to check the value character
  3. create a character class of every 0 character and every 3 character
I am not certain what option 1 would look like (maybe \p{0}, \p{1}, etc.), but I am not holding my breath. Option 2 is dangerous because (?{}) is marked as experimental and is ugly (if it even works, I spent a couple of hours trying to make it work this morning to no avail). Option 3 is probably the most likely (if only because I can do it for myself) and safest (a new module won't have to worry about backwards compatibility or Unicode adding a numeric name) choice.

The problem is that between Perl 5.8 and 5.10, new digit characters were added to Unicode (this, by the way, is why Unicode::Digits is failing one of its tests on 5.8), so we can't use a static list, we must build it dynamically. Luckily there is a file, at least in Perl 5.8.0 – Perl 5.10.0, in one of the lib directories named unicore/lib/To/Digit.pl that has a mapping of digit characters to their decimal values. This makes it easy to build the character classes we need:
#!/usr/bin/perl

use perl5i;

my @digits;
for (split "\n", require "unicore/To/Digit.pl") {
my ($ord, $val) = split;
$digits[$val] .= "\\x{$ord}";
}
@digits = map { qr/[$_]/ } @digits;

my $mobile = qr{
(?<!\d) ( $digits[0] $digits[3] (?: \d\d-\d{7} | \d{9} ) ) (?!\d)
}x;

my $thai = "\x{0e53}" x 9; #9 THAI DIGIT THREE characters

my @cases = (
"0312-1234567",
"03123456789",
"03$thai",
"\x{0e50}\x{0e53}$thai",
"0212-1234567",
);

for my $case (@cases) {
say "$case ", $case =~ /$mobile/ ? "matches" : "doesn't match";
}
Which outputs:
0312-1234567 matches
03123456789 matches
03๓๓๓๓๓๓๓๓๓ matches
๐๓๓๓๓๓๓๓๓๓๓ matches
0212-1234567 doesn't match
If unicore/To/Digit.pl is supported (I have a question on the Perl 5 Porters list at the moment) I will probably be creating a nice interface to it and the other files. Once I have that interface I can build a better, more efficient version of Unicode::Digits and have better tests for it (i.e. ones that won't break because of the version of Perl).

Maybe this new world won't be as bad as I thought.

Friday, July 24, 2009

Google overreaches in latest anti-SPAM feature

Google is now adding an "Unsubscribe and report spam" option to newsletters and mailing lists. The problem I see with this is the classic "UNSUBSCRIBE ME" type post you tend to see on high traffic lists like Perl Beginners. People sign up for mailing lists. SPAM, by definition, is unsolicited. Encouraging people to mark things that are not SPAM as SPAM is wrong and dilutes the term. It is all well and good if Google wants to make it easy for people to unsubscribe from mailing lists, but they shouldn't be stigmatizing those mailing lists because someone wants to unsubscribe and doesn't know the proper way to do it.

Monday, July 20, 2009

adventures in ignorance: Multiple roles with MooseX::Declare

I am really liking Moose, in particular MooseX::Declare; however, the docs and error messages can be very confusing. For instance, I was trying to create a class that used two roles and kept getting the following error:
expected option name at [path to MooseX/Declare/Syntax/NamespaceHandling.pm] line 45
What this meant was that I needed to say:
class FooBar with (Fooable, Barable) {}
instead of:
class FooBar with Fooable, Barable {}
Now, I should never have gotten that error if I had followed the docs, so lets see what MooseX::Declare has to say about with:
with
    class Foo with Role { ... }

Applies a role to the class being declared.

No parentheses there (and no example of multiple roles). How about the Moose docs then?
with (@roles)

This will apply a given set of @roles to the local class.
Well, this has the parentheses, but let's look at how with is used in the examples:
package MovieCar;

use Moose;

extends 'Car';

with 'Breakable', 'ExplodesOnBreakage';
Hey! Where are the parentheses? In Moose they are optional, and in MooseX::Declare they are optional if you have only one role, but required if you have more than one.

Monday, July 13, 2009

When the alarm clock goes of unexpectedly.

Recently, I saw someone questioning the need for an alarm 0; after code like
eval {
alarm 5;
do_stuff();
alarm 0;
};
You don't need it under the two obvious code paths (code runs successfully within the time limit and code doesn't finish before the time limit), but if do_stuff(); dies, then you need to disable the alarm (because the alarm 0; in the block eval won't get a chance to run). My solution to this problem is
sub timeout {
my ($wait, $code, $timedout, $error) = (@_,
sub { warn $@ }, sub { die $@ });

eval {
local $SIG{ALRM} = sub { die "timeout\n" };
alarm $wait;
$code->();
alarm 0;
1;
} or do {
alarm 0; #ensure that alarm is not still set
#raise error if it isn't a timeout
if ($@ eq "timeout\n") {
$timedout->();
} else {
$error->();
}
};
}
This function takes between two and four arguments. The first two are the number of seconds to wait before timing out and a reference to the code to run respectively. The next argument is a reference to code that should be run in the event that the code times out, and the last is a reference to code that should be run in the event that an error occurs. Here are a few examples of how to call it:
timeout 1,
sub { die "oops\n" },
sub { warn "timeout out\n" },
sub { warn "died with $@" };

timeout 1,
sub { select undef, undef, undef, 2 },
sub { warn "timeout out\n" },
sub { warn "died with $@" };

timeout 1,
sub { print "normal execution\n" },
sub { warn "timeout out\n" },
sub { warn "died with $@" };

timeout 1, sub { select undef, undef, undef, 2 };
timeout 1, sub { die "and here it ends" };
This is probably reinventing the wheel, but it works for me.

Here is the full code.

Wednesday, July 8, 2009

»ö«

I am now the proud owner of xn--iba5a8l.net, also known as »ö«.net. I plan on putting some basic Perl 6 information (where to get Rakudo, Parrot, the specs, etc.) on a web site there shortly, but mostly I just wanted it because I think Camelia is cool.

Saturday, July 4, 2009

adventures in ignorance: hex vs oct

At one time I must have known this, but, like many parts of Perl I don't use on a common basis, it must have fallen out of my head. The hex function does exactly what I would expect it to do; that is it turns a string of hexadecimal digits into a Perl number. It can also handle strings that start with "0x". However, the oct function does significantly more than I would expect it to. In addition to converting strings of octal digits to Perl numbers, it can convert hexadecimal numbers (if they start with "0x") and binary numbers (if they start with "0b"). The reason for this is obvious: hex can't determine if "0b10" is binary for 2 or hexadecimal for 2_832, but oct can. There is no common convert function for the same reason. I do still find it odd that hex throws a warning and returns zero when confronted with leading spaces, but has no problem with trailing spaces.

#!/usr/bin/perl

use perl5i;

my @strings = ("10 ", " 10", "0b10", "010", "0x10");

say "testing hex:";
for my $string (@strings) {
say "\thex '$string' is '", hex $string, "'";
}

say "testing oct:";
for my $string (@strings) {
say "\toct '$string' is '", oct $string, "'";
}

Friday, June 26, 2009

Perl Ironman

This blog is my entry in the Perl Ironman competition. I had been thinking about it for a while, but MST's lightning talk at YAPC NA pushed me over the edge.

So, what to expect from the content? Well, in addition to any good emails or answers I write, I am planning on a journey through Perl 5's standard library. There is a lot of stuff in there that I have never known about or have forgotten. The realization that I need to take this journey came when I wanted to know the current process's parent's PID. I hit Google and was shocked to find that getppid is a core function.

Perl 6 is also likely to come up from time to time as I play with Rakudo.

Thursday, June 25, 2009

An Email to a Beginner

On Thu, Jun 25, 2009 at 09:23, REDACTED wrote:
> Respected Sir,

Hmm, was this supposed to be directed at me? I am just a geek.

> I am very keen to gather knowledge of perl. I use perl for last six months,
> and I am new to perl. So I need your help and kind suggestion to develop my
> skill in perl.

There is a simple four prong approach:
  • read
  • write
  • ask questions
  • answer questions
You need to read docs, books, blogs, and code.

The docs are available on your system through the perldoc command
(type perldoc perldoc to learn how to use it), but you can also
access of the information online at perldoc.perl.org (core language)
and search.cpan.org (everything under the sun).

Suggested books are
If you do not have a computer science background, you will want to get
a good algorithms book. I have not looked at it in any detail, but I
have heard good things about
but beware, the book was written ten years ago, and Perl has had two
major releases since then. There are many new features to take
advantage of (this advice also applies for Programming Perl).

Read programming blogs, some blogs I read are:
If you have co-workers or friends who are writing Perl code, take a look at it. CPAN is also a good source for code to read.

Just as important as reading is writing. You should be reading a part
of one of the books and then writing code. And I don't just mean the
examples from the book. Play with the concept the book introduced.
See what you can make it do. These don't have useful programs. And
don't worry when you make mistakes, Every programmer makes mistakes.
By making, and learning from, these mistakes now, you will be better
off later.

In addition to playing with the concepts in the books, you should
choose an ambitious project to implement. When I started out with
Perl, I was a DBA/Developer, and I wanted a nice SQL editor. I was
running on Linux at the time and was missing Informix's SQL Editor (it
only ran on MS Windows) and was dissatisfied with dbaccess (their
terminal client). I looked around and found a GUI toolkit (Gtk) that
worked with Perl and a way to connect to the database (DBI) and I just
started trying to make something work. When I finally got something
working, I realized I needed a new feature, and got coding again. And
so the cycle goes. When you run out of ideas, you can always try
reimplementing UNIX commands. There is wealth of programming
information to be gleaned by doing this.

And don't be afraid to ask questions if something confuses you are you
can't get something to work. I suggest two resources: the perl
beginner's list
and Stack Overflow; just remember not to post the
same question to both (it annoys people).

Answering questions is just as important as asking them. If you think
you know the answer, respond to the question. You will probably be
smacked down, but what is worse: thinking you know the answer when you
don't or a little bit of embarrassment? To try to get you over being
afraid of embarrassment, here is an exchange where I was thoroughly
brought to task for being wrong
, and this didn't happen years ago, it
happened this month. The trick is to care more about the information than your pride.

--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.