Rewriting Perl code for Raku Part V

Last week we started to talk about the pack() and unpack() builtins for Raku and Perl. These aren’t terribly common built-ins to use, so I thought I’d take some time to go over these in detail and talk about how I use them and debug files that use them.

As a gentle reminder, OLE::Storage_Lite is a Perl module to read and write a subset of the Microsoft OLE storage format. As part of my effort at the start, I’ve got a “translation” of the original Perl code pounded out, without much thought to whether it’ll work, or really even compile. It looks like the Perl version, but with most of the {} changed to <> and -> changed to ..

What to test first… The reading side seems to be the easiest, because I can check object-by-object to see what the data should look like. Replicating that for Raku becomes essentially fixing the bugs I know I’ve introduced on the way.

Testing testing… is this on?

Before we dive into the Raku code, though, let’s just set up a quick test in Perl. There really wasn’t one to begin with, which is a testament to how well-used the module is. I’ve got a ‘test.xls’ file that I’ve already checked in LibreOffice to make sure it works, so I’ll add a test script that reads the file and checks the root object.

use Test::More;
use OLE::Storage_Lite;

my $root = OLE::Storage_Lite->new( 'sample/test.xls' );
use YAML; die Dump($root);
isa_ok $root, 'OLE::Storage_Lite::PPS::Root';
is $root->No, 0;
is $root->PrevPps, 0xfffffffe;

You might be reading the code and wondering what the heck die() is doing in a test suite. It’s not because in my current copy it’s commented out, but it’s a quick and dirty way to get the data for the Raku version of the file, which looks almost the same.

use Test;
use OLE::Storage_Lite;

my $root = 'sample/test.xls' );
die $root.perl;
isa_ok $root, 'OLE::Storage_Lite::PPS::Root';
is $root.No, 0;
is $root.PrevPps, 0xfffffffe;

Notice there’s hardly any difference overall, just a few minor syntax tweaks. And I don’t need to use YAML. But I’ve got a Q&D way to run my code, and since my screen looks something like this:

I’ve got most of what I need in my face. This is all a rather plain TMUX setup, running multiple panes so I can see what’s going on. On the left is vim running in split-screen mode with the Perl and Raku test files open. The rest are shells in the Perl and Raku directories, and some commands to get byte dumps of the files.

I’ve also in my shells set up the following aliases:

alias 5 = "perl -Ilib"
alias 5p = "prove -Ilib"
alias 6 = "perl6 -Ilib"
alias 6p = "prove -e'perl6 -Ilib'"

This way I can run both Perl and Raku test suites with just a few keystrokes, and not have to worry about details such as -I paths. You’re of course welcome to do things exactly the same, completely different, or even radically better than I am, in which case please let me know.

You might notice the use of the language’s old name here. I haven’t changed over to the new binaries yet, but the techniques I’ll talk about here won’t change.

Keeping it Clean

We now have two scripts that should produce the same output, but probably won’t, for any number of reasons. I’ve got a whole article’s worth of things that I had to do to make the new module compile, let alone run. But that’s for a later issue.

Let’s start out with this section, which might be familiar to longtime (ha!) readers. 

  $rhInfo->{_FILEH_}->seek(0, 0);
  $rhInfo->{_FILEH_}->read($sWk, 8);
  return undef unless($sWk eq "xD0xCFx11xE0xA1xB1x1AxE1");

This is in Perl, of course. In Raku I’ve chosen to write

  $ 0, SeekFromBeginning );
  my Str $sWk = $ 8 ).unpack('A8');
  die "Header ID incorrect" if $sWk ne HEADER-ID;

It’s a bit ungraceful to die() inside a module, but this guarantees that execution stops way before it can cause a hard-to-debug problem down the road. The first change is that I’ve refactored $rhInfo->{_FILEH_} out into its own $file variable so I don’t have to repeat references to $rhInfo all over the place, like the original.

Next is using the built-in IO::Handle constant ‘SeekFromBeginning’ instead of the rather anodyne 0 as in Perl. Probably the parent OLE::Storage module looked ahead in the file to determine something before reading in earnest. I’m keeping it here for no good reason other than it might be nice to separate ‘read’ functionality into a different method.

Diving in

The next line will cause some consternation, so I’ll unpack it slowly. The original author used Hungarian notation for their variable names, so the ‘s’ of $sWk means that it’s a string type. I’ve adopted this for the Raku code as well, actually enforcing the variable type without additional code.

File handles have both a fancy lines() method that lets you read files line-by-line, and a raw read()method that lets you read raw bytes. If I stopped right here and just looked at the raw bytes, the code would actually fail, and I’ve talked about why in earlier parts. Suffice to say that read()returns a buffer of uninterpreted bytes, not a string that you have to decode later.

Decoding here is the job of the unpack() statement. It acts just like its Perl counterpart, but is experimental. Lucky for me, it implements enough of the Perl builtin that I can use it to read the entire OLE file. 

Now, unlike other builtins (again, keeping in mind it’s experimental,) it’s only available as a method call. There is a version of unpack() that works on multiple arguments, but if you try to call it as a builtin, expect:

===SORRY!=== Error while compiling -e
Undeclared routine:
    unpack used at line 1. Did you mean 'pack'?

This may be fixed in your version, feel free to try it and let me know if I should upgrade 🙂 In any case, the last bit you’re wondering about is the ‘A8’ business as its argument. I think this isn’t explained correctly in the documentation, so I’ll explain in my own way.

read() returns a raw string of bytes, without interpretation. If it sees hex 041, it doesn’t “know” if you meant the ASCII character ‘A’ or the number 41, so it doesn’t interpret the data, it just puts the data into the buffer. It relies on the the Buf(fer)’s pack() and unpack() methods to assign types to the data.

So finally, unpack( "A8" ) pulls out 8 “ASCII” characters and puts them into $sWk. Now I used scare-quotes there because ASCII is a 7-bit encoding, not 8 bits as many people seem to think. It only encodes from 0x00-0x7f, so anything over that isn’t legal ASCII.

Which just means that the “A” of “A8” doesn’t truly correspond to ASCII, but it’s close enough. So, we call unpack( “A8” ) on the buffer that $ 8 ) returns, and get back a string that we can finally check against our header.


But what if the header isn’t what we expect? Your first instinct might be to say you must’ve screwed up and sent it the wrong file. Luckily it’s pretty easy to check that, just call $file.slurp.print; That’ll tell you the contents quickly. If it’s text you’ve probably got the wrong file – OLE files do contain text but it’s usually zip’ed or in UCS-2.

Let’s assume though that it’s an actual binary file, and a real spreadsheet that Excel (or LibreOffice in my case) can read. Since the headers don’t match, it must be a different version of OLE that our code isn’t ready to handle.

That means we need to know what the first 8 bytes of the file actually are. We’ve got a bunch of tools at our disposal, but what I want to introduce is hexdump(1) (don’t worry about the (1), force of habit.) Run this command on the file:

hexdump -C sample-file.xls | head -1

This should generate something like this:

00000000  d0 c9 11 a0 af b1 13 d1  00 00 00 00 00 00 00 00  |................|

(original bytes changed to protect the innocent file) The numbers on the left (‘00000000’) tell us how far we are into the file (in hex), the next two groups of 8 are the hex values of the individual bytes of the file, and the dots between ‘|..|’ are where any printable characters would appear, if there were any.

So now we know what the first 8 bytes of this file look like, and we can add (without much muss or fuss) some checks to our original file, and come up with this:

$ 0, SeekFromBeginning );
my Str $sWk = $ 8 ).unpack('A8');
die "Unknown OLE header!" if $sWk eq "xd0xc9x11xa0xafxb1x13xd1";
die "Header ID incorrect" if $sWk ne HEADER-ID;

This check isn’t in my source, so don’t go looking for it. As far as I know there aren’t any other OLE header strings than what I check for, but then I’m trying to get away without reading the spec. My blood pressure doesn’t need that.

Getting at the details

Of course, binary packed formats contain more stuff than just ASCII strings. OLE was originally written in the days of 16-bit CPUs, so it’s got other ways to pack in data. Let’s look at a fragment of the file format: (not from the spec, this is just my interpretation)

0000: 0xD0 0xCF 0x11 0xE0 0xA1 0xB1 0x1A 0xE1 # header
0008: 0x00 0x09           # size of large block of data (in power-of-2)
000a: 0x00 0x06           # size of small block of data
000c: 0x00 0x00 0x00 0x03 # Number of BDB blocks
0010: 0xff 0xff 0xff 0xfe # Starting block

So, this is the first 20 (0x0010+4) bytes of an OLE header block. You may have already caught on to the fact that there are at least 3 sizes of data here. The first 8 bytes on line 0000 is the header data we talked about ad nauseam.

Next, the header says that a “large” block of data is 2**9 bytes long, and a “small” block of data is 2**6 bytes long, this time in pairs of bytes. Finally we’ve got the number of BDB blocks (whatever those are, probably Berkeley DB) and the starting block’s index number, all in 4-byte chunks.

This means we need to read 2 2-byte chunks and 2 4-byte chunks into memory. This time though, we have to read them as numbers. Once again, unpack() comes to the rescue. Last time we used the ‘A’ character, this time we’ll do something just a little bit different.

Let’s read the documentation for unpack() to see what we can use. Halfway down the page we come to a table which gives us the letter abbreviations for each type of data we can read, and what it is in terms of where it is in memory.

For now, replace the term ‘element’ with ‘byte’ while you’re reading the documentation. We need to read (0x00, 0x09) as a 2-byte integer, so let’s look for “two elements” on the right-hand side. “Extracts two elements and returns them as a single unsigned integer” seems to be what we need.

So it looks like the letter we need to use is “S”, and since we only want to read one at a time, that’s all we need. But the original Perl source uses “v”, so that’s what I’ll use as well.

  $iWk = _getInfoFromFile($rhInfo->{_FILEH_}, 0x1E, 2, "v");
  return undef unless(defined($iWk));
  $rhInfo->{_BIG_BLOCK_SIZE} = 2 ** $iWk;

But as you can see, the Perl source creates a wrapper around the pack() method, much to my annoyance. I’d prefer to simply write this:

$iWk = $ 2 ).unpack( "v" );
%hInfo<_BIG_BLOCK_SIZE> = 2**$iWk;

but to keep things looking as similar to the original Perl code as I can, my code looks like 

  my Int $iWk = self._getInfoFromFile( $file, 0x1E, 2, "v" );
  die "Big block size missing" unless defined( $iWk );
  %hInfo<_BIG_BLOCK_SIZE> = 2 ** $iWk;

which is just one line longer, and that’s because of the safety check. Of course, pack() and unpack()can take more than one format character at time. In Perl, there’s yet another mini-language (like regex, and what used to be called the format statement) for these builtins, and that’s not quite done yet.

But you can still take the entire header we’ve collected so far, and write it into a single unpack()statement like so:

my ( $header, $large-size, $small-size, $num-bdbs, $start-block ) =
  $ 20 ).unpack( "A8 vv VV" );

This format is of course much more compact and much easier to read. In all probability once I get done with the main module I’ll convert everything over to this style and the code will become much, much quieter. Binary protocols, especially those for moisture evaporators, tend to have lots of code that looks like:

my $rev = $"v")
if $rev == 0x01 {
  $r2 = $"v");
} else {
  $d2 = $"V");

where the next bytes you read depend upon the version of the protocol. Even though I’ve just been rattling off code based on the Perl version, I don’t know what the protocol may do at any given point. So it makes sense to read just one int or long ahead while developing.

I could read a version number as “V” because they started out using “v1”, “v2” and so on up to “v42792643522”. But then 30 lines and 2 revs later they may have changed from “V” to “vcc” because they wanted to support “v2.1.0” style.

And if that header were something like “A8 V CC* V vv” I have to go back and break up the format string and statement at the very least. If I go term-by-term I just have to find the version number and add an if-then statement just below.

Now that you’ve got a fairly good grounding in unpack(), I think it’s time for break. Next time we’ll cover writing our file back out, the most fun part of the operation.

Again, many thanks to those of you that have read this far. As usual, Gentle Reader, please feel free to leave constructive questions, comments, critiques and improvements in the comment section. I do require an email address for validation, but I don’t use it for any other purpose. Thank you again, and I’ll see you in part VI of this series.

Rewriting Perl Code for Raku IV: A New Hope

Back in Part III of our series on Raku programming, we talked about some of the basics of OO programming. This time we’ll talk about another aspect of OO programming. Perl objects can be made from any kind of reference, although the most common is a hash. I think Raku objects can do the same, but in this article we’ll just talk about hash-style Perl objects.

Raku objects let you superclass and subclass them, instantiate them, run methods on them, and store data in them. In previous articles we’ve talked about all but storing data. It’s time to remedy that, and talk about attributes.

Instance attributes

We used unit class OLE::Storage_Lite; to declare our class, and method save( $x, $y ) { ... } to create methods. Or in our case rewrite existing functions into methods. Now, we focus our attention on some of the variables that should really be instance attributes, and why.

Let’s get to know which variables behave like attributes, and which don’t. This will change how we write our Raku code, but hopefully for the better. We’ll start from the outside in, and look at the API. There are a few “test” scripts that use the module, and this fragment is pretty common.

use OLE::Storage_Lite;
my $oOl = OLE::Storage_Lite->new('test.xls');
my $oPps = $oOl->getPpsTree(1);
die( "test.xls must be a OLE file") unless($oPps);

The author creates an object ($oOl) from an existing file, then fetches a tree of “Pps” objects, whatever they are. So, one OLE::Storage_Lite object equals one file. This gives me my first instance variable, the filename.

sub new($$) {
  my($sClass, $sFile) = @_;
  my $oThis = {
    _FILE => $sFile,
  bless $oThis;
  return $oThis;

Above is how they wrote it in Perl, and below is how we’d write it (exactly as specified) in Raku:

has $._FILE;

multi method new( $sFile ) { _FILE => $sFile );

Later on, we can call my $file = 'test.xls' ); just like we did in Perl. We wouldn’t even need the new method if we had users call my $file = _FILE => 'text.xls' );. This gives users the option of calling the API in the old Perl fashion or the new Raku fashion without additional work on our part.

Strict Raku-style

There’s a problem lurking here, though. The constructor Raku provides us lets us call my $file =; without specifying a value for $._FILE. If you know Perl’s Moose module, though, the ‘has’ there just might look familiar.

And for good reason. A lot of the ideas from Moose migrated into Raku during its design, and the attributes were one of those. Moose lets you do a lot of things with attributes, and so does Raku. One of those is you can add “adverbs” to them. Let’s do that now.

has $._FILE is required;

Calling now fails, because you’re not passing in the _FILE argument. That solves one problem. Actually, it solves two, come to think of it. In the original Perl code, you could call OLE:Storage_Lite->new() too, and it wouldn’t complain. Now we’ve fixed that, with one new term.

Progressive Typing

No, we’re not talking about some new editor like Comma (the link does work, despite the certificate problem.) Our code would run just fine, as-is. Users could call our .new() API, Raku would make sure the filename existed, and we could go on with translating.

But there’s something more we can take advantage of here, and that is the fact that any Raku object (and anything we can instantiate is an object) is a type as well. We haven’t mentioned that because we really couldn’t use that information until now.

The original Perl code is littered with clues to types, hidden in the variable names. When we wrote our own API call, the Perl code called the file name $sNm. The ‘s’ tells the Perl compiler nothing, but it tells us that $sNm is a String type. Perl may not have true types, but Raku does. Let’s fix our attribute with that in mind.

has Str $._FILE is required;

We knew all along that $._FILE is a string of some sort, but telling Raku that lets it allocate space more efficiently. Making sure it’s a required attribute lets anyone that calls new() know if they forget an argument. We could go a little farther with this, but locking down attributes will help in the long run, when we start dealing with the pack and unpack built-ins.

Packing It All In

We’re now getting to the heart of the module. There’s a lot of mechanics above us, allocating objects and doing math and checking types, and not much below us. The class’ entire purpose is to read and write OLE-formatted files. We’ll talk more about the boilerplate, but here’s the real meat of the file.

Let’s start with what should be simple, reading in data. Just like in Perl, we open a file and get back a “file handle” (assuming the file exists, of course.) By default, calling my $fh = open $._FILE;gives us a read-only file handle. The file handle itself has a bunch of attributes associated with it, but the important one right now is its encoding.

Namely, the fact that it has none. An OLE file is essentially a miniature filesystem (probably based on FAT) packed onto disk, complete with a root directory, subdirectories and files. File have names encoded in UCS-2, but the rest is entirely dependent upon what the application requires.

The upshot of which is that we can’t read the format with something simple like my @lines = $fh.lines; which would read line after line into the @lines array. Instead we’ll use calls like read()and write() that return byte-oriented buffers.


All OLE files start off with the header “xD0xCFx11xE0xA1xB1x1AxE1”, so we should probably start there. That’s important twice in the code, in fact. First, when we’re reading off disk, we can check it against what we’ve just read to make sure this file is OLE, and not, say, a JSON file. Later on, when we’re saving out an OLE file, we can write it as the header string.

constant HEADER-ID = "xD0xCFx11xE0xA1xB1x1AxE1";

I’ll make it a constant as well, so when I revisit this code in a month I don’t have to go looking in specs for ‘0xd0 0xcf’ to remember what this is. Reading is straight-forward too. It needs just a byte count.

my Buf $header = $ 8 );

Something important to notice here is the type, ‘Buf’. If our file was in Markdown, or JSON we could get away with just writing my @lines = $fh.lines; like I tried earlier. But these are raw bytes, hindered by no interpretation. Let’s see what happens when we compare these bytes to our HEADER-ID.

t/01-internals.t ............ Cannot use a Buf as a string, but you called the Stringy method on it
  in method _getHeaderInfo at /home/jgoff/GitHub/drforr/raku-OLE-Storage_Lite/lib/OLE/Storage_Lite.pm6 (OLE::Storage_Lite) line 169
  in block <unit> at t/01-internals.t line 42

Another brick in the wall

Ka-blam. But… hold the phone here a minute, I just said $header eq HEADER-ID, I didn’t write anything like ‘Stringy’! There’s no ‘Stringy’ in the source… oh. HEADER-ID is a string, so Raku is being helpful. I’m trying to use string comparison (‘eq’) between something that’s not a Str ( $header ) and something that is (HEADER-ID).

Pull up the Stringy documentation, and look for the Type graph. Midway down you’ll see ‘Buf’ and ‘Str’, as of this writing Buf is on the left, and Str is popular so it’s in the middle.

Trace the inheritance paths from Buf and Str upwards, and you’ll see they pass Buf -> Blob -> Stringy and Str -> Stringy, and stop. What the error message therefore is saying is this, anthropomorphized:

You wanted to convert Buf to Str, and didn’t care how you did it. So I looked. First, on the Buf type. No .Str method there, at least without arguments. No good. So I looked in its parent, Blob. Nothing doing there. Then I looked at Stringy, and couldn’t find anything else.

There’s nothing above me, nothing below. So I’ll let you know I looked for a conversion method in a bunch of places, stopped at Stringy, and couldn’t go any farther. Sorry.


You’re probably wondering how to get out of this quandary. Reading the Blob documentation closely, you might think that the encode method is the way out of our present jam. If you look closer, though, there’s a spanner in the works. “xD0” is the byte 0xd0, so if you try to decode to ASCII, you run into the problem that ASCII only covers 0x00-07xf, everything outside of that is undefined.

Packing for vacation

If you’ve kept up with things, you might surmise by now that the key to our quandary lies in the pack and unpack builtins. Specifically unpack(), because we’re trying to “decode” a buffer into something suitable for Raku.

Unless you’ve done things like network programming or security, the pack and unpack builtins are going to be unfamiliar territory. The closest analogue of pack() is the builtin sprintf().

Both of these builtins take a format string telling the compiler how to arrange its arguments. Both of them take a mixture of string and integer arguments afterwards. But while sprintf() takes the arguments and treats its output as a UTF-8 encoded string, pack() takes the same arguments and treats its output as a raw buffer of bytes.

And now you can see one way out of our little predicament. If we could just find the right invocation, pack() would be able to take our string “xd0xcf…” and turn it into a Buf object. Then we could compare the buffer we got by reading 8 bytes to the buffer we expected.

So instead of cluttering up the main code, let’s write a quick test.

use experimental :pack;
constant HEADER-ID = "xD0xCFx11xE0xA1xB1x1AxE1";

use Test;
my $fh = open "test.xls";
my Buf $buf = $ 8 );

is $buf, pack( "A8", HEADER-ID ); # Pack 8 ASCII characters


Let’s take it from the top. We tell Raku to use the “experimental” pack() builtin, and declare the header we want to check against. Then we tell Raku we want to use the Test module, and open a new Microsoft Excel test file.

Last, we read a chunk of 8 bytes from the file into a buffer, and check to see that the 8 bytes matches the header we expect to see. Now, how did we get that weird ‘A8’ string in there? I thought pack() looked more like sprintf()?

Well, it does, to an extent. I/O routines like sscanf() and sprintf() can do all sorts of things to your strings and numbers on the way in and out, think for example what ‘%-2.10f’ means in a format specifier, for instance. You can follow along with the unpack() documentation if you like.

pack(), by contrast, just takes 8, 16, or 32-bit chunks of your input, and places them into a buffer. The “A” in “A8” says that it wants to convert an ASCII-sized chunk of your input (“xd0” in our case) into a byte in the buffer, so our Buf now looks like ( 0xd0 ).

I could just as well have said “AAAAAAAA” in order to translate all 8 characters of the buffer, but I think it’s a little tidier to use the ‘repeat’ option, and say “A8” in order to convert just 8 characters (yes, yes, I know, they’re glyphs, but let’s not confuse matters.)

I could write “A*” just as well, but “A8” makes sure that 8 and only 8 (the number that thou shalt count to…) characters get converted. I doubt that the header in an OLE file will change, but it’s a nice bit of forward planning.

For those of you that made it this far, thank you. As usual, gentle Reader, if you have any comments, criticisms (constructive, please) or questions, feel free to post them below.

Next week I’ll delve deeper into the mysteries of pack()unpack() and some of the tips and tricks I use to keep on my toes and make sure that I generate clean Microsoft-compatible output.

Rewriting Perl Code for Raku III: The Sorceror

Last week, we started testing, learned how to create proper Raku classes, and the basics of functions. This time we’ll take a closer look at functions, arguments, and make some decisions about the API. And maybe while writing this I’ll argue myself out of a decision. It’s happened before.

One good thing about writing about a module is that you can slip into a certain mindset. For instance, right now I’m thinking a few paragraphs ahead, wondering how to explain why I changed the API from Perl 5 references to regular Raku types.

It’s at odds with some of the principles I laid down at the start, which states that I should have minimal changes in the API from Perl to Raku. In Perl 5, you would create the “filesystem root” object like so:

my $root = OLE::Storage_Lite::PPS::Root->new(
  [ 0, 0, 0, 25, 1, 100 ],
  [ 0, 0, 0, 25, 1, 100 ],
  [ $workbook, $page_1, $sheet_1 ]

with a bunch of references to lists. By all rights, and the principles I set up earlier, the Raku equivalent should be almost exactly the same:

my $root =
  [ 0, 0, 0, 25, 1, 100 ],
  [ 0, 0, 0, 25, 1, 100 ],
  [ $workbook, $page_1, $sheet_1 ]

In fact, all I did was copy and change two characters, specifically the Perl ‘->’ to the Raku ‘.’ operator. Clean, and very simple. And I think what I’ll do is actually just change the code back to using the Perl reference, at least in the API. Dereferencing it will be just a few lines, and I’ll have to change it in the tests as well, but I think the pain will be worthwhile.

This way I don’t have to field questions like “Why did you end up potentially breaking old code?” during talks. See, speaking at conferences about your code really can be a useful motivator!

I’d like a formal argument, please

So, I think I’ve settled on Perl-style formal references, at least for the current iteration. There are actually better ways to do this, but I’ll leave that for the proper Raku version. For right now, quick-n-dirty is the name of the game.

Moving on, we see an important method in the original Perl code, saving an object to disk.

sub save($$;$$) {
  my($oThis, $sFile, $bNoAs, $rhInfo) = @_;
  #0.Initial Setting for saving
  $rhInfo = {} unless($rhInfo);
  # ..

As I’ve mentioned before, OLE::Storage_Lite has been around for a long, long time. And it’s obvious here. Function prototypes (not signatures, which are a different kettle of fish) and the use of ‘$oThis’ instead of the more conventional ‘$self’.

Being prototypical

Prototypes were originally meant as a way to save you from having to write checks in your code. Theoretically, if your function was called sub save($$) and you tried to call it with save($fh) you would get an error, because the ‘$$’ means the subroutine took two arguments, and you gave it just one.

But it also predated objects (yes, Virginia, objects in Perl haven’t been around all that long.) and they could have unforeseen side effects. So they were a fad for a while, but quickly faded out of existence.

These days they’re a reason for a more experienced Perl hacker to take the junior aside and explain quietly why we don’t use those anymore, and point them to some modern references, like Modern Perl (not an affiliate link, yet.)

Let’s at least partially convert that to Raku, like so:

method save($sFile, $bNoAs, $rhInfo) {
  #0.Initial Setting for saving
  $rhInfo = {} unless($rhInfo);
  # ..

The ‘$oThis’ means that this is a method call, so instead of writing sub save( $oThis, ... ) we can rewrite it to a method and gain ‘self’ instead of the arbitrary variable ‘$oThis’. Of course we do have to do a search-and-replace on ‘$oThis’ with ‘self’, but that’s relatively simple. More complex is what to do with the ‘;’ in the original prototype.

Having options

It’s worth pointing out that OLE::Storage_Lite is taken at least in part from another (larger) module, OLE::Storage. This means that the internal code is redundant in a few places. Raku would let us rewrite what we have as:

method save($sFile, $bNoAs, $rhInfo = {}) {
  #0.Initial Setting for saving
  # ..

making $rhInfo an optional variable with a default value. Now, this is a pretty common pattern for a recursive method, so I did a bit of digging. Namely I grep’ed for ‘save’ in the original (all-in-one) module, and found no recursive calls to it.

Debugging both sides now

This is also where the test suite I wrote earlier comes in handy, as it actually exercises the ‘save’ method. So I added a quick debugging message warn "Saving $rhInfo"; to my local copy of the code, and ran the test suite. Seeing just one ‘Saving …’ message in my test output convinced me it wasn’t recursive. So now the code just looks like:

method save($sFile, $bNoAs) {
  #0.Initial Setting for saving
  my %hInfo;
  # ..

Also, since $rhInfo is created in this method, there’s no reason to leave it as a reference. So the initial ‘r’ goes away, and we have left just ‘%hInfo’. It may get passed in to other methods, but Raku lets us pass hashes and arrays as ordinary variable types, so I’ll take advantage of that.

To be fair, leaving it as a reference would have saved me a bit of typing, but I’d already kind of decided that at least internally I’d try to use Raku types and calling conventions, and that left me with the choice of how to pass variables around.

Having options

Finally, there’s the question of what to do with the semicolon. Remember at the start, the function prototype was ‘($$;$$)’ which meant $oThis and $sFile were before the semicolon, and $bData and $rhInfo were after. I can now reveal that ‘;’ in a Perl prototype means that whatever appears afterward is optional.

True to Raku’s nature, I can account for this in at least two ways. One way would be to decide that $bData is always there and just has a default value, probably 0. That would look like method save( $sFile, $bData = 0 ). But the documentation puts $bData in square brackets, indicating that it’s optional.

Raku has an alternate syntax to indicate if a variable is optional, which looks like method save( $sFile, $bData? ). I think this method is better than the alternative syntax because it states clearly that $bData is optional. Both methods work, I just happen to like the ‘?’ modifier.

Waiting for Huffman

Moving on, we have this wonderful line of code:

$rhInfo->{_BIG_BLOCK_SIZE}  = 2**
                  _adjust2($rhInfo->{_BIG_BLOCK_SIZE})  : 9);

When I was translating this initially, I was in something of a drone mindset, not truly thinking about what I was doing. I’d copied the $rhInfo variable into the method signature and just kept on writing. I ended up with a statement that I eventually shortened quite a bit.

$rhInfo.<_BIG_BLOCK_SIZE> = 2**
  ( $rhInfo.<_BIG_BLOCK_SIZE> ??
    _adjust2( $rhInfo.<_BIG_BLOCK_SIZE> ) !!
                                        9 );

The ‘.’ after $rhInfo indicates we’re dealing with a reference, and the <..> notation is now how barewords look inside hashes. The old {_BIG_BLOCK_SIZE} is still there, but it’s pronounced {‘_BIG_BLOCK_SIZE’}. A lot of people use the {‘..’} in Perl already so it’s not a big change, and it actually simplifies the backend enormously.

Also, at the start Larry and Damian pulled statistics on Perl code from CPAN and other repositories. They were looking for operator frequencies, among other things. Frequently used operators like qw() and -> got even shorter in Raku.

Others, like the ternary operator, weren’t so lucky. It got longer, and stretched to ‘?? .. !!’. So this is one place where the code will look a little funky. Maybe one day I’ll write a slang to fix it, but back to work.

Trimming the verge

Earlier I mentioned that this module was trimmed down from a much larger full OLE reader/writer. This was the first place that became evident. Since $rhInfo is now called %hInfo and initialized inside the method, this statement deserves to be looked at a little closer.

my %hInfo;
%hInfo<_BIG_BLOCK_SIZE> = 2**
  ( %hInfo<_BIG_BLOCK_SIZE> ??
    _adjust2( %hInfo<_BIG_BLOCK_SIZE> ) !!
                                        9 );

After replacing $rhInfo with %hInfo this is what I got. But since %hInfo is defined just above, the test %hInfo<_BIG_BLOCK_SIZE> will never be true, so this entire block can be reduced to:

my %hInfo = _BIG_BLOCK_SIZE => 2**9;

While I’m here I’ll delete _adjust2(). No code pathway uses it, so out it goes. I’ll restore it if I have to, but right now I want the test scripts to pass, and that’s it. I’ve got the original source, and a map from Perl to Raku, and that’s all I need.

Culling yaks from the herd

Where there’s smoke there’s fire, so I stop what I’m doing and grep out every ‘sub X’ call in the source, putting it in a scratch monkey. Then I go through the source (which I have below the new Raku source, deleting lines as I go) and look for methods that aren’t used, like adjust2(). I delete each of these methods with extreme prejudice, because each line of code I don’t see is one I don’t have to translate.

Checkpoint in git, and now it’s time for a lunch break. Afterwards, I’m getting into the save()method, and see what looks like a new yak to shave. Or a package to translate, to be precise.

  if(ref($sFile) eq 'SCALAR') {
    require IO::Scalar;
    my $oIo = new IO::Scalar $sFile, O_WRONLY;
    $rhInfo->{_FILEH_} = $oIo;
    # ...

In both Raku and Perl, you can create a single method called new( $sFile ) that treats $sFile as either a filename (scalar), file content (scalar reference) or file handle (scalar object.) In Perl, if we wanted to handle filenames, file contents, or file handles differently, we’d have to switch like this, or have different method names.

In Raku, we can handle this differently. In fact I can write the code to save() to a filename, and add save() to a filehandle later with no modifications needed. Above, I briefly touched on the fact that you can write more than one new() method, as long as the two method signatures were distinct.

multi method save( Str $filename ) {...}
multi method save( IO::Handle $fh ) {...}

Raku will let you write two methods called save(), as long as it can tell which one to call at runtime. So, I can call $ '/tmp/test.xlsx' ); or $ $out_filehandle ); and Raku will “dispatch” it to the right save() method automatically.

We call it ‘multiple dispatch’ for just that reason, dispatching a function call to multiple versions of a method. And this means that I can write the first save( Str $filename ) method without worrying about the other methods. I don’t have to add a new if-then branch to the existing code, or modify save() in any way.

I can just write my save() method and ignore the other IO:: types. Also, if someone gets my code later and wants to add a save() method that saves to something I know nothing about, they can write their new save() method without interfering with mine.

In this installment we’ve covered the basics of function and method calls, delved into the ternary operator, removed dead code and learned a little about multiple dispatch. Next time, we’ll open the binary filehandle we created above and delve into the mysteries of pack() and unpack().

I’ll also show you a new (yes, I couldn’t resist) grammar-based version of pack() that should cover the entire Perl gamut of packed types, with a bit of patience and a large enough test suite.

As always, gentle Reader, thank you for your time and attention. If you have any (constructive, please) comments, criticisms or questions, please let me know in the comment section below.

Rewriting Perl Code for Raku II: Electric Boogaloo

Picking up from Part One, we’d just finished up rewriting a Perl script into the test suite for the Raku translation of OLE::Storage_Lite. Raku programming is made easier by having lots of tools, but Microsoft documents aren’t yet well-represented in the Raku ecosystem.

Being able to read/write OLE allows us to create a whole range of Microsoft documents (at least where they’re documented.) Because of its day-to-day use, we’re focusing on Excel here. Many businesses still rely on Excel for their day-to-day task management, time tracking and home-grown processes.

I’ve been known to wax philosophical about this after a few Westmalle Tripels at various conferences. Now is the time for doing something about it. Here’s what our burgeoning test suite looked like, at least in part. The current code is in raku-OLE-Storage_Lite over on I’ve gotten rid of most of the Perl 5 test skeleton, but the essence remains.

use v6;
use Test;
use OLE::Storage_Lite;

plan 1;

my $oDt =
  ( 0, 0, 16, 4, 10, 100 ), # 2000/11/4 16:00:00:0000
  ( $oWk, $oDir )
subtest 'Root', {
  isa-ok $oDt, 'OLE::Storage_Lite::PPS::Root';
  is $oDt.Name, 'Root Entry';
  is-deeply $oDt.Time2nd, [ 0, 0, 16, 4, 10, 100 ];
  # ...

Originally there really weren’t any Perl 5 tests for this module. I’m sure the original author treated the entire module as a black box, and they were happy to be able to run samples/, open the new test.xls in Excel, and when it actually read the file, treat that as ‘ok 1’, push it to CPAN and call it a day.

Testing, testing

That’s wonderful, and I may eventually adopt that methodology. For the moment, the lack of a test suite leaves me a bit unsatisfied. I suppose I could treat the entire module as a black box and fix the translated version line-by-line as I go through it. I’ll have to do that eventually (spoiler alert: That’s actually where I am – I’m writing these pieces a bit after the fact.)

That leaves me with the question of what to test, and what the quickest way to get there is. The individual Directory, Root and File objects are exposed to the user, and are part of the public API. So it makes some sense to create an object, look at the internals, and do my best to match that in Raku.

I Think I’m A Clone Now

There’s always two [implementations] of me standing around… I don’t want to get sidetracked by reading the entire OLE spec. I might start to realize what a huge job this really is, and abandon ship. So, I’m going to limit myself to the following:

Create a narrowly defined 1:1 clone of the exact source of OLE::Storage_Lite in Perl 5. The objects will act exactly like the Perl 5 version, as will the API. This way I don’t have to think about what the API should do, how it should look in Raku, how the objects get laid out, anything fancy. All I need to worry about is:

  1. When I write warn $oDt.raku, does the output look the same as use YAML; warn Dump($oDt); in Perl 5?
  2. When I write the final file to disk, does the Raku code output exactly the same file as the original Perl 5 version?

That’s it. It takes away a lot of possibilities, but it lets me focus on getting the job done, not how things should look. Being able to test how the individual objects look will tell me that the read API works and saves enough data to be able to reconstruct the object in memory.

Conversely, being able to match the binary output tells me that the write API works, so I’ve effectively tested as much as the original module did. Plus I can automate some of the process, especially on the read side.

Lost in Translation

You can check out the current source at raku-OLE-Storage_Lite, and follow along with some of the changes I’ve made. I also made sure to keep a working copy of the original OLE::Storage_LitePerl 5 module around. My Raku tree right now is very close to Perl 5.

I can insert a debug statement like die "[$iBlockNo] [$sData]\n" in the Perl 5 code, go to the equivalent line in Raku, and expect that when I run the two test suites, that they’ll die in exactly the same way.

This way when they don’t, I can immediately narrow down the problem simply by moving the ‘die’ statements up in the code until they return the same values. The line immediately below the ‘die’ statement will be the culprit.

The Nitty Gritty Perl Band

I’ll mention one thing in passing – the original Perl 5 source code is in a single file containing all of the packages. That’s not Raku style, so I’ve unpacked it into lib/OLE/Storage_Lite/* following the usual style of one Perl 5 class – one file.

So, time to get our hands dirty. The new Raku module won’t compile for quite a while, so we’d better put this into git. I’m also using App::Mi6 to do my development and eventual push to CPAN, so all of that boilerplate is there too.

So, cue the montage scene of the dedicated Raku hacker pounding away at the keyboard, with the occasional break for food and/or adult beverage. Looking over her shoulder, we see a familiar split-screen view, with Perl 5 code on top, and a new Raku file below.

use OLE::Storage_Lite::PPS;
package OLE::Storage_Lite::PPS::Root;
use vars qw($VERSION @ISA);
@ISA = qw(OLE::Storage_Lite::PPS);
use OLE::Storage_Lite::PPS;
unit class OLE::Storage_Lite::PPS::Root is OLE::Storage_Lite::PPS;

Raku has classes where Perl 5 has packages. The ‘unit’ declaration there says that the class declaration takes up the remainder of the file. This is sort of how Perl 5 does it, but gets rid of the ‘1;’ at the end of your package declaration.

It’s also useful for another reason I’m not going to show. Namely that the Perl 5 code is directly below the Raku code, commented out. I’m flipping between vim windows to delete lines as I translate them by hand. So the ‘unit class’ declaration helps in case I accidentally un-comment Perl 5 code – I’ll get big honkin’ warnings when I run the test suite.


(for those of you that remember the module’s release)

Raku borrowed liberally from Perl 5’s Moose OO metamodel, to the point where using Raku will feel very similar. Just drop a few bits of syntactic sugar that Moose needed to work under Perl, and it’ll feel the same.

In this case the ‘is’ does the same job as in Moose, to introduce a parent class. Raku doesn’t need the sugar that Moose sweetens your code with, so you can just say your class ‘is’ a subclass of any other class.

Let’s keep rolling along here, with the next lines of the Perl 5 library:

require Exporter;
use strict;
use IO::File;
use IO::Handle;
use Fcntl;
use vars qw($VERSION @ISA);
@ISA = qw(OLE::Storage_Lite::PPS Exporter);
$VERSION = '0.19';
sub _savePpsSetPnt($$$);
sub _savePpsSetPnt2($$$);
use OLE::Storage_Lite::PPS;
unit class OLE::Storage_Lite::PPS::Root:ver<0.19> is OLE::Storage_Lite::PPS;

Moving along… Okay, ya caught me, ‘:ver<0.19>’ is something new that we should add. Versions are now integrated into classes, so you can check them and even instantiate based on version number.

The module actually doesn’t export anything, so we don’t need Exporter at all. Raku enables ‘strict’ automatically, has IO modules in core, and doesn”t need Fcntl. The forward declarations aren’t needed for Raku, so all that’s left is the module’s version number, which gets added to the class name. You can add other attributes, too.

Making things functional

To keep things simple for me writing the code, and me having to read the code weeks, months or years later, I want as close to a 1:1 relation between Perl 5 and Raku as I can. Another place where this requires an accommodation (but not much of one) is just a few lines down, writing the creation method ‘new’.

sub new ($;$$$) {
    my($sClass, $raTime1st, $raTime2nd, $raChild) = @_;
        # ...

By this point you’ll probably see more of why I say this module is a hard worker. It’s been around a long time, and function prototypes like this are one easy way to tell. Let’s rewrite it in a more modern Perl 5 style before making the jump to Raku, with function signatures.

sub new($sClass, $raTime1st, $raTime2nd, $raChild) {
        # ...

Just drop the old function prototype, and replace it with the variables we need to populate. Well, almost. If you know what a subroutine prototype is, you might think I’m pulling a fast one on you. And you’d be right. Look back at the original Perl 5 code, and you’ll see ‘($;$$$)’ is the prototype.

The ‘;’ separates required variables from optional variables, and we haven’t accounted for that in our Perl 5 code. Since I’m not here to modernize Perl 5 code but convert it to Raku, I’m going to ignore that in Perl 5 and go straight to Raku.

multi method new( @aTime1st?, @aTime2nd?, @aChild? ) {
    Time1st => @aTime1st,
    Time2nd => @aTime2nd,
    Child => @aChild

Under Construction

And there we are. Now, there’s quite a bit to take in, so I’ll take things slow. The first thing you’ll notice is the keyword ‘multi’. In Perl 5, you get to hand-roll your own constructors, so you can make them any way you like. In this case, the author chose to write new($raTime1st, $raTime2nd, $raChild), which is pretty common.

Raku gives me a default ‘new’ method, so I only need to hand-roll constructors when I want. Since I want to keep as close as reasonable to the original API, I’ll write a constructor that takes 3 arguments too. In my case I chose to simplify things just a bit here.

I’ve found over several years of writing Raku code that I rarely use references. In Perl 5 they were pretty much the only way to pass arrays or hashes into a function, because of its propensity to “flatten” arguments.

In Raku, you can still use the Perl 5 style, but formal argument lists are the way to go in my opinion. If you need to pass both an array and a hash to a Raku function, go for it. I encourage that in my tutorial courses, and recommend it to help break students out of their Perl 5 mindset.

This is not to say that there’s anything wrong with Perl 5’s argument list, in fact they’ve taken some ideas from Raku for formal argument lists, and I encourage that. Cross-pollination of ideas should be encouraged, it’s how both languages grow and add new features.

Last week was about the overall module, this week we delved a bit into the OO workings. Next week we’ll talk about references, attributes, and maybe progressive typing.

Rewriting Perl Code for Raku

This time around we’re going to talk about how to rewrite Perl code in Raku. Even in 2019, a lot of the office world revolves around spreadsheets, whether they be Excel, LibreOffice or simple .csv files. Perl 5 has a plethora of modules to do this, a quick search for ‘Spreadsheet’ on MetaCPANshould convince you of that.

The Raku world doesn’t have quite as many modules as you’d expect, though. While it’s been around for a few years, “heavy lifting” modules like Spreadsheet stuff really haven’t come around yet. This involves packing and unpacking binary formats, and in Perl 5 this centered around the pack and unpack builtins, which are relative newcomers to Raku.

But Raku has built-in binary buffers, which take care of most of the need for pack/unpack. The main reason I can see is the OLE storage format. Basically it’s Microsoft’s way of packing a file system into a single data file. And at this point the proverbial yaks start to pile up, and reasonable people say “You know, Excel still accepts .csv files, I know how to build those.”

Enter raku-OLE-Storage_Lite – this is my translation-in-progress from Perl 5 to Raku. As of this writing it can read an entire OLE file (without data) and write a good portion of the sample file – I believe I’ve got maybe two methods left to debug.

Knee deep in yaks

CSV files are easy to write, but they come with their own set of troubles. When you import a .csv file into your Excel app (or LibreOffice, or whatever) you’re faced with a complex dialog asking you how to import your data, and the average user doesn’t want that every time, they just want to open their spreadsheet.

So, it’s time to follow Liz’s lead and rewrite in Raku an existing module. First thing I did was go to Spreadsheet::ParseExcel and see how they did things. Within a few minutes I’d already encountered the first yak. After opening the file, it delegates it to OLE::Storage_Lite, which is much like James Brown, the “hardest-working man in show business”.

It’s still on version 0.19 at the time of writing, but I assure you that’s only because the current maintainer hasn’t updated the version to reflect reality. It may be legacy Perl rough-and-tumble code, but it’s been around for a long time. It wears its battle scars proudly.

It relies heavily on pack and unpack, which are still technically experimental in Raku. The OO and coding style betrays its pre-5.00 origins, and the tests are, well, very pragmatic. “Does it load? Great! Can it convert timestamps internally? Great! Ship it!”

To its credit, there’s a sample directory where you can use to view the contents of the internal filesystem of any OLE file, and a sample writer to create a known-working OLE file. That’ll do as a starting point.

Buckling down

So, reading an Excel spreadsheet means reading an OLE file system. And when I say file system, I’m not kidding. Inside your typical .xlsx file, there’s a small header and a root object. The root object contains “pointers” (really file offsets) to a document object, and inside that are file objects, each with pointers to the different blocks.

This is all intended to reflect the original disk layout, so it looks very much like an NTFS superblock and block layout. The documentation seems to have moved to this page detailing OLE 1.0 and 2.0 formats, I’m not at all certain what the current version has.

How are Excel spreadsheets arranged in here? Worksheets are OLE directories, and inside each worksheet, tabs are individual files. How’s that for a bit of inspiration? Luckily the Root directory, Files and nested Directories are all separate objects, with at least a few common methods aggregated into a superclass.

Legacy Code

This is a long-winded way of saying the module in question is very much legacy code. And, as I want to bring it into the proverbial light, I’ve got to give some issues some thought.

  1. No useful tests, so I’ll have to write those.
  2. How much code do I want to sacrifice?
  3. How much can I save?

Well, I can put off #2 and #3 while writing some tests. Whoa, wait a minute. I don’t have a test file to work with, just some scripts over in sample/. Mumble, mumble, more yaks. Read README, find that will create one, run that.

Great, I’ve got a sample test.xsl file. But given the amount of potential bit-rot it seems prudent to actually make sure that I’ve got a working Excel file before committing a few days (ha!) to getting a module working. Double-click it, launch into Excel’s cloud-serviced app, find that it’s one of those Win10 panes I’ve never figured out how to close, open task-killer, kill that.

Launch LibreOffice which I happen to have lying around – my current project at work is parsing a spreadsheet in Perl 5, which is what inspired this whole workload.

Yep, that parses; looks a bit odd because it’s coming up with a Japanese font, and some arbitrary English text, but it works. Also, looking at the code it generates all three object types – Root, File and Dir, so it’ll exercise the major code paths. Bonus.

Testing, testing

Now I’ve got the makings of a simple test file. The script builds objects individually, so I can run the individual calls, and check that the object’s internals look the way I want.

my $oDt = OLE::Storage_Lite::PPS::Root->new(
  [ ],
  [ 0, 0, 16, 4, 10, 100 ], # 2000/11/4 16:00:00:0000
  [ $oWk, $oDir ]

In Raku, this converts to:

my $oDt =
  ( 0, 0, 16, 4, 10, 100 ), # 2000/11/4 16:00:00:0000
  ( $oWk, $oDir )

I’ve made one change already, to make things simpler for Raku users. In Perl, you have to pass lists as references unless you want to use the new function signatures. In Raku, you can just pass lists as you would ordinarily to your method call.

Using native data types rather than passing references around may seem a bit odd at first to new Raku programmers, but the new variable classes are easier to enforce strong typing on later, when you get used to the language.

Going with the flow

Now we’ve got something we can test, namely making sure that we’ve got a valid OLE Root document. So, before we go ahead with the code, I’ll share a few little things. I know very little about this code, so I want to make sure that I intimately copy each detail of the object at this stage. Later on I might get fancy and replace things with their own object types, but for now, my goal is going to be 1:1 replication.

I tend to like tmux as a shell environment, haven’t really gotten along with UIs. So, keeping in mind that I wanted an absolute 1:1 copy of the original object, I ended up doing this:

  1. Switch to new window, open my copy of ‘samples/’ in vim
  2. Add ‘use YAML; die Dump( $oDt ) just below the line where it gets created
  3. Switch to new window, run the sample script, copy the YAML output
  4. Close the two new windows I created to keep clutter down
  5. Paste the YAML code into the new Raku test.
my $oDt =
  ( 0, 0, 16, 4, 10, 100 ), # 2000/11/4 16:00:00:0000
  ( $oWk, $oDir )
  Name: "R\0o\0o\0t\0 \0E\0n\0t\0r\0y\0"
  No: ~
    - 0
    - 0
    - 16
    - 4
    - 10
    - 100
# and so on...

This should contain all I need to create an OLE file from this set of objects. I’m using this as a sneaky way of not reading the spec, at least not yet. As the old title goes: Algorithm + Data Structure = Program. Using YAML (or Data::Dumper) gives me the data structure, copying the Perl 5 code into Raku gives me the algorithm.

I should almost be able to keep line-for-line fidelity, so when a patch is posted to the Perl 5 source I can import it into Raku without too much trouble. But once I’ve got a better test base and a few users in Raku I’ll probably rewrite this whole module in a more Raku-ready fashion. I can keep the old module around for reference.

Encoding worries

But we’ve also got a surprise lurking here. “R\0o\0ot\0 \0E\0n\0t\0r\0y\0” looks like binary garbage, but is actually UCS-2, I think. If it is, then the OLE file is limited to a subset of Unicode. I can put restrictions on it later if I have to, but ATM I actually don’t care.

I’ve done enough time in the i18n salt mines that I know how to deal with this. Store the string in the best format possible (UTF-8 here) internally. When the time comes to write it to the network or disk, translate it to the final encoding.

This way I can see what all the attributes are at a glance without changing encoding. I can also manipulate everything using regular Raku code until the last moment. If I have to, I can use Raku’s gradual typing to constrain the string. More importantly, I don’t have to do any of this now.

Got any change?

This means I’m going to change things just a little bit more. When data gets added to ‘Name’ I’m going to assume it’s UTF-8. Since I’m not doing any I/O yet, I can make whatever assumptions I want. Keeping the internals simple keeps my life simple, at least.

So I’ll write out a quick is-deeply test and get on with things:

is-deeply $oDt, (
  Name => 'Root Entry',
  Time2nd => ( 0, 0, 16, 4, 10, 100 ),
  # ...
  Child => ( $oWk, $oDir )

This looks pretty straightforward, and almost how you’d write the original test in Perl 5. It won’t run yet, but that’s something we’ll tackle in the next part in the series.

I’m not done quite yet, because I’ve got a lot of these things to write, and not all of them may have the ‘Child’ attribute. I could write a tiny method that skipped over the ‘Child’ attribute along with anything else I wanted, but that felt clumsy. It looked like:

ok sorta-deeply $oDt, (
  Name => 'Root Window',
  Time2nd => ( 0, 0, 16, 4, 10, 100 ),
  # ...
), ( 'Child' );

And notice that sorta-deeply is a function that does all the work, then passes a simple Bool back to the test. I’d end up writing all of the code that is-deeply does (except for the recursion), and get something back that’s less useful.

Next time we’ll get into making these tests pass. I’m writing the next section right after this, but you won’t get to see it for another week or so, I’m afraid. If you have questions or comments about the first part of this series, please feel free to comment below.

Templates II: Electric Boogaloo

Last time on this adventure writing the Template Toolkit language in Raku, we’d just created a small test suite that encompasses some of the problems we’re going to encounter. It’s no use without a grammar and a bunch of other parts, but it does give us an idea of what it’s going to look like.

use Test;
use Template::Toolkit::Grammar;
use Template::Toolkit::Actions;

# ... similar lines above this
is-deeply the-tree( 'xx[% name %]x' ),
    [ 'a', 'a', :content( 'name' ) ), 'a', ];
# ... and similar lines below this.

The list here is what we’re going to return to render(), and I’d love to make that as simple as it can be without being too simple. Let’s focus for the moment just on one bit of the test suite here, the array I’m getting back.

[ 'a', 'a', :content( 'name' ) ), 'a', ];

If these elements were all strings, then all render() would have to do is join the strings together, simples!

method render( Str $text ) returns Str {
  my @terms = # magic to turn text into array of terms
  @terms.join: '';

Let’s create the ‘Directive’ class and see what happens, though.

class Directive { has $.content }

my @terms = 'a', 'a', :content( 'name' ) ), 'a';
say @terms.join: '';
# aaDirective<94444485232315>a

Whoops, that’s not what we want. Not bad exactly, but not what we want, either. Well, not to fear. Remember that in Template Toolkit, directives will always return a string. It may be an emptystring, but they’ll always return some kind of string.

As a side note, this may not always be true – some directives will even tell the renderer to stop parsing entirely. But it’s a pretty solid starting assumption. For instance, we could say that encountering the STOP directive just makes all future directives return ”.

Of course, I’m harping on the term ‘string’ for a reason. Internally, everything is an object, and every object has a method that returns a readable value. Our Directive class didn’t specify one, so we get the default that returns ‘$name<$address>’.

So, let’s supply our own method.

class Directive { has $.content; method Str { $.content } }

my @terms = 'a', 'a', :content( 'name' ) ), 'a';
say @terms.join: ', ';
# a, a, name, a

There. If we supply a .Str method we can make Directives do what we want. INCLUDE directives would open the file, slurp the contents and return them. Argument directives would take their argument name, look up the value, and return that. Or, more likely, would have a context object passed that does the lookup for them.

Where do we go from here?

Next time we’ll convince Grammars and Actions to work together, making processing a template as simple as:

parse-template( $text ).join( '' );

Next in this series on writing your own template language using Raku, you should be able to define your own Template Toolkit directives and have them return the pre-processed text. We’ll add support for context and the ability to do simple ‘[% name %]’ tags, and maybe explore how to change ‘[%’..’%]’ tags on-the-fly.

Thank you again, dear reader, for your interest, comments and critiques.

A Regex amuse-bouche

Before continuing with the Template series, I thought I’d talk briefly about an interesting (well, at least to me) solution to a little problem. System and user libraries (the kind that end in .so or .a, not Perl libraries) have a section at the top that maps a function name (‘load_user’ or whatever) to an offset into the library, say, 0x193a.

This arrangement worked fine for many years for C, Algol, FORTRAN and most other languages out there. But then along came languages that upset the apple cart, like C++ and Smalltalk, where a programmer could write two ‘load_user’ functions, call ‘load_user(1234)’ or ‘load_user(“Smith, John”)’ and expect the linker to load the right version of ‘load_user.’

The problem here is that the library, the linker and all of the other programs in the tool chain expect there to only be one function called ‘load_user’ in any given library.

Those of us that do Perl 5 and Raku programming don’t have to worry about this, but if you ever want to link to a C++ library, you probably should know at least a bit about “name mangling.”

For a while, utilities like ‘CFront’ for the Macintosh (which the author actually filed bug reports on) were used to “rename” functions like ‘load_user(int)’ and ‘load_user(char*)’ to ‘i_load_user’ and ‘cs_load_user’ before being added to the library, and other tools to do the reverse.

Has Your Mother Sold Her Mangle?

Eventually things settled down, and this process of changing names to fit into the library was “baked in” to the tool chains. Not consistently, of course, couldn’t have that. But conventions arose and even today Wikipedia lists at least 12 different ways to “mangle” ‘void h(void)’ into the existing library formats.

We’ll just look at the first one, ‘_Z1hv’. The ‘_Z’ can be safely ignored, its purpose there is mainly to tell the linker something “special” is going on. ‘1h’ is the function name, and ‘v’ is its first (and only) parameter. Suppose, then, that you were tasked with writing a tool that undid this name mangling.

Your first cut at extracting something useful might look something like

'_Z9load_useri' ~~ m{ ^ '_Z' \d+ (\w+) (.) $ };

Assuming $mangle-me has ‘_Z9load_useri’ in it (The mangled version of ‘void load_user(int)’) the regex engine goes through a bunch of simple steps.

  • Read and ignore ‘_Z’
  • Read and ignore ‘9’
  • Capture ‘load_user’ into $0
  • Capture ‘i’ into $1
  • There is no fifth thing.

But the person that wrote this library is playing silly buggers with someone (obviously us in this case) and there’s also a ‘_Z9load_userss’ which comes out of the other end of the mangle looking like ‘void load_user(char*, char*)’, loading a user with first and last names.

Now we’re in a bit of a quandary. Run the same expression and see what happens:

'_Z9load_userss' ~~ m{ ^ '_Z' \d+ (\w+) (.) $ };

Sure enough, $1 is ‘s’, just as we wanted it, but what about $0? It’s now ‘load_users’, which… y’know, looks too legit to quit. But we must. And now we’re faced with the quandary. Do we make the first parameter an optional capture? ‘m{ … (.)? (.) $ }’ like so?

No, that would capture the ‘r’ of ‘_Z9load_users’. There must be something else in the name that we’re overlooking, some clue… Aha! ‘load_user’ has 9 characters, and look just before it, we’ve got the number 9! Surely that tells us the number of characters in the function name! (and thankfully it actually does.)

Regexes 201

Now, how can we use this to our advantage? First things first, let’s get rid of some dead weight. We don’t care (for the moment) about parameters, so let’s just match the name and number of characters. And because we’re getting all serious up in here, let’s create a quick test.

use Test;
'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w+) };
is $0, '9';
is $1, 'load_user';

Run the test script, see if it passes, I’m sure you know the drill. Go ahead and copy that, I’ll wait. Okay, the tests pass, so it’s time to play. I usually am working in a library that’s in git, so I’m usually on the “edit, run tests, git reset, edit…” treadmill by this point.

So… How do we make use of this number? Well, let’s pull up the Regexes page over at docs.raku.organd look around. Back in Perl 5 there used to be this feature ‘m{ a{5} }x’ that matched just 5 copies of whatever it was in front of, that might be a good place to start looking.

That’s now morphed into ‘m{ a ** 5 }’. Great, so let’s replace 5 with $0 and go for it.

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** $0) };

“Quantifier quantifies nothing…” That’s weird. $0 is right there, staring me in the face. Maybe I just got the syntax wrong somehow?

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** 9) };

Nope, that works. What’s going on here? $0 is defined… Wait, it’s a variable inside a regex, that used to require the ‘e’ modifier, didn’t it? Or something like that… <read the manpage, scratch head… nothing there> Hm. Are we at a dead end?

Kick it up a notch

No, we just need to remember about how string interpolation works. In Raku, “Hello, {$name}!” is a perfectly fine way to interpolate variables into your expression, and it works because no matter where it is, {} signals a code block. Let’s try that, surround $0 with braces.

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** {$0}) };

Weird. This time the test failed with ” instead of ‘load_user’. Maybe $0 really isn’t defined? Now that it’s just regular Raku code, let’s check.

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) (\w ** {warn "Got '$0'"; $0}) };

“Use of Nil in string context.” So it’s really empty. Now, we have to really do some reading. Looking at the section on general quantifiers says “only basic literal syntax for the right-hand side of the quantifier [what we want to play with] is supported,” so it looks like we’re at a dead end.

But things like ‘{$0}’ do work, so we can use variables. That means that my problem isn’t that the variable is being ignored, it’s just not being populated when I need it. Let’s look at the section on Capture numbers to see when they get populated.

Aha, you need to “publish” the capture using ‘{}’ right after it. Let’s see if that works…

'_Z9load_user' ~~ m{ ^ '_Z' (\d+) {} (\w ** {warn "Got '$0'"; $0}) };

Nope, something else is going on. And the next block down tells us the final solution – ‘:my’. This lets us create a variable inside the scope of the regular expression and use it as well, so let’s do just that.

'_Z9load_user' ~~ m{ ^ '_Z'
                     :my $length;          # Put $length in the proper scope
                     (\d+) {$length = +$0} # Capture the length
                     (\w ** {$length})     # And extract that many chars.

And reformat things just a wee bit so we’ve got some room to work with. Now the test actually runs, and reads only as many characters of the function name as needs be.

And just one more thing…

It’s not just function names that follow this pattern, it’s also namespaces, and any special types that the function might use as parameters, so let’s package this up into something more useful.

my regexp pascalish-string {
  :my $length;
  (\d+) {$length = +$0}
  (\w ** {$length})
'_Z9load_user' ~~ m{ ^ '_Z' <pascalish-string> };
is $/<pascalish-string>[0], 9;
is $/<pascalish-string>[1], 'load_user';

Pascal implementations were done back when RAM was at more of a premium, and stored a string like ‘load_user’ as ‘\x{09}load_user’ so the compiler knew how many bytes were available immediately rather than having to guess. It was limiting, but this was on computers like the early Macs (we’re talking pre-OS X, for that matter pre-System 7, for those of you that remember that far back.)

So we can use this <pascalish-string> regular expression anywhere we want to match one of our counted terms. Because we’re using ‘my’ inside a regular expression nested inside another regular expression inside a burrito wrapped in an enigma, there are no scoping troubles.

There are probably other ways of doing this, and I would love to see them. If you do come up with a better way to solve this, let me know in the comments and I’ll work your solution into an upcoming article.

As usual, gentle reader, thank you for your time and attention, and if you have any comments, questions, clarifications or criticisms (constructive, please) let me know.

Templates and a Clean Start

Before I get into the meat of the topic, which will eventually lead to a self-modifying grammar (yes, you heard me, self-modifying…) I have a confession to make, in that a series of articles on the old site may have led people astray. I wrote that series thinking to make parsing things where no grammar existed easier.

It may have backfired. So, as a penance, I’m simultaneously pointing theperlfisher.{com,net} to this new site, and starting a new series of articles on Raku programming with a different approach. This time I’ll be incorporating more of my thoughts and what hopefully will be a different approach.

Begin as you mean to go on.

I would love to dump the CMS I’m currently using for something written in Raku. Among the many challenges that presents is displaying HTML, and to paraphrase Clint Eastwood, I do know my limitations. So, I don’t want to write HTML. Ideally, not ever.

So, that means steal borrowing HTML from other sites and making it my own. Since those are usually Perl 5 sites, that means dealing with Template Toolkit. And already I can hear some of you screaming “Raku already handles everything TT used to! Just use interpolated here-docs!”

And, for the most part, you’re absolutely correct. Instead of the clunky ‘[% variable_name %]’ notation you can use clean inline interpolation with ‘{$variable-name}’, and being able to insert blocks of code inline means you don’t have to go through many of the hoops that you’re required to jump through with Template Toolkit.

That’s all absolutely true, and I hope to be able to use all of those features and more in the final CMS, whatever that happens to be. This approach ignores the fact that most HTML out there is written with Template Toolkit, and that rewriting HTML, even if it’s just a few tiny tags, is an investment of time that could be better done elsewhere.

If only there were Template Toolkit for Raku…

Let’s dive in!

If you’re not familiar with Template Toolkit, it’s a fairly lightweight programming language for writing HTML templates, among others. Please don’t confuse it with a markup language, designed to be rendered into HTML. This is a language that lets you combine your own code with a template and generate dynamic displays.

<h1>Hello, [% name %]!</h1>

That is a simple bit of Template Toolkit. Doesn’t look like much, does it? It’s obviously a fragment of a proper HTML document because there’s no ‘<html>’..'</html>’ bracketing it, and obviously whatever’s between ‘[%’ and ‘%]’ is being treated specially. In this case, it’s being rendered by an engine that fills in the name, maybe something like…

$tt.render( '', :name( 'Jeff' ) );

where is the name of the template file containing the previous code, and ‘Jeff’ is the name we want to substitute. We’ve got a lot of work to go through before we can get there, though. If you’ve read previous articles of mine on the subject, please try to ignore what I’ve said there.

Off the Deep End

First things first, we need a package to work in. For this, I generally rely on App::Mi6 to do the hard work for me. Start by installing the package with zef, and then we’ll get down to business. (It should be installed by default, if you’re still using rakudobrew please don’t.)

$ zef install App::Mi6
{a bit of noise}
$ mi6 new Template::Toolkit
Successfully created Template-Toolkit
$ cd Template-Toolkit

Ultimately, we want this test (in t/01-basic.t – go ahead and add it) to pass:

use Test;
use Template::Toolkit;
my $tt =;
is $tt.render( '', :name( 'Jeff' ) ), '<h1>Hello, Jeff!</h1>';

It’ll fail (and miserably, at that) but at least it’ll give us a goal. Also it should give us an idea of how others will use our API. Let’s think about that for a few moments, just to make sure we’re not painting ourselves into any obvious corners.

In order to be useful, our module has to parse Perl 5 Template Toolkit files, and process them in a way that’s useful in Raku. Certain things will go by the wayside, to be sure, but the core will be a module that lets us load, maybe compile, and fill in a template.

Hrm, I just said ‘fill in’ rather than ‘render’, what I said above. Should I change the method name? No, not really, the new module will still do what the Perl 5 code used to, it just won’t do it using Perl 5, so some of the old conventions won’t work. Let’s leave that decision for now, and go on.

Retrograde is all the rage

Let’s apply some basic retrograde logic to what we’ve got here, given what we know of Raku tools. In order to get the string ‘<h1>Hello, Jeff!</h1>’ from ‘<h1>Hello, [% name %]!</h1>’, we need a lot of mechanics at work.

At first glance, it seems pretty obvious that ‘[% name %]’ is a substitution marker, so let’s just do a quick regexp like this:

$text ~~ s:g{ '[%' (\w+) '%]' } = %args{$0};

That should replace every marker in the text with something from an %arguments hash that render() supplies to us. End of column, end of story. But not so fast, if all Template Toolkit supplied to us was the ability to substitute values for keys, then … there’s really no need for the module. And in fact, if you look at the docs, it can do many more things for us.

For example, ‘[% INCLUDE %]’ lets us include other template files in our own, ‘[% IF %]’ .. ‘[% END %]’ lets us do things conditionally, and a whole host of other “directives” are available. But you’ll see here the one thing they have in common is they all start with ‘[%’ and end with ‘%]’.

Hold the phone

That isn’t entirely true, and in fact there’s going to be another article in the series about that. But it’s a good starting point. We may not know much about what the language itself looks like, but I can tell you that tags are balanced, not nested, and every ‘[%’ opening tag has a ‘%]’ tag that closes it.

I’ll also point out that directives ( ‘[% foo %]’ ) can occur one after another without any intervening white space, and may not occur at all. So already some special cases are starting to creep in.

In fact, let’s put this in as a separate test file entirely. So separate that we’re going to put it in a nested directory, in fact. Let’s open t/parser/01-basic.t and add this set of tests:

use Test;
use Template::Toolkit::Parser;

my $p =;

0000, AAAA
0001, AAAB
0010, AABA
0011, AABB
0100, ABAA
0101, ABAB
... # and so on up to
1110, BBBA
1111, BBBB

Now just HOLD THE PHONE here… we’re testing directives for Template Toolkit, not binary numbers, and whatever that other column is! Well, that’s true. We want to test text and directives, and make sure that we can get back text when we want it, and directives when we want them.

At first blush you might think it’s just enough to make sure that ‘<h1> Hello,’ is parsed as text, and that ‘[% name %]’ is parsed as a directive, and just leave it at that. But those of you that have worked with regular expressions for a while might wonder how ‘[% name %][% other %]’ gets parsed… does it end at the first ‘%]’, or continue on to the next one?

And what about text mixed with directives? Leading? Trailing text? Wow, a lot of combinations. In fact, if you wanted to be thorough, it wouldn’t hurt to cover all possible combinations of text and directives up to… say, 4 in a row.

Let’s call text ‘T’, and directives ‘D’. I’ve got 4 slots, and only two choices for each. Filling the first slot gives me ‘T_ _ _’ and ‘D_ _ _’, for two choices. I can fill the next slot with ‘T T _ _’, ‘T D _ _’, ‘D T _ _’, and ‘D D _ _’, and I think you can see where we’re going with this. 

In fact, replace T with 0 and D with 1, and you’ve got the binary numbers from 0000 to 1111. So, let’s take advantage of this fact, and do some clever editing in our editor of choice:

0010, 0010                            =>
is-deeply the-tree( '0010, AABA       =>
is-deeply the-tree( '0010' ), [ AABA  =>
is-deeply the-tree( '0010' ), [ AABA ];

A few quick search-and-replace commands should get you from the first line to the last line. Now it’s looking more like a Raku test, right? We’re not quite there yet, ‘0010’ still doesn’t look like a string of text and directives, and what’s this AABA thing? One more search-and-replace pass, this time global, should solve that.

is-deeply the-tree( '0010' ), [ AABA ]; =>
is-deeply the-tree( 'xx1x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
          [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
    [ 'a', 'a', :content( 'name' ) ), 'a', ];

Starting out with the padded binary numbers covers every combination of text and directive possible (at least 4 long). A clever bit of search-and-replace in your favorite editor gives us a working set of test cases that check a set of “real-world” strings, and a file you can almost run. Next time we’ll fill in the details, and get from zero to a minimal (albeit working) Template Toolkit implementation.

As always, dear reader, feel free to post whatever comments, questions, and/or suggestions that you may have, including ideas for future articles. I read and respond to every comment, and thank you for your time.

Logic Programming in Raku

This is a small example of conference-driven development. I’m sitting in the board room at TPCiP – TCP in Pittsburgh surrounded by people doing both Perl 5 and Raku programming, and decided to look again at Picat, working on some simple examples. I was thinking that I might be able to translate some of the simpler backtracking examples from Picat to Raku and here’s a simple example.

First the Picat code:

fib(0,F) => F=1.
fib(1,F) => F=1.
fib(N,F),N>1 => fib(N-1,F1),fib(N-2,F2),F=F1+F2.

Now here’s my equivalent Raku code:

multi fib( 0, $F is rw ) { $F = 1 }
multi fib( 1, $F is rw ) { $F = 1 }
multi fib( $N is rw where * > 1, $F is rw ) {
  my ( $F1, $F2 ) = 0, 0;
  my $N1 = $N - 1;
  my $N2 = $N - 2;
  fib( $N1, $F1 ) && fib( $N2, $F2 ) && $F = $F1 + $F2

The Raku version is slightly larger because I need to declare some variables that Picat would ordinarily declare for me ($F1, $F2). There may be a way to work around declaring ($N1, $N2), but otherwise the two versions are identical.

How does it work?

You’ve probably guessed based on the inputs that N is index of the Fibonacci number, and F is the Fibonacci number itself. Picat doesn’t require you to declare variables, so you could ask it for the 7th Fibonacci number by calling fib(7,F) and looking at F.

my $Fib = 0;
say $Fib     # 21

Or you could do the above in Raku, letting the code populate $Fib for you. This code relies on the fact that Raku lets you dispatch not just on signatures, not just on argument types, but on values. Look at the base case above:

multi fib( 0, $F is rw ) { ... }

fib(…) is the function signature, and this function will get called whenever the first argument is 0, like so: fib(0, $Fib). This happens even if ‘multi fib( $N, $F )’ is the one doing the calling, everything gets run through the same dispatcher each time.

So fib(2, $Fib) calls fib(1, $Fib) which calls ‘multi fib( 1, $F )’ and gives us a base case, for example. This lets the higher-order functions call our base cases, and still get the right value.

What are we missing?

Well, the Picat code can do something the Raku code can’t, at least for the moment, and this is what I want to spend some time working on. In Picat, I can call ‘fib(6,F)’ and F will be 13 when the code is done. This works in Raku too.

But Picat will also let you call ‘fib(N,21)’ and N will be 7 when the calculation is finished. Take some time to let that settle. Yes, you can run the calculation both forward and backward. Give N a value, and F will be the Nth Fibonacci number. Give F a value, and it will tell you what N is.

In fact, Picat will go one step further. If you don’t specify a value for either parameter but just specify variables, like ‘fib(N,F)’, then it will generate all the Fibonacci numbers and their indexes until you tell it to stop.

This is because of the backtracking engine that it uses, which I want to see if I can mimick. ‘F=F1+F2’ doesn’t mean “Assign the sum of F1 and F2 to F”. Instead, “If any values are missing, find values that satisfy the equation, and keep generating them until you run out of possibilities.”

That’s a bit of a mouthful, so let’s look at just F1. Supposing F=8 and F2=5, the backtracking engine would search all values of F1, and return just the matching value of 3. Now of course, it can’t search all values, because that means you’d be waiting forever, so there are pruning algorithms at work here.

But the same logic can work with any combination of arguments, so if both F1 and F2 were missing, then the backtracking engine would run through all possible combinations of values (pruned appropriately) until it found a combination that would work.

In this case, since in our example F=8, it would return a bunch of combinations, starting with (F1=1, F2=7), (F1=2, F2=6) and so on. But why, then, you ask, does it only return (F1=3, F2=5)? That’s because each value F1 also has to satisfy fib(N1,F1), which means that F1 has to be a Fibonacci number, as does F2.


This is the part where Raku breaks down a little bit. But what I think I might be able to do is use a trick I used a while ago, relying on the fact that operators are just functions, and they dispatch just like other functions. So I should be able to start out with something crude like:

my $F = :lhs(3) );
my Value ($F1, $F2);
$F = $F1 + $F2;

This way both $F1 and $F2 are bound to backtracking Values, they return a Operator, and the Operator is part of the backtracking engine. This way once the Operator engine determines the range of possible combinations of $F1 and $F2 that add to 3, it can assign them concurrently to @F1.value and @F2.value.

Smooth Operators

The Operator and Value classes, along with their overloaded operators, would look something like this:

class Operator {
  has ( $.lhs, $.rhs ); }
class PlusOperator is Operator { }
class AssignmentOperator is Operator {
  method make-combinations() {...} }
class Value {
  has @.value }
multi infix:<=>( Operator $lhs, Operator $rhs ) { :$lhs, :$rhs );
multi infix:<+>( Value $lhs, Value $rhs ) { :$lhs, :$rhs );

This is purely a sketch that I haven’t tried out at all. My idea here is that once you’ve executed ‘$F = $F1 + $F2’, $F will be an AssignmentOperator instance. You should be able to call $F.make-combinations() that will solve the equation ‘3 = $F1 + $F2’ for all (constrained) values of $F1 and $F2.

That would populate @F1.values and @F2.values with (1,2) and (2,1) respectively. I’m about to play my first game of Azul, so I’ll leave the article here. The next article will hopefully implement this so you can see this all working. It won’t quite be a true backtracking engine, but it’s a start.

Dear Reader, thank you for your attention, and please feel free to add comments, questions and suggestions.

Quantum Tunneling

Introducing the new Perl Fisher site

Before I get on to the meat of the article, welcome to the new home of The Perl Fisher. I intend to cover both Perl 5 and Raku programming here, but it’ll be mostly Raku content because that’s the language I find the most fun. Please excuse the dust, I’m still very much settling into the new home, and the overall look of the site is bound to change while I play with the new toys available to me.

Defeating Thanos with Raku

Don’t worry, no spoilers here. We’re just going to talk about a little-known feature of Raku, the quantum-tunneling variable type. If you’ve worked with Raku for any length of time, you’ve probably seen or written a class declaration that looks something like below.

class Point2D {
  has Real $.x;
  has Real $.y;

While the word ‘has’ does the real work, second-sigil syndrome strikes as well, in the shape of the ‘.’ between the scalar sigil ‘$’ and the variable name. Here it’s syntactical sugar for being an attribute name, but we can enlarge that ‘.’ to a ‘*’ and open up a world of possibilities.

When we add the ‘*’ sigil to a variable name, we turn that variable into one that can quantum tunnel between scopes and solve problems that you probably used to do with a global variable. You can read more about dynamic variables and how they differ from ordinary globals at The_*_twigilat

Testing, testing

I’m working on a project to try to augment the Raku grammar debugger with an emulator. The tools we have on CPAN and respectively are wonderful, but they’re limited because Raku compiles grammar rules down to single methods, which is wonderful for speed, but makes it almost impossible to look into.

The grammar I’m writing isn’t important at the moment, but the testing part is. Below is a sample subtest that I’m writing for each term of a grammar that’s probably going to have ~50 terms by the time I’m done.

subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b', 'binary-number' );
    ok fails( '3g', 'binary-number' );

  is build-ast( '0101', 'binary-number' ), 5;

This tests the ‘binary-number’ rule to see if it properly fails on ‘0b’ and ‘3g’. ‘0b’ fails because it’s the prefix of a binary number, and ‘3g’ because neither 3 nor ‘g’ are binary digits. It also makes certain that ‘0101’ gets translated into the decimal number 5. All important when testing a grammar that parses … well, itself eventually. Oroborous redux, as it were.

Dry up, will you…

The test is simple, and straightforward. ‘0b’ should fail, ‘0101’ should be built into a node of an abstract syntax tree. But it’s got some flaws. It talks too much. See how ‘binary-number’ repeats itself? If I want to copy that, rename it to ‘hex-number’ and add a few changes, I have to copy the block, rename all the incidences of ‘binary-number’ to ‘hex-number’ and then fix the existing tests.

Thus I run the risk of forgetting to update the name ‘binary-number’. And there’s an even greater bugaboo there. If I don’t, the test won’t fail. Because the subtest doesn’t know that it’s supposed to be testing the ‘binary-number’ rule. There are a bunch of ways to solve this problem, of course, but for this post we’re going to use Ant-Man(tm).

Entering the Quantum Realm

I don’t want to do too much work here, I just want to get rid of the duplicate ‘binary-number’ entries. So, let’s take a look at what fails() does.

sub fails( Str $sample, Str $rule-name ) returns Bool {
  !?( $g.parse( $sample, :rule( $rule-name ) );

The ‘!?(…)’ casts $g.parse(…) to a Boolean and negates it, so if $g can’t parse the statement, it returns True. So, first let’s make $rule-name optional. 

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $rule-name ) );

Opening the wormhole

Now, we’re going to summon Ant-Man(tm). Remember earlier I mentioned that quantum variables use a wormhole? Well, we’re going to open one end of the wormhole right here in our fails() function, just like this.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $*ANT-MAN // $rule-name ) );

Rerun our tests, and … wait, they should fail, we haven’t declared $*ANT-MAN anywhere! Well, just like in quantum physics, $*ANT-MAN doesn’t have enough energy to tunnel over the quantum barrier because we haven’t defined him yet. 

So let’s do that, but remember that $*ANT-MAN is a quantum variable, so he can tunnel through the quantum barrier of a function scope. In fact, he can tunnel through any number of them. So, let’s define a new version of subtest() that looks and acts like the old one first before we go boldly where no Raku programmer has gone before.

sub Subtest( Str $rule-name, Block $test-code ) {
  subtest $rule-name, $test-code;

We should be able now to replace the outer subtest() block with our new Subtest() block, and it should act just as it used to.

Subtest 'binary-number', {
  subtest 'failing', {
    ok fail( '0b', 'binary-number' );

Tunneling through

Our test suite still works, and the output still is what we expect. Now, let’s give $*ANT-MANenough energy to tunnel through the quantum barrier by defining him as the subtest name we want:

sub Subtest( Str $rule-name Block $test-code ) {
  my $*RULE-NAME = $rule-name;
  subtest $rule-name, $test-code;

And now run our test suite. Which… doesn’t change. Come to think of it, we don’t want it to change. If it did change, we’d have to go through and change all of our test suites, which would be bad. So, putting things together, this code works just fine.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $*ANT-MAN // $rule-name ) );
sub Subtest( Str $rule-name Block $test-code ) {
  my $*RULE-NAME = $rule-name;
  subtest $rule-name, $test-code;

Subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b', 'binary-number' );

Notice by the way that $*ANT-MAN has tunneled through not one but two function signatures to get to where he is. And to prove it, finally, delete the inside ‘binary-number’.

Subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b' );

So ‘binary-number’ gets passed along to $*ANT-MAN who jumps into the quantum realm, tunnels through the outer and inner pairs of braces, and finally lands in fails() where he passes the value on to the :rule() declaration. Whew, that’s a lot of work.

Oh, snap.

(sorry, couldn’t resist.) If you’re like me, and I know I am, you’ve probably come across a few cases where this technique would come in handy. Especially if you’re dealing with legacy code. Sometimes you need to add just one little flag to a function and set that flag in a top-level handler on one page of a website.

The catch is that between the lower level and the top level there’s a chain of 8 function calls where you have to add that as a parameter. Wouldn’t it be nice if there was a workaround? Well, in Raku there is.

Thanks for getting all the way to the bottom of this, my inaugural article on Raku on the next generation of the Perl Fisher website. Feel free to leave comments and constructive criticism in the comments section below, and come back every so often to watch the website grow over the coming months.

Create your website at
Get started