The Perfect Day

What can you do to make today perfect? Of the parts that you can control, what will you do to make the day the best day possible?

On the day I took this photo, I was playing the Perfect Day game. I work up early, forced myself out of bed and squeezed in a 20 mile bike ride on a old favorite route when the air was still cool on what would be a hot summer day.

Frankly, I don’t remember much else from that day, but I remember this part. I remember this S curve between corn field and horse pasture, just before the road makes a sharp right turn into the woods and down Abington hill. I remember my eyes tearing up as the speedometer flirted with 50 and the forest blurred into green. Then just as fast, the hill flattening out again and safety returning, like a calculated roller coaster.

I’ve forgotten that Friday, but I remember this small perfection.

Bicycling saves lives

Perl’s prove is a great tool for running TAP-based test suites. I use it for a large project with about 20,000 tests. Our test suite runs multiple times per day. At that scale and frequency, the run time is a concern.

To address that, prove has -j option, allowing you to run tests in parallel. Unfortunately, in our large test suite not all tests are “parallel ready”. I needed a way to run most tests in parallel, but mark a few to be run in sequence. Ideally, I’d like these tests to be part of the same run as the parallel tests.

After much digging around, I found that prove has had support for adding exceptions to parallel tests runs since in 2008. The feature has just never been documented in prove, making it difficult to discover. Here’s an example which works with the currently released version of prove, 3.25:

# All tests are allowed to run in parallel, except those starting with "p"
--rules='seq=t/p*.t' --rules='par=**'

For more details and documentation, you can see my related pull request. You can leave a comment there to advocate for or against merging the the patch, or you can download the raw patch directly to apply to your own copy of the Test-Harness-3.25 distribution.

**UPDATE 9/14/2012: ** It turns out it doesn’t work quite as I thought. I tried tweaking some of the internals of prove further, but hit a bug. See the details in my follow-up to the Perl-QA list. I changed approached and started working on making my exceptional tests parallel-safe. Perhaps I can cover some of the techniques I used for that in a future post.

leaves falling off the tree

For me, one of the promises of PSGI and Plack is to get away from programming with global variables in Perl, particularly being able to modify the request object after it’s been created. Long ago, replaced using a global <pre>%FORM</pre> hash with a “query object”, but it essentially has been used as always-accessible read/write data structure.

It would appear at first look that improving this might a fundamental part of the new system: A lexical environment hashref is explicitly passed around, as seen here:


vs the old implicit grab from the environment:


You might hope to be escaping some of th action-at-a-distance badness of global variables, like having a change to environment alter your object after the object is created. You might like avoid having changes made to your object avoid changing the global environment as well.

Not only does Plack::Request require an explicit environment hash to be passed in, both nearly all methods are read-only, including a param() method inspired by a read/write method from of the same name. That’s all good.

This would all seem to speak to safety and simplicity for using Plack::Request, but the reality turns out to be far muddier than you might hope. I encourage you to down and run this short, safe interactive perl script which illustrates some differences. It shows that:

  • Plack::Request objects can be altered after they are created by changing the external environment.
  • Modifying a Plack::Request object can potentially alter the external environment hash (Something which explicitly does not allow).

In effect, the situation with global variables is in some regards worse. Plack::Request provides the impression that there is a move away from action-at-distance programming, but the fundamental properties of being affected by global changes and locally creating them are still present.

On the topic of surprising read/write behavior in Plack::Request, you may also interested to note the behavior of query\_parameters(), body\_parameters() and parameters() is not consistent in this regard. I submitted tests and suggestion to clarify this, although that contribution has not yet been accepted.

Here’s the deal: The hashref returned by query\_parameters() and body\_parameters() and parameters() are all read/write – subsequent calls to the same method return the modified hashref.

However, modifying the hashes returned by body\_parameters() or query\_paremeters() does not modify the hashref returned by parameters(), which claims to be a merger of the two.

It seems that either all the return values should be read-only, ( always returning the same values ), or if modifying them is supported then the parameters() hash should be updated when either of the body\_parameters() or query\_parameters() hashes are updated.


An incoming HTTP request to your server is by it’s nature read-only. It’s analogous to a paper letter being delivered to you be postal mail.

It’s a perfect application for an immutable object object design that Yuval Kogman eloquently advocates for. Plack::Request comes close to implementing the idea with mostly read-only accessors, but falls short. The gap it leave unfortunately carries forward some possibilities for action-at-distance cases that have been been sources of bugs in the past. I’d like to see Plack::Request, or some alternative to it, with the holes plugged: It should copy the input, not modify it by reference, and parameter related methods should also return copies rather than reference to internal data structures.

rocks near Greens Fork, Indiana

In my previous post I summarized the current state of Percent-encoding in Perl. One of my conclusions was that the perfect percent-encoding solution would automatically handle UTF-8 encoding, using logic like this:

utf8::encode $string if utf8::is_utf8 $string;

Respected Plack author miyagawa quickly responded in a response post to say that the above code approach is a bug, although the code pattern is already wide use as it is present in Catalyst, (and by extension CGI::Application and other frameworks) as well as Mojo.

In one sense, he’s right. The pattern goes against the advice found in the official perlunifaq documentation which states that

It’s better to pretend that the internal format is some unknown encoding, and that you always have to encode and decode explicitly.

In other words: don’t use the “is_utf8()” function.

Before drawing a conclusion whether this code pattern is the best design in in practice, let’s look some related facts about the matter.

  • The W3C is clear that it is a best practice to convert text to UTF-8 before percent encoding it.
  • However, as Miyagawa points out, only newer protocols, since about 2005, are expected to follow that. The HTTP protocol is far older, and it is valid to pass binary data through HTTP query strings.
  • But conveniently, Perl reads binary data as 8-bit characters. You can read in /usr/bin/perl, pass it to uri_escape, and it will percent-encode it just fine– without blowing up due to the “hi-bit” characters that some UTF-8 characters correspond to.
  • In practice, the approach of automatic encoding used by Catalyst and has not been problematic. From reviewing both both bug queues, I don’t see any open or previously-closed bugs that haven caused by the current behavior. It does seem reasonable and logical that it should be safe to UTF-encode text that is marked as UTF-8.
  • People would reasonably expect the percent-encode/decode process to be a symmetric round trip– you should get back exactly the data that you started with. When UTF-8 encoding is added to the process, this isn’t necessarily the case, since UTF-8 encoding happens only before percent-encoding. The reverse does not happen when percent-decoding. The one way trip to UTF-8 is probably what you wanted, but since the conversion is automatic, the intent of the programmer is not explicit.
  • The most popular percent-encoding solutions for Perl– and URI::Escape– both pre-date proper Unicode support in Perl, and both include built-in code to re-implement UTF-8 functionality in case you are using Perl 5.6 which lacks proper Unicode support. So in the context of designing those modules, it wasn’t really an option to tell people just call encode($string) if they wanted to UTF-8 encode their text before it was URI-encoded.

So, then, which of these two designs represents a best practice for handling UTF-8 encoding in combination with percent-encoding in Perl?

  1. First option: The popular solution used in practice to UTF-8-encode data if the UTF-8 flag is set. The approach automatically handles the recommended practice of UTF-8 encoding so users don’t have to think about it.
  2. Second option: UTF-8 handling should not built-into a percent-encoding solution. Following the advice of perlunifaq, an ideal solution would not check the UTF-8 flag, but instead would instead offer clear documentation that advises the user about the best practices about possibly UTF-8 encoding text before it is percent-encoded. This design cleanly externalizing all the character encoding issues that sometimes get bundled with percent-encoding, such as translating from EBCDIC or UTF-16 surrogate pairs. It provides users a bit of education about UTF-8 best practices which they may able to apply in other areas. As long as UTF-8 handling is attempted to be automatic, programmers can continue to stay in the dark that the best practice is for them to explicitly handle character encodings themselves. The solution might be as simple as documenting this in the synopsis:

use Encode 'encode\_utf8'; $uri = uri\_percent\_encode(encode\_utf8($string));

Note that URI::Escape currently essentially already has the code part of the second alternative in place. It offers percent-encoding without any character encoding, as well as uri_escape_utf8, which is basically just sugar for the code sample above.The URI::Escape documentation explains in precise technical terms what the two alternatives do, but offers very little guidance on which method to choose–the plain one or the UTF-8 one? It doesn’t address which reflects a best practice, or cover the possiblity of any drawbacks to choosing the UTF-8 version. A Google code search shows that there are about 10x more hits for “uri_escape()” vs “uri_escape_utf8()”. I suspect that many of these cases would benefit from being able to handle UTF-8 characters, but the programmers weren’t really educated about when to choose that option. Considering that masses of programmers aren’t suddenly going to become UTF-8 experts, it’s clear to me that an ideal solution should go with one of the option above: Either try to handle UTF-8 automatically, or offer strong guidance in documentation on percent-encoding about best practices regarding character encodings. I think with clear, informative documentation, the second option could be the better way to go.

What do you think?

This post is about the current state of URI encoding in Perl. This is the problem space of being able to safely pass arbitrary text into and out of a URI format. If you’ve even seen a space in URL represented as “%20”, that’s the topic of the moment.

The best general introduction I’ve found on the topic is the Wikipedia page on Percent-encoding.

RFCs on the topic include the 2005 RFC 3986 that defined the generic syntax of URIs. It replaces RFC 1738 from 1994 which defined Uniform Resource Locators (URLs), and RFC 1808 from 1995 which defined Relative Uniform Resource Locators. Sometimes this transformation is called “URI escaping” and sometimes it’s refered to “URL encoding”. RFC 3986 clarified the naming issue:

“In general, the terms “escaped” and “unescaped” have been replaced with “percent-encoded” and “decoded”, respectively, to reduce confusion with other forms of escape mechanisms.”

Elsewhere it’s clarified that percent encoding applies to all URIs, not just URLs.

I think the Perl community would do well to adopt “percent encode URI” and “percent decode URI” as ways to describe this process that is unambigous and in line with the RFC.

There are two URI percent-encoding solutions in Perl that seem to be in the widest use. Both have a significant deficiency.

fallen tree in Kentucky

Percent-encoding with

The first is CGI::Util which provides escape() and unescape() as a pair. This solution has a lot going for it– it’s been in the core for years, it works back to Perl 5.6, it automatically handles UTF-8 encoding, and it handles some edge cases like EBCIDIC encoding and UTF-16 surrogate pairs. Further you can use escape() and unescape() without using the rest of or ever creating a object. There’s just one major deficiency: These methods have never been documented! Many take advantage of them by using directly or indirectly, as uses them internally. A few people have found them and use them directly. As someone with commit access to the repo, I’ll be documenting them shortly, once I’m done with the detour that became this post.

Percent-encoding with URI::Escape

Probably the most intentionally widely used module for URI percent encoding is URI::Escape. URI::Escape is not in the core, but the URI distribution depends only on MIME::Base64, and that module is not actually needed for the URI::Escape functionality. Like, URI::Escape also advertises support back to Perl 5.6.1. It does not handle EBIDIC or UTF-16 surrogate pairs, but as I’ll explain later, it’s questionable whether those abilities are truly desirable to be built-in to a percent-encoding solution. The deficiency with URI::Escape is that doesn’t handle UTF-8 automatically like most other solutions do. Many perl scripts and modules have called URI::Escape::uri\_escape expecting that it will always “just work” for encoding all text. Instead, you have to explictly ask for UTF-8 handling by calling uri\_escape\_utf8() instead. To credit URI::Escape, it has clearly documented how it behaves in this regard, but it seems like a missed opportunity to handle UTF-8 input automatically. By contrast, most other solutions handle either case automatically with a single line like this:

utf8::encode $_[0] if utf8::is_utf8 $_[0];

RFC 3986 is quite clear that UTF-8 encoding should be part of the solution:

“Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent- encoded to be represented as URI characters”

URI::Escape is likely suffering from being far older than RFC 3986, and added a new method specific to UTF-8 to keep uri\_escape() perfectly backwards compatible. In hindsight from 2010, I think that was an unfortunate choice.

Summary of all known percent-encoding solutions for Perl

I researched further to see what other percent-encoding solutions for Perl and how they differ. Here’s what I found, including and URI::Escape again for completeness.


Has the benefit of being in the core, but the drawback of being undocumented as of version 3.50.

  • Names: escape / unescape
  • Min Perl version: 5.6.0
  • Handles UTF-8 handling: Yes, on Perl 5.8 and newer
  • Notes: also handles EBCIDIC and UTF-16 surrogate pairs.


CGI::Simple 1.112 Appears to have a bug regarding RFC 2396, section 2.2, concerning reserved characters. It explicitly translates spaces to “+”, unlike most other solutions here which translate it to %20. It also lacks automatic UTF-8 handling. It’s implementation is notably not compatible with the one in, as some would assume.

  • Names: url_encode / url_decode
  • Min Perl Version: 5.6.1
  • Handles UTF-8 encoding: No.
  • Notes: The implemention here isn’t the same as a second one in the distribution, in CGI::Simple::Util.


A second percent-encoding in CGI::Simple 1.112, it is not compatible with’s implementation either. Compare:

CGI::escape                  å -> %C3%A5%20X
URI::Escape::uri_escape_utf8 å -> %C3%A5%20X
CGI::Simple->url_encode      å -> %E5+X
CGI::Simple::Util::escape    å -> %E5%20X
  • Names: escape / unescape
  • Min Perl Version: 5.6.1
  • Handles UTF encoding: No.
  • Notes: Handles EBCIDIC encoding, inherited from before the fork.


Mojo::Util 0.999941 provides a modern, simple implementation with automatic UTF-8 encoding. My gripes with it are that the names say “url” and “escape” instead of “uri” and “encode” to follow the RFCs more closely. It also doesn’t allow you to use a rather normal syntax: Mojo::Util::url_escape('å').. That’s because Mojo has uses the unconventional impementation of modify the input by reference instead of returning a modified copy. Presumably this is done for performance.

  • Names: url_escape / url_unescape
  • Min Perl Version: 5.8.7.
  • Handles UTF-8 encoding: Yes.


Tie::UrlEncoder 0.02 provides a unique interface through a %urlencode hash. However, it doesn’t provide a decoding routine. Basic UTF-8 tests pass for it, but the solution employed is unothorodox. Instead of calling UTF-8 related functions, it calls use bytes;. Official Perl documentation is clearly opinionated this approach. In perlunifaq, it says plainly “Don’t use it.” in regard to use bytes;.

  • Names: %urlencode
  • Min Perl version: 5.6.
  • Handles UTF-8 encoding: The implemention does not follow best practices. See above.


Not be confused with URI::Escape, URI::Encode is meant to be a newer and simpler take on the problem space. It offers automatic UTF-8 encoding, and includes an option on whether are not to include reserved characters– The option to not encode reserved characters is nice for those who know what they are doing. Unfortunately, it has a poor object-oriented UI. It offers a constructor which does nothing, when the reserved characters option could be used as option there. Then, it doesn’t document that you can call the key methods as class methods to bypass the do-nothing constructor. While it also offers a procedural interface, it’s implemented in terms of calling the do-nothing constructor every time, adding an unnecessary penalty.

  • Names: uri_encode / uri_decode
  • Min Perl version: Perl 5.8.1.
  • Handles UTF-8 encoding: Yes


URI::Escape provides three APIs, two that don’t handle UTF-8 encoding and one that does. It’s popular, works well and is well documented. It’s main drawback is that UTF-8 encoding is not automatic in uri\_escape() and as a result and has not been used by many applications, when UTF-8 support here could have otherwise been a free benefit. uri\_escape_utf8() can be used for UTF-8 support.

  • Names: uri_escape / uri_escape_utf8 / uri_unescape / %escapes
  • Min Perl version: 5.6.1
  • Handles UTF-8 encoding: Not automatically


It sounds like a module that’s compatible with URI::Escape, only faster due to a C-based XS implementation. It does benchmark to be much faster, and it is somewhat compatible, but it lacks a uri_escape_utf8 method, which could be a valuable addition for better compatibility. Instead, it has a uri_escape method that includes UTF-8 support automatically. It also has a higher minimum Perl requirement– Perl 5.8 vs 5.6, which is another important difference that’s not documented. As an additional benfit, Any::URI::Escape exists which will use the XS version if it exists, and the Pure-Perl version otherwise. The wrapper module also unfortunatley glosses over the difference in UTF-8 handling in the XS version and the pure-Perl version.

  • Names: uri_escape / uri_unescape
  • Min Perl version: 5.8.1
  • Handles UTF-8 encoding: Yes
  • Notes: Requires a C-compiler (but very fast)


All the URI percent encoding solutions I reviewed had flaws, but the pieces are all there to produce an optimal solution. Here’s my recommendation for designing a perfect solution:

  • Name the module URI::PercentEncode;
  • Name the functions uri\_percent\_encode() and uri\_percent\_decode()
  • Return the changed value (don’t modify by reference)
  • Require at least Perl 5.8.1. Supporting older version is unnecessary baggage at this point.
  • Don’t build in support for getting data into UTF-8 beyond a simple call to utf8::encode(). Anything else belongs in the domain of the “Encode” module. If I’ve wrong about including support for UTF-16 surrogate pairs in a percent encoding solution, let me know.
  • Automatically handle UTF-8 encoding (like this: utf8::encode $\_[0] if utf8::is\_utf8 $\_[0];)
  • Use faster XS-based code by default, but allow building a Pure-Perl version for those who need or want it. (Follow the model of Params::Validate here).

Any volunteers?

That’s my take on URI percent-encoding in Perl. What do you have to add?

Update: *See the reply by miyagawa who states that this code is a bug: utf8::encode $\_[0] if utf8::is\_utf8 $\_[0];. It is used by CGI::Util, Mojo::Util in the versions given above as well as in Catalyst. URI::Escape and URI::Encode do UTF-8 encoding without checking the UTF-8 flag. He has more experience with UTF-8, and I defer to his advice here. *