Bicycling saves lives

Perl’s prove is a great tool for running TAP-based test suites. I use it for a large project with about 20,000 tests. Our test suite runs multiple times per day. At that scale and frequency, the run time is a concern.

To address that, prove has -j option, allowing you to run tests in parallel. Unfortunately, in our large test suite not all tests are “parallel ready”. I needed a way to run most tests in parallel, but mark a few to be run in sequence. Ideally, I’d like these tests to be part of the same run as the parallel tests.

After much digging around, I found that prove has had support for adding exceptions to parallel tests runs since in 2008. The feature has just never been documented in prove, making it difficult to discover. Here’s an example which works with the currently released version of prove, 3.25:

# All tests are allowed to run in parallel, except those starting with "p"
--rules='seq=t/p*.t' --rules='par=**'

For more details and documentation, you can see my related pull request. You can leave a comment there to advocate for or against merging the the patch, or you can download the raw patch directly to apply to your own copy of the Test-Harness-3.25 distribution.

*UPDATE 9/14/2012: * It turns out it doesn’t work quite as I thought. I tried tweaking some of the internals of prove further, but hit a bug. See the details in my follow-up to the Perl-QA list. I changed approached and started working on making my exceptional tests parallel-safe. Perhaps I can cover some of the techniques I used for that in a future post.

leaves falling off the tree

For me, one of the promises of PSGI and Plack is to get away from programming with global variables in Perl, particularly being able to modify the request object after it’s been created. Long ago, CGI.pm replaced using a global

%FORM
hash with a “query object”, but it essentially has been used as always-accessible read/write data structure.

It would appear at first look that improving this might a fundamental part of the new system: A lexical environment hashref is explicitly passed around, as seen here:

Plack::Request->new($psgi_env);

vs the old implicit grab from the environment:

CGI->new;

You might hope to be escaping some of th action-at-a-distance badness of global variables, like having a change to environment alter your object after the object is created. You might like avoid having changes made to your object avoid changing the global environment as well.

Not only does Plack::Request require an explicit environment hash to be passed in, both nearly all methods are read-only, including a param() method inspired by a read/write method from CGI.pm of the same name. That’s all good.

This would all seem to speak to safety and simplicity for using Plack::Request, but the reality turns out to be far muddier than you might hope. I encourage you to down and run this short, safe interactive perl script which illustrates some differences. It shows that:

  • Plack::Request objects can be altered after they are created by changing the external environment.
  • Modifying a Plack::Request object can potentially alter the external environment hash (Something which CGI.pm explicitly does not allow).

In effect, the situation with global variables is in some regards worse. Plack::Request provides the impression that there is a move away from action-at-distance programming, but the fundamental properties of being affected by global changes and locally creating them are still present.

On the topic of surprising read/write behavior in Plack::Request, you may also interested to note the behavior of query_parameters(), body_parameters() and parameters() is not consistent in this regard. I submitted tests and suggestion to clarify this, although that contribution has not yet been accepted.

Here’s the deal: The hashref returned by query_parameters() and body_parameters() and parameters() are all read/write — subsequent calls to the same method return the modified hashref.

However, modifying the hashes returned by body_parameters() or query_paremeters() does not modify the hashref returned by parameters(), which claims to be a merger of the two.

It seems that either all the return values should be read-only, ( always returning the same values ), or if modifying them is supported then the parameters() hash should be updated when either of the body_parameters() or query_parameters() hashes are updated.

Reflections

An incoming HTTP request to your server is by it’s nature read-only. It’s analogous to a paper letter being delivered to you be postal mail.

It’s a perfect application for an immutable object object design that Yuval Kogman eloquently advocates for. Plack::Request comes close to implementing the idea with mostly read-only accessors, but falls short. The gap it leave unfortunately carries forward some possibilities for action-at-distance cases that have been been sources of bugs in the past. I’d like to see Plack::Request, or some alternative to it, with the holes plugged: It should copy the input, not modify it by reference, and parameter related methods should also return copies rather than reference to internal data structures.

rocks near Greens Fork, Indiana

In my previous post I summarized the current state of Percent-encoding in Perl. One of my conclusions was that the perfect percent-encoding solution would automatically handle UTF-8 encoding, using logic like this:

utf8::encode $string if utf8::is_utf8 $string;

Respected Plack author miyagawa quickly responded in a response post to say that the above code approach is a bug, although the code pattern is already wide use as it is present in Catalyst, CGI.pm (and by extension CGI::Application and other frameworks) as well as Mojo.

In one sense, he’s right. The pattern goes against the advice found in the official perlunifaq documentation which states that

It’s better to pretend that the internal format is some unknown encoding, and that you always have to encode and decode explicitly.

In other words: don’t use the “is_utf8()” function.

Before drawing a conclusion whether this code pattern is the best design in in practice, let’s look some related facts about the matter.

This post is about the current state of URI encoding in Perl. This is the problem space of being able to safely pass arbitrary text into and out of a URI format. If you’ve even seen a space in URL represented as “%20”, that’s the topic of the moment.

The best general introduction I’ve found on the topic is the Wikipedia page on Percent-encoding.

RFCs on the topic include the 2005 RFC 3986 that defined the generic syntax of URIs. It replaces RFC 1738 from 1994 which defined Uniform Resource Locators (URLs), and RFC 1808 from 1995 which defined Relative Uniform Resource Locators. Sometimes this transformation is called “URI escaping” and sometimes it’s refered to “URL encoding”. RFC 3986 clarified the naming issue:

“In general, the terms “escaped” and “unescaped” have been replaced with “percent-encoded” and “decoded”, respectively, to reduce confusion with other forms of escape mechanisms.”

Elsewhere it’s clarified that percent encoding applies to all URIs, not just URLs.

I think the Perl community would do well to adopt “percent encode URI” and “percent decode URI” as ways to describe this process that is unambigous and in line with the RFC.

There are two URI percent-encoding solutions in Perl that seem to be in the widest use. Both have a significant deficiency.

A midwinter box bike ride
Recently I’ve been reviewing how various Perl frameworks and modules generate HTTP headers. After reviewing several approaches, it’s clear that there are two major camps: those which put the response headers in a specific order and those which don’t. Surely one approach or the other would seem like it would be more spec-compliant, but RFC 2616 provides conflicting guidance on this point.

The bottom line is that spec says that “the order in which header fields with differing field names are received is not significant”. But then it goes on to say that it is a “good practice” (and it puts “good practice” in quotes) to order the headers a particular way. So, without strict guidance from the spec about the importance of header ordering, it would be interesting to know if header order caused a problem in practice.

The Plack::Middleware::RearrangeHeaders documentation suggests there is some benefit to strict header ordering: “to work around buggy clients like very old MSIE or broken HTTP proxy servers”

You might wonder what the big deal is— why not just stick to the “good practice” recommendation all the time? The difference can be seen in the benchmarks provided by HTTP::Headers::Fast. By ignoring the good-practice header order, an alternate implementation was able to speed-up header generation to be about twice as fast. Considering that a web-app needs to generate a header on every single request, making header generation smaller and faster is potentially a tangible win, while also still being spec-compliant.

box biking at 10F

I don’t tap my own phone. I don’t xerox postcards before I mail them back from vacation. I don’t take a voice recorder when I go out with friends. And I don’t have a copy machine at home to duplicate hand written notes I may send.

But if I send a message of equal importance by e-mail, then my e-mail program will automatically save a copy of every one of these messages.

E-mails I don’t need waste my time. They increase the time it takes to search and browse through old email. They increase the time it takes for my email to “sync” when I want to go offline. To continue to save every e-mail I send perpetuates the unsustainable myth that as long as our actions are online they are “green”.

retired radiators
The PDF spec includes an option to cause PDFs to open full screen when users open them. I’m a fan of the feature because it maximizes screen real estate and creates a simple, focused, experience for the PDF readers. Using this option is one of my two essential tips for creating an impactful newsletter targeted at being read online. The other tip is to use a “portrait” format document, to match the shape of most screens.

Many PDF viewers respond to PDFs that are set to open full screen, but a number of PDF generation tools don’t provide you option to set this preference when creating PDFs. I ran into this with Xournal which is a nice application for Linux-based tablets, but offers no PDF export options.

So I found a way to update a pre-existing PDF to set the preference to have it open full screen by default. The key here is that PDF is a text-based format, so preferences in it can be updated manually by opening and editing the file according to the PDF spec, or the same effect can be accomplished with automated tools. In this case, I found that I needed to update a line that started like this:

<< /Type /Catalog

After /Catalog, this is all that needed to be added:

/PageMode /FullScreen

I automated this with a simple script that I named make-pdf-full-screen.sh. It works for the simple case when no “PageMode” has been declared, as in the Xournal case. I don’t expect it would update the PageMode properly if it was already declared. For a safer solution consider opening the PDF in a text editor to manually set “/PageMode /Fullscreen” on the initial /Catalog line. Alternatively, you could use a formal solution like PDF::API3::Compat::API2 which appears to have the features needed to solve this with Perl.

Here’s the contents of my little script to automate the update:

#!/bin/sh
# usage: make-pdf-full-screen.sh file.pdf
#   The file will be modified in place so that it opens full screen.
#   The current approach is naive... it assumes no Initial View has been defined.
# by Mark Stosberg
perl -pi -e 's?<< /Type /Catalog?<< /Type /Catalog /PageMode /FullScreen?' $1

Kent and Kurt on the Whitewater Gorge Trail

A few weeks ago I had my laptop stolen. Earlier that morning I had been reflecting and writing on the laptop about the intersection of our spiritual lives with our digital lives. And then, as if by divine intervention, my laptop disappeared— during church service no less— and I was given an even greater opportunity to answer the question: When we spent more time browsing the web, what is that we are doing less of? When we spend more checking e-mail, what are we doing less of? And when we spend more time on Facebook, what are we spending less time doing? Apparently, the answer in my case is cleaning is my desk and organizing the garage. Those are the things I did more when I could do the the internet less. I joke about this, but I do envision my home as a place of rest and rejuvenation, yet I let clutter accumulate while I spent more time on my computer doing “productive” things.

There are many implications of shifting our increasingly precious free time online. Today I’d like to delve into the carbon footprint of our online lives.

You can use the audio player here to listen to a 15 minute version of the message delivered at my church, (or you can also download the audio file.)

The message continues below the jump.

baby sleeps again I’ve spent a lot time recently triaging bugs for CGI.pm. I’ve enjoyed the process, and respect CGI.pm as a widely used Perl module. I’m not in love all aspects of module. I don’t use or recommend the HTML generation features— I recommend using HTML template files and HTML::FillInForm for filling them.

Whenever I think about how I’d like to change CGI.pm,what I have mind is often the same choice that CGI::Simple made. There was a time years ago that I focused my attention on CGI::Simple and tried it in production, only to be bit by a compatibility issue, so I reverted back to CGI.pm. I don’t remember what the specific issue, and it’s likely been fixed by now. But the pragmatic point remained with me: CGI::Simple may have clean code and a good test suite, but it’s not necessarily free of defects and in particularly it lacks the vastly larger user base that CGI.pm has to provide real world beta testing.

new bikes-at-work trailerGet off the couch and pull your weight—
There’s CGI.pm bug with your name on it.

There were nearly 150 active entries in the CGI.pm bug tracker when I was approved recently as a new co-maintainer. As I had time in the evenings after the baby was sleep, I went through and reviewed every one of these bug reports. Many had already been addressed by Lincoln some time ago. Those were simply closed. Still, I found about 20 fairly ready-to-go patches, and those have now been processed and released today as CGI.pm 3.45. Whenever code changes were made, I also strived to make sure new automated tests were added that covered those cases. You may be surprised how many methods in CGI.pm have no automated tests for them at all.

Now there are still about 50 open issues in the CGI.pm bug tracker. For these, I have tried to use the subject line some summary indication of what is needed to move it forward, like “Needs Test: “, or “Needs Peer Review: ” or “Needs Confirmation”. Generally, I plan to wait patiently for volunteers to help with these. If you use CGI.pm, consider helping to move one of these forward.

Recent Comments

Close