Results tagged “perl”

Bicycling saves lives
Perl's [prove]( is a great tool for running [TAP]( test suites. I use it for a large project with about 20,000 tests. Our test suite runs multiple times per day. At that scale and frequency, the run time is a concern. To address that, `prove` has `-j` option, allowing you to run tests in parallel. Unfortunately, in our large test suite not all tests are "parallel ready". I needed a way to run *most* tests in parallel, but mark a few to be run in sequence. Ideally, I'd like these tests to be part of the same run as the parallel tests. After much digging around, I found that `prove` has had support for adding exceptions to parallel tests runs since in 2008. The feature has just never been documented in `prove`, making it difficult to discover. Here's an example which works with the currently released version of `prove`, 3.25: # All tests are allowed to run in parallel, except those starting with "p" --rules='seq=t/p*.t' --rules='par=**' For more details and documentation, you can see my related [pull request]( You can leave a comment there to advocate for or against merging the the patch, or you can download the [raw patch]( directly to apply to your own copy of the Test-Harness-3.25 distribution. **UPDATE 9/14/2012: ** It turns out it doesn't work quite as I thought. I tried tweaking some of the internals of `prove` further, but hit a bug. See the details in [my follow-up to the Perl-QA list]( I changed approached and started working on making my exceptional tests parallel-safe. Perhaps I can cover some of the techniques I used for that in a future post.
leaves falling off the tree
For me, one of the promises of [PSGI and Plack]( is to get away from programming with global variables in Perl, particularly being able to modify the request object after it's been created. Long ago, replaced using a global
hash with a "query object", but it essentially has been used as always-accessible read/write data structure. It would appear at first look that improving this might a fundamental part of the new system: A lexical environment hashref is explicitly passed around, as seen here: Plack::Request->new($psgi_env); vs the old implicit grab from the environment: CGI->new; You might hope to be escaping some of th action-at-a-distance badness of global variables, like having a change to environment alter your object *after the object is created.* You might like avoid having changes made to your object avoid changing the global environment as well. Not only does Plack::Request require an explicit environment hash to be passed in, both nearly all methods are read-only, including a param() method inspired by a read/write method from of the same name. That's all good. This would all seem to speak to safety and simplicity for using Plack::Request, but the reality turns out to be far muddier than you might hope. I encourage you to down and run this short, safe [interactive perl script](/blog/2011/02/ which illustrates some differences. It shows that: * Plack::Request objects can be altered after they are created by changing the external environment. * Modifying a Plack::Request object can potentially alter the external environment hash (Something which explicitly does not allow). In effect, the situation with global variables is in some regards worse. Plack::Request provides the impression that there is a move away from action-at-distance programming, but the fundamental properties of being affected by global changes and locally creating them are still present. On the topic of surprising read/write behavior in Plack::Request, you may also interested to note the behavior of query\_parameters(), body\_parameters() and parameters() is not consistent in this regard. I submitted [tests and suggestion to clarify this](, although that contribution has not yet been accepted. Here's the deal: The hashref returned by query\_parameters() and body\_parameters() and parameters() are all read/write -- subsequent calls to the same method return the modified hashref. However, modifying the hashes returned by body\_parameters() or query\_paremeters() does not modify the hashref returned by parameters(), which claims to be a merger of the two. It seems that either all the return values should be read-only, ( always returning the same values ), or if modifying them is supported then the parameters() hash should be updated when either of the body\_parameters() or query\_parameters() hashes are updated. ## Reflections An incoming HTTP request to your server is by it's nature read-only. It's analogous to a paper letter being delivered to you be postal mail. It's a perfect application for an immutable object object design that Yuval Kogman [eloquently advocates for]( Plack::Request comes close to implementing the idea with mostly read-only accessors, but falls short. The gap it leave unfortunately carries forward some possibilities for action-at-distance cases that have been been sources of bugs in the past. I'd like to see Plack::Request, or some alternative to it, with the holes plugged: It should copy the input, not modify it by reference, and parameter related methods should also return copies rather than reference to internal data structures.
rocks near Greens Fork, Indiana
In my [previous post]( I summarized the current state of Percent-encoding in Perl. One of my conclusions was that the perfect percent-encoding solution would automatically handle UTF-8 encoding, using logic like this: utf8::encode $string if utf8::is_utf8 $string; Respected Plack author miyagawa quickly responded [in a response post]( to say that the above code approach is a bug, although the code pattern is already wide use as it is present in Catalyst, (and by extension CGI::Application and other frameworks) as well as Mojo. In one sense, he's right. The pattern goes against the advice found in the official [perlunifaq]( documentation which states that > It's better to pretend that the internal format is some unknown encoding, and > that you always have to encode and decode explicitly. In other words: don't use the "is_utf8()" function. Before drawing a conclusion whether this code pattern is the best design in *in practice,* let's look some related facts about the matter.
This post is about the current state of URI encoding in Perl. This is the problem space of being able to safely pass arbitrary text into and out of a URI format. If you've even seen a space in URL represented as "%20", that's the topic of the moment. The best general introduction I've found on the topic is the [Wikipedia page on Percent-encoding]( RFCs on the topic include the 2005 [RFC 3986]( that defined the generic syntax of URIs. It replaces [RFC 1738]( from 1994 which defined Uniform Resource Locators (URLs), and [RFC 1808]( from 1995 which defined Relative Uniform Resource Locators. Sometimes this transformation is called "URI escaping" and sometimes it's refered to "URL encoding". RFC 3986 clarified the naming issue: > "In general, the terms "escaped" and "unescaped" have been replaced with > "percent-encoded" and "decoded", respectively, to reduce confusion with other > forms of escape mechanisms." Elsewhere it's clarified that percent encoding applies to all [URIs, not just URLs]( I think the Perl community would do well to adopt "percent encode URI" and "percent decode URI" as ways to describe this process that is unambigous and in line with the RFC. There are two URI percent-encoding solutions in Perl that seem to be in the widest use. Both have a significant deficiency.
This post is about the current state of URI encoding in Perl. This is the problem space of being able to safely pass arbitrary text into and out of a URI format. If you've even seen a space in URL represented as "%20", that's the topic of the moment. The best general introduction I've found on the topic is the [Wikipedia page on Percent-encoding]( RFCs on the topic include the 2005 [RFC 3986]( that defined the generic syntax of URIs. It replaces [RFC 1738]( from 1994 which defined Uniform Resource Locators (URLs), and [RFC 1808]( from 1995 which defined Relative Uniform Resource Locators. Sometimes this transformation is called "URI escaping" and sometimes it's refered to "URL encoding". RFC 3986 clarified the naming issue: > "In general, the terms "escaped" and "unescaped" have been replaced with > "percent-encoded" and "decoded", respectively, to reduce confusion with other > forms of escape mechanisms." Elsewhere it's clarified that percent encoding applies to all [URIs, not just URLs]( I think the Perl community would do well to adopt "percent encode URI" and "percent decode URI" as ways to describe this process that is unambigous and in line with the RFC. There are two URI percent-encoding solutions in Perl that seem to be in the widest use. Both have a significant deficiency.
A midwinter box bike ride
Recently I've been reviewing how various Perl frameworks and modules generate HTTP headers. After reviewing several approaches, it's clear that there are two major camps: those which put the response headers in a specific order and those which don't. Surely one approach or the other would seem like it would be more spec-compliant, but RFC 2616 provides [conflicting guidance on this point]( The bottom line is that spec says that *"the order in which header fields with differing field names are received is not significant"*. But then it goes on to say that it is a "good practice" (and it puts "good practice" in quotes) to order the headers a particular way. So, without strict guidance from the spec about the importance of header ordering, it would be interesting to know if header order caused a problem in practice. The [Plack::Middleware::RearrangeHeaders]( documentation suggests there is some benefit to strict header ordering: *"to work around buggy clients like very old MSIE or broken HTTP proxy servers"* You might wonder what the big deal is-- why not just stick to the "good practice" recommendation all the time? The difference can be seen in the benchmarks provided by [HTTP::Headers::Fast]( By ignoring the good-practice header order, an alternate implementation was able to speed-up header generation to be about twice as fast. Considering that a web-app needs to generate a header on every single request, making header generation smaller and faster is potentially a tangible win, while also still being spec-compliant.
retired radiators
The [PDF spec]( includes an option to cause PDFs to open full screen when users open them. I'm a fan of the feature because it maximizes screen real estate and creates a simple, focused, experience for the PDF readers. Using this option is one of my two essential tips for creating an impactful newsletter targeted at being read online. The other tip is to use a "portrait" format document, to match the shape of most screens. Many PDF viewers respond to PDFs that are set to open full screen, but a number of PDF generation tools don't provide you option to set this preference when creating PDFs. I ran into this with [Xournal]( which is a nice application for Linux-based tablets, but offers no PDF export options. So I found a way to update a pre-existing PDF to set the preference to have it open full screen by default. The key here is that PDF is a text-based format, so preferences in it can be updated manually by opening and editing the file according to the PDF spec, or the same effect can be accomplished with automated tools. In this case, I found that I needed to update a line that started like this: << /Type /Catalog After `/Catalog`, this is all that needed to be added: /PageMode /FullScreen I automated this with a simple script that I named ``. It works for the simple case when no "PageMode" has been declared, as in the Xournal case. I don't expect it would update the PageMode properly if it was already declared. For a safer solution consider opening the PDF in a text editor to manually set "/PageMode /Fullscreen" on the initial `/Catalog` line. Alternatively, you could use a formal solution like [PDF::API3::Compat::API2]( which appears to have the features needed to solve this with Perl. Here's the contents of my little script to automate the update: #!/bin/sh # usage: file.pdf # The file will be modified in place so that it opens full screen. # The current approach is naive... it assumes no Initial View has been defined. # by Mark Stosberg perl -pi -e 's?<< /Type /Catalog?<< /Type /Catalog /PageMode /FullScreen?' $1
new bikes-at-work trailerGet off the couch and pull your weight--
There's bug with your name on it.
There were nearly 150 active entries in the bug tracker when I was approved recently as a new co-maintainer. As I had time in the evenings after the baby was asleep, I went through and reviewed every one of these bug reports. Many had already been addressed by Lincoln some time ago. Those were simply closed. Still, I found about 20 fairly ready-to-go patches, and those have now been processed and [released today as 3.45]( Whenever code changes were made, I also strived to make sure new automated tests were added that covered those cases. You may be surprised how many methods in have no automated tests for them at all. Now there are still about 50 open issues in the [ bug tracker]( For these, I have tried to use the subject line some summary indication of what is needed to move it forward, like "Needs Test: ", or "Needs Peer Review: " or "Needs Confirmation". Generally, I plan to wait patiently for volunteers to help with these. If you use, consider helping to move one of these forward.
melody-logo-mark-on-white-thumb-200x200-7.jpg Today Melody was announced as a fork of the perl-based Movable Type platform. I helped the Melody project as it prepared to launch, in part advising on how to best to relate to the Perl community.  One of the stated interests of Melody is to refactor the project to use CGI::Application, which I maintain. Tim Appnel has already spelled out  a vision of what a "CPANization" of Movable Type might look like, and I've looked in depth at what the initial steps towards using CGI::Application could be.

My own vision for Melody is a code base that's very focused on publishing and content management, with all the infrastructure outsourced to CPAN modules that are well-written, well-documented, and well-tested.  The collaboration between Melody and CPAN would be a two-way code flow. While there are more CPAN modules that Melody could make use of, there are number of pieces of Melody which should be packaged as independent modules on their own and released to CPAN. One example is the great "dirification" that already exists in Movable Type. This is the functionality that turns any given string of words into a reasonable representation in URLs. It seems like an easy problem on the surface, but Movable Type has a sophisticated solution that takes into account what it means to do this well across many different languages. I also couldn't find any existing CPAN module which already takes on this problem space, so I started to extract this out of Movable Type myself and published a draft of String::Dirify. For that initial release, I ripped out all the fancy multi-language support, and there is still more significant work to be done to untangle this layer from from Movable Type. ( If you want to pick up that project and work on it, there's also some discussion of testing String::Dirify).

While Movable Type already had an open source release, I expect Melody to have  a more adventerous evolution, and I look forward to it becoming a shining star in the Perl community, not just for the exterior functionality, but also because internals have an opportunity to become an example of best practices.
Derrek on the EZ-Sport recumbent Titanium [1.01 was released]( recently. The new release include includes a README with clearer instructions on how to install Titanium if you are not already familiar with installing modules from [CPAN]( Initial feedback on Titanium has been positive. A couple recent quotes from users: "Titanium is much, much simpler [than Catalyst] and has the advantages that entails." [1](, "CGI::Appplication and Titanium (including modules like HTML::Template and HTML::FillInForm) are simple to use, work with all of the authentication stuff that I interface with, and scale perfectly for the number of users that I typically have." [2]( Simplicity is a goal of Titanium and our feedback confirms our success with it.
This weekend I spent some quality time with HTTP Cookie Specs ( [RFC 2109]( and [RFC 2695]( ), and looked closely how at the cookie parsing and handling is done in three Perl frameworks: [Titanium](, [Catalyst]( and [Mojo]( Titanium uses [CGI::Cookie]( by default, while Catalyst uses [CGI::Simple::Cookie]( and Mojo uses built-in modules including [Mojo::Cookie::Request]( I'll look at these solutions through the filters of Standards, Security, and Convenience. ## Standards: Max-Age, Set-Cookie2 and commas Max-Age is cookie attribute which gives the expiration time as a relative value. This is considered a more secure replacement for the "Expires" header, which gives the time as an absolute value, making it vulnerable to clock skew on the user's systems. and Mojo support it, but CGI::Simple does not. This is potentially an issue for Catalyst users, if they believe they have Max-Age support because the documentation refers them to CGI::Cookie, but they actually don't because they are using CGI::Simple::Cookie. Set-Cookie2 is a standard from 2000 to replace Set-Cookie, which became a standard in 1997. Mojo is the only of the three that supports it. However, Set-Cookie2 [never caught on]( Firefox 3 doesn't even support it, and neither does IE 6. Still, I like the idea of deciding for myself about supporting new standards, rather than having tools that only support older standards. Mojo wins here. The RFCs say that servers should accept a comma as well as semicolon between cookie values. and Mojo comply here, CGI::Simple does not. (I've submitted a [patch to address this](, along with a few other places I felt CGI::Simple cookie parsing lagged ## Security CGI::Simple cookies are potentially less secure because they lack "Max-Age" support. Mojo's cookie implementation appears to be vulnerable to an injection attack where untrusted data in a cookie value can write a new HTTP body. I have notified the developers of my findings there. and CGI::Simple both avoid the injection attack by URI-encoding the cookie values, (a spec-compliant solution). ## Convenience and CGI::Simple share several convenient user interface features which Mojo currently lacks. They allow you to set multiple-values for a single cookie, including setting a hashref. They also provide a convenient shorthand for giving expiration times, like "+10m" for "10 months in the future". Mojo lacks these features. If you have a Catalyst app that uses the multiple-values features, a port to Mojo could mean a painful cookie transition, since Mojo does not have a built-in understanding of the format uses to store cookie values. (This detail is not dictated by the cookie spec so both value formats are "spec compliant"). ## Conclusions Sebastian Riedel, the Mojo author, promotes Mojo as being focused on standards. From my findings here, I have to say that I agree that Mojo is a leader here, currently at the expense of a potentially serious security issue, and lacking some usability features that the others offer. CGI::Simple has a reputation but being a lighter and better enigeneered version of Certainly the overall the design and focus of CGI::Simple is an improvement. But the reality is that CGI::Simple was forked from in 2001. has received many improvements since then including improved cookie handling, like adding support for "Max-Age". However, CGI::Simple doesn't seem to make a point of tracking and merging improvements that originate in CGI::Simple is perhaps more like a lighter, tighter alternative to as it existed several years ago. The mature-but-maligned comes out fairing the best for cooking handling in my opinion. It did not have any of the potential security issues I found with the other two, and it has a range of convenient methods for cookie access. But as a final note, I encourage to you check with the specific projects for the most current information, as some of the deficiencies I found here may already be addressed.
Grover's recumbent, side view There's a lot of trash talk among professional web programmers regarding using vanilla CGI, like Stevan Little's [recent comment]( *"There is no excuse to still use vanilla CGI as it is simply just a waste of resources"*. As an experienced professional website developer myself, I find that CGI has its place. First, let's recap what we're talking about.
What's the minimum time it could take a serve a web page using a given Perl-based solution using CGI? What's the minimum amount of memory it would take? To check the relative differences between several options listed below, I made the simplest possible "Hello World" page, and benchmarked the time and memory used to serve it. To create a base line, I also measured the results for an empty Perl script that just prints "Hello World". The result summary is after the jump.
Mojo Perl web framework logo Here's my take on [Mojo]( 0.8.7, a new web framework for Perl. The primary author of Mojo, Sebastian Riedel was once as a primary contributor to Catalyst. There are clearly some similarities, and it's easy for me to see Mojo as an evolution of Catalyst. One major difference with Catalyst sparked my interest in Mojo. Catalyst now depends on Moose among other things, with a very long overall dependency change. How long? I downloaded Catalyst 5.8 along with all of it's non-core Perl module dependencies. The result was over **250** modules, not counting the Catalyst modules themselves, or anything in the Test::* name space. Bleh. *What to see for yourself? Get my "self-contained" patch out of the local::lib tracker and run the following. It will install Catalyst and all it non-core dependencies into a "Catalyst" folder. Be aware, this could take half an hour or more... (The TinyURL points a Catalyst 5.8 tarball)* perl -MCPAN -Mlocal::lib=--self-contained,Catalyst \ -e 'install ""' Leaning on dependencies can be a great thing. It works well when you are able to outsource part of your needs to an external module that is already well written, documentation and tested. I'm sure there's some of that happening in the Catalyst dependency chain. But there's also a good deal of duplication, as different authors solve things different ways. For example:, Sub::Exporter and Moose::Exporter all serve the same function, and are dependencies somewhere along the way. Class::Accessor::Fast competes with MooseX::Emulate::Class::Accessor::Fast. And this is where a long dependency chain can start to look and feel like bloat, and it can be difficult to overcome if the owners of the dependencies don't share the projects preferences about how to export subroutines or build accessors. It could perhaps be said though that Mojo suffers from re-using too little. Mojo::Base is yet-another accessor-generation solution, like Class::Accessor. The potential I see in Mojo is the summed up in the following: * No dependency chain, for less complexity and easy deployment * Built-in support for several backends, for portability * A rewrite of HTTP request and response objects, as a sanely designed evolution of what has been used for * No ties to a specific framework design beneath the server/response-request object layer, for flexibility and potential code sharing between frameworks based on it. It is this last item that has allowed me to ignore the bundled Mojolicious framework for this review-- It's not required for use with Mojo and could use it's own review. Overall, I feel positive about the Mojo project, although I have no current plans to quit developing with [Titanium]( projects myself. In theory, they could be used together. Mojo could provide the backend-server support and query object, and CGI::Application could run inside the Mojo handler() in much the same way CGI::Application apps can run in a mod_perl handler. Now, CGI::Application can already run under various servers and with different query objects, so whether or not you'd actually want to use CGI::Application with Mojo is left up to the reader. I don't recommend using Mojo yet-- it needs more documentation and tests for my taste. It's Mojo's clean, scalable and extensible design that make it a project worth following. I'll be keeping my eye on Mojo. *This post is being [discussed on](*