<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Mark Stosberg</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/" />
    <link rel="self" type="application/atom+xml" href="http://mark.stosberg.com/blog/atom.xml" />
    <id>tag:mark.stosberg.com,2008-09-06:/blog/2</id>
    <updated>2012-09-15T01:55:21Z</updated>
    <subtitle>balancing simplicity and technology in Richmond, Indiana</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.35-en</generator>

<entry>
    <title>Running Perl tests in parallel with &quot;prove&quot;, with some exceptions</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2012/08/running-perl-tests-in-parallel-with-prove-with-some-exceptions-2.html" />
    <id>tag:mark.stosberg.com,2012:/blog//2.357</id>

    <published>2012-08-28T01:18:27Z</published>
    <updated>2012-09-15T01:55:21Z</updated>

    <summary> Perl&#8217;s prove is a great tool for running TAP-based test suites. I use it for a large project with about 20,000 tests. Our test suite runs multiple times per day. At that scale and frequency, the run time is...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/7648079964/" title="Bicycling saves lives by Mark Stosberg, on Flickr"><img src="http://farm9.staticflickr.com/8001/7648079964_0a6500a9fc_m.jpg" width="240" height="180" alt="Bicycling saves lives"></a></div>

<p>Perl&#8217;s <a href="https://metacpan.org/module/prove">prove</a> is a great tool for running <a href="http://testanything.org/">TAP</a>-based test suites. I use it for a large project with about 20,000 tests. Our test suite runs multiple times per day.  At that scale and frequency, the run time is a concern.</p>

<p>To address that, <code>prove</code> has <code>-j</code> option, allowing you to run tests in parallel. Unfortunately, in our large test suite not all tests are &#8220;parallel ready&#8221;. I needed a way to run <em>most</em> tests in parallel, but mark a few to be run in sequence. Ideally, I&#8217;d like these tests to be part of the same run as the parallel tests. </p>

<p>After much digging around, I found that <code>prove</code> has had support for adding exceptions to parallel tests runs since in 2008. The feature has just never been documented in <code>prove</code>, making it difficult to discover.  Here&#8217;s an example which works with the currently released version of <code>prove</code>, 3.25: </p>

<pre><code># All tests are allowed to run in parallel, except those starting with "p"
--rules='seq=t/p*.t' --rules='par=**'
</code></pre>

<p>For more details and documentation, you can see my related <a href="https://github.com/Perl-Toolchain-Gang/Test-Harness/pull/5">pull request</a>. You can leave a comment there to advocate for or against merging the the patch, or you can download the <a href="https://github.com/Perl-Toolchain-Gang/Test-Harness/pull/5.patch">raw patch</a> directly to apply to your own copy of the Test-Harness-3.25 distribution. </p>

<p><em>*UPDATE 9/14/2012: *</em> It turns out it doesn&#8217;t work quite as I thought. I tried tweaking some of the internals of <code>prove</code> further, but hit a bug. See the details in <a href="http://www.nntp.perl.org/group/perl.qa/2012/08/msg13251.html">my follow-up to the Perl-QA list</a>. I changed approached and started working on making my exceptional tests parallel-safe. Perhaps I can cover some of the techniques I used for that in a future post. </p>
]]>
        

    </content>
</entry>

<entry>
    <title>Plack::Request: not as read-only as it might seem</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2011/02/plackrequest-not-as-read-only-as-it-might-seem.html" />
    <id>tag:mark.stosberg.com,2011:/blog//2.350</id>

    <published>2011-02-05T14:24:16Z</published>
    <updated>2011-02-16T03:08:40Z</updated>

    <summary> For me, one of the promises of PSGI and Plack is to get away from programming with global variables in Perl, particularly being able to modify the request object after it&#8217;s been created. Long ago, CGI.pm replaced using a...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cgipm" label="CGI.pm" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="plack" label="plack" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="psgi" label="psgi" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/5112912016/" title="leaves falling off the tree by Mark Stosberg, on Flickr"><img src="http://farm2.static.flickr.com/1414/5112912016_997ee5d5c0_m.jpg" width="180" height="240" alt="leaves falling off the tree" /></a></div>

<p>For me, one of the promises of <a href="http://plackperl.org">PSGI and Plack</a> is
to get away from programming with global variables in Perl, particularly
being able to modify the request object after it&#8217;s been created. Long
ago, CGI.pm replaced using a global <pre>%FORM</pre> hash with a &#8220;query
object&#8221;, but it essentially has been used as always-accessible
read/write data structure.</p>

<p>It would appear at first look that improving this might a fundamental
part of the new system: A lexical environment hashref is explicitly
passed around, as seen here:</p>

<p><code>Plack::Request->new($psgi_env);</code></p>

<p>vs the old implicit grab from the environment:</p>

<p><code>CGI->new;</code></p>

<p>You might hope to be escaping some of th action-at-a-distance badness of global
variables, like having a change to environment alter your object <em>after the
object is created.</em>  You might like avoid having changes made to your object
avoid changing the global environment as well.</p>

<p>Not only does Plack::Request require an explicit environment hash to be
passed in, both nearly all methods are read-only, including a param()
method inspired by a read/write method from CGI.pm of the same name.
That&#8217;s all good.</p>

<p>This would all seem to speak to safety and simplicity for using Plack::Request,
but the reality turns out to be far muddier than you might hope. I encourage
you to down and run this short, safe <a href="/blog/2011/02/plack_vs_cgi.pl">interactive perl script</a> which
illustrates some differences. It shows that:</p>

<ul>
<li>Plack::Request objects can be altered after they are created by changing the
external environment.</li>
<li>Modifying a Plack::Request object can potentially alter the external environment hash
(Something which CGI.pm explicitly does not allow).</li>
</ul>

<p>In effect, the situation with global variables is in some regards worse.
Plack::Request provides the impression that there is a move away from
action-at-distance programming, but the fundamental properties of being
affected by global changes and locally creating them are still present.</p>

<p>On the topic of surprising read/write behavior in Plack::Request, you may also
interested to note the behavior of <code>query_parameters()</code>, <code>body_parameters()</code> and
<code>parameters()</code> is not consistent in this regard. I submitted <a href="https://github.com/markstos/Plack/commit/0aefed0900eeb2281eb0fe925bd18dfd60076b1e">tests and
suggestion to clarify
this</a>,
although that contribution has not yet been accepted.</p>

<p>Here&#8217;s the deal: The hashref returned by <code>query_parameters()</code> and
<code>body_parameters()</code> and <code>parameters()</code> are all read/write &#8212; subsequent calls to
the same method return the modified hashref.</p>

<p>However, modifying the hashes returned by <code>body_parameters()</code> or
<code>query_paremeters()</code> does not modify the hashref returned by <code>parameters()</code>, which
claims to be a merger of the two.</p>

<p>It seems that either all the return values should be read-only, ( always
returning the same values ), or if modifying them is supported then the
<code>parameters()</code> hash should be updated when either of the
<code>body_parameters()</code> or
<code>query_parameters()</code> hashes are updated.</p>

<h2>Reflections</h2>

<p>An incoming HTTP request to your server is by it&#8217;s nature read-only. It&#8217;s
analogous to a paper letter being delivered to you be postal mail.</p>

<p>It&#8217;s a perfect application for an immutable object object design that Yuval
Kogman <a href="http://blog.woobling.org/2009/05/immutable-data-structures.html">eloquently advocates
for</a>.
Plack::Request comes close to implementing the idea with mostly read-only
accessors, but falls short. The gap it leave unfortunately carries forward some
possibilities for action-at-distance cases that have been been sources of bugs
in the past. I&#8217;d like to see Plack::Request, or some alternative to it, with
the holes plugged: It should copy the input, not modify it by reference, and
parameter related methods should also return copies rather than reference to
internal data structures.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Best practice for handling UTF-8 when percent-encoding? </title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2010/12/best-practice-for-handling-utf-8-when-percent-encoding.html" />
    <id>tag:mark.stosberg.com,2010:/blog//2.346</id>

    <published>2010-12-20T03:47:29Z</published>
    <updated>2010-12-20T04:02:12Z</updated>

    <summary> In my previous post I summarized the current state of Percent-encoding in Perl. One of my conclusions was that the perfect percent-encoding solution would automatically handle UTF-8 encoding, using logic like this: utf8::encode $string if utf8::is_utf8 $string; Respected Plack...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="percentencoding" label="percent-encoding" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="unicode" label="Unicode" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="utf8" label="UTF-8" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/5084789203/" title="rocks near Greens Fork, Indiana by Mark Stosberg, on Flickr"><img src="http://farm5.static.flickr.com/4113/5084789203_6bf6e7ff29_m.jpg" width="180" height="240" alt="rocks near Greens Fork, Indiana" /></a></div>

<p>In my <a href="http://mark.stosberg.com/blog/2010/12/percent-encoding-uris-in-perl.html">previous post</a> I summarized the current state of Percent-encoding
in Perl. One of my conclusions was that the perfect percent-encoding solution
would automatically handle UTF-8 encoding, using logic like this:</p>

<p><code>utf8::encode $string if utf8::is_utf8 $string;</code></p>

<p>Respected Plack author miyagawa quickly responded <a href="http://bulknews.typepad.com/blog/2010/12/re-percent-encoding-uris-in-perl-mark-stosberg.html">in a response post</a> to say that the above code approach is a bug, although the 
code pattern is already wide use as it is present in Catalyst, CGI.pm (and by extension CGI::Application and other frameworks) as well as Mojo.</p>

<p>In one sense, he&#8217;s right. The pattern goes against the advice found in the official <a href="http://perldoc.perl.org/perlunifaq.html#What-is-%22the-UTF8-flag%22%3f">perlunifaq</a> documentation which states that</p>

<blockquote>
  <p>It&#8217;s better to pretend that the internal format is some unknown encoding, and
  that you always have to encode and decode explicitly.</p>
</blockquote>

<p>In other words: don&#8217;t use the &#8220;is_utf8()&#8221; function.</p>

<p>Before drawing a conclusion whether this code pattern is the best design in <em>in practice,</em>
let&#8217;s look some related facts about the matter. </p>
]]>
        <![CDATA[<ul>
<li>The <a href="http://www.w3.org/International/O-URL-code.html">W3C is clear</a> that it is a best practice to convert text to UTF-8 before percent encoding it.</li>
<li>However, as Miyagawa points out, only newer protocols, since about 2005, are expected to follow that. The HTTP protocol is far older, and it is valid to pass binary data through HTTP query strings.</li>
<li>But conveniently, Perl reads binary data as 8-bit characters. You can read in <code>/usr/bin/perl</code>, pass it to <code>uri_escape</code>, and it will percent-encode it just fine&#8212; without blowing up due to the &#8220;hi-bit&#8221; characters that some UTF-8 characters correspond to.</li>
<li>In practice, the approach of automatic encoding used by Catalyst and CGI.pm has not been problematic. From reviewing both both bug queues, I don&#8217;t see any open or previously-closed bugs that haven caused by the current behavior. It does seem reasonable and logical that it should be safe to UTF-encode text that is marked as UTF-8.</li>
<li>People would reasonably expect the percent-encode/decode process to be a symmetric round trip&#8212; you should get back exactly the data that you started with. When UTF-8 encoding is added to the process, this isn&#8217;t necessarily the case, since UTF-8 encoding happens only before percent-encoding. The reverse does not happen when percent-decoding. The one way trip to UTF-8 is probably what you wanted, but since the conversion is automatic, the intent of the programmer is not explicit.</li>
<li>The most popular percent-encoding solutions for Perl&#8212; CGI.pm and URI::Escape&#8212; both pre-date proper Unicode support in Perl, and both include built-in code to re-implement UTF-8 functionality in case you are using Perl 5.6 which lacks proper Unicode support. So in the context of designing those modules, it wasn&#8217;t really an option to tell people just call <code>encode($string)</code> if they wanted to UTF-8 encode their text before it was URI-encoded.</li>
</ul>

<p>So, then, which of these two designs represents a best practice for handling
UTF-8 encoding in combination with percent-encoding in Perl?</p>

<ol>
<li>First option: The popular solution used in practice to UTF-8-encode data if the UTF-8 flag is set. The approach automatically handles
the recommended practice of UTF-8 encoding so users don&#8217;t have to think about it. </li>
<li>Second option: UTF-8 handling should not built-into a percent-encoding solution. Following the advice of <a href="http://www.w3.org/International/O-URL-code.html">perlunifaq</a>, an ideal solution would not check the UTF-8 flag, but instead would instead offer clear documentation that advises the user about the best practices about possibly UTF-8 encoding text before it is percent-encoded. This design cleanly externalizing all the character encoding issues that sometimes get bundled with percent-encoding, such as translating from EBCDIC or UTF-16 surrogate pairs. It provides users a bit of education about UTF-8 best practices which they may able to apply in other areas. As long as UTF-8 handling is attempted to be automatic, programmers can continue to stay in the dark that the best practice is for them to explicitly handle character encodings themselves.
The solution might be as simple as documenting this in the synopsis:</li>
</ol>

<p><code>
   use Encode 'encode_utf8';
   $uri = uri_percent_encode(encode_utf8($string));
</code></p>

<p>Note that URI::Escape currently essentially already has the <em>code</em> part of the second alternative in place. It offers percent-encoding without any character encoding, as well as <code>uri_escape_utf8</code>, which is basically just sugar for the code sample above.The <a href="http://search.cpan.org/~gaas/URI-1.56/URI/Escape.pm">URI::Escape documentation</a> explains in precise technical terms what the two alternatives do, but offers very little guidance on which method to choose&#8212;the plain one or the UTF-8 one? It doesn&#8217;t address which reflects a best practice, or cover the possiblity of any drawbacks to choosing the UTF-8 version. A Google code search shows that there are about 10x more hits for &#8220;<a href="http://www.google.com/codesearch?hl=en&amp;lr=&amp;q=lang%3Aperl+file%3A^.*%5C.pm%24+uri_escape%5C%28&amp;sbtn=Search">uri_escape()</a>&#8221; vs &#8220;<a href="http://www.google.com/codesearch?hl=en&amp;lr=&amp;q=lang%3Aperl+file%3A^.*%5C.pm%24+uri_escape_utf8%5C%28&amp;sbtn=Search">uri_escape_utf8()</a>&#8221;. I suspect that many of these cases would benefit from being able to handle UTF-8 characters, but the programmers weren&#8217;t really educated about when to choose that option. Considering that masses of programmers aren&#8217;t suddenly going to become UTF-8 experts, it&#8217;s clear to me that an ideal solution should go with one of the option above: Either try to handle UTF-8 automatically, or offer strong guidance in documentation on percent-encoding about best practices regarding character encodings. I think with clear, informative documentation, the second option could be the better way to go.</p>

<p>What do you think?</p>
]]>
    </content>
</entry>

<entry>
    <title>Percent-encoding URIs in Perl</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2010/12/percent-encoding-uris-in-perl.html" />
    <id>tag:mark.stosberg.com,2010:/blog//2.342</id>

    <published>2010-12-17T16:45:45Z</published>
    <updated>2010-12-18T18:55:08Z</updated>

    <summary>This post is about the current state of URI encoding in Perl. This is the problem space of being able to safely pass arbitrary text into and out of a URI format. If you&#8217;ve even seen a space in URL...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="uri" label="URI" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p>This post is about the current state of URI encoding in Perl. This is the
problem space of being able to safely pass arbitrary text into and out of a URI
format. If you&#8217;ve even seen a space in URL represented as &#8220;%20&#8221;, that&#8217;s the
topic of the moment.</p>

<p>The best general introduction I&#8217;ve found on the topic is the <a href="http://en.wikipedia.org/wiki/Percent-encoding">Wikipedia page
on Percent-encoding</a>.</p>

<p>RFCs on the topic include the 2005 <a href="http://www.ietf.org/rfc/rfc3986.txt">RFC 3986</a> 
that defined the generic syntax of
URIs. It replaces <a href="http://www.ietf.org/rfc/rfc1738.txt">RFC 1738</a> 
from 1994 which defined Uniform Resource Locators (URLs), 
and <a href="http://www.rfc-editor.org/rfc/rfc1808.txt">RFC 1808</a> from 1995 which defined
Relative Uniform Resource Locators. Sometimes this transformation is called
&#8220;URI escaping&#8221; and sometimes it&#8217;s refered to &#8220;URL encoding&#8221;. RFC 3986 clarified
the naming issue:</p>

<blockquote>
  <p>&#8220;In general, the terms &#8220;escaped&#8221; and &#8220;unescaped&#8221; have been replaced with
  &#8220;percent-encoded&#8221; and &#8220;decoded&#8221;, respectively, to reduce confusion with other
  forms of escape mechanisms.&#8221;</p>
</blockquote>

<p>Elsewhere it&#8217;s clarified that percent encoding applies to all <a href="http://www.damnhandy.com/2009/08/26/url-vs-uri-vs-urn-in-more-concise-terms/">URIs, not just URLs</a>.</p>

<p>I think the Perl community would do well to adopt &#8220;percent encode URI&#8221; and
&#8220;percent decode URI&#8221; as ways to describe this process that is unambigous and in
line with the RFC.</p>

<p>There are two URI percent-encoding solutions in Perl that seem to be in the
widest use. Both have a significant deficiency. </p>
]]>
        <![CDATA[<p><a href="http://www.flickr.com/photos/markstos/5221908601/" title="fallen tree in Kentucky by Mark Stosberg, on Flickr"><img src="http://farm6.static.flickr.com/5047/5221908601_8774ed800f_z.jpg" width="169" height="640" alt="fallen tree in Kentucky" align="right" style="margin-left: 15px; margin-bottom: 15px" /></a></p>

<h2>Percent-encoding with CGI.pm</h2>

<p>The first is
<code>CGI::Util</code> which provides <code>escape()</code> and
<code>unescape()</code> as a pair. This solution has a lot going for it&#8212; it&#8217;s
been in the core for years, it works back to Perl 5.6, it automatically handles
UTF-8 encoding, and it handles some edge cases like EBCIDIC encoding and
UTF-16 surrogate pairs. Further you can use escape() and unescape() without
using the rest of CGI.pm or ever creating a CGI.pm object. There&#8217;s just one
major deficiency: <em>These methods have never been documented!</em> Many take advantage
of them by using CGI.pm directly or indirectly, as CGI.pm uses them internally.
A few people have found them and use them directly. As someone with commit access
to the CGI.pm repo, I&#8217;ll be documenting them shortly, once I&#8217;m done with the detour
that became this post. </p>

<h2>Percent-encoding with URI::Escape</h2>

<p>Probably the most intentionally widely used module for URI percent encoding is
<a href="http://search.cpan.org/perldoc?URI::Escape">URI::Escape</a>. URI::Escape is not
in the core, but the URI distribution depends only on MIME::Base64, and that
module is not actually needed for the URI::Escape functionality. Like CGI.pm,
URI::Escape also advertises support back to Perl 5.6.1. It does not handle
EBIDIC or UTF-16 surrogate pairs, but as I&#8217;ll explain later, it&#8217;s questionable
whether those abilities are truly desirable to be built-in to a percent-encoding
solution. The deficiency with URI::Escape is that doesn&#8217;t handle UTF-8 automatically like most other solutions do.
Many perl scripts and modules have called
<code>URI::Escape::uri_escape</code>
expecting that it will always &#8220;just work&#8221; for encoding all text.
Instead, you have to explictly ask for UTF-8 handling by calling
<code>uri_escape_utf8()</code> instead.  To credit URI::Escape, it has clearly
documented how it behaves in this regard, but it seems like a missed
opportunity to handle UTF-8 input automatically. By contrast, most other
solutions handle either case automatically with a single line like this:</p>

<pre><code>utf8::encode $_[0] if utf8::is_utf8 $_[0];
</code></pre>

<p>RFC 3986 is quite clear that UTF-8 encoding should be part of the solution: </p>

<blockquote>
  <p>&#8220;Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then
  each octet of the corresponding UTF-8 sequence must be percent-
  encoded to be represented as URI characters&#8221;</p>
</blockquote>

<p>URI::Escape is likely suffering from being far older than RFC 3986, and added a
new method specific to UTF-8 to keep <code>uri_escape()</code> perfectly backwards compatible. 
In hindsight from 2010, I think that was an unfortunate choice.</p>

<h2>Summary of all known percent-encoding solutions for Perl</h2>

<p>I researched further to see what other percent-encoding solutions for Perl and how they differ. Here&#8217;s what I found, including CGI.pm and URI::Escape again for completeness.</p>

<h3><a href="http://search.cpan.org/perldoc?CGI::Util">CGI::Util</a></h3>

<p>Has the benefit of being in the core, but the drawback of being undocumented as of version 3.50.</p>

<ul>
<li><em>Names:</em>  <code>escape / unescape</code></li>
<li><em>Min Perl version:</em> 5.6.0</li>
<li><em>Handles UTF-8 handling:</em> Yes, on Perl 5.8 and newer</li>
<li><em>Notes:</em> also handles EBCIDIC and UTF-16 surrogate pairs.</li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?CGI::Simple">CGI::Simple</a></h3>

<p>CGI::Simple 1.112 Appears to have a bug regarding RFC 2396, section 2.2, concerning reserved
characters. It explicitly translates spaces to &#8220;+&#8221;, unlike most other solutions
here which translate it to %20. It also lacks automatic UTF-8 handling. It&#8217;s
implementation is notably not compatible with the one in CGI.pm, as some would
assume. </p>

<ul>
<li><em>Names:</em> <code>url_encode / url_decode</code></li>
<li><em>Min Perl Version:</em> 5.6.1</li>
<li><em>Handles UTF-8 encoding:</em> No.</li>
<li><em>Notes</em>: The implemention here isn&#8217;t the same as a second one in the distribution, in CGI::Simple::Util.</li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?CGI::Simple::Util">CGI::Simple::Util</a></h3>

<p>A second percent-encoding in CGI::Simple 1.112, it is not compatible with CGI.pm&#8217;s
implementation either. Compare:</p>

<pre><code>CGI::escape                  å -&gt; %C3%A5%20X
URI::Escape::uri_escape_utf8 å -&gt; %C3%A5%20X
CGI::Simple-&gt;url_encode      å -&gt; %E5+X
CGI::Simple::Util::escape    å -&gt; %E5%20X
</code></pre>

<ul>
<li><em>Names:</em> <code>escape / unescape</code></li>
<li><em>Min Perl Version</em>: 5.6.1</li>
<li><em>Handles UTF encoding</em>: No.</li>
<li><em>Notes</em>: Handles EBCIDIC encoding, inherited from CGI.pm before the fork. </li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?Mojo::Util">Mojo::Util</a></h3>

<p>Mojo::Util 0.999941 provides a modern, simple implementation with automatic UTF-8
encoding. My gripes with it are that the names say &#8220;url&#8221; and &#8220;escape&#8221; instead
of &#8220;uri&#8221; and &#8220;encode&#8221; to follow the RFCs more closely. It also doesn&#8217;t allow
you to use a rather normal syntax: <code>Mojo::Util::url_escape('å').</code>.
That&#8217;s because Mojo has uses the unconventional impementation of modify the
input by reference instead of returning a modified copy. Presumably this is
done for performance.</p>

<ul>
<li><em>Names:</em> <code>url_escape / url_unescape</code></li>
<li>Min Perl Version: 5.8.7.</li>
<li><em>Handles UTF-8 encoding:</em> Yes.</li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?Tie::UrlEncoder">Tie::UrlEncoder</a></h3>

<p>Tie::UrlEncoder 0.02 provides a unique interface through a %urlencode hash. However,
it doesn&#8217;t provide a decoding routine. Basic UTF-8 tests pass for it, but the
solution employed is unothorodox. Instead of calling UTF-8 related functions,
it calls <code>use bytes;</code>. Official Perl documentation is clearly
opinionated this approach. In
<a href="http://search.cpan.org/perldoc?perlunifaq">perlunifaq</a>, it says plainly
<em>&#8220;Don&#8217;t use it.&#8221;</em>  in regard to <code>use bytes;</code>.</p>

<ul>
<li><em>Names:</em> <code>%urlencode</code></li>
<li><em>Min Perl version:</em> 5.6.</li>
<li><em>Handles UTF-8 encoding:</em> The implemention does not follow best practices. See above.</li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?URI::Encode">URI::Encode</a></h3>

<p>Not be confused with URI::Escape, URI::Encode is meant to be a newer and
simpler take on the problem space. It offers automatic UTF-8 encoding, and
includes an option on whether are not to include reserved characters&#8212; The
option to not encode reserved characters is nice for those who know what they
are doing. Unfortunately, it has a poor object-oriented UI. It offers a
constructor which does nothing, when the reserved characters option could be
used as option there.  Then, it doesn&#8217;t document that you can call the key methods as
class methods to bypass the do-nothing constructor. While it also offers a
procedural interface, it&#8217;s implemented in terms of calling the do-nothing
constructor every time, adding an unnecessary penalty.</p>

<ul>
<li><em>Names:</em> <code>uri_encode / uri_decode</code></li>
<li><em>Min Perl version:</em> Perl 5.8.1.</li>
<li><em>Handles UTF-8 encoding:</em> Yes</li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?URI::Escape">URI::Escape</a></h3>

<p>URI::Escape provides three APIs, two that don&#8217;t handle UTF-8 encoding and one
that does. It&#8217;s popular, works well and is well documented. It&#8217;s main drawback
is that UTF-8 encoding is not automatic in <code>uri_escape()</code> and as a result and has
not been used by many applications, when UTF-8 support here could have
otherwise been a free benefit. <code>uri_escape_utf8()</code> can be used for UTF-8 support.</p>

<ul>
<li><em>Names:</em> <code>uri_escape / uri_escape_utf8 / uri_unescape / %escapes</code></li>
<li><em>Min Perl version:</em> 5.6.1</li>
<li><em>Handles UTF-8 encoding:</em> Not automatically</li>
</ul>

<h3><a href="http://search.cpan.org/perldoc?URI::Escape::XS">URI::Escape::XS</a></h3>

<p>It sounds like a module that&#8217;s compatible with URI::Escape, only faster due to
a C-based XS implementation. It does benchmark to be much faster, and it is
somewhat compatible, but it lacks a <code>uri_escape_utf8</code> method, which
could be a valuable addition for better compatibility. Instead, it has a
<code>uri_escape</code> method that includes UTF-8 support automatically.  It
also has a higher minimum Perl requirement&#8212; Perl 5.8 vs 5.6, which is another
important difference that&#8217;s not documented. As an additional benfit,
<a href="http://search.cpan.org/perldoc?Any::URI::Escape">Any::URI::Escape</a> exists
which will use the XS version if it exists, and the Pure-Perl version otherwise.
The wrapper module also unfortunatley glosses over the difference in UTF-8 handling
in the XS version and the pure-Perl version. </p>

<ul>
<li><em>Names:</em> <code>uri_escape / uri_unescape</code></li>
<li><em>Min Perl version:</em> 5.8.1</li>
<li><em>Handles UTF-8 encoding:</em> Yes</li>
<li><em>Notes:</em> Requires a C-compiler (but very fast)</li>
</ul>

<h3>Recommmendations</h3>

<p>All the URI percent encoding solutions I reviewed had flaws, but the pieces are all
there to produce an optimal solution. Here&#8217;s my recommendation for designing a perfect
solution:</p>

<ul>
<li>Name the module URI::PercentEncode;</li>
<li>Name the functions <code>uri_percent_encode()</code> and <code>uri_percent_decode()</code></li>
<li>Return the changed value (don&#8217;t modify by reference)</li>
<li>Require at least Perl 5.8.1. Supporting older version is unnecessary baggage at this point.</li>
<li>Don&#8217;t build in support for getting data into UTF-8 beyond a simple call to utf8::encode(). Anything else belongs in the domain of the &#8220;Encode&#8221; module. If I&#8217;ve wrong about including support for UTF-16 surrogate pairs in a percent encoding solution, let me know.</li>
<li>Automatically handle UTF-8 encoding (like this: <code>utf8::encode $_[0] if utf8::is_utf8 $_[0];</code>)</li>
<li>Use faster XS-based code by default, but allow building a Pure-Perl version for those who need or want it. (Follow the model of Params::Validate here).</li>
</ul>

<p>Any volunteers?</p>

<p>That&#8217;s my take on URI percent-encoding in Perl. What do you have to add?</p>

<p><strong>Update:</strong> *See the <a href="http://bulknews.typepad.com/blog/2010/12/re-percent-encoding-uris-in-perl-mark-stosberg.html">reply by miyagawa</a> who states that this  code is a bug: <code>utf8::encode $_[0] if utf8::is_utf8 $_[0];</code>. It is used by CGI::Util, Mojo::Util  in the versions given above as well as in Catalyst. URI::Escape and URI::Encode do UTF-8 encoding without checking the UTF-8 flag.  He has more experience with UTF-8, and I  defer to his advice here. *</p>
]]>
    </content>
</entry>

<entry>
    <title>generating HTTP headers: sorted or unsorted?</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html" />
    <id>tag:mark.stosberg.com,2010:/blog//2.318</id>

    <published>2010-01-26T01:42:21Z</published>
    <updated>2010-01-26T03:06:51Z</updated>

    <summary> Recently I&#8217;ve been reviewing how various Perl frameworks and modules generate HTTP headers. After reviewing several approaches, it&#8217;s clear that there are two major camps: those which put the response headers in a specific order and those which don&#8217;t....</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="catalyst" label="Catalyst" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cgiapplication" label="CGI::Application" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cgipm" label="CGI.pm" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cgisimple" label="CGI::Simple" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="plack" label="Plack" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/4304868291/" title="A midwinter box bike ride by Mark Stosberg, on Flickr"><img src="http://farm5.static.flickr.com/4021/4304868291_0b6e3e2052_m.jpg" width="240" height="180" alt="A midwinter box bike ride" /></a></div> Recently I&#8217;ve been reviewing  how various Perl frameworks and modules generate HTTP headers. After reviewing several approaches, it&#8217;s clear that there are two major camps: those which put the response headers in a specific order and those which don&#8217;t. Surely one approach or the other would seem like it would be more spec-compliant, but RFC 2616 provides <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2">conflicting guidance on this point</a>. </p>

<p>The bottom line is that spec says that <em>&#8220;the order in which header fields with differing field names are received is not significant&#8221;</em>. But then it goes on to say that it is a &#8220;good practice&#8221; (and it puts &#8220;good practice&#8221; in quotes) to order the headers a particular way. So, without strict guidance from the spec about the importance of header ordering, it would be interesting to know if header order caused a problem in practice.</p>

<p>The <a href="http://search.cpan.org/perldoc?Plack::Middleware::RearrangeHeaders">Plack::Middleware::RearrangeHeaders</a> documentation suggests there is some benefit to strict header ordering: <em>&#8220;to work around buggy clients like very old MSIE or broken HTTP proxy servers&#8221;</em></p>

<p>You might wonder what the big deal is&#8212; why not just stick to the &#8220;good practice&#8221; recommendation all the time? The difference can be seen in the benchmarks provided by <a href="http://search.cpan.org/perldoc?HTTP::Headers::Fast">HTTP::Headers::Fast</a>. By ignoring the good-practice header order, an alternate implementation was able to speed-up header generation to be about twice as fast. Considering that a web-app needs to generate a header on every single request, making header generation smaller and faster is potentially a tangible win, while also still being spec-compliant. </p>
]]>
        <![CDATA[<p>So let&#8217;s look at who is doing what:</p>

<ul>
<li><a href="http://search.cpan.org/perldoc?CGI::Application">CGI.pm</a> does not order headers by default or have an option to do so. CGI.pm is used by default by CGI::Application and also by Mason and Jifty when they run under CGI.</li>
<li>CGI::Simple predictably models CGI.pm&#8217;s behavior</li>
<li><a href="http://search.cpan.org/perldoc?Dancer">Dancer</a> generates headers itself, without regard to order</li>
<li><a href="http://search.cpan.org/perldoc?Plack">Plack</a> is a mixed bag. If you generate your own headers, I believe they will be passed through unmodified, but if you use <a href="http://search.cpan.org/perldoc?Plack::Request">Plack::Request</a> they will be ordered (with no option to disable this). With Plack you also have the option to the the RearrangeHeaders Middleware if you want to be certain that your headers are in the &#8220;good practice&#8221; order. </li>
<li><a href="http://search.cpan.org/perldoc?Mojo">Mojo</a> generates its own headers, always in the &#8220;good practice&#8221; order, with no way to disable the ordering.</li>
<li><a href="http://search.cpan.org/perldoc?Catalyst">Catalyst</a>, Plack and others rely on <a href="http://search.cpan.org/perldoc?HTTP::Headers">HTTP::Headers</a> to generate headers. HTTP::Headers also currently always generates headers in good-practice order, although there has been <a href="http://www.mail-archive.com/libwww@perl.org/msg06573.html">some discussion</a> about adding an option to produce the headers in an unsorted order. </li>
<li>And finally, I found <a href="http://search.cpan.org/perldoc?HTTP::Headers::Fast">HTTP::Headers::Fast</a> which is used by HTTP::Engine and provides the user equal options to generate headers in a sorted or unsorted order.</li>
<li>For a bonus consultation, I looked at Rack, a web server written in Ruby sometimes paired with Rails, and one of the inspirations for Plack. It appears that that <a href="http://github.com/chneukirchen/rack/blob/master/lib/rack/handler/cgi.rb">Rack does not order HTTP headers either</a>. </li>
</ul>

<p>So which approach is best? I think what makes most sense to me is leave HTTP headers unsorted by default. This complies with the spec and can be accomplished with a simpler implementation and better performance. The good-practice ordering is a nice-to-have option if you know you need it or want the tradeoff.  </p>

<p>And should we be worried about header ordering issues in practice? With CGI.pm in use for over a decade, I can&#8217;t recall any header-ordering bugs of the hundreds of CGI.pm bugs I&#8217;ve triaged in that bug tracker. I&#8217;m interested to know if there are any real-world web clients or proxies that require or benefit from the good-practice ordering. </p>
]]>
    </content>
</entry>

<entry>
    <title>The cost of saving sent e-mail</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2010/01/the-cost-of-saving-sent-e-mail.html" />
    <id>tag:mark.stosberg.com,2010:/blog//2.317</id>

    <published>2010-01-20T01:17:21Z</published>
    <updated>2010-01-20T01:51:22Z</updated>

    <summary> I don&#8217;t tap my own phone. I don&#8217;t xerox postcards before I mail them back from vacation. I don&#8217;t take a voice recorder when I go out with friends. And I don&#8217;t have a copy machine at home to...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Simplicity" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tech" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="carbonfootprint" label="carbon footprint" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="email" label="email" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sustainability" label="sustainability" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="technology" label="technology" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/4263436474/" title="box biking at 10F by Mark Stosberg, on Flickr"><img src="http://farm5.static.flickr.com/4036/4263436474_48b5ecaa79_m.jpg" width="180" height="240" alt="box biking at 10F" /></a></div></p>

<p>I don&#8217;t tap my own phone.  I don&#8217;t xerox postcards before I mail them back from
vacation. I don&#8217;t take a voice recorder when I go out with friends. And I don&#8217;t
have a copy machine at home to duplicate hand written notes I may send.</p>

<p>But if I send a message of equal importance by e-mail, then my e-mail program
will automatically save a copy of every one of these messages.</p>

<p>E-mails I don&#8217;t need waste my time. They increase the time it takes to search
and browse through old email. They increase the time it takes for my email to
&#8220;sync&#8221; when I want to go offline.    To continue to save every e-mail I send
perpetuates the unsustainable myth that as long as our actions are online they
are &#8220;green&#8221;.</p>
]]>
        <![CDATA[<p>First the small action of saving a e-mail is amplified by the <a href="http://software.tekrati.com/research/9512/">1.5 billion
 people using e-mail globally</a>.
More saved e-mail means more disks to store the e-mail, larger backup systems
to handle the volume, and faster processors so that all the archived messaged
can be searched through efficiently. Since sent mail accumulates over time, an
increasing amount of resources are needed to handle the saved mail. In five
years there will be more people on the planet.  It&#8217;s likely a greater percentage
will be sending e-mail, and likely the e-mail volume will be even higher than
it as now. What is the energy cost, now and in the future, of storing so
much e-mail?</p>

<p>While we can debate the magnitude of the impact, more data storage equals more
resource consumption. In the United States, data centers already draw more
power than our TV usage [<a href="#references">1</a>], and data centers are only part of
the energy consumed by our increasingly networked life.  In turn, about 40% of
our water supply is devoted to generating power. Water is used in part to
provide cooling massive data centers. (Only 15% of the water supply is actually
used by the public). [<a href="#references">2</a>]</p>

<p>If you&#8217;re with me on this one, here&#8217;s one tip that could make big reduction in
the size of your &#8220;Sent Mail&#8221; folder, while still retaining the memory the
correspondence:</p>

<p>Consider not saving a copy of attachments you send in your Sent Mail folder.
Attachments are often 10 to a 1000 times larger than a typical e-mail, and you
already have a copy of the document on your hard drive. Plus, your recipient is
about to receive a copy and she may then also download a copy from her e-mail
to her hard drive. Saving the attachment in your Sent folder could mean
keeping a forth copy of the document. If it&#8217;s important to have a record
that the attachment was sent, you could send one message with the
correspondence that references the attachment, and save that a message. Then,
send the attachment in it&#8217;s own message, and don&#8217;t save the attachment in the
Sent folder.</p>

<p>Another idea: While some places have data retention rules (or laws!), these
typically do not apply to personal e-mail accounts. Consider turning off the
option to save e-mails by default, and conciously choose which e-mails you
think are important enough to save. Note that Google&#8217;s Gmail service (146 million users) does not
have an option to turn off automatically saving the messages you send. Yahoo is
another web-based e-mail provider that does provide this option. Check your
e-mail program for details.</p>

<p>It is powerful to ask &#8220;what is the impact of this?&#8221;, whether you consider e-mail
storage, the toxic batteries in our cell phones, or the impact of broadcasting
wifi radio waves through our homes 24/7. A daily choice such as a not saving
an sent e-mail can be a mindful practice to connect our abstract online lives
with the real world.</p>

<p>For details about calculating the carbon footprint of e-mail storage, read on. </p>

<p><a name="calculation"></a></p>

<h2>Calculating the carbon footprint of e-mail storage</h2>

<p>Research about the carbon footprint of e-mail storage turned up little in the
way of existing estimates. emailfootprint.org estimated that it would take 1 kilowatt
hour to store 1 Gigabyte of data for a year, but they didn&#8217;t explain how they
came up with that number, and the site is now offline, but accessible
through the <a href="http://74.125.93.132/search?q=cache:WXkI6AepK_oJ:emailfootprint.org/main.aspx">google cache</a></p>

<p>TreeHugger referenced a report that said that it would take 1 lb of coal to
store 20 megabytes of data for a year, according to the US Department of
Energy. But the report they linked to is no longer available.
[<a href="#references">3</a>]</p>

<p>I decided to see if I could piece together my own estimate of the impact of
storing sent e-mail, based on my own usage.</p>

<p>I have about 40 megabytes of mail saved from messages I sent in 2007.  This
perhaps already represents removing some messages which contained large
attachments.  For the sake of the example, let&#8217;s assume this is an average
amount. ( To personalize the example, check the size of your own Sent Mail
folders!). Let&#8217;s multiply this by an approximate 1.5 billion e-mailers and you
get about 56 petabytes of data (about 58 million gigabytes). To picture that:
imagine all the data was crammed on to reasonable large hard drives: 500
gigabytes each. It would take about 117,000 hard drives to store that data,
assuming it was stored relatively efficiently and no extra space was required
to store the operating system on this drives! The actual number is would be
higher because much data can be expected to be on older hard drives,
manufactured before the very large drives were an option. Further, it&#8217;s an
industry best practice to use always duplicate data using &#8220;RAID&#8221;, so the same
data would be written to at least two hard drives. Large providers such as
Google may further be duplicating data in at least two data centers, for extra
redundancy.</p>

<p>Using a number from
<a href="http://web.mit.edu/annakot/MacData/afs.annakot/OldFiles/MacData/afs.course.lockers/2/2.813/www/readings/EricWilliamsHybrid.pdf">here</a>,
I&#8217;ll estimate that it takes 446 MegaJoules of energy to produce a hard drive,
which I&#8217;ll convert to 124 kilowatt hours.  So, it would take about 14.5 million
kilowatt hours of energy just to produce 117,000 hard drives, without getting
into the energy required to keep them turned on.</p>

<p>Using <a href="http://www.eia.doe.gov/cneaf/electricity/page/co2_report/co2report.html#electric">data from US Department of Energy</a>,
it looks like we can expect about 2 lbs of CO2 to be generated to produce each
kilowatt hour of energy.</p>

<p>So that puts us at an estimated  29 million lbs of C02 generated to produce
enough hard drives to efficiently store all the sent e-mail in the world.
(Using my own amount as an average). </p>

<p>To visualize that number, let&#8217;s equate it the number of miles you&#8217;d have to
drive in average car to generate the same amount of CO2.  <a href="http://www.streetsblog.org/2008/10/14/how-clean-is-your-commute/">According to
Streetsblog</a>
the average car generates 1.2 pounds of CO2 per mile, equating to about
24 million miles.</p>

<p>In perspective: 24 million miles is a very large absolute number, but it pales
in comparison to the estimated 5 <em>billion</em> miles that American&#8217;s drive each
day. [<a href="#references">4</a>]. The real danger to address is the way of thinking that
a digital paperless life is automatically green one.</p>

<p><em>This post is a follow-up to one entitled <a href="http://mark.stosberg.com/blog/2009/11/stewardship-of-our-online-lives.html">Stewardship and Sustainability of our Online Lives</a></em>.  </p>

<h2>See Also</h2>

<ul>
<li><a href="http://ms609.blogspot.com/2009/06/how-much-energy-does-it-take-to-store.html">How Much Energy does it take to store an e-mail?</a></li>
<li><a href="http://blogs.wsj.com/digits/2009/04/15/spams-noxious-carbon-footprint/">The Carbon Footprint of Spam - Wall Street Journal</a></li>
</ul>

<h2>References:</h2>

<p><a name="references"></a></p>

<ol>
<li><em>an EPA study stating that the data center industry devours <a href="http://datacenterjournal.com/content/view/2851/43/">61 billion kWh of energy annually</a></em> compared to .<em>..about 275 million TVs currently in use in the U.S., consuming over <a href="http://www.energystar.gov/index.cfm?fuseaction=find_a_product.showProductGroup&amp;pgw_code=TV">50 billion kWh of energy each year</a></em></li>
<li><em><a href="http://datacenterjournal.com/content/view/2851/43/">Quenching the Thirst of Power-Hungry Data Centers</a>, citing primary data from the US government.</em></li>
<li><em><a href="http://www.treehugger.com/files/2008/02/the_footprint_o_1.php">The Footprint of Gmail: How Much Energy Would Deleting Email Save?</a></em></li>
<li><em>Americans drive <a href="http://www.ibtta.org/files/PDFs/Yermack_Larry.pdf">5 billion miles per day</a></em></li>
</ol>
]]>
    </content>
</entry>

<entry>
    <title>Modifying PDFs so they open full screen</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/11/modifying-pdfs-so-they-open-full-screen.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.313</id>

    <published>2009-11-30T02:49:01Z</published>
    <updated>2009-11-30T03:31:36Z</updated>

    <summary> The PDF spec includes an option to cause PDFs to open full screen when users open them. I&#8217;m a fan of the feature because it maximizes screen real estate and creates a simple, focused, experience for the PDF readers....</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Tech" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="linux" label="linux" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="pdf" label="pdf" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/4116496576/" title="retired radiators by Mark Stosberg, on Flickr"><img src="http://farm3.static.flickr.com/2778/4116496576_b40a00a979_m.jpg" width="240" height="180" alt="retired radiators" /></a></div> The <a href="http://www.adobe.com/devnet/pdf/pdf_reference_archive.html">PDF spec</a> includes an option to cause PDFs to open full screen when users open them. I&#8217;m a fan of the feature because it maximizes screen real estate and creates a simple, focused, experience for the PDF readers. Using this option is one of my two essential tips for creating an impactful newsletter targeted at being read online. The other tip is to use a &#8220;portrait&#8221; format document, to match the shape of most screens. </p>

<p>Many PDF viewers respond to PDFs that are set to open full screen, but a number of PDF generation tools don&#8217;t provide you option to set this preference when creating PDFs. I ran into this with <a href="http://xournal.sourceforge.net/">Xournal</a> which is a nice application for Linux-based tablets, but offers no PDF export options. </p>

<p>So I found a way to update a pre-existing PDF to set the preference to have it open full screen by default. The key here is that PDF is a text-based format, so preferences in it can be updated manually by opening and editing the file  according to the PDF spec, or the same effect can be accomplished with automated tools. In this case, I found that I needed to update a line that started like this:</p>

<pre><code>&lt;&lt; /Type /Catalog
</code></pre>

<p>After <code>/Catalog</code>, this is all that needed to be added:</p>

<pre><code>/PageMode /FullScreen
</code></pre>

<p>I automated this with a simple script that I named <code>make-pdf-full-screen.sh</code>. It works for the simple case when no &#8220;PageMode&#8221; has been declared, as in the Xournal case.  I don&#8217;t expect it would update the PageMode properly if it was already declared. For a safer solution consider opening the PDF in a text editor to manually set  &#8220;/PageMode /Fullscreen&#8221; on the initial <code>/Catalog</code> line. Alternatively, you could use a formal solution like <a href="http://search.cpan.org/dist/PDF-API3/lib/PDF/API3/Compat/API2.pm">PDF::API3::Compat::API2</a> which appears to have the features needed to solve this with Perl.</p>

<p>Here&#8217;s the contents of my little script to automate the update:</p>

<pre><code>#!/bin/sh
# usage: make-pdf-full-screen.sh file.pdf
#   The file will be modified in place so that it opens full screen.
#   The current approach is naive... it assumes no Initial View has been defined.
# by Mark Stosberg
perl -pi -e 's?&lt;&lt; /Type /Catalog?&lt;&lt; /Type /Catalog /PageMode /FullScreen?' $1
</code></pre>
]]>
        

    </content>
</entry>

<entry>
    <title>Stewardship and Sustainability of our online lives</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/11/stewardship-of-our-online-lives.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.312</id>

    <published>2009-11-26T05:04:09Z</published>
    <updated>2009-12-05T23:37:01Z</updated>

    <summary> A few weeks ago I had my laptop stolen. Earlier that morning I had been reflecting and writing on the laptop about the intersection of our spiritual lives with our digital lives. And then, as if by divine intervention,...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Simplicity" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Tech" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="brethren" label="brethren" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="carbonfootprint" label="carbon footprint" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="church" label="church" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="facebook" label="facebook" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="google" label="google" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="internet" label="internet" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="online" label="online" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sermon" label="sermon" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="stewardship" label="stewardship" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sustainability" label="sustainability" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="twitter" label="twitter" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><span class="floatimgright"><a href="http://www.flickr.com/photos/markstos/4157172806/" title="Kent and Kurt on the Whitewater Gorge Trail by Mark Stosberg, on Flickr"><img src="http://farm3.static.flickr.com/2691/4157172806_3ba4273709_m.jpg" width="180" height="240" alt="Kent and Kurt on the Whitewater Gorge Trail" /></a></span></p>

<p>A few weeks ago I had my laptop stolen. Earlier that morning I had been reflecting and writing on the laptop about the intersection of our spiritual lives with our digital lives. And then, as if by divine intervention, my laptop disappeared&#8212; during church service no less&#8212; and I was given an even greater opportunity to answer the question: When we spent more time browsing the web, what is that we are doing less of? When we spend more checking e-mail, what are we doing less of? And when we spend more time on Facebook, what are we spending less time doing? Apparently, the answer in my case is cleaning is my desk and organizing the garage.  Those are the things I did more when I could do the the internet less. I joke about this, but I do envision my home as a place of rest and rejuvenation, yet I let clutter accumulate while I spent more time on my computer doing &#8220;productive&#8221; things. </p>

<p>There are many implications of shifting our increasingly precious free time online. Today I&#8217;d like to delve into the carbon footprint of our online lives.</p>

<p>You can use the audio player here to listen to a 15 minute version of the message delivered at my church, (or you can also <a href="http://mark.stosberg.com/audio/stewardship_of_our_online_lives.mp3">download the audio file</a>.)</p>

<p><span class="mt-enclosure mt-enclosure-podcast" style="display: inline;">
<object type="application/x-shockwave-flash" data="/mt/mt-static/plugins/Podcast/player_mp3_maxi.swf" width="200" height="20">
                <param name="movie" value="/mt/mt-static/plugins/Podcast/player_mp3_maxi.swf" />
                    <param name="bgcolor" value="#ffffff" />
                        <param name="FlashVars" value="mp3=http%3A//mark.stosberg.com/audio/stewardship_of_our_online_lives.mp3&amp;showvolume=1" />
                    </object>
</span></p>

<p>The message continues below the jump. </p>
]]>
        <![CDATA[<p><a href="http://www.flickr.com/photos/markstos/4086956596/" title="Fall box bike commute by Mark Stosberg, on Flickr"><img src="http://farm3.static.flickr.com/2551/4086956596_e621817fd9.jpg" width="500" height="375" alt="Fall box bike commute" /></a></p>

<p>As individuals and organizations, many of us profess to hold up the value of stewardship, of caring for the earth&#8217;s resources. But as some of us move more of our lives online, how much do really know about the real-world impact of our actions and data online lives? </p>

<p>When Google&#8217;s Gmail service launched, it advertised &#8220;never delete an email again&#8221;. Instead, you can archive the e-mail with a single click, and it will always be there in case you might like to find it later. </p>

<p>As part of the launch, Google was offering about 100 times more e-mail storage than their competitors. This was enough, they claimed, to never delete another e-mail in your life. This was a decisive moment that changed web-based e-mail forever. Competitors scrambled to dramatically increase their storage options so they could compete.</p>

<p>Something there bothered me.  In the physical world, this is a way of thinking that no environmentalist would stand for&#8212; NEVER THROW ANYTHING ANYWAY AGAIN?  The circle of life is broken, replaced with a one way trip from creation to permanent storage.</p>

<p>Are the rules for sustainability online really that different? </p>

<p>There&#8217;s been a belief that when we move activities online, we are being green.  We laud &#8220;Going paperless&#8221;, and celebrate e-everything. </p>

<p>There of course some truth in the efficiencies of digital living. It&#8217;s certainly intuitive that&#8217;s less resource intensive to send an e-mail instead of a physical letter, or teleconference instead of flying somewhere for a meeting. </p>

<p>But along with some of these efficient uses of the internet, we&#8217;ve moved some of our unsustainable practices online without deeply questioning the impact of this.</p>

<p>While it may be efficient to send an e-mail instead of a letter, many of us now send and receive far more e-mails than we wrote letters.  Our use of the internet has gone far beyond replacing physical tasks with efficient digital alternatives.</p>

<p>I&#8217;ll share what I know about the carbon footprint of our online lives now.</p>

<p>To talk about the carbon footprint of our online lives, let&#8217;s start with the the physical existence of the Internet. Websites and e-mail are served for computers all over the world. Many websites are now clustered in a relatively small number of large data centers. </p>

<p>Picture a data center as a dimly lit, windowless warehouse. On the concrete floor sits aisle after aisle of floor-to-ceiling stacks of computer, neatly set on identical racks, with a blinking lights on the front and neatly organized cable on the back. There is an incessant hum from thousands of spinning disk drives and fans to cool the systems.  The temperature is comfortable, thanks to dedicated cooling systems for the computers. The aisles are even emptier of workers than it is at Lowe&#8217;s. A small number of people may be onsite to tend to the rare physical needs of the machines, but most people who use the systems could be anywhere. Like you or me, they could even be sitting at home in their underwear.</p>

<p>Already, the data centers that host major Internet sites are drawing more electrical power in the United States than our TV use. [1] Let me say that again: the electricity American&#8217;s consume to power to their Internet habit has surpassed the amount of electricity used to power to TV habit.  And while we keep our TVs just a few hours a day, we expect websites to be available 24 hours a day, every day.</p>

<p>Data centers tend to be powered by traditional power sources, with a few exception who choose to use wind or solar to power their operations. Google has expressed sincere interest in greening their operations, but so far continues to focus on building out their infrastructure as fast as they can, with a plan to throw money at the sustainability problem, hoping for a solution later.</p>

<p>A scientist researched the energy consumed by a Google search and determined that executing just two Google searches would use enough energy to boil a kettle of water [2] Google refuted this claim, saying that this estimate was far too high. Google performed it&#8217;s own carbon footprint calculation of a Google search.  According Google&#8217;s own estimate, it would take a 1,000 Google searches to equal  the impact of driving an average automobile a kilometer, or  6/10ths of a mile [3]. Sending a search to Google isn&#8217;t just asking a question to a single computer.  Clusters of super computers are used to calculate a response. The footprint of a search is small, but the number being executed every day is staggering. I&#8217;m sure Google was trying to present their environmental impact in the best possible light. It&#8217;s no wonder then that they didn&#8217;t cross reference these statistics with the number of searches that are currently performed each day. It&#8217;s estimated that about 300 million Google searches are performed each day.[7]</p>

<p>This means that according to Google&#8217;s own estimate, the daily impact of Google searches adds up to the equivalent of driving about 180,000 miles each day.  Calculating this number was of my deciding points in preparing this message.  It&#8217;s such a big number.  Imagine if there were 180,000 less miles driven each day!</p>

<p>With some further research I was able to put this number in perspective. (I think it took less a thousand additional Google searches). The United States Postal Service logs an estimated 2.6 million miles each day, or about 15 times more. [6] Americans in total drive about 5 BILLION miles a day. [4] The impact of Google searches is statistically insignificant compared to this. To try to put this into perspective: If American&#8217;s were to drive one mile less per year, it would have more a thousand times more impact than if the entire world abstained from searching Google for a single day. </p>

<p>I don&#8217;t mean to diminish the original number: The daily impact of Google searches equating to 180,000 miles of driving in terms the carbon footprint. It&#8217;s still a big number and it would great to reduce it further. Comparing the impact of different activities we perform helps us to put things in perspective and prioritize what lifestyle changes could most effective. And we don&#8217;t always have to chose making one improvement at the expense of another. </p>

<p>The Google search statistic was an example of taking an action online. Life online involves more than just Google searches though.</p>

<p>Our online lives are also composed of data we generate or that is collected about us, sitting up there in the &#8220;cloud&#8221;, at these data centers. There are e-mail folders of archived messages. There are archived posts to mailing lists and forums, and photos of old summer vacations posted on photo sharing websites.</p>

<p>Our data has a cost to exist as well.  Data that seems to be inactive is likely to be regularly accessed for maintenance like virus scans, causing an energy draw proportional to the amount of data involved. Any data stored online is likely backed up every day.  Even inactive data is copied repeatedly to back-up tapes, causing additional power consumption.</p>

<p>What is this impact of this storage in context? I don&#8217;t know, but it&#8217;s clear that the more data is out there, the greater the cost to store it.</p>

<p>There&#8217;s so much data being stored about us, often not because we care about it, but because it benefits the corporations who are collecting it.  The more data Google, Facebook and others collect, the more content they have for pages to serve ads on, and the more relevant ads they are able to display based on the data we give them.</p>

<p>So Google strongly encourages us to archive e-mail, not delete it, which would reclaim resources. Likewise, Facebook and many other sites have few or no limits on the amount of content you can post. Instead, they focus on infinite data structures, like Flickr&#8217;s &#8220;photostream&#8221;, Facebook &#8220;walls&#8221;, and the endless river of status updates on Twitter.</p>

<p>The design of these sites is not to encourage us to review all of someone&#8217;s content, or even someone&#8217;s best content. The design pattern we see over and over online now is to encourage an infinite streams of data, and have us focus only the most recent entries of the infinite streams, while meanwhile the old data is encouraged not to be removed and recycled, but to stay online forever for reference and profit.</p>

<p>It&#8217;s a hard problem to design tools that find the most relevant information regardless of whether it&#8217;s the newest or not. Google search tries to solve just that problem. The problem could be somewhat voluntary addressed if people took greater care to update the information that was posted online, or delete content we controlled that knew was obsolete.</p>

<p>As stewards of our online lives, we should apply the same kind of thinking we do about physical world sustainability to our online lives.</p>

<p>Re-consider allowing so much of your data to make a one way trip to permanent archiving.  Cultivate your data like a garden. Something with finite boundries.  Review the things you&#8217;ve planted online periodically. Throw away content that has rotted or expired over time. Prune out the typos. Trim and rewrite your best pieces so they can flourish.</p>

<p>Use your data gardening time to reflect on your past. You may ask yourself &#8220;Whatever planted the seed for that article in my head?&#8221; But you may also find some heirloom crops, still bright with flavor today.</p>

<p>Now let&#8217;s zoom out some. How can we profess to be good stewards of the earth,
when we engage in activity where we don&#8217;t really know the impact? </p>

<p>Religious history has seen groups split over such questions. Should we use
automobiles? Electricity? The Internet? The Amish stand out for choosing the
simpler life, while other demoninations attempt to live &#8220;in the world but not
of it.&#8221;</p>

<p>Communicating through the internet is just one example of lifestyle choices
which create a more abstract existence, where the affects of simple daily
activities touch back to data centers in California and factories in China. </p>

<p>To embrace this complexity while still prioritizing stewardship means taking on the responsibility of understanding the impact of our abstracted actions, from using the internet, to driving cars, to buying foreign-made products.</p>

<p>When it comes being a good steward of our online lives there are many ways to address the complexities and reduce our carbon footprint. Here are three specific practices that I use. The impact of each action may be small, but like a vote, the cumulative effect of small actions can add up to something big. The benefits of such practices go beyond simply reducing carbon footprints. Each one is a practice in mindfulness, that reminds us that our abstracted actions have real world impacts. </p>

<ol>
<li>The first tip: I  put our home cable modem and wireless router on a power strip. We turn the strip off at night and on in the morning. Not only does this save electricity, it also improves security by completely preventing outside access. It also reduces the amount of radio waves  being broadcast through the house.</li>
<li>A second tip: When sending an e-mail that is primarily an attachment, I consider using the option to not save the message in your sent-mail folder. These messages are much larger than normal e-mails, and I already have a copy of the document on your hard drive, plus the recipients will also have a second copy in their Inbox, and likely a third that is saved to their own hard drive. </li>
<li>Finally, here&#8217;s a tip that could vastly reduce the number of Google searches, while at the same time finding what you are looking even faster. Top Google searches include queries for &#8220;YouTube&#8221; and &#8220;Facebook&#8221;. Instead of going directly to a site like &#8220;YouTube.com&#8221;, many people first type &#8220;YouTube&#8221; into Google and click on the first result. Using a bookmark for popular sites would save a small but repetitive amount of time and energy by going directly to the sites. A bookmark is not only efficient here, it makes that Google is not tracking your search and mediating your experience as pass through Google. You are saving yourself from seeing one more ad that day, which would otherwise be displayed in the right sidebar of Google as you click through. </li>
</ol>

<p>Ultimately I think the wisdom of &#8220;less is more&#8221; that applies to being stewards of our online lives. You have the option to just not post something. Or Don&#8217;t sign up for some website. Or just unplug and go outside. Visit someone in person. Stewardship the old fashioned way has a beautiful simplicity to it. </p>

<p>How have you found satisfaction and success in being a steward of your online
life? If you don&#8217;t use the Internet, or have even just avoided Facebook, what
has it meant for you to chose this decision while so many others embrace it?
What do you find at the intersection of our spiritual and digital lives? </p>

<h2>References:</h2>

<ol>
<li><em>an EPA study stating that the data center industry devours <a href="http://datacenterjournal.com/content/view/2851/43/">61 billion kWh of energy annually</a></em> compared to .<em>..about 275 million TVs currently in use in the U.S., consuming over <a href="http://www.energystar.gov/index.cfm?fuseaction=find_a_product.showProductGroup&amp;pgw_code=TV">50 billion kWh of energy each year</a></em></li>
<li><em>Performing two Google searches from a desktop computer can generate about the <a href="http://www.natscience.com/Uwe/Forum.aspx/physics/32927/Revealed-THE-ENVIRONMENTAL-IMPACT-OF-GOOGLE-SEARCHES">same amount of carbon dioxide as boiling a kettle for a cup of tea.</a></em></li>
<li><em>the average car driven for one kilometer (0.6 miles for those in the U.S.) produces as many greenhouse gases as <a href="http://googleblog.blogspot.com/2009/01/powering-google-search.html">a thousand Google searches.</a></em></li>
<li><em>Americans drive <a href="http://www.ibtta.org/files/PDFs/Yermack_Larry.pdf">5 billion miles per day</a></em></li>
<li><em>The Dept. of Transportation estimates that Americans drive an average of <a href="http://www.associatedcontent.com/article/32937/more_gas_and_better_gas_mileage_for.html">29 miles per day</a></em></li>
<li><em>The Postal Service operates a fleet of 219,000 vehicles, including 146,000 delivery vehicles&#8230;The average LLV is driven about 18 miles a day.</em> (146,000*18 = ~ <a href="http://www.altenergystocks.com/archives/2009/09/usps_study_ev_economics_depend_on_smartgrid_revenue.html">2.6 million miles per day</a> )</li>
<li><em><a href="http://blog.usaseopros.com/2009/06/26/google-searches-per-day-reach-299-million-in-may-2009/">&#8230;299.83 million Google searches per day</a>  in May 2009</em></li>
<li>The book <a href="http://www.amazon.com/dp/1416546960/">Planet Google</a> was also a useful reference. </li>
</ol>
]]>
    </content>
</entry>

<entry>
    <title>A vision for CGI.pm and CGI::Simple</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/09/a-vision-for-cgipm-and-cgisimple.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.311</id>

    <published>2009-09-24T01:19:23Z</published>
    <updated>2009-09-24T02:02:34Z</updated>

    <summary> I&#8217;ve spent a lot time recently triaging bugs for CGI.pm. I&#8217;ve enjoyed the process, and respect CGI.pm as a widely used Perl module. I&#8217;m not in love all aspects of module. I don&#8217;t use or recommend the HTML generation...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cgipm" label="CGI.pm" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cgisimple" label="CGI::Simple" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="psgi" label="PSGI" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><a href="http://www.flickr.com/photos/markstos/3933330782/" title="baby sleeps again by Mark Stosberg, on Flickr"><img src="http://farm3.static.flickr.com/2508/3933330782_e081dcf5ac_m.jpg" width="240" class="floatimgleft" height="180" alt="baby sleeps again" /></a> I&#8217;ve spent a lot time recently <a href="http://mark.stosberg.com/blog/2009/08/almost-100-cgipm-bugs-closed-help-with-the-50-still-open.html">triaging bugs for CGI.pm</a>. I&#8217;ve enjoyed the process, and respect CGI.pm as a widely used Perl module. I&#8217;m not in love all aspects of module. I don&#8217;t use or recommend the HTML generation features&#8212; I recommend using HTML template files and <a href="http://search.cpan.org/perldoc?HTML::FillInform">HTML::FillInForm</a> for filling them.</p>

<p>Whenever I think about how I&#8217;d like to change CGI.pm,what I have mind is often the same choice that <a href="http://search.cpan.org/perldoc?CGI::Simple">CGI::Simple</a> made.  There was a time years ago that I focused my attention on CGI::Simple and tried it in production, only to be bit by a compatibility issue, so I reverted back to CGI.pm. I don&#8217;t remember what the specific issue, and it&#8217;s likely been fixed by now. But the pragmatic point remained with me: CGI::Simple may have clean code and a good test suite, but it&#8217;s not necessarily free of defects and in particularly it lacks the vastly larger user base that CGI.pm has to provide real world beta testing. </p>
]]>
        <![CDATA[<p>I recently took another look at CGI::Simple, it&#8217;s <a href="http://mark.stosberg.com/blog/2008/12/cookie-handling-in-titanium-catalyst-and-mojo.html">cookie handling implementation</a>, and its bug queue. One thing became clear: CGI::Simple forked from CGI.pm in 2001, and they have not evolved in parallel since then. Each has had different bugs filed against it, with some issues fixed in one and not the other. They both have test suites, but they have evolved with different test coverage as new tests are written to respond to bugs filed against one particular module. </p>

<p>And, unfortunately for the better design of CGI::Simple, it is CGI.pm that continues to receive far more of the attention and updates. (Although to be fair, some of this relates to the HTML functions, which are intentionally omitted from CGI::Simple). </p>

<p>I would like to say that CGI::Simple is a clear path forward from CGI.pm if you are willing to let go of the HTML generation functions. Unfortunately, the current situation is ripe for running into subtle differences that have been created since the projects forked about eight years ago. </p>

<p>My vision for a solution is simple: CGI.pm and CGI::Simple should be maintained together. Where their features overlap, the combined project should have the best version of the documentation from both projects, the best code from both projects, and the combined test coverage of both projects. CGI::Simple is intentionally incompatible in a few ways in the name of better design, and I support that. Still, the projects should strive to maintain compatible whenever possible to make it easy for people to transition from CGI.pm to CGI::Simple. When a change comes in that could affect either module, it should be changed in both modules. </p>

<p>A great example of a possible collaboration point is the <a href="http://rt.cpan.org/Public/Bug/Display.html?id=49943">request</a> to add <a href="http://bulknews.typepad.com/blog/2009/09/psgi-perl-wsgi.html">PSGI</a> support to CGI.pm. Ideally if the proposal is accepted, it could be added to both CGI.pm and CGI::Simple at the same time, with the same API, tests and documentation. </p>
]]>
    </content>
</entry>

<entry>
    <title>Almost 100 CGI.pm bugs closed, help with the 50 still open</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/08/almost-100-cgipm-bugs-closed-help-with-the-50-still-open.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.308</id>

    <published>2009-08-15T01:28:23Z</published>
    <updated>2009-08-15T02:14:08Z</updated>

    <summary> Get off the couch and pull your weight&#8212; There&#8217;s CGI.pm bug with your name on it. There were nearly 150 active entries in the CGI.pm bug tracker when I was approved recently as a new co-maintainer. As I had...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cgipm" label="CGI.pm" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<div class="floatimgright"> 
<a href="http://www.flickr.com/photos/markstos/3802117476/" title="new bikes-at-work trailer by Mark Stosberg, on Flickr"><img src="http://farm3.static.flickr.com/2634/3802117476_bd83c36b1f_m.jpg" width="240" height="180" alt="new bikes-at-work trailer" /></a>Get off the couch and pull your weight&#8212; <br/>There&#8217;s CGI.pm bug with your name on it.</div> 

<p>There were nearly 150 active entries in the CGI.pm bug tracker when I was approved recently as a new co-maintainer. As I had time in the evenings after the baby was sleep, I went through and reviewed every one of these bug reports. Many had already been addressed by Lincoln some time ago. Those were simply closed.  Still, I found about 20 fairly ready-to-go patches, and those have now been processed and <a href="http://cpansearch.perl.org/src/LDS/CGI.pm-3.45/Changes">released today as CGI.pm 3.45</a>. Whenever code changes were made, I also strived to make sure new automated tests were added that covered those cases. You may be surprised how many methods in CGI.pm have no automated tests for them at all. </p>

<p>Now there are still about 50 open issues in the <a href="http://rt.cpan.org/Public/Dist/Display.html?Name=CGI.pm">CGI.pm bug tracker</a>. For these, I have tried to use the subject line some summary indication of what is needed to move it forward, like &#8220;Needs Test: &#8220;, or &#8220;Needs Peer Review: &#8221; or &#8220;Needs Confirmation&#8221;. Generally, I plan to wait patiently for volunteers to help with these. If you use CGI.pm, consider helping to move one of these forward. </p>
]]>
        <![CDATA[<p>To make collaboration easier, <a href="http://github.com/markstos/CGI.pm/tree/master">CGI.pm is now on github</a>. You are welcome to fork and send pull requests through there, although posting patches to the bug tracker continues to work file for small changes.  The full CVS history has not been translated yet, but may be eventually. </p>

<p>CGI.pm has been in the Perl core since 5.4. With its maturity comes quirks. I&#8217;m not a fan of the HTML generation functions. I find the support for both OO and procedural styles awkward. As I have more time, I also hope to continue updating the CGI.pm documentation to promote more modern practices, and de-emphasize other parts of it, like the HTML generation functions and the procedural interface. </p>

<p>I would like to thank Lincoln Stein for building and releasing CGI.pm, and for maintaining it for over a decade. Many projects from <a href="http://www.cgi-app.org">CGI::Application</a> to <a href="http://www.movabletype.com">Movable Type</a> depend on it. I also appreciate his willingness to allow for direct help on the project, and his receptiveness to the documentation overhaul idea. Lincoln is continuing his involvement. He completed the final review and release of the changes proposed for 3.45. </p>

<p>I understand that CGI.pm is broadly used, but like ExtUtils::MakeMaker, not always well loved. It&#8217;s true that some day it will be completely replaced by next-generation tools, and some reasonable candidates exist now. Until then, there are thousands of existing users will appreciate our collective maintenance of the module.  Let&#8217;s get the bug tracker back down to zero!</p>
]]>
    </content>
</entry>

<entry>
    <title>Movable Type fork is an opportunity to harness CPAN</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/06/movable-type-fork-is-an-opportunity-to-harness-cpan.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.301</id>

    <published>2009-06-23T16:21:51Z</published>
    <updated>2009-06-24T13:21:20Z</updated>

    <summary><![CDATA[ Today Melody was announced as a fork of the perl-based Movable Type platform. I helped the Melody project as it prepared to launch, in part advising on how to best to relate to the Perl community.&nbsp; One of the...]]></summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Movable Type" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cgiapplication" label="CGI::Application" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="fork" label="fork" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="melody" label="Melody" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="movabletype" label="movabletype" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="melody-logo-mark-on-white-thumb-200x200-7.jpg" src="http://mark.stosberg.com/blog/images/melody-logo-mark-on-white-thumb-200x200.jpg/melody-logo-mark-on-white-thumb-200x200-7.jpg" class="mt-image-right" style="margin: 0pt 0pt 20px 20px; float: right;" height="200" width="200" /></span>
Today <a href="http://www.openmelody.org/">Melody</a> was announced as a fork of the perl-based <a href="http://www.movabletype.org/">Movable Type platform</a>. I helped the Melody project as it prepared to launch, in part advising on how to best to relate to the Perl community.&nbsp; One of the stated interests of Melody is to refactor the project to use <a href="http://www.cgi-app.org/">CGI::Application</a>, which I maintain. Tim Appnel has already spelled out&nbsp; <a href="http://wiki.movabletype.org/Proposal:Foundry">a vision of what a "CPANization" of Movable Type might look like</a>, and I've looked in depth at what the <a href="http://wiki.movabletype.org/Proposal:QueryObjectRefactor">initial steps towards using CGI::Application</a> could be.<br /><br />My own vision for Melody is a code base that's very focused on publishing and content management, with all the infrastructure outsourced to <a href="http://search.cpan.org/">CPAN</a> modules that are well-written, well-documented, and well-tested.&nbsp; The collaboration between Melody and CPAN would be a two-way code flow. While there are more CPAN modules that Melody could make use of, there are number of pieces of Melody which should be packaged as independent modules on their own and released to CPAN. One example is the great "dirification" that already exists in Movable Type. This is the functionality that turns any given string of words into a reasonable representation in URLs. It seems like an easy problem on the surface, but Movable Type has a sophisticated solution that takes into account what it means to do this well across many different languages. I also couldn't find any existing CPAN module which already takes on this problem space, so I started to extract this out of Movable Type myself and <a href="http://www.sixapart.com/pipermail/mtos-dev/2008-December/002249.html">published a draft of String::Dirify</a>. For that initial release, I ripped out all the fancy multi-language support, and there is still more significant work to be done to untangle this layer from from Movable Type. ( If you want to pick up that project and work on it, there's also <a href="http://www.sixapart.com/pipermail/mtos-dev/2008-September/001982.html">some discussion of testing String::Dirify</a>).<br /><br />While Movable Type already had an open source release, I expect Melody to have&nbsp; a more adventerous evolution, and I look forward to it becoming a shining star in the Perl community, not just for the exterior functionality, but also because internals have an opportunity to become an example of best practices. <br />]]>
        
    </content>
</entry>

<entry>
    <title>Towards a Pure Perl HTML::FillInForm: 61% tests passing</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/04/towards-a-pure-perl-htmlfillinform-61-tests-passing.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.279</id>

    <published>2009-04-28T02:50:44Z</published>
    <updated>2009-04-28T03:16:02Z</updated>

    <summary> I have a goal of distributing Titanium along with all of its dependencies. The vision is to have a pure perl stack, so it&#8217;s easy to unpack and start using the distribution, without worrying about binaries tied to a...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><a href="http://www.flickr.com/photos/markstos/3478645938/" title="kentucky creek by Mark Stosberg, on Flickr"><img  align="left"
style="margin-right: 10px" src="http://farm4.static.flickr.com/3540/3478645938_d4c1817754.jpg" width="103" height="500" alt="kentucky creek" /></a> I have a goal of distributing <a href="http://search.cpan.org/perldoc?Titanium">Titanium</a> along with all of its dependencies. The vision is to have a pure perl stack, so it&#8217;s easy to unpack and start using the distribution, without worrying about binaries tied to a specific platform. </p>

<p>One stumbling block with this has been <a href="http://search.cpan.org/perldoc?HTML::Parser">HTML::Parser</a>, which requires XS and is dependency of <a href="http://search.cpan.org/perldoc?HTML::FillInForm">HTML::FillInForm</a>, which is used by <a href="http://search.cpan.org/perldoc?CGI::Application::Plugin::ValidateRM">CGI::Application::Plugin::ValidateRM</a>. </p>

<p>I&#8217;ve been working with Ron Savage on a Pure Perl replacement for HTML::Parser, and I made good progress over the weekend based on his foundation.  He created <a href="http://search.cpan.org/perldoc?HTML::Parser::Simple">HTML::Parser::Simple</a>, which is really is two major parts in a one. First, it provides a pure-perl HTML parser. Secondly, it also provides a specific use of the parser, which is store a representation of a HTML documentation as a <a href="http://search.cpan.org/perldoc?Tree::Simple">Tree::Simple</a> data structure. </p>

<p>In a future version, I&#8217;d like to see the Tree::Simple part split into it&#8217;s own
module to clarify the parts, and to eliminate this dependency for people who
don&#8217;t use it. For example, in the new sub-class I created, I don&#8217;t use the
Tree::Simple object at all.</p>

<p>I made a short-term goal of getting all the tests for HTML::FillInForm to pass when we modify one line in HTML::FillInForm to say that it &#8220;isa&#8221; HTML::Parser::Simple::Compat&#8221; rather than being a sub-class of HTML::Parser. Then I run the HTML::FillInForm test suite and see how many tests pass and file. Currently our parser emulates enough of HTML::Parser to pass 61 percent of the tests. </p>

<p>The idea with &#8220;HTML::Parser::Simple::Compat&#8221; is provide an API that is compatible (enough) with the HTML::Parser 2.x API.  I think the approach seems promising, but there is still more to do:</p>

<ul>
<li>I have a focused on getting tests to pass, so the documentation is still poor</li>
<li>Perhaps my &#8220;Compat&#8221; sub-class should really be merged into the parent class. I&#8217;ll discuss this with Ron</li>
<li>There are few of our own tests, as I&#8217;ve been focusing on getting HTML::FillInForm tests to pass. Perhaps more of those tests could be &#8220;ported&#8221; into own test suite.</li>
<li>Performance and memory consumption have not been benchmarked. </li>
</ul>

<p>If this project interests you, feel free to fork the project on github and start contributing code or other feedback.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>fgdb: Free software to manage community hardware recycling</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/04/fgdb-free-software-to-manage-community-hardware-recycling.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.271</id>

    <published>2009-04-03T23:50:44Z</published>
    <updated>2009-04-04T00:11:23Z</updated>

    <summary> Hardware recycling in Richmond took a leap forward last weekend. A small group of Richmond volunteers toured Free Geek Columbus. We learned much from visit. One valuable detail I&#8217;ll focus on today is that they run their organization on...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Tech" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="freegeek" label="freegeek" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="hardware" label="hardware" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="richmondindiana" label="Richmond, Indiana" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p><a href="http://www.flickr.com/photos/markstos/3402969378/" title="pandarama #1 by Mark Stosberg, on Flickr"><img style="margin-right: 10px" align="left" src="http://farm4.static.flickr.com/3435/3402969378_f42814b440.jpg" width="130" height="500" alt="pandarama #1"  /></a> Hardware recycling in Richmond took a leap forward last weekend. A small group of Richmond volunteers toured <a href="http://www.freegeekcolumbus.org">Free Geek Columbus</a>. We learned much from visit.  One valuable detail I&#8217;ll focus on today is that they run their organization on web-based,
database-driven software system called &#8220;fgdb.rb&#8221;. The software is available on their intranet, allowing several volunteers to use and access the system at the same time. </p>

<p>fgdb.rb tracks hardware donations, volunteer time, recycling trips, and
hardware distribution. fgdb.rb work with a neat tool called &#8220;printme&#8221; that
takes an automatic snapshot of the all of computer system&#8217;s details, and
uploads it to the database. This automates tedious data entry and creates a
great reference.</p>

<p>I was pleased to find that &#8220;fgdb.rb&#8221; was available for free as open source
software, and is designed to runon  Linux. However, the documentation was lacking
on details on how to get the system up and running on Ubuntu Hardy Linux, which
is what we use on our server at our hardware co-op, and also on my laptop.</p>

<p><a href="http://www.freegeek.org/">Free Geek Portland</a> developed fgdb.rb for their own
use and had tested installing the software primarily on Debian Lenny Linux.
Now I have managed to install it on Ubuntu Hardy Linux, and have submitted a
patch back to the authors to update the documentation to help others do this as
well. You can see my <a href="http://github.com/markstos/fgdb.rb/blob/8eed86ea36da6b16bd32ecbd5e644c47ea31085c/doc/README_FOR_APP">current version of the installation instructions</a>, but my changes should be merged
into the <a href="http://git.ryan52.info/?p=fgdb.rb;a=summary">main fgdb.rb git repository</a>
soon, and I recommend checking there for the current version.</p>

<p>I&#8217;ve also published a <a href="http://www.flickr.com/photos/markstos/sets/72157616045453334/">few photos from the Free Geek Columbus field trip</a>.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Darcs vs. git annoyances: getting to know you</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/04/darcs-vs-git-annoyances-getting-to-know-you.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.270</id>

    <published>2009-04-02T02:09:23Z</published>
    <updated>2009-04-02T02:18:48Z</updated>

    <summary><![CDATA[I just saw a patch I created on github attributed to &#8220;mark@freekbox.(none)&#8221;. &nbsp; That&#8217;s nothing like an email I use and it&#8217;s annoying that git not only selected this as my identity to use in the commit log, but then...]]></summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Darcs" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="darcs" label="darcs" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="git" label="git" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<p>I just saw a patch I created on github attributed to &#8220;mark@freekbox.(none)&#8221;. &nbsp; That&#8217;s nothing like an email I use and it&#8217;s annoying that git not only selected this as my identity to use in the commit log, but then didn&#8217;t even run it by me for a reality check. Now that I&#8217;ve committed that patch and pushed it to github, there&#8217;s no real way to alter that. Extra annoying.</p>

<p>By contrast, the first time you record a patch in a darcs repo, you are prompted for your e-mail address in a friendly way:</p>

<pre><code>$ darcs record
Darcs needs to know what name (conventionally an email address)
to use as the patch author, e.g. 'Fred Bloggs &lt;fred@bloggs.invalid&gt;'.
If you provide one now it will be stored in the file 
'_darcs/prefs/author' and used as a default in the future.  
To change your preferred author address, simply delete or edit
this file.

What is your email address?
</code></pre>

<p>Easy and pleasant. </p>
]]>
        

    </content>
</entry>

<entry>
    <title>Richmond Schools: Consider Thin Clients</title>
    <link rel="alternate" type="text/html" href="http://mark.stosberg.com/blog/2009/02/school-board-consider-thin-clients.html" />
    <id>tag:mark.stosberg.com,2009:/blog//2.265</id>

    <published>2009-02-08T02:44:12Z</published>
    <updated>2009-02-18T20:51:32Z</updated>

    <summary>Richmond High School student Jonathan Ulrich helped to set up and test a thin client lab. This is an open letter to the Richmond, Indiana Community School system. There is a school board meeting coming up to discuss how to...</summary>
    <author>
        <name>Mark Stosberg</name>
        <uri>http://mark.stosberg.com/</uri>
    </author>
    
        <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="hardware" label="hardware" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="linux" label="Linux" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="opensource" label="Open Source" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="richmondindiana" label="Richmond, Indiana" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://mark.stosberg.com/blog/">
        <![CDATA[<div class="floatimgright"><a href="http://www.flickr.com/photos/markstos/2098105333/" title="computer hardware co-op launches by Mark Stosberg, on Flickr"><img  src="http://farm3.static.flickr.com/2405/2098105333_dcbfd4d28a_m.jpg" width="240" height="180" alt="computer hardware co-op launches" /></a><br/>Richmond High School student Jonathan Ulrich helped to set up and test a thin client lab.</div>

<p>This is an open letter to the Richmond, Indiana Community School system. There is a
school board meeting coming up to discuss how to fund technology upgrades
with a dwindling budget. I strongly suggest the school system consider Linux thin client
labs as part of the solution. Thin client labs are made with low-cost, low-power,
low-maintenance stations and have many advantages.</p>

<p>A Linux thin client lab is already being used successfully in the area.  Four
years ago in Brookville, Indiana a thirty-seat thin client lab was set up at
St. Michael&#8217;s School.  Initial costs were kept low through low hardware
requirements and the use of free, open source software.  The lab is still in
use four years later. Minimal maintenance has been required, including zero
virus/spyware/malware infections due to the use of Linux.</p>

<p>Thin clients don&#8217;t need a hard drive, which are at the top of the list of the
common parts to fail in a computer. Instead, every workstation pulls all the
software it needs from a single server, meaning there is a one computer to
maintain software on in the lab, not thirty. So St. Michael&#8217;s unplugged the
hard drives in their machines, cutting down on noise in the lab, and well as
reducing the energy consumed by the lab.</p>

<p>I recommend checking for yourself on this success story. For the
administrator perspective, contact the Principal, Ken Saxon at (765)
647-4961. For the IT perspective, contact Mike Heins, who set up the system
and maintains it: (765) 328 4479, (also at mikeh@perusion.net).</p>

<p>The use of Linux in Indiana schools is not new, either. In 2005 the state of
Indiana launched a state-wide initiative to put <a href="http://news.cnet.com/Indiana-schools-enroll-Linux/2100-7344_3-5820237.html">Linux on the the desktop of
300,000 Indiana high school
students</a>.
Locally, Northeastern High School has made significant use of Linux.</p>

<p>I&#8217;ve already hinted that thin clients have lower power requirements and can
be lower maintenance. The hardware needed for thin client workstations is not
special. In fact, old desktop hardware that would otherwise be discarded for
being slow is ideal. In a thin client system, the performance is determined
by the server, and the workstation needs just a minimal amount of resources
to connect to it.</p>

<p>With these principles, I built a four-seat demonstration lab at my church,
using three computers so old that a local computer store gave them to me. I
paid only $50 for a memory upgrade for the server. As a thin client lab, these
old computers came back to life and performed like modern desktops, although
they ran Windows 98 in their former lives.</p>

<p>Because a school lab setting is ideal place to deploy a thin client network,
there are several projects that focus on exactly this, and give away the
required software. These include <a href="https://fedorahosted.org/k12linux/">K12Linux</a>
and <a href="http://www.edubuntu.org">Edubuntu</a>. Both are exceptionally easy to try out
and install, from personal experience.</p>

<p>Pursuing thin client now is a strategic move that works towards the goal
of the City&#8217;s Comprehensive Plan to be a &#8220;Sustainable City&#8221;. The plan
is fiscally conservative and technologically advanced, with low impact
on the environment and energy bills.</p>
]]>
        

    </content>
</entry>

</feed>
