A midwinter box bike ride

Recently I've been reviewing how various Perl frameworks and modules generate HTTP headers. After reviewing several approaches, it's clear that there are two major camps: those which put the response headers in a specific order and those which don't. Surely one approach or the other would seem like it would be more spec-compliant, but RFC 2616 provides conflicting guidance on this point.

The bottom line is that spec says that "the order in which header fields with differing field names are received is not significant". But then it goes on to say that it is a "good practice" (and it puts "good practice" in quotes) to order the headers a particular way. So, without strict guidance from the spec about the importance of header ordering, it would be interesting to know if header order caused a problem in practice.

The Plack::Middleware::RearrangeHeaders documentation suggests there is some benefit to strict header ordering: "to work around buggy clients like very old MSIE or broken HTTP proxy servers"

You might wonder what the big deal is-- why not just stick to the "good practice" recommendation all the time? The difference can be seen in the benchmarks provided by HTTP::Headers::Fast. By ignoring the good-practice header order, an alternate implementation was able to speed-up header generation to be about twice as fast. Considering that a web-app needs to generate a header on every single request, making header generation smaller and faster is potentially a tangible win, while also still being spec-compliant.

So let's look at who is doing what:

  • CGI.pm does not order headers by default or have an option to do so. CGI.pm is used by default by CGI::Application and also by Mason and Jifty when they run under CGI.
  • CGI::Simple predictably models CGI.pm's behavior
  • Dancer generates headers itself, without regard to order
  • Plack is a mixed bag. If you generate your own headers, I believe they will be passed through unmodified, but if you use Plack::Request they will be ordered (with no option to disable this). With Plack you also have the option to the the RearrangeHeaders Middleware if you want to be certain that your headers are in the "good practice" order.
  • Mojo generates its own headers, always in the "good practice" order, with no way to disable the ordering.
  • Catalyst, Plack and others rely on HTTP::Headers to generate headers. HTTP::Headers also currently always generates headers in good-practice order, although there has been some discussion about adding an option to produce the headers in an unsorted order.
  • And finally, I found HTTP::Headers::Fast which is used by HTTP::Engine and provides the user equal options to generate headers in a sorted or unsorted order.
  • For a bonus consultation, I looked at Rack, a web server written in Ruby sometimes paired with Rails, and one of the inspirations for Plack. It appears that that Rack does not order HTTP headers either.

So which approach is best? I think what makes most sense to me is leave HTTP headers unsorted by default. This complies with the spec and can be accomplished with a simpler implementation and better performance. The good-practice ordering is a nice-to-have option if you know you need it or want the tradeoff.

And should we be worried about header ordering issues in practice? With CGI.pm in use for over a decade, I can't recall any header-ordering bugs of the hundreds of CGI.pm bugs I've triaged in that bug tracker. I'm interested to know if there are any real-world web clients or proxies that require or benefit from the good-practice ordering.

box biking at 10F

I don't tap my own phone. I don't xerox postcards before I mail them back from vacation. I don't take a voice recorder when I go out with friends. And I don't have a copy machine at home to duplicate hand written notes I may send.

But if I send a message of equal importance by e-mail, then my e-mail program will automatically save a copy of every one of these messages.

E-mails I don't need waste my time. They increase the time it takes to search and browse through old email. They increase the time it takes for my email to "sync" when I want to go offline. To continue to save every e-mail I send perpetuates the unsustainable myth that as long as our actions are online they are "green".

First the small action of saving a e-mail is amplified by the 1.5 billion people using e-mail globally. More saved e-mail means more disks to store the e-mail, larger backup systems to handle the volume, and faster processors so that all the archived messaged can be searched through efficiently. Since sent mail accumulates over time, an increasing amount of resources are needed to handle the saved mail. In five years there will be more people on the planet. It's likely a greater percentage will be sending e-mail, and likely the e-mail volume will be even higher than it as now. What is the energy cost, now and in the future, of storing so much e-mail?

While we can debate the magnitude of the impact, more data storage equals more resource consumption. In the United States, data centers already draw more power than our TV usage [1], and data centers are only part of the energy consumed by our increasingly networked life. In turn, about 40% of our water supply is devoted to generating power. Water is used in part to provide cooling massive data centers. (Only 15% of the water supply is actually used by the public). [2]

If you're with me on this one, here's one tip that could make big reduction in the size of your "Sent Mail" folder, while still retaining the memory the correspondence:

Consider not saving a copy of attachments you send in your Sent Mail folder. Attachments are often 10 to a 1000 times larger than a typical e-mail, and you already have a copy of the document on your hard drive. Plus, your recipient is about to receive a copy and she may then also download a copy from her e-mail to her hard drive. Saving the attachment in your Sent folder could mean keeping a forth copy of the document. If it's important to have a record that the attachment was sent, you could send one message with the correspondence that references the attachment, and save that a message. Then, send the attachment in it's own message, and don't save the attachment in the Sent folder.

Another idea: While some places have data retention rules (or laws!), these typically do not apply to personal e-mail accounts. Consider turning off the option to save e-mails by default, and conciously choose which e-mails you think are important enough to save. Note that Google's Gmail service (146 million users) does not have an option to turn off automatically saving the messages you send. Yahoo is another web-based e-mail provider that does provide this option. Check your e-mail program for details.

It is powerful to ask "what is the impact of this?", whether you consider e-mail storage, the toxic batteries in our cell phones, or the impact of broadcasting wifi radio waves through our homes 24/7. A daily choice such as a not saving an sent e-mail can be a mindful practice to connect our abstract online lives with the real world.

For details about calculating the carbon footprint of e-mail storage, read on.

Calculating the carbon footprint of e-mail storage

Research about the carbon footprint of e-mail storage turned up little in the way of existing estimates. emailfootprint.org estimated that it would take 1 kilowatt hour to store 1 Gigabyte of data for a year, but they didn't explain how they came up with that number, and the site is now offline, but accessible through the google cache

TreeHugger referenced a report that said that it would take 1 lb of coal to store 20 megabytes of data for a year, according to the US Department of Energy. But the report they linked to is no longer available. [3]

I decided to see if I could piece together my own estimate of the impact of storing sent e-mail, based on my own usage.

I have about 40 megabytes of mail saved from messages I sent in 2007. This perhaps already represents removing some messages which contained large attachments. For the sake of the example, let's assume this is an average amount. ( To personalize the example, check the size of your own Sent Mail folders!). Let's multiply this by an approximate 1.5 billion e-mailers and you get about 56 petabytes of data (about 58 million gigabytes). To picture that: imagine all the data was crammed on to reasonable large hard drives: 500 gigabytes each. It would take about 117,000 hard drives to store that data, assuming it was stored relatively efficiently and no extra space was required to store the operating system on this drives! The actual number is would be higher because much data can be expected to be on older hard drives, manufactured before the very large drives were an option. Further, it's an industry best practice to use always duplicate data using "RAID", so the same data would be written to at least two hard drives. Large providers such as Google may further be duplicating data in at least two data centers, for extra redundancy.

Using a number from here, I'll estimate that it takes 446 MegaJoules of energy to produce a hard drive, which I'll convert to 124 kilowatt hours. So, it would take about 14.5 million kilowatt hours of energy just to produce 117,000 hard drives, without getting into the energy required to keep them turned on.

Using data from US Department of Energy, it looks like we can expect about 2 lbs of CO2 to be generated to produce each kilowatt hour of energy.

So that puts us at an estimated 29 million lbs of C02 generated to produce enough hard drives to efficiently store all the sent e-mail in the world. (Using my own amount as an average).

To visualize that number, let's equate it the number of miles you'd have to drive in average car to generate the same amount of CO2. According to Streetsblog the average car generates 1.2 pounds of CO2 per mile, equating to about 24 million miles.

In perspective: 24 million miles is a very large absolute number, but it pales in comparison to the estimated 5 billion miles that American's drive each day. [4]. The real danger to address is the way of thinking that a digital paperless life is automatically green one.

This post is a follow-up to one entitled Stewardship and Sustainability of our Online Lives.

See Also

References:

  1. an EPA study stating that the data center industry devours 61 billion kWh of energy annually compared to ...about 275 million TVs currently in use in the U.S., consuming over 50 billion kWh of energy each year
  2. Quenching the Thirst of Power-Hungry Data Centers, citing primary data from the US government.
  3. The Footprint of Gmail: How Much Energy Would Deleting Email Save?
  4. Americans drive 5 billion miles per day

retired radiators

The PDF spec includes an option to cause PDFs to open full screen when users open them. I'm a fan of the feature because it maximizes screen real estate and creates a simple, focused, experience for the PDF readers. Using this option is one of my two essential tips for creating an impactful newsletter targeted at being read online. The other tip is to use a "portrait" format document, to match the shape of most screens.

Many PDF viewers respond to PDFs that are set to open full screen, but a number of PDF generation tools don't provide you option to set this preference when creating PDFs. I ran into this with Xournal which is a nice application for Linux-based tablets, but offers no PDF export options.

So I found a way to update a pre-existing PDF to set the preference to have it open full screen by default. The key here is that PDF is a text-based format, so preferences in it can be updated manually by opening and editing the file according to the PDF spec, or the same effect can be accomplished with automated tools. In this case, I found that I needed to update a line that started like this:

<< /Type /Catalog

After /Catalog, this is all that needed to be added:

/PageMode /FullScreen

I automated this with a simple script that I named make-pdf-full-screen.sh. It works for the simple case when no "PageMode" has been declared, as in the Xournal case. I don't expect it would update the PageMode properly if it was already declared. For a safer solution consider opening the PDF in a text editor to manually set "/PageMode /Fullscreen" on the initial /Catalog line. Alternatively, you could use a formal solution like PDF::API3::Compat::API2 which appears to have the features needed to solve this with Perl.

Here's the contents of my little script to automate the update:

#!/bin/sh
# usage: make-pdf-full-screen.sh file.pdf
#   The file will be modified in place so that it opens full screen.
#   The current approach is naive... it assumes no Initial View has been defined.
# by Mark Stosberg
perl -pi -e 's?<< /Type /Catalog?<< /Type /Catalog /PageMode /FullScreen?' $1

Kent and Kurt on the Whitewater Gorge Trail

A few weeks ago I had my laptop stolen. Earlier that morning I had been reflecting and writing on the laptop about the intersection of our spiritual lives with our digital lives. And then, as if by divine intervention, my laptop disappeared-- during church service no less-- and I was given an even greater opportunity to answer the question: When we spent more time browsing the web, what is that we are doing less of? When we spend more checking e-mail, what are we doing less of? And when we spend more time on Facebook, what are we spending less time doing? Apparently, the answer in my case is cleaning is my desk and organizing the garage. Those are the things I did more when I could do the the internet less. I joke about this, but I do envision my home as a place of rest and rejuvenation, yet I let clutter accumulate while I spent more time on my computer doing "productive" things.

There are many implications of shifting our increasingly precious free time online. Today I'd like to delve into the carbon footprint of our online lives.

You can use the audio player here to listen to a 15 minute version of the message delivered at my church, (or you can also download the audio file.)

            <param name="movie" value="/mt/mt-static/plugins/Podcast/player_mp3_maxi.swf" />
                <param name="bgcolor" value="#ffffff" />
                    <param name="FlashVars" value="mp3=http%3A//mark.stosberg.com/audio/stewardship_of_our_online_lives.mp3&showvolume=1" />
                </object>

The message continues below the jump.

Fall box bike commute

As individuals and organizations, many of us profess to hold up the value of stewardship, of caring for the earth's resources. But as some of us move more of our lives online, how much do really know about the real-world impact of our actions and data online lives?

When Google's Gmail service launched, it advertised "never delete an email again". Instead, you can archive the e-mail with a single click, and it will always be there in case you might like to find it later.

As part of the launch, Google was offering about 100 times more e-mail storage than their competitors. This was enough, they claimed, to never delete another e-mail in your life. This was a decisive moment that changed web-based e-mail forever. Competitors scrambled to dramatically increase their storage options so they could compete.

Something there bothered me. In the physical world, this is a way of thinking that no environmentalist would stand for-- NEVER THROW ANYTHING ANYWAY AGAIN? The circle of life is broken, replaced with a one way trip from creation to permanent storage.

Are the rules for sustainability online really that different?

There's been a belief that when we move activities online, we are being green. We laud "Going paperless", and celebrate e-everything.

There of course some truth in the efficiencies of digital living. It's certainly intuitive that's less resource intensive to send an e-mail instead of a physical letter, or teleconference instead of flying somewhere for a meeting.

But along with some of these efficient uses of the internet, we've moved some of our unsustainable practices online without deeply questioning the impact of this.

While it may be efficient to send an e-mail instead of a letter, many of us now send and receive far more e-mails than we wrote letters. Our use of the internet has gone far beyond replacing physical tasks with efficient digital alternatives.

I'll share what I know about the carbon footprint of our online lives now.

To talk about the carbon footprint of our online lives, let's start with the the physical existence of the Internet. Websites and e-mail are served for computers all over the world. Many websites are now clustered in a relatively small number of large data centers.

Picture a data center as a dimly lit, windowless warehouse. On the concrete floor sits aisle after aisle of floor-to-ceiling stacks of computer, neatly set on identical racks, with a blinking lights on the front and neatly organized cable on the back. There is an incessant hum from thousands of spinning disk drives and fans to cool the systems. The temperature is comfortable, thanks to dedicated cooling systems for the computers. The aisles are even emptier of workers than it is at Lowe's. A small number of people may be onsite to tend to the rare physical needs of the machines, but most people who use the systems could be anywhere. Like you or me, they could even be sitting at home in their underwear.

Already, the data centers that host major Internet sites are drawing more electrical power in the United States than our TV use. [1] Let me say that again: the electricity American's consume to power to their Internet habit has surpassed the amount of electricity used to power to TV habit. And while we keep our TVs just a few hours a day, we expect websites to be available 24 hours a day, every day.

Data centers tend to be powered by traditional power sources, with a few exception who choose to use wind or solar to power their operations. Google has expressed sincere interest in greening their operations, but so far continues to focus on building out their infrastructure as fast as they can, with a plan to throw money at the sustainability problem, hoping for a solution later.

A scientist researched the energy consumed by a Google search and determined that executing just two Google searches would use enough energy to boil a kettle of water [2] Google refuted this claim, saying that this estimate was far too high. Google performed it's own carbon footprint calculation of a Google search. According Google's own estimate, it would take a 1,000 Google searches to equal the impact of driving an average automobile a kilometer, or 6/10ths of a mile [3]. Sending a search to Google isn't just asking a question to a single computer. Clusters of super computers are used to calculate a response. The footprint of a search is small, but the number being executed every day is staggering. I'm sure Google was trying to present their environmental impact in the best possible light. It's no wonder then that they didn't cross reference these statistics with the number of searches that are currently performed each day. It's estimated that about 300 million Google searches are performed each day.[7]

This means that according to Google's own estimate, the daily impact of Google searches adds up to the equivalent of driving about 180,000 miles each day. Calculating this number was of my deciding points in preparing this message. It's such a big number. Imagine if there were 180,000 less miles driven each day!

With some further research I was able to put this number in perspective. (I think it took less a thousand additional Google searches). The United States Postal Service logs an estimated 2.6 million miles each day, or about 15 times more. [6] Americans in total drive about 5 BILLION miles a day. [4] The impact of Google searches is statistically insignificant compared to this. To try to put this into perspective: If American's were to drive one mile less per year, it would have more a thousand times more impact than if the entire world abstained from searching Google for a single day.

I don't mean to diminish the original number: The daily impact of Google searches equating to 180,000 miles of driving in terms the carbon footprint. It's still a big number and it would great to reduce it further. Comparing the impact of different activities we perform helps us to put things in perspective and prioritize what lifestyle changes could most effective. And we don't always have to chose making one improvement at the expense of another.

The Google search statistic was an example of taking an action online. Life online involves more than just Google searches though.

Our online lives are also composed of data we generate or that is collected about us, sitting up there in the "cloud", at these data centers. There are e-mail folders of archived messages. There are archived posts to mailing lists and forums, and photos of old summer vacations posted on photo sharing websites.

Our data has a cost to exist as well. Data that seems to be inactive is likely to be regularly accessed for maintenance like virus scans, causing an energy draw proportional to the amount of data involved. Any data stored online is likely backed up every day. Even inactive data is copied repeatedly to back-up tapes, causing additional power consumption.

What is this impact of this storage in context? I don't know, but it's clear that the more data is out there, the greater the cost to store it.

There's so much data being stored about us, often not because we care about it, but because it benefits the corporations who are collecting it. The more data Google, Facebook and others collect, the more content they have for pages to serve ads on, and the more relevant ads they are able to display based on the data we give them.

So Google strongly encourages us to archive e-mail, not delete it, which would reclaim resources. Likewise, Facebook and many other sites have few or no limits on the amount of content you can post. Instead, they focus on infinite data structures, like Flickr's "photostream", Facebook "walls", and the endless river of status updates on Twitter.

The design of these sites is not to encourage us to review all of someone's content, or even someone's best content. The design pattern we see over and over online now is to encourage an infinite streams of data, and have us focus only the most recent entries of the infinite streams, while meanwhile the old data is encouraged not to be removed and recycled, but to stay online forever for reference and profit.

It's a hard problem to design tools that find the most relevant information regardless of whether it's the newest or not. Google search tries to solve just that problem. The problem could be somewhat voluntary addressed if people took greater care to update the information that was posted online, or delete content we controlled that knew was obsolete.

As stewards of our online lives, we should apply the same kind of thinking we do about physical world sustainability to our online lives.

Re-consider allowing so much of your data to make a one way trip to permanent archiving. Cultivate your data like a garden. Something with finite boundries. Review the things you've planted online periodically. Throw away content that has rotted or expired over time. Prune out the typos. Trim and rewrite your best pieces so they can flourish.

Use your data gardening time to reflect on your past. You may ask yourself "Whatever planted the seed for that article in my head?" But you may also find some heirloom crops, still bright with flavor today.

Now let's zoom out some. How can we profess to be good stewards of the earth, when we engage in activity where we don't really know the impact?

Religious history has seen groups split over such questions. Should we use automobiles? Electricity? The Internet? The Amish stand out for choosing the simpler life, while other demoninations attempt to live "in the world but not of it."

Communicating through the internet is just one example of lifestyle choices which create a more abstract existence, where the affects of simple daily activities touch back to data centers in California and factories in China.

To embrace this complexity while still prioritizing stewardship means taking on the responsibility of understanding the impact of our abstracted actions, from using the internet, to driving cars, to buying foreign-made products.

When it comes being a good steward of our online lives there are many ways to address the complexities and reduce our carbon footprint. Here are three specific practices that I use. The impact of each action may be small, but like a vote, the cumulative effect of small actions can add up to something big. The benefits of such practices go beyond simply reducing carbon footprints. Each one is a practice in mindfulness, that reminds us that our abstracted actions have real world impacts.

  1. The first tip: I put our home cable modem and wireless router on a power strip. We turn the strip off at night and on in the morning. Not only does this save electricity, it also improves security by completely preventing outside access. It also reduces the amount of radio waves being broadcast through the house.
  2. A second tip: When sending an e-mail that is primarily an attachment, I consider using the option to not save the message in your sent-mail folder. These messages are much larger than normal e-mails, and I already have a copy of the document on your hard drive, plus the recipients will also have a second copy in their Inbox, and likely a third that is saved to their own hard drive.
  3. Finally, here's a tip that could vastly reduce the number of Google searches, while at the same time finding what you are looking even faster. Top Google searches include queries for "YouTube" and "Facebook". Instead of going directly to a site like "YouTube.com", many people first type "YouTube" into Google and click on the first result. Using a bookmark for popular sites would save a small but repetitive amount of time and energy by going directly to the sites. A bookmark is not only efficient here, it makes that Google is not tracking your search and mediating your experience as pass through Google. You are saving yourself from seeing one more ad that day, which would otherwise be displayed in the right sidebar of Google as you click through.

Ultimately I think the wisdom of "less is more" that applies to being stewards of our online lives. You have the option to just not post something. Or Don't sign up for some website. Or just unplug and go outside. Visit someone in person. Stewardship the old fashioned way has a beautiful simplicity to it.

How have you found satisfaction and success in being a steward of your online life? If you don't use the Internet, or have even just avoided Facebook, what has it meant for you to chose this decision while so many others embrace it? What do you find at the intersection of our spiritual and digital lives?

References:

  1. an EPA study stating that the data center industry devours 61 billion kWh of energy annually compared to ...about 275 million TVs currently in use in the U.S., consuming over 50 billion kWh of energy each year
  2. Performing two Google searches from a desktop computer can generate about the same amount of carbon dioxide as boiling a kettle for a cup of tea.
  3. the average car driven for one kilometer (0.6 miles for those in the U.S.) produces as many greenhouse gases as a thousand Google searches.
  4. Americans drive 5 billion miles per day
  5. The Dept. of Transportation estimates that Americans drive an average of 29 miles per day
  6. The Postal Service operates a fleet of 219,000 vehicles, including 146,000 delivery vehicles...The average LLV is driven about 18 miles a day. (146,000*18 = ~ 2.6 million miles per day )
  7. ...299.83 million Google searches per day in May 2009
  8. The book Planet Google was also a useful reference.

baby sleeps again I've spent a lot time recently triaging bugs for CGI.pm. I've enjoyed the process, and respect CGI.pm as a widely used Perl module. I'm not in love all aspects of module. I don't use or recommend the HTML generation features-- I recommend using HTML template files and HTML::FillInForm for filling them.

Whenever I think about how I'd like to change CGI.pm,what I have mind is often the same choice that CGI::Simple made. There was a time years ago that I focused my attention on CGI::Simple and tried it in production, only to be bit by a compatibility issue, so I reverted back to CGI.pm. I don't remember what the specific issue, and it's likely been fixed by now. But the pragmatic point remained with me: CGI::Simple may have clean code and a good test suite, but it's not necessarily free of defects and in particularly it lacks the vastly larger user base that CGI.pm has to provide real world beta testing.

I recently took another look at CGI::Simple, it's cookie handling implementation, and its bug queue. One thing became clear: CGI::Simple forked from CGI.pm in 2001, and they have not evolved in parallel since then. Each has had different bugs filed against it, with some issues fixed in one and not the other. They both have test suites, but they have evolved with different test coverage as new tests are written to respond to bugs filed against one particular module.

And, unfortunately for the better design of CGI::Simple, it is CGI.pm that continues to receive far more of the attention and updates. (Although to be fair, some of this relates to the HTML functions, which are intentionally omitted from CGI::Simple).

I would like to say that CGI::Simple is a clear path forward from CGI.pm if you are willing to let go of the HTML generation functions. Unfortunately, the current situation is ripe for running into subtle differences that have been created since the projects forked about eight years ago.

My vision for a solution is simple: CGI.pm and CGI::Simple should be maintained together. Where their features overlap, the combined project should have the best version of the documentation from both projects, the best code from both projects, and the combined test coverage of both projects. CGI::Simple is intentionally incompatible in a few ways in the name of better design, and I support that. Still, the projects should strive to maintain compatible whenever possible to make it easy for people to transition from CGI.pm to CGI::Simple. When a change comes in that could affect either module, it should be changed in both modules.

A great example of a possible collaboration point is the request to add PSGI support to CGI.pm. Ideally if the proposal is accepted, it could be added to both CGI.pm and CGI::Simple at the same time, with the same API, tests and documentation.