Tips for upgrading Postgres from 6.5.3 to 7.0.3
version 1.2 by Mark Stosberg
Last significant update: 2001-04-04
Here are some tips that came out my experience
upgrading Summersault's database from Postgres 6.5.3 to 7.0.3. Some of the information will be specific to crossing the 7.0 version line, other bits will useful for postgres upgrades and general, and some of it is probably specific to our system and data. It does not try to be a complete "HOW-TO" document, but rather to offer some insights about challenges that we ran into, and how we got around them. Bits of it are also specific to UNIX.
Dealing with the date format change
In the 7.x branch of Postgres, the default date style changed from being "Postgres" to being "ISO". We wanted to adopt the new standard, but at the same time, we didn't want all of our legacy applications to break that depended on the old
style format. Here are some options we thought of for dealing with this:
- Configure Postgres to continue to use the 'Postgres' date style format. This solution was attractive because it involved making a change in just one place. However, it there appeared to more benefits to adopting the new ISO standard if possible, so we passed on this solution.
- Evaluate and update each SQL statement that is expecting the Postgres date style format. This would mean digging through several hundred instances in code of this happening, and debugging, updating and testing each one. Needless to say, we continued to look for other options. :)
- the solution: We realized that each database project we've done has it's own "connect_db" routine, which occurs in a somewhat predictable place in the code line. If we updated all the old projects to include this SQL command in the connect routine:
set datestyle to 'Postgres' then
all applications would continue to perform as before. Since we almost all of projects with Perl/DBI, we were adding a line like this:
$dbh->do("set datestyle to 'Postgres'");
For new applications, there would be no need to this, since we could process and format the ISO style string to our tastes.
Since this step did not require the new database, we were able to do it in advance to save us time and trouble when were actually doing the upgrade.
Postgresql.org has detailed documentation on the differences with
the Postgres and ISO date formats.
from the change to the ISO date format.
Processing the dump file
We set up a instance of Postgres 7.0.3 on a test server, so that we could test dumping and importing into the new system before we tried it on our production server. I highly recommend this step because in our case, some errors in the dump files appeared to be specific to our own projects, so they would not have been documented elsewhere. :)
Dump the schema first
I found that most of the errors that occurred when trying to import the dump file were in the schema, not the data. So to save time in debugging process, I just dumped the schemas, and worked with importing that and debugging the errors. I used this command to the dump the schema:
pg_dumpall -cs >schema.sql
The "c" flag means "clean drop", it means that drop statements will be included. This will be useful when you are repeatedly trying to import the schema. :) the "s" flag means "schema only"
When I ran pg_dump, I got this error:
pq_recvbuf: unexpected EOF on client connection
However, as explained here, this is not something to worry about. The dump and import went fine regardless of this error.
Dump with the old pg_dump, import with the new psql
It's important to use the pg_dump that came with the older version of the database when exporting. It's also important to use the psql that came with the new database when importing. In both cases, it's a good idea to be doing this as the Unix user that owns the database files, probably "pgsql".
Trap the import output for debugging
I think the most useful system I developed for debugging the import process was to pipe the output and errors to a file for later review. I did that like this:
psql -e <schema.sql 2>&1 | tee schema-full-results.txt; grep ERROR schema-full-results.txt >schema-err-results.txt
What's happening here is that I'm actually sending the output to three places at once:
- The Screen, so that I can hit "Ctrl-C" to cancel as soon as there is an error
- schema-err-results.txt -- This file shows all the errors in the order that they occurred. I can browse it quickly to find any errors that I missed on the screen.
- schema-full-results.txt -- this is a complete version of the output.If I find an error I need to investigate, the first thing I do is to try to find it in this file to see exactly where it occurred in the schema.
So my debugging cycle would go like this: try the import, find an error in the error log, find the error in the full log to see where it occurred in the schema, edit that location in the schema, try the import again. (Of course, I made a backup of the schema dump before I started editing it!)
It's normal to get a lot of errors to the effect of "Does not exist" or "Already exits". These occur when you try to drop a table that has not yet been created, or the entity has already been successfully created.
Things that might need changing in the schema dump
Here's the list of things we ended up needing to change in the schema dump. In many cases, the easiest way to make change was using perl on the command line. This saves the time of opening, processing and saving a large file in a text editor:
perl -pi.bak -e 's/old/new/' schema.sql
That will change the first occurrence of "old" to "new" on every line
that it appears on. If you don't want perl to create a backup file with this
operation, remove the ".bak" from the command.
- s/text 'now'/'now'/ Somehow in our old schema we managed to have the default date defined as " text 'now' ". This just happened in a few cases. CURRENT_DATE and CURRENT_TIMESTAMP are even better to use than 'now', since they are part of the ANSI standard SQL
- s/postgres-6\.5/pgsql/ Our old installation was in /usr/local/postgres-6.5 and the new one was going to be in /usr/local/pgsql, so we needed to make this change. The path appeared where we had defined a custom function.
- s/onnect template1 reidhosp/onnect template1 postgres/ This is another situation I'm not sure how we got ourselves into-- the dump file was specifying to connect as a normal user, when super-user privilege was needed.
- Bad Indexes In a few cases I got errors about bad indexes, I think specifically where we had indexed the type "point". I dealt with these by simply deleting the create statements for the index-- they could be recreated correctly once the import is complete. Keep a log of any indexes you delete so that you can easily create them all later.
- s/"balance" money DEFAULT 0/"balance" money DEFAULT money\('0'\)/
In this case a money field had an integer as a default value, so it needed to be replaced with a default of type money. The money type is going away I believe, so the best solution here would be to not use the money type at all. :)
Notes on handling the import file
I followed much the same process for dumping and inserting the data as I did the schema. I used this command to export it:
pg_dumpall -a >inserts.sql
That dumps just the data for all the databases. Here's what I used to import it:
psql -e <inserts.sql 2>&1 | tee inserts-full-results.txt; grep ERROR inserts-full-results.txt >inserts-err-results.txt
There was only one change to the data that needed to be made in my (18 Meg) dump file. The old database user was named 'postgres' and the new database user was named 'pgsql'. I quickly updated all these entries with a good perl trick:
perl -pi -e 's/connect(.*)postgres/connect$1pgsql/g' inserts.sql
A note for BBEdit users: Here's why it doesn't work to edit Postgres data dump files in BBEdit. The dump files contain two styles of line breaks, and it's important to preserve this. For the file to be translated into BBEdit in a useful way, both the line-break styles get converted to a single style, and that information gets lost when you re-save the file-- although you can't easily tell this by looking at it in BBEdit! There may be a Mac workaround to get this to work with BBEdit-- my workaround was to use a Unix text editor when I needed to edit the file, which correctly preserved the line breaks.
Upgrading SQL incompatibilities
So far I've discovered a few places where the SQL statements we were running on Postgres 6.5.3 no longer work on 7.0.3:
- text_timespan is gone -- I believe this function served to convert text to the timespan format. The same thing can accomplished by using
'foo'::interval instead of text_timespan('foo')
- timespan_cmp is gone -- the timespan_cmp function has been removed. You should be able to replace calls to it with calls to
interval_cmp
- aggregates in insert statements must be explicit -- This won't work any more:
insert into t (c1) values (max(c1)). You have to explicitly give the table name in the aggregate now:
insert into t (c1) values (max(t.c1))
- '1-1-0' is no longer a valid date -- This was an old hack that we had used in place of negative infinity. I still can't get '-infinity'
to work with the
date type, so I used this new hack instead: '0001-01-01 BC'
Good luck!
Mark Stosberg