Hyperlinks need to live forever - Blog edition
THE bummer mistake in any web revamp is a total disregard for page addresses. The maximum to be found is a nice 404 page with a notice that things have been revamped and the invitation to search. What a waste of human time and disregard for a site's users!
The links to the original page live outside the sites control and Jacob already stated in 1998 Pages need to live forever. So what could you do when swapping blog platforms?
If your new platform runs behind an Apache HTTP server (also known as IHS), there is mod_rewrite that allows you to alter incoming addresses (the old links) into the new destinations based on a pattern match ( other http servers have similar functions, but that's a story for another time).
HTTP knows 2 redirection codes: You want to use the later, so at least the search engines update their links.
Now your new URL pattern most likely uses a different structure than the old one, so a simple Regex might not help for that transition. E.g. your existing format might be
For this case mod_rewrite provides the RewriteMap where you can use your old value (ABCDEF in our case) to find the new URL. Unfortunately mod_rewrite is very close to dark magic. It can be simple from a key/value lookup up to invoking an external program to get the result. For the key/value lookup you need make your key case insensitive, so all the possible case variations work. This is what I figured out:
The links to the original page live outside the sites control and Jacob already stated in 1998 Pages need to live forever. So what could you do when swapping blog platforms?
If your new platform runs behind an Apache HTTP server (also known as IHS), there is mod_rewrite that allows you to alter incoming addresses (the old links) into the new destinations based on a pattern match ( other http servers have similar functions, but that's a story for another time).
HTTP knows 2 redirection codes: You want to use the later, so at least the search engines update their links.
Now your new URL pattern most likely uses a different structure than the old one, so a simple Regex might not help for that transition. E.g. your existing format might be
/myblog.nsf/d6plinks/ABCDEF
while the new pattern would be /blog/2001/10/is-this-on.html
.
For this case mod_rewrite provides the RewriteMap where you can use your old value (ABCDEF in our case) to find the new URL. Unfortunately mod_rewrite is very close to dark magic. It can be simple from a key/value lookup up to invoking an external program to get the result. For the key/value lookup you need make your key case insensitive, so all the possible case variations work. This is what I figured out:
RewriteEngine on RewriteMap lowercase int:tolower RewriteMap blog-map dbm:/var/www/blogmap.map RewriteRule ^/myblog.nsf/d6plinks/(.*) /blog/${blog-map:${lowercase:$1}} [NC,R=301,L]Let me pick that into pieces for you:
- RewriteEngine on
This switches the rewrite engine on. It requires that mod_rewrite is loaded (check your documentation for that) - RewriteMap lowercase int:tolower
This enabled an internal conversion of the incoming string into its lower case format - RewriteMap blog-map dbm:/var/www/blogmap.map
This defines the actual lookup. The simplest case would be a text file with the key and result in one line separated by a space. However that might not perform well enough for larger numbers of links, so I choose a indexed table format. It is very easy to create, since the tool is included in the Apache install. I generated my translation list as text file and then invokedhttxt2dbm -v -i /var/www/blogmap.txt -o /var/www/blogmap.map
and the indexed file is created/updated - RewriteRule ^/myblog.nsf/d6plinks/(.*) /blog/${blog-map:${lowercase:$1}} [NC,R=301,L]
This is the rewrite rule with a nested set of parameters that first converts the key to lower case and then looks up the new URL. If a key isn't found it redirects to/blog/
which suits my needs, you might want to handle things different.
In detail:- ^/myblog.nsf/d6plinks/(.*) matches all links inside the d6plinks, the () "captures" ABCEDF (from our example), so it can be used in $1
- ${lowercase:$1} converts ABCDEF into abcdef
- ${blog-map: ... } finally looks it up in the map file
- [NC,R=301,L] are the switched governing the execution of the rewrite rule:
- NC stands for NoCase. It allow to match /MyBlog.nsf/ /MYBLOG.NSF/ /myblog.NSF/ etc. It doesn't however convert the string
- R=301 issues a permanent redirect response (default is 302, temporary)
- L stops the evaluation of further redirection rules
Posted by Stephan H Wissel on 05 December 2012 | Comments (4) | categories: Blog Software WebDevelopment