Sunday, 8 November 2009

Advanced url rewriting for the rest of us - how to create a custom transform in UrlRewriter.net

Introduction

At first glance url rewriting can be a daunting task, but today we will see that it doesn't have to be. Every developer desires clean urls for their apps but because its not built into the language in a straight forward way its usually left on the "nice to haves" list.

The .net library does come with a few classes for performing url rewriting such as HttpContext.Current.RewritePath(); but it is a simple mechanism and you are largely left to code your own features into it. Today we are going to look at turning using a 3rd party library UrlRewriter.Net.

When asp.net 4.0 is released it is going to feature improved webforms support for a new feature that was introduced in .net 3.5 sp1 called Routing. It lives in the System.Web.Routing namespace and while it is a pretty awesome feature its not the got seamless integration yet so I will leave that feature for another day.

The scenario

Which dazzling feat of url rewriting are we going to attempt today? Well today we are looking at turning a nice friendly name (such as a car name) into a database ID and we are going to do that with one of the features of UrlRewriter.Net called a Transform.

We will take a url such as:

http://www.example.com/Cars/Ford_Fiesta.aspx

And behind the scenes translate this into a request that looks like:

http://www.example.com/ViewCar.aspx?id=10

When I say behind the scenes what I mean is that the user always sees the first url but the system is displaying the page listed in the second url.

Url Rewrites vs Redirects

This is the difference between doing it in the background and sending the user on to a different page.

  Explanation Common Usage
Url Rewrite Translates a clean url into your actual url in the background. The user only sees the clean url in the address bar of the browser. To provide a clean url for dynamically generated content like blog posts, classified ads, articles, etc.
Url Redirect Translates a clean url into your actual url and then sends the user to that page. The actual url is then shown in the address bar of the browser. To redirect users from old content systems to new content systems such as after a redesign or restructuring of a site.

To provide an automatic alternative to content no longer available.

We aren't going to cover everything (it just feels like it)

Let me just say now that we aren't going to be covering everything in this article. I just want to present the juicy part of the url rewriting technique called a custom transform. In a real world project you are going to have to take extra steps.

For example I wont show how to actually look up the name in a database, just where you should be looking it up. That depends on the data access technology your project uses.

I wont be showing you how to setup extensionless urls either. Not all requests are mapped to the asp.net rendering pipeline so if you don't end your urls in .aspx then it wont get mapped on a live server. It will work on Cassini, the built in development server, so you will be able to test it. When you're ready for the prime time with this feature you will need to do set up something called wildcard mapping in IIS. Chris Love is my go-to guy for a tutorial on this:

I won't be covering how to generate a list of your products or adverts in a nice format. For this article we will just presume that a nice clean url has just appeared on your site and you need to process it.

As you can tell, this article would end up very long if I tried to cover everything!

Downloading and extracting the library

The download for UrlRewriter.net is over on SourceForge at:

You should get the file marked newest files which at the time of writing is UrlRewriterNet-1_8.zip.

When its downloaded I recommend you make a folder which is outside of your project to extract it to. I keep all of my 3rd party libraries in a folder on my computer called D:\libraries\<projectname>.

Its handy for several reasons. The main one is that when adding a reference to a library an auto-refresh file is created alongside it. Each time you compile or deploy your website the latest version of the library is copied into your project. This means that if several projects all use the library they will get updated to the latest build automatically.

It can save space on your drive; you can easily link in a library without having to stop what you're doing and download it; and when you get a new version you can just drop it in. Convinced? Good!

Configuring your project to use the library

My preferred way to add a new reference is to right click on the project name in the solution explorer and click Add Reference. You can also do it by clicking the Website menu and choosing Add Reference.

After a small amount of resistance Visual Studio will chug through the list of available assemblies and show you a dialog box for you to pick from. At this stage UrlRewriter.Net isnt going to be in that list so you have to click the Browse tab along the top and navigate to the location you extracted it to (D:\Libraries\UrlRewriterNet\ right?).

When you get to the folder you should pick the bin folder and then the Release folder. Select the assembly file called Intelligencia.UrlRewriter.dll so at this point you are looking at a path like this:

D:\libraries\UrlRewriterNet\UrlRewriterV1.8\bin\Release\Intelligencia.UrlRewriter.dll

Clicking the OK button will add a Bin folder if it didn't already exist and drop in the assembly.

Add the references in your web.config

At this point you are going to have to add two blocks to your web.config so that the project knows how to use the library.

First we have to add a custom config section. All of our rules will go inside its own section called <rewriter> so to make sure asp.net understands how to read that connection.

  1. Open up your web.config file.
  2. Find <configSections>
  3. Add the following <section> to it:
<section name="rewriter" requirePermission="false" 
type="Intelligencia.UrlRewriter.Configuration.RewriterConfigurationSectionHandler, Intelligencia.UrlRewriter" />

The second step is to register UrlRewriter as a HttpModule. This is so that it will be included in the pipeline when a new http request comes in.

  1. Open up web.config if its not still open
  2. Find <system.web>
  3. Find <httpModules>
  4. Add the following <add> to it:
<add type="Intelligencia.UrlRewriter.RewriterHttpModule, Intelligencia.UrlRewriter" name="UrlRewriter" />

A simple starter rule

Just to dip your toes in the water we are going to start off with a simple rewrite rule. The basic rewrite takes the form

<rewrite url="" to="" />

The url part is a regular expression. This means that certain special characters are reserved so you have to be careful what you put in here means what you think it does. Regular expressions are the true power behind the url rewriting engine. While they can be very complex at first glance they actually give you an incredible amount of power in a very short string of text. I am going to delve a little deeper in to regex syntax in the next section.

The second part of the rewrite url is "to". This is where you define the url that you want to end up with (either redirected visibly or rewritten behind the scenes). This is a normal url so you can use the tilde ~ notation to signify the start of your url.

So for a working example lets say we wanted to hide an unfriendly url with parameters and replace it with a nice clean url.

<rewrite url="^~/Cars/Ford_Fiesta.aspx$" to="~/ViewCar.aspx?CarId=10" />

So what does that mean? If somebody types in http://www.example.com/Cars/Ford_Fiesta.aspx into their web browser then behind the scenes the actual page loaded would be http://www.example.com/ViewCar.aspx?CarId=10. The visitor to the web page (and search engines) would only see the clean Ford_Fiesta.aspx rule.

In a real system we would probably want to use a more complex rewrite rule. Instead of making a new <rewrite> for every type of CarId in the system we could make a rewrite rule which could automatically match up the page names to the name of the car stored in the database.

There are two common ways to do this. One way is that you would embed an id at the start and then provide a clean url after it, something like:

http://www.example.com/Cars/10-Ford_Fiesta.aspx

The 10 at the start is all the system really uses. Everything else is discarded and the 10 is passed in as the CarId on the ViewCar.aspx page, but for the ultimate in clean urls we are going to aim for url rewriting mechanism that doesn't have to embed the CarId into the rule at all.

In the next section we will do that using a custom transform which will turn the car name into a CarId.

An advanced url rewriting scenario with Custom Transforms

We've got past all the introduction and configuration stuff now and we're into the main part of the article - using a custom transform with UrlRewriter.Net.

The rule looks like this:

<rewriter>
   <rewrite url="^~/Cars/([\w-_]+).aspx" 
    to="~/ViewCar.aspx?CarId=${CarNameToIdTransform($1)}" />
</rewriter>  

So lets take a look in more detail at how it is constructed. The first attribute "url" is the url that is typed into the address bar of the browser. As we said in the last section, it is a regular expression aka a regex.

If you don't have any experience with regex then you might be intimidated by the cryptic language it uses but once you start to use them you will see how powerful they are at expressing text manipulation rules.

If you recall back to the start of this article you'll remember that this particular rule is trying to match urls like the following:

http://www.example.com/Cars/Ford_Fiesta.aspx

And ^~/Cars/([\w-_]+).aspx achieves this by saying:

^

Match the start of the line, nothing can come before it.
~ Map to the root of the application (the normal meaning of the tilde in asp.net urls)

/Cars/

Match the text /Cars/

([\w-_]+)

Match any letter, dash or underscore (the [\w-_]) and match 0 or more of them (the +).

By putting these commands in a brackets you are signifying that these will be group together as a match group in the regex results.
.aspx Url must end with .aspx

The explanation above mentions that the brackets have a special meaning. Anything that is matched by the pattern contained within these brackets becomes the text that is inserted at the place we put the $1 in the "to" part of the rewrite.

So in this case of the Ford Fiesta example the $1 would contain Ford_Fiesta. This is then passed into the custom transform for further processing. The word Ford_Fiesta is cleaned up and searched for within the database of cars. If a match is found then the CarId is returned.

The custom transform class

Without further ado I present to you the custom transform class:

using Intelligencia.UrlRewriter;
using System.Web;

namespace CustomTransforms
{
    /// <summary>
    /// Convert a car name into its ID value
    /// </summary>
    public class CarNameToIdTransform : IRewriteTransform
    {
        /// <summary>
        /// The name you reference this custom transform by
        /// </summary>
        public string Name
        {
            get
            {
                return "CarNameToIdTransform";
            }
        }

        /// <summary>
        /// Apply the transform
        /// </summary>
        /// <param name="input"></param>
        /// <returns></returns>
        public string ApplyTransform(string input)
        {
            // Clean up the input
            input = input.Replace("_", " ");
            input = HttpUtility.UrlDecode(input);

            // Look up the car id from a data store
            int CarId = ConvertCarNameToCarId(input);

            // return the value back to the URL
            return CarId.ToString();
        }

        public int ConvertCarNameToCarId(string CarName)
        {
            int CarId = 0;

            // you would do a real lookup here if this was a production app
            if (CarName.Equals("Ford Fiesta"))
            {
                CarId = 1234;
            }

            return CarId;
        }
    }
}

This is a simple class which implements the IRewriteTransform interface. You should put it in a file of the same name in your App_Code folder.

The IRewriteTransform comprises of a public property and a method. The public property returns the name that you refer to it by in your rewrite rules. The method (called ApplyTransform) gives you a string input which is your portion of the url with which you are supposed to apply a transformation to. In this case we will be extracting the url encoded car name and turning it into a car id.

You can see I have setup one extra method called ConvertCarNameToCarId to handle the actual conversion. After we have cleaned up the input I pass it to this method for conversion. In your production app this is where you would do a real lookup against a database.

Integrating the custom transform with UrlRewriter.net

Great! We nearly have a working model now. All that's left is the configuration of our new custom transform.

We are going to make some changes to the <rewriter> element in your web.config so that

  1. The custom transform is loaded in for use in url rewrites
  2. The url rewriting rule we dissected earlier is configured to use the custom transform

The complete <urlrewriter> section (which should be placed just after your closing tag </system.web>) looks like this:

<rewriter>
  <register transform="CustomTransforms.CarNameToIdTransform, App_Code" />
  <rewrite url="^~/Cars/([\w-_]+).aspx" to="~/ViewCar.aspx?CarId=${CarNameToIdTransform($1)}" />
</rewriter>

The <register> tag takes the format of the name of the class including the namespace and the assembly that you are loading it from. Because this is part of the same project you can use App_Code. I will go into this in a little more detail before the end of this article.

We have already covered a lot of the syntax for the <rewrite> so I just want to explain the syntax of the to section and how the data is passed in from the url attribute.

The ${CarNameToIdTransform()} section should match the name that you setup in your public property called Name. The $1 matches the group you defined in your regex ([\w-_]+)

You can add in as many custom rewrite rules as you want. In fact, in large projects this section can get quite lengthy, especially if you have to setup a number of static redirects to forward old page urls to new page urls.

The built in transforms

There are a total of 6 pre-built transforms which come included with UrlRewriter.net. They are for url encoding and decoding, converting case, encoding and decoding base64.

The documentation for them all is available here:

Using App_Code for your assembly

I wanted to point out that using App_Code as the assembly name is actually a good tip for you to remember because it will come in handy in other situations as well. You can get away with using App_code because internally asp.net compiles each of the folders into an assembly at run time.

Like I have said, this is important to know because this knowledge isn't on the UrlRewriter.net site or its support forums. The first time that I tried to create a custom rewrite I didn't know this. I was also using Visual Web Developer. This meant that I had to go off and download Visual C# Express, learn about making class libraries, link the class library into my project and then figure out how to debug the whole thing. In fact I didn't find this knowledge out until several months later when I was reading a different article and had an AHA! moment.

Demo Application

A demo application containing a working example of the code from this article can be downloaded here:

Conclusion

We have come along quite a journey in this article and if you hadn't experienced any of these topics before then this article should have touched on quite a few interesting topics:

  • Learning about Url Rewriting
  • Learning about Regular Expressions
  • Realising this custom transform feature exists in UrlRewriter.net
  • Seeing a real world example so that you can internalise it as one of your problem solving tools
  • Learning about other built in transforms
  • Learning about App_Code as an assembly

Further reading

kick it Shout it vote it on WebDevVote.com

14 comments:

Suresh said...

doen not work, I get an 404 error, page not found.

rtpHarry said...

@Suresh: The main site has been down for a couple of weeks unfortunately and looks like it might not be coming back. The project is still available for on the sourceforge page at http://sourceforge.net/projects/urlrewriter/files/

Rasaiyah said...

Hi rptHarry,
how are you.i got this link from asp.net forums my question answer. thak you once again for your great help.actually in my case every slug i'm geting from the database,Since how i can call my method in web config to returned url and attach with mapping url.

And i'm try to contact you.thank you

hs_jha said...

rpt Harry ! Yes !
I stumbled on your blog just now...This is the topic i was searching for.

My strategy for page url rewriting is declaring a separate slug field in db for title of the page.

But images are not displaying ...

Could you help on dat ?

Anonymous said...

I love how Apache can handle this in 5 or 6 short lines of code in a small little tiny winy .htaccess file.

waqas said...

nice article, can you please write an article about url routing in .net 4.0, how to accomplish this functionality in .net 4.0

thanks

Lenen said...

Regular expressions are really difficult to understand, but i'm glad you touched the subject. I will find more blogs about these.

rtpHarry said...

@apache .htaccess comment: I am not a hater of apache. I think you have misunderstood this article; the htaccess is equal to the regex in this iis based tutorial. The other side of the article is taking the text phrase and looking it up in the database to turn it into an id. This is not provided by .htaccess to my knowledge?

rtpHarry said...

@wagas: it is on my list of things to investigate but I haven't used this particular technology yet so I don't have anything valuable to contribute! :)

Yah000000 said...

Got the butter man.!! thank you so much..

Fayas said...

i applied this rewriting concept to my project,the rewriting was fine but my style sheets and images are not getting loaded.what could be the problem..

rtpHarry said...

@Fayas: Yes this is a somewhat common occurrence and is touched on in Scott Gu's url rewriting article. Scroll down to the last section "Handling CSS and Image Reference Correctly". Basically you need to root qualify your css and image urls as in /folder/image.jpg not folder/image.jpg. There is an additional gotcha that might occur with App_Themes urls that HeartAttack covers in his article.

Unknown said...

1 question on the regex part.
Lets say I want to allow default.aspx to work.

For example, Cars/Ford.aspx will rewrite, but Cars/default.aspx will have a general cars page and not redirect anywhere.

Thank you,

Anonymous said...

I hope someone can help me on my problem using urlwriter. After adding the configuration settings on the web config file and tried to run the page it throws an error "Microsoft JScript: BBD is undefine" something like that. It seems that it break the javascript or scriptmanager. What do I need to add or change to resolve this problem.