Writing No-Framework ASP.NET (part 1: encodings)

For the past week, I was writing some code in ASP.NET that I wanted to run on a Microsoft IIS server. Here I want to post what I learnt and what the structure of my code looked like.

First, my requirements.

  1. I didn’t want to code a full web application. All I needed to do was to add a few dynamic elements to otherwise static web pages written as HTML files.
  2. I wanted to modify the existing pages as little as possible.
  3. I didn’t want a solution that would require installing plugins to the IIS server. This meant that I couldn’t use PHP and that I had to code in ASP.NET.
  4. I didn’t want to learn a lot to do this. Having to use an IDE or having to learn a framework was completely out of the question.

These are really simple requirements. However, the solution turned out to be quite complicated ranging from code reuse to character encoding. The following is an outline of what I will discuss.

  1. Understanding how ASP.NET handles source file encoding. (part 1)
  2. Basic ASP.NET web page (.aspx) structure. (part 2)
  3. Making Visual Basic function calls as terse as possible in the view code. (part 2)
  4. Ways to reuse code in ASP.NET. (part 3)

I’ll start with understanding source file encoding and then describe the others in separate articles.

The .aspx files

The simplest way to code ASP.NET web sites is to use ASP.NET web pages; .aspx files. These are similar to PHP .php pages and you can include both code and HTML markup into a single file.

The .aspx files are compiled before being run in IIS.

An important thing is that all strings in .NET are represented in Unicode. Strings that are not in Unicode are not allowed. This is very different from PHP. In PHP, strings are a simple collection of bytes. It is up to the programmer to keep track of which charset each string is encoded in.

This means that any .aspx files that are not in Unicode have to be charset converted from their original coding on compilation. This includes all the hard-coded HTML strings. Also, if the output from the server is going to be non-unicode, we have to do that conversion (internal Unicode to output encoding conversion) as well.

Example of how encoding of an .aspx file would happen

Since a large number of websites in Japan still use Shift-JIS encoding, let’s assume that we are working with a Shift-JIS encoded website. Hence the source files that we want to add dynamic ASP.NET code are in Shift-JIS. We also want the HTML output to be in Shift-JIS so that all web pages on the site (both static HTML pages and .aspx pages) have the same charset.

In this scenario, character encoding conversion of .aspx files would happen in the following manner;


Shift-JIS encoded .aspx source files are converted to Unicode on compile

Code inside the .aspx source files is run as Unicode

Output is converted back to Shift-JIS

With this in mind, let’s look at how to configure stuff to ensure that the encoding happens correctly.

ASP.NET configuration hierarchy

Before going into the charset configuration, I want to briefly touch on how ASP.NET web applications are configured. The configuration hierarchy is quite complex. Compare this to PHP where you basically have one php.ini file to configure all PHP instances, and the Apache .htaccess file where you can put additional settings. In both ASP.NET and PHP, you can additionally change settings inside the application (.aspx or .php).

However, even with ASP.NET’s complex hierarchy, you will probably only have to worry about the web.config file. You basically place a web.config file at any location in your web applications file hierarchy, and that file will change the setting for that directory and any subdirectories. It works like Apache .htaccess.

Telling ASP.NET what charset the source code files (.aspx files) are encoded in

Configurations for encoding are set in the globalization property of a web.config file with the fileEncoding attribute.


  
    
  

If fileEncoding is not specified in the configuration hierarchy, the system encoding of the server is used. For machines running a Japanese OS, this would be Shift-JIS.

A list of possible encodings is provided by Microsoft.. For Japanese encodings, we have “utf-8” (code page 65001), “shift_jis” (CP932: code page 932) and “EUC-JP” (code page 20932). These encodings are not pure Shift-JIS or pure EUC-JP but have extensions for Windows.

Now that I’ve talked about how to set the ASP.NET configuration for fileEncoding, let’s see how this affects charset conversion.

The rules are as follows;

  1. If the .aspx source code file has a Unicode BOM, then the file is considered to be in the Unicode encoding as described in the BOM.
  2. If there is no BOM, then the file is considered to be in the encoding as configured in the ASP.NET settings (i.e. web.config, etc.).

Telling ASP.NET what charset the HTML output should be encoded in

ASP.NET internally manages the .aspx file contents in Unicode, and converts them to the responseEncoding before it sends the response to the client browser. ASP.NET also sets the “Content-type: text/html; charset=???” HTTP header to responseEncoding.

ASP.NET however does not set the <meta charset=???> tag inside the HTML <head> element. You have to manage this yourself.

ASP.NET uses the following locations to set responseEncoding.

web.config


  
    
  

The @ Page directive
.aspx files contain a @ Page directive to set page-specific attributes. You can set responseEncoding here with the following syntax;


The Page object
You can also set responseEncoding on the Page object directly in code.

Page.responseEncoding = 932

Telling ASP.NET what charset the request parameters are encoded in

In addition to setting the charset of the source file (fileEncoding) and setting the charset of the HTML output (responseEncoding), ASP.NET has another charset that you can specify. That is the charset of the request (reqeustEncoding).

This setting affects how the query-string data and the data coming in from POST requests is interpreted by the ASP.NET server. You set this in web.config like so;


  
    
  

The default value is “utf-8”.

The charset used by browsers to send queries and post data is a complicated issue. Ruby-on-Rails for example, adds an extra parameter to ensure that all data is in UTF-8.

Microsoft’s documentation suggests that reqeustEncoding should be set to the same charset as responseEncoding for a single server applications. Of course this depends on how ASP.NET servers work, but in general I don’t think this is a good idea. I think reqeustEncoding should be set to “utf-8” regardless of the responseEncoding (the charset of the HTML output), and this is also how Ruby-on-Rails does it.

Encoding settings for a Shift-JIS encoded website

Let’s go back the settings that would be required if we were working on a Shift-JIS encoded website. The requirements are;

  1. .aspx source files are encoded in Shift-JIS.
  2. HTML output is in Shift-JIS.

Then the web.config file should look like


  
    
  

I arbitrarily set requestEncoding to “utf-8”. This is how I would set up the system, but it really depends on how your server decodes requests. It does not affect the HTML output from your .aspx files.

Summary

This was the first of my series on working with ASP.NET. It dealt with how ASP.NET handles source file encoding. ASP.NET has to encode the whole source file (.aspx files) into Unicode even before a programmer touches the code, and that is why you need a setting. This is also why you have to specify the output encoding.

PHP doesn’t meddle with string encodings in the source files. Encodings are performed on a per-function basis and the programmer is responsible for managing conversion. In practice, programmers will convert request parameters and output similar to ASP.NET. However, programmers will seldom touch the hard-coded HTML strings in the source files. The idea of converting the encoding of hard-coded HTML is quite surprising and it was a shock that ASP.NET does this in the background.

The ASP.NET way is not inherently a good or bad idea, but it can cause issues when you are simply adding dynamic content to a pre-existing website. You need to make sure that your settings aligns with the encodings and the workflows of your colleagues who might edit your .aspx files with various editors.

Coming from a web-development background, the ASP.NET way is certainly alien.

Other articles in this series

  1. Understanding how ASP.NET handles source file encoding. (part 1)
  2. Basic ASP.NET web page (.aspx) structure. (part 2)
  3. Making Visual Basic function calls as terse as possible in the view code. (part 2)
  4. Ways to reuse code in ASP.NET. (part 3)