Writing No-Framework ASP.NET (part 2: code structure)

Basic ASP.NET web page structure

This is the second article in my series on writing a simple ASP.NET web page that resembles a simple PHP file. My first article discussed how encodings are handled in PHP.NET

In this second article, I will discuss the code structure.

The Single-File Page Model structure

Microsoft describes the single-file page model where the page’s markup and the programming code is in the same physical .aspx file. This is exactly the same model that ASP.NET’s predecessor, classical ASP was build upon, and it is also the model that PHP uses.

In my first article, I described my requirements as being quite simple. All I needed to do was to add simple functionality to a pre-exisiting web page (static HTML file). The single-file page model was the obvious place to start.

The following is a bird’s-eye view of Microsoft’s example;




  ' Controller code to set the text on Label1
  ' in response to a click on Button1



  ' View code in plain HTML with tags indicating the locations
  ' of Label1 and Button1

Let’s see how I added the functionality that I required onto this structure.

Encoding links easily

My objective is to create a helper function that generates url-encoded links.

Suppose I want to have links to a query results page. The query that we want to send is;

reactivity: "Bovine(ウシ)"

Using an online url encoder, we can see that “Bovine(ウシ)” should be url encoded to “Bovine%ef%bc%88%e3%82%a6%e3%82%b7%ef%bc%89” (if we use UTF-8). Hence the link should look like the following;

<a href="query.php?reactivity=Bovine%ef%bc%88%e3%82%a6%e3%82%b7%ef%bc%89">Bovine(ウシ)</a>

We could always use the online url encoder and then paste the results to a static HTML file. However it would be difficult to manage the file because the urlencoded URL is incomprehensible to human beings.

What we need is a link function that will take non-ASCII arguments and create the url encoded link for us.

The code that I came up with is shown here;




  Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs)
    ' initialization stuff to do immediately after page load
  End Sub

  Private Function urlEncodeUtf8(myString As String) As String
    urlEncodeUtf8 = HttpUtility.UrlEncode(myString, new System.Text.UTF8Encoding)
  End Function

  Public Function link(label As String, endpoint As String, query As String(,)) as String
    Dim tuples As New ArrayList()
    Dim i As Integer
    For i = LBound(query) To UBound(query)
      tuples.Add(query(i, 0) &amp; "=" &amp; urlEncodeUtf8(query(i, 1)))
    Next
    Dim href As String = "http://api.example.com/api.php?ep=" &amp; endpoint &amp; "&amp;" &amp; Join(tuples.ToArray(), "&amp;")
    link = "<a>" &amp; label + "</a>"
  End Function





'...


'...

'...


Virtual Basic syntax itself is quite simple and easy to grasp. However, without an ASP.NET nor Visual Basic background, there were a quite a few concepts that I initially had difficulty with.

Code/Render blocks in ASP.NET

In PHP, you have <?php> tags that you can put anywhere in the page. Inside these tags, you place PHP code. There is no limit to what code you can insert and all the tags function identically. However, this is not the case for ASP.NET. ASP.NET .aspx files have distinct blocks of code that are called code blocks and render blocks. These two are intended to be used in discrete ways.

The code blocks are the regions surrounded by <script runat="server"> tags. Here you define global variables and functions. On the other hand, you cannot directly write code that will be executed; you can only write definitions. If you want to write code that will be executed, you have to write it inside the Page_Load function which is the event handler that is called immediately after the page is loaded.

Render blocks are surrounded by <% %> or <%= %> tags and are executed when the page is rendered. At first glance, they look exactly like the tags in classic ASP or the <?php ?> tags in PHP; you embed code inside HTML. There is a big difference though. You cannot declare functions or subroutines inside these render blocks. You can however declare variables.

So basically, you define your functions in the code blocks and render your results in render blocks. I suppose the idea is to make sure that you don’t mix too much code with your HTML.

In my example, we defined all our functions (Page_Load, urlEncodeUtf8 and link) at the top of the page inside a code block (<script>). In the render block (<%= %>), we simply called the link function.

Passing complex data as arguments in Visual Basic

For the link function, we want to pass the query as a data structure (not a string). In PHP, the natural choice would be to use associative arrays. Hence the syntax for calling the link function would look like this;

<?php echo link("Bovine(ウシ)", "endpoint.php",
                                 array ("type" => "Whole IgG",
                                        "reactivity" => "Bovine(ウシ)",
                                        "title" => "Bovine Whole IgG secondary-antibodies")) %>

PHP 5.4 has added a short array syntax which makes it even more convenient to write the arguments for this function;

<?php echo link("Bovine(ウシ)", "endpoint.php",
                                 ["type" => "Whole IgG",
                                  "reactivity" => "Bovine(ウシ)",
                                  "title" => "Bovine Whole IgG secondary-antibodies"]) %>

Ruby also has a simple syntax for associative arrays or hashes;

 "Whole IgG",
                                 "reactivity" => "Bovine(ウシ)",
                                 "title" => "Bovine Whole IgG secondary-antibodies" %>

Ruby 1.9 took this one step further and made it this simple;


I am emphasizing the terseness of the argument syntax because this code is going to sit inside the HTML. Verbose code would completely interupt the HTML and make it harder to understand. Hence if we are going to insert Visual Basic code into the view at all, we should try to make it simple.

Now Virtual Basic has collections like ArrayList, Hashtable, SortedList, NameValueCollection and others. A Hashtable or a NameValueCollection would be ideal for the arguments that we want to pass. In fact, the Request.QueryString property that is used to retrieve GET parameters returns a NameValueCollection.

The problem is that the NameValueCollection and all the other collections do not have a terse syntax for populating it with values. There apparently is a new From keyword available from .NET Framework 4 that addresses this, but it doesn’t seem to be widely used.

If we were to use a NameValueCollection, the traditional syntax (without “From”) would be like this;



'

This is totally ridiculous.

The exception is a Visual Basic Array. Visual Basic provides a relatively terse syntax to initialize arrays.

Dim myArray() As String = {"first_element", "second_element"}

and you can do multi-dimension arrays like so;

Dim myArray() As String(,) = {{"1-1", "1-2"}, {"2-1", "2-2"}}

Because the only terse way was to use multi-dimension arrays, we ended up using the following argument structure for the link function.


To sum up, we gave up on the more desirable collections like NameValueCollection because it would be ridiculously verbose. Instead we used multi-dimensional arrays. I would say that the result is pretty OK, but it is troubling how Visual Basic seems to be indifferent to making code concise.

Solution for .NET 4.0

If we are on .NET 4.0, we can use the from keyword to use collections and still keep the code concise (official documentation);


This is much better, but unfortunately doesn’t work on .NET 2.0. It’s unfortunate that Microsoft doesn’t promote this more on their documentation for the collection objects, because I think it’s a very important feature. I was initially bewildered that the NameValueCollection class couldn’t take a Visual Basic Array as an argument on its constructor method, but I suppose that that was due to compatibility concerns with C#. The corresponding syntax in C# is simpler and actually looks as if we are passing an array to the constructor, although that isn’t what is happening behind the scenes.

Summary

In this article, I showed how we used the single-file page structure to add simple code to a web page.

Although this was a very simple exercise, we learnt that we can’t place functions anywhere we want in ASP.NET. We have to place them inside a code block. We also learnt that sending complex arguments to a function can be a bit difficult.

In the next article, I will show how to reuse code for the link function in other pages, which was again quite a surprise for me.

Other articles in this series

  1. Understanding how ASP.NET handles source file encoding. (part 1)
  2. Basic ASP.NET web page (.aspx) structure. (part 2)
  3. Making Visual Basic function calls as terse as possible in the view code. (part 2)
  4. Ways to reuse code in ASP.NET. (part 3)

RubyでUTF8とXML書き出し

アップデート:RSSからTwitterに自動投稿をしてくれるTwitterfeedなどのサービスがありますが、これらは下記の2.の「XMLはUTF8をそのままに残す」というのができなくて文字化けを発生させているようです。”&#26085″などの記号はウェブブラウザだと正しく変換して画面に表示してくれますが、ほとんどのTwitterクライアントではこの変換をやらないためです。

丸一日、これで悩んでいました。なんとか解決したので、ここに記録します。

やりたかったこと

  1. UTF8化したデータをXMLに書き出す。
  2. XMLファイルはUTF8をそのままに残す。例えば”日本語” => “日&amp#26412;&amp#35486;”という変換はしない。
  3. 大きいXMLファイルを書き出したいので、XMLをすべてメモリに溜め込んでから書き出すのではなく、少しずつファイルに書き出す。

そんなに珍しいことをやろうという訳でもないので、簡単にできるかなと思ったのですが、これがなかなか大変でした。Ruby 1.9ではもう少し簡単になっているかもしれませんが、少なくともRuby 1.8では大変です。

その1:Ruby標準ライブラリのREXMLを使うという選択肢

採用せず

  1. まず、REXMLはバグが多い。例えばXMLをインデント整形するだけでバグ。
  2. 少しずつファイル書き出しはできない。

その2:Ruby on RailsのActiveSupportについてくるBuilder (version 2.1.2)

採用せず

  1. ファイルを少しずつ書き出すことができるのは大きなプラス。
  2. しかし”日本語” => “日&amp#26412;&amp#35486;”は起きる。

その3:Nokogiri

採用せず

  1. “日本語” => “日&amp#26412;&amp#35486;”は起きないというのは大きなプラス。
  2. しかしファイルを少しずつ書き出すことはできない。

その4:Builderの最新バージョン (Githubにある version 2.2.0以上)

採用。以下の感じで使いました。

$KCODE = 'UTF8'
require 'rubygems'
gem 'bigfleet-builder'
require 'builder'
x = Builder::XmlMarkup.new(File.open("output_file.xml", "w"), :indent => 1)
x.instruct!(:xml, :encoding => "UTF-8")
1000.times do
  x.product do
    x.name("日本語")
  end
end

Railsでは処理速度を向上させるために、fast_xs gemがインストールされていればこれを読み込んでBuilderをパッチしています。しかしバージョン 2.2.0のBuilderはXmlBase#_escape内で、String#to_xsを引数付きで呼び出しているので、引数を取らないfast_xsのto_xsとコンパチではなくなっている感じです。

例えば

$KCODE = 'UTF8'
require 'rubygems'
gem 'bigfleet-builder'
require 'builder'
require 'active_support'
x = Builder::XmlMarkup.new(File.open("output_file.xml", "w"), :indent => 1)
x.instruct!(:xml, :encoding => "UTF-8")
1000.times do
  x.product do
    x.name("日本語")
  end
end

とすると、ArgumentError: wrong number of arguments (1 for 0) : method to_xs in xmlbase.rb at line 118と怒られます。

これを解消するためには fast_xs を使わないようにmonkey patchします。

$KCODE = 'UTF8'
require 'rubygems'
gem 'bigfleet-builder'
require 'builder'
require 'active_support'

class String
  alias_method :to_xs, :original_xs if method_defined?(:original_xs)
end

x = Builder::XmlMarkup.new(File.open("output_file.xml", "w"), :indent => 1)
x.instruct!(:xml, :encoding => "UTF-8")
1000.times do
  x.product do
    x.name("日本語")
  end
end

かなり美しくないのですが、これでようやくなんとかXMLがやりたいように書き出せるようになりました。

Using Hex coded characters inside Ruby regex character classes

I ran into a (bug? | annoyance) in Ruby 1.8.6 on MacOS X.

Run all following code with

$KCODE = 'u'

Working with the hex coded em-dash character.

puts "xE2x80x94"
output: 

I can successfully use the hex coding to generate a simple regular expression.

puts "—" =~ /xE2x80x94/
output: 0

However, this doesn’t work if I put the hex coded character inside a character class.

puts "—" =~ /[xE2x80x94]/
output: nil

I can work around this by evaluating the hex coded character and generating a UTF-8 character, before putting into the character class brackets.

puts "—" =~ /[#{"xE2x80x94"}]/
output: 0

To see what’s happening, I inspected the regex objects.

/[xE2x80x94]/.inspect
output: "/[\xE2\x80\x94]/"
/#{"xE2x80x94"}/.inspect
output: "/—/"

It looks like if I want to reliably use unicode within Ruby regular expressions, using the hex code inside of the regex is a bad idea. I should evaluate the hex code and generate a unicode character before sticking it into the regex.

‘ran out of buffer space on element’ errors in Hpricot

Hpricot is a great gem for parsing web pages, and combined with the automatic navigation capabilities provided by WWW::Mechanize, it really becomes easy to create a robot to scrape web sites.

One problem, mentioned in this blog post, is that an ever increasing number of ASP.NET web sites have huge amounts of data in an HTML attribute.

Instead of using the methods provided by Hpricot and WWW::Mechanize to work around this issue (as described in the blog post), I used the following monkey patch.


module WWW
  require 'hpricot'
  class Mechanize
    Hpricot.buffer_size = 262144  # added by naofumi
  end
end

You can put it an initializer if you are working in Rails.

Handling UTF16 line endings in Ruby

A quick memo of a problem that I was having with Ruby.

I was reading in a UTF-16 Little-Endian text file with Windows (CR+LF) line endings, using the Ruby ‘read’ command, then converting it to UTF8 using the NKF library. I was constantly running into a problem where some of the characters were garbled.

After some digging around, I found this post (in Japanese).
Ruby List

What it is saying is that UTF-16 Little-Endian CD+LF line endings are encoded as

"r 00 n 00"

The problem is that since the Ruby get command uses “n” as the default separator string, the string that is actually read in is

"r 00 n"

The result is such that the final character “n” is only 8 bits long and is not a valid UTF-16 character. This causes NKF to misbehave and garble the text (with Iconv, it spits out an error and quits).

Instead of using a simple gets to fetch a line from UTF-16 Little-Endian CD+LF text, simply use

gets("n00")

You can then use either NKF or Iconv without any problems.