Kamishibai vs. Server-generated Javascript Responses

We’re writing some new capabilities for our Kamishibai.js Javascript library that powers our Ponzu conference system.

I haven’t documented Kamishibai.js, and at this point, it’s not even an independent library. Still, I just wanted to note a few things that crossed my mind recently.

Enough with the JavaScript Already!

A well written slide stack by Nicholas C. Zakas.

After consulting with several companies on performance related issues, it became clear that one of the biggest performance issues facing websites today is the sheer amount of JavaScript needed to power the page. The demand for more interactive and responsive applications has driven JavaScript usage through the roof. It’s quite common for large sites to end up with over 1 MB of JavaScript code on their page even after minification. But do today’s web applications really need that much JavaScript?

The answer in my opinion is no. Not nearly. Kamishibai.js is less than 50KB minified without HTML templates. It is smaller than the jquery.min.js file.

Server-generated JavaScript Responses

Written by David Heinemeir Hansson on the Signal v. Noise blog.

The essence of Server-generated JavaScript Responses (SJR) is as follows;

  1. Form is submitted via a XMLHttpRequest-powered form.
  2. Server creates or updates a model object.
  3. Server generates a JavaScript response that includes the updated HTML template for the model.
  4. Client evaluates the JavaScript returned by the server, which then updates the DOM.

I totally agree to this approach. In Kamishibai.js, we extend it in the following ways;

Instead of returning a Javascript response in 3, we usually send a simple HTML fragment. The Kamishibai.js library looks at our HTML fragment and searches it to see if any of the top level element ids are already present in the DOM. If so, then Kamishbai.js replaces the content of the DOM with the content in the HTML fragment. This allows us to do common DOM-replacements without any Javascript. If you want to add animations, you can do this declaratively through HTML data-attributes in the HTML fragment.

Another extension is the use of JSON. We totally agree that returning HTML is better than JSON if performance or readability of your code is your main issue. However in Kamishibai, we cache responses in localStorage which is very limited in capacity. Since JSON can be made many times more compact than HTML, we use JSON for the responses that we need to store a lot in localStorage.

In Kamishibai.js, we take a progressive approach to Javascript HTML templates. We start by returning HTML fragments. When we think we want to send a view with JSON, we write a Javascript HTML template and a JSON response for that view. Kamishibai.js can automatically determine if the response is an HTML fragment or JSON which should be used with a template. If it is JSON, then it summons the appropriate template and converted the JSON to an HTML fragment. That HTML fragment is further processed to be inserted into the DOM.

Summary

Kamishibai.js uses Javascript to generate pages, but the code is small and simple. We just expand on some concepts by those who eschew complex Javascript libraries, and provide the Javascript to make these approaches easier to follow.

I hope to write more on Kamishibai.js in the future.

Ruby 1.9.2 CSV でInternal Encodingを指定するとおかしくなる件

今日これで数時間潰してしまったのでメモ。

Ruby 1.9のCSVでShift-JISのファイルを読み込むときのバグです。

# encoding: UTF-8

require 'csv'

CSV.foreach('test.csv', row_sep: "n", encoding: "SJIS:UTF-8") do |row|
  puts row.join(':')
end

CSV.foreach('test.csv', encoding: "SJIS:UTF-8") do |row|
  puts row.join(':')
end

ファイル”test.csv”は以下のもので、行末は”n”でエンコーディングはShiftJISで保存してあります。

今日は,2月末なのに,東京でも,大雪でした

ruby 1.9.2-p290で実行すると

NaoAir:Desktop nao$ ruby test.rb 
今日は:2月末なのに:東京でも:大雪でした
/Users/nao/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:2027:in </code><code>=~': invalid byte sequence in UTF-8 (ArgumentError)
	from /Users/nao/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:2027:in </code>init_separators'
	from /Users/nao/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1570:in <code>initialize'
	from /Users/nao/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1335:in </code>new'
	from /Users/nao/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1335:in <code>open'
	from /Users/nao/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/csv.rb:1201:in </code>foreach'
	from test.rb:19:in `'

のようになります。最初の”row_sep”を指定したものはうまくいきますが、”row_sep”を指定していない方はエラーが出ます。

理由は”row_sep”で行末記号を指定してあげないと、CSVは自分で先読みをして行末記号を推定しようとしますが、この処理がエンコーディングによってはバグを起こすようです。

解決策は最初の例のように行末記号をしてあげる、そもそも自動推定をさせないこと。あるいはCSVにtranscodeさせるのをやめ、CSVから得られた個々の結果を個別に#encodeしてあげること(下例)。

CSV.foreach('test.csv', encoding: "SJIS") do |row|
  puts row.map{|c| c.encode('UTF-8')}.join(':')
end

あるいはRuby 1.9.3ではこのバグは修正されているようですので、1.9.3にアップグレードすれば問題なく処理されます。

RubyでUTF8とXML書き出し

アップデート:RSSからTwitterに自動投稿をしてくれるTwitterfeedなどのサービスがありますが、これらは下記の2.の「XMLはUTF8をそのままに残す」というのができなくて文字化けを発生させているようです。”&#26085″などの記号はウェブブラウザだと正しく変換して画面に表示してくれますが、ほとんどのTwitterクライアントではこの変換をやらないためです。

丸一日、これで悩んでいました。なんとか解決したので、ここに記録します。

やりたかったこと

  1. UTF8化したデータをXMLに書き出す。
  2. XMLファイルはUTF8をそのままに残す。例えば”日本語” => “日&amp#26412;&amp#35486;”という変換はしない。
  3. 大きいXMLファイルを書き出したいので、XMLをすべてメモリに溜め込んでから書き出すのではなく、少しずつファイルに書き出す。

そんなに珍しいことをやろうという訳でもないので、簡単にできるかなと思ったのですが、これがなかなか大変でした。Ruby 1.9ではもう少し簡単になっているかもしれませんが、少なくともRuby 1.8では大変です。

その1:Ruby標準ライブラリのREXMLを使うという選択肢

採用せず

  1. まず、REXMLはバグが多い。例えばXMLをインデント整形するだけでバグ。
  2. 少しずつファイル書き出しはできない。

その2:Ruby on RailsのActiveSupportについてくるBuilder (version 2.1.2)

採用せず

  1. ファイルを少しずつ書き出すことができるのは大きなプラス。
  2. しかし”日本語” => “日&amp#26412;&amp#35486;”は起きる。

その3:Nokogiri

採用せず

  1. “日本語” => “日&amp#26412;&amp#35486;”は起きないというのは大きなプラス。
  2. しかしファイルを少しずつ書き出すことはできない。

その4:Builderの最新バージョン (Githubにある version 2.2.0以上)

採用。以下の感じで使いました。

$KCODE = 'UTF8'
require 'rubygems'
gem 'bigfleet-builder'
require 'builder'
x = Builder::XmlMarkup.new(File.open("output_file.xml", "w"), :indent => 1)
x.instruct!(:xml, :encoding => "UTF-8")
1000.times do
  x.product do
    x.name("日本語")
  end
end

Railsでは処理速度を向上させるために、fast_xs gemがインストールされていればこれを読み込んでBuilderをパッチしています。しかしバージョン 2.2.0のBuilderはXmlBase#_escape内で、String#to_xsを引数付きで呼び出しているので、引数を取らないfast_xsのto_xsとコンパチではなくなっている感じです。

例えば

$KCODE = 'UTF8'
require 'rubygems'
gem 'bigfleet-builder'
require 'builder'
require 'active_support'
x = Builder::XmlMarkup.new(File.open("output_file.xml", "w"), :indent => 1)
x.instruct!(:xml, :encoding => "UTF-8")
1000.times do
  x.product do
    x.name("日本語")
  end
end

とすると、ArgumentError: wrong number of arguments (1 for 0) : method to_xs in xmlbase.rb at line 118と怒られます。

これを解消するためには fast_xs を使わないようにmonkey patchします。

$KCODE = 'UTF8'
require 'rubygems'
gem 'bigfleet-builder'
require 'builder'
require 'active_support'

class String
  alias_method :to_xs, :original_xs if method_defined?(:original_xs)
end

x = Builder::XmlMarkup.new(File.open("output_file.xml", "w"), :indent => 1)
x.instruct!(:xml, :encoding => "UTF-8")
1000.times do
  x.product do
    x.name("日本語")
  end
end

かなり美しくないのですが、これでようやくなんとかXMLがやりたいように書き出せるようになりました。

Using Hex coded characters inside Ruby regex character classes

I ran into a (bug? | annoyance) in Ruby 1.8.6 on MacOS X.

Run all following code with

$KCODE = 'u'

Working with the hex coded em-dash character.

puts "xE2x80x94"
output: 

I can successfully use the hex coding to generate a simple regular expression.

puts "—" =~ /xE2x80x94/
output: 0

However, this doesn’t work if I put the hex coded character inside a character class.

puts "—" =~ /[xE2x80x94]/
output: nil

I can work around this by evaluating the hex coded character and generating a UTF-8 character, before putting into the character class brackets.

puts "—" =~ /[#{"xE2x80x94"}]/
output: 0

To see what’s happening, I inspected the regex objects.

/[xE2x80x94]/.inspect
output: "/[\xE2\x80\x94]/"
/#{"xE2x80x94"}/.inspect
output: "/—/"

It looks like if I want to reliably use unicode within Ruby regular expressions, using the hex code inside of the regex is a bad idea. I should evaluate the hex code and generate a unicode character before sticking it into the regex.

‘ran out of buffer space on element’ errors in Hpricot

Hpricot is a great gem for parsing web pages, and combined with the automatic navigation capabilities provided by WWW::Mechanize, it really becomes easy to create a robot to scrape web sites.

One problem, mentioned in this blog post, is that an ever increasing number of ASP.NET web sites have huge amounts of data in an HTML attribute.

Instead of using the methods provided by Hpricot and WWW::Mechanize to work around this issue (as described in the blog post), I used the following monkey patch.


module WWW
  require 'hpricot'
  class Mechanize
    Hpricot.buffer_size = 262144  # added by naofumi
  end
end

You can put it an initializer if you are working in Rails.

Handling UTF16 line endings in Ruby

A quick memo of a problem that I was having with Ruby.

I was reading in a UTF-16 Little-Endian text file with Windows (CR+LF) line endings, using the Ruby ‘read’ command, then converting it to UTF8 using the NKF library. I was constantly running into a problem where some of the characters were garbled.

After some digging around, I found this post (in Japanese).
Ruby List

What it is saying is that UTF-16 Little-Endian CD+LF line endings are encoded as

"r 00 n 00"

The problem is that since the Ruby get command uses “n” as the default separator string, the string that is actually read in is

"r 00 n"

The result is such that the final character “n” is only 8 bits long and is not a valid UTF-16 character. This causes NKF to misbehave and garble the text (with Iconv, it spits out an error and quits).

Instead of using a simple gets to fetch a line from UTF-16 Little-Endian CD+LF text, simply use

gets("n00")

You can then use either NKF or Iconv without any problems.