String concatenation in Ruby

Posted by Jonas Elfström Mon, 01 Feb 2010 23:04:00 GMT

There's no StringBuilder class in Ruby because the String class has the << for appending. The problem is that not every Ruby programmer seems to be aware of it. Recently I've seen += being used to append to strings where << would have been a much better choice.

The problem with using += is that it creates a new String instance and if you do that in a loop you can get really horrible performance.

If you are dealing with an array you don't even have to use << because Array#join is even faster and shows intent in a nice way.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
require 'benchmark'

array_of_random_strings=(0...262144).map{65.+(rand(25)).chr}.join.scan(/.{1,8}/m)

Benchmark.bm do |benchmark|
  benchmark.report do
    str=array_of_random_strings.join
  end
  benchmark.report do
    str2=""
    array_of_random_strings.each do |s|
      str2<<s
    end
  end
  benchmark.report do
    str3=""
    array_of_random_strings.each do |s|
      str3+=s
    end
  end
end
The array_of_random_strings is an array of 32768 8 characters long random strings.

usersystemtotalreal
0.0300000.0000000.030000( 0.027184)
0.1600000.0100000.170000( 0.190277)
106.0200000.300000106.320000(113.457793)


The performance of += was even worse than I imagined!

Comparing instance variables in Ruby

Posted by Jonas Elfström Mon, 02 Nov 2009 21:50:00 GMT

Say you have two objects of the same class and you want to know what differs between them. Well actually you just want to know the instance variables in object b that differs from the ones in object a.

To begin with, we need a class. I like cheese.

1
2
3
4
5
6
class Cheese
  attr_accessor :name, :weight, :expire_date
  def initialize(name, weight, expire_date)
    @name, @weight, @expire_date = name, weight, expire_date
  end
end


Then we need some cheese objects.

1
2
stilton=Cheese.new('Stilton', 250, Date.parse("2009-11-02"))
gorgonzola=Cheese.new('Gorgonzola', 250, Date.parse("2009-11-17"))


With only name, weight and an expiration date it would be easy to compare those but imagine that these two objects has 42 properties. It does not stop there, you are being asked to compare 24 different classes in this way. Are you cringing yet?

Object#instance_variables to the rescue! Well, that and a small hack by me. Below I add a new method called instance_variables_compare to Object. The long method name is because I wanted to follow the naming already in place. Usually I prefer to do these kind of things as a module and then include them where appropriate but in this case I find that a monkey patch will do.

1
2
3
4
5
6
7
class Object
  def instance_variables_compare(o)
    Hash[*self.instance_variables.map {|v|
      self.instance_variable_get(v)!=o.instance_variable_get(v) ? 
      [v,o.instance_variable_get(v)] : []}.flatten]
  end
end


It returns the instance variables that differs as a hash because it's handy and because I like it that way.

1
2
3
4
5
6
7
8
9
10
>> stilton.instance_variables_compare(gorgonzola)
=> {"@name"=>"Gorgonzola", "@expire_date"=>#<Date: 4910305/2,0,2299161>}
>> gorgonzola.instance_variables_compare(stilton)
=> {"@name"=>"Stilton", "@expire_date"=>#<Date: 4910275/2,0,2299161>}
>> stilton.expire_date=gorgonzola.expire_date
=> #<Date: 4910305/2,0,2299161>
>> stilton.instance_variables_compare(gorgonzola)
=> {"@name"=>"Gorgonzola"}
>> stilton.instance_variables_compare(stilton)
=> {}


If you ever think of using this code you should be aware of two things.

  1. This code is very untested and comes with no guarantees.
  2. Since instance variables spring into life the first time they are assigned to you either have to work with objects that always initialize everything or you have to change instance_variables_compare to handle this.

Infinite ranges in C# 1

Posted by Jonas Elfström Tue, 20 Oct 2009 18:41:00 GMT

I recently learned that C# is compliant with the IEEE 754-1985 for floating point arithmetics. That wasn't a big surprise but that division by zero is defined as Infinity in it was! It actually kind of bothers me that I didn't know this.

In mathematics division by zero is undefined for real numbers but I guess Infinity is a more pragmatic result. Or as a friend put it "IEEE stands for Institute of Electrical and Electronics Engineers not Institute of Mathematics"

1
2
3
4
double n = 1.0;
n = n / 0;
if (n > 636413622384679305)
  System.Console.WriteLine("Yes it certainly is!");

This C# code does not throw an exception, it simply leaves n defined as Infinity and a line written to the console.

Ruby is also IEEE 754-1985 compliant. It even lets you define infinite ranges.

1
2
3
4
5
6
Infinity=1.0/0
=>Infinity
(1..Infinity).include?(162259276829213363391578010288127)
=> true
(7..Infinity).step(7).take(3).inject(&:+) # 7+14+21
=> 42


I can't say I see very much use of this but it brings a kind of completeness to the handling of infinities. Unfortunately it seems we don't get that in C# out of the box because Enumerable.Range takes <int>,<int> as parameters and there's no Infinity definition for int. That's unless someone wrote a generic Range class. Turns out none other than Jon Skeet did in his MiscUtil. Dowload MiscUtil and then by using MiscUtil.Collections; you can:

1
2
3
4
5
6
double n = 1.0;
var infinity = n / 0;
var r = new Range<double>(0, infinity);
if (r.Contains(4711))
  System.Console.WriteLine("Yes it certainly does!");
var sum = r.Step(7.0).Take(3).Sum();


And guess what, it works like a charm! 4711 is part of positive infinity and sum is 42.0 and all is good.

Edit

There's also a couple of predefined constants. Thanks to Eric for pointing that out.

1
2
var r = new Range<double>(7,  System.Double.PositiveInfinity);
var sum = r.Step(7.0).Take(3).Sum();

The Thrush combinator in C# 1

Posted by Jonas Elfström Tue, 06 Oct 2009 18:35:00 GMT

Last year I read Reg "Raganwald'" Braithwaite's excellent post The Thrush and he explains it as

The thrush is written Txy = yx. It reverses evaluation.

Back then I didn't even consider trying to implement it in C#. That was before I digged deeper into lambda expressions and extension methods in C# 3.0 and way before last night when I read Debasish Ghosh's post on how to implement the Thrush in Scala. After reading that my first thought was if it was possible to do the same in C#. Here's my attempt.

At first I struggled with the static typing and headed for an easy way out using Object in the extension method of Object:

1
2
public static object Into(this Object obj, Func<object, object> f)
{  return f.Invoke(obj); }


My goal was to translate the Ruby example


(1..100).select(&:odd?).inject(&:+).into { |x| x * x }

in Raganwald's post to C#.

Which reads "Take the numbers from 1 to 100, keep the odd ones, take the sum of those, and then answer the square of that number."

But with the Object based extension method I had to do some ugly casts.


var r = Enumerable.Range(1, 100).Where(x => Odd(x)).Sum().Into(x => (int)x * (int)x);


With som added typing I could do:


var result = Enumerable.Range(1, 100).Where(x => Odd(x)).Sum().Into(x => x * x);


But that merely moved the cast to the ext. method and also made it work for integers only.

1
2
public static int Into(this Object obj, Func<int, int> f)
{ return f.Invoke((int)obj); }


Then I remembered generics and method type inference.

1
2
public static T Into<T>(this T obj, Func<T, T> f)
{ return f(obj); }

And now not only the casts were gone but I also got a thrush combinator almost as flexible as the one in Ruby.

Contrived example follows:

1
2
var test = "ball";
var ball = test.Into(s => "Are we having a " + s + " yet?");
 

The odd part

The Odd(x) method call in the calculation above is a plain static method.

1
2
private static bool Odd(int n)
{ return (n % 2 != 0); }

If you want an even more terse syntax you could try an ext. method on IEnumerable like this:

1
2
public static IEnumerable<int> Odd(this IEnumerable<int> en)
{ return en.Where(n => n % 2 != 0); }

Gives:


var result = Enumerable.Range(1, 100).Odd().Sum().Into(x => x * x);


Also as a general alternative to .Sum() I could have used .Aggregate((x, y) => x + y)) but I found it a bit verbose.

In C# I don't think it's possible to pull off the Symbol#to_proc stuff that Ruby does. That's the &: in the select(&:odd?) and the inject(&:+) in the Ruby example. Raganwald has a great post on that too.

Edit

Check out Jon Skeet's nice answer on StackOverflow to my question on how to make this even more Ruby-like. I have to try out that Operator class later though.

Edit 2009-10-07

One thing I found a bit surprising is that by implementing the Into ext. method in this way it not only works for all objects based on System.Object but it also works for value types.

1
2
3
4
int n=4711;
int oddOrZero = n.Into(x => x % 2 !=0 ? x : 0); // 4711
n = 4712;
oddOrZero = n.Into(x => x % 2 != 0 ? x : 0); // 0


Edit 2009-10-12

My confusion did stem from my lack of understanding of extension methods. Ex. methods are in fact not extending System.Object or any other type, they are "nothing more than a pleasant syntax for calling a static method" in case no instance method with the same name can be found.

A case for using only three different digits in keypad codes 3

Posted by Jonas Elfström Sun, 27 Sep 2009 19:02:00 GMT

Keypads have obvious security problems and keypads accepting a stream of digits with no # or enter in between, while checking for the four digit long code, are even worse.

The important part is to not leak the digits in the code by wear or intentional markings because if they leak it's suddenly very far from 10000 combinations.

If the "lock picker" only knows that the code contains four digits there are 10000 combinations. Keypads accepting a stream of digits can then be opened in a maximum of 10003 keystrokes using the De Bruijn sequence. That is still quite a lot.

Below is a Ruby implementation of the De Bruijn sequence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# De Bruijn sequence
# Original implementation by Frank Ruskey (1994)
# translated to C by Joe Sawada
# and further translated to Ruby by Jonas Elfström (2009)

@n=4
@k=10
@a=[0]
@sequence=[]

def debruijn(t, p, alike)
  if t>@n
    if @n%p==0
      1.upto(p) {|j| @sequence<<@a[j]}
    end
  else
    @a[t]=@a[t-p]
    if @a[t]>0
      debruijn(t+1,p,alike+1)
    else
      debruijn(t+1,p,alike)
    end
    (@a[t-p]+1).upto(@k-1) {|j|
      @a[t]=j
      debruijn(t+1,t,alike+1)
    }
  end
end

debruijn(1,1,0)
print @sequence.join
 

It's not uncommon to find keypads with 4 of the 10 keys worn down and if you do you can be pretty sure that the code contains those four different digits. The number of possible combinations are 4! = 4x3x2x1 = 24. I got curious to see if there's a kind of De Bruijn sequence for this that brings down the 4*24=96 keystrokes. By scribbling in a text editor I quickly realized there's not a clean sequence. Not clean in the way that a sequence following the rules can be created. Also it's probably even quite daunting to present it as mathematically dense and beautiful as the De Bruijn but that could be my less than great combinatorics speaking.

I made a quick and dirty brute force hack to try to find a shorter sequence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
seq=[]
1.upto(4) {|a| 1.upto(4) {|b| 1.upto(4) {|c| 1.upto(4) {|d|
  seq << "%d%d%d%d" % [a,b,c,d] if !(a==b || a==c || a==d || b==c || b==d || c==d)
}}}}

s=seq[0]
seq.delete_at(0)
while (seq.length>0)
  next_code=(seq.select {|c| c[0..2]==s[-3..-1]})
  if next_code.empty?
    next_code=(seq.select {|c| c[0..1]==s[-2..-1]})
    if next_code.empty? 
      next_code=(seq.select {|c| c[0]==s[-1]})
      s+=next_code[0][1..3]
      seq.delete_at(seq.index(next_code[0]))
    else
      s+=next_code[0][2..3]
      seq.delete_at(seq.index(next_code[0]))
    end
  else
    next_code=(seq.select {|c| c[0..2]==s[-3..-1]})
    s+=next_code[0][3].chr
    seq.delete_at(seq.index(next_code[0]))
  end
end

The above code takes the first code "1234" of the 24 and then searches the rest of the array for a code beginning with "234". It finds "2341" and adds "1" to the end of s and continues to look for "341" and so on. Relatively soon there is no three digit match and then it tries two digits and eventually even that fails and then it gets the first one digit match. The resulting sequence is:

123412314231243121342132413214321

From 96 to 33 keystrokes. Not as effective as De Bruijn but still significant. Unlike De Bruijn I have absolutely no proof that this is the shortest one possible but it seems likely. Also notice that in the middle of the sequence we find "3121" and "1213". Those break the criteria of four different digits but they seem to be necessary to be able to enter the reversed mode. Try reading the sequence forward and backwards to see what I mean.

If the code only contains two digits it's gets even more trivial to try them all. There are 14 possible codes and by compressing those to one sequence you get down to 20 keystrokes.

Things get a little more interresting if three buttons are worn. It turns out that the repeated digits can be placed in the code in six different ways.

0012,1002,1200,0102,0120,1020

That's 6x2x3=36 combinations and, maybe a little unintuitive, 12 more than if you are using four different digits. I compressed it down to 49 key strokes. Unlike the sequence for four different digits I can't find it with google and I know it's kind of security by obscurity but that could give a tiny amount of extra trouble to an attacker. I will not be the first one publishing it.

Be aware that if an attacker knows you are using a 0012-like code he gets a smaller space to search. 6x8x9x10=4320 instead of 10000. You have to weight the risk of button leaks against a code protocol leak.

Older posts: 1 2 3 4