Sunday 8 April 2012

The Ladder of Design Ascent

One of the first design decision I had to take when developing several years ago the interval library in Prolog was how to represent intervals considering that I wanted to have:

  • a representation as simple as possible
  • both open and closed intervals
  • intervals of any underlying type, including non-numerics
  • natural blending in the language
  • being able to handle infinity gracefully

At the time I came up with the <interval>(_, _) term where <interval> was ‘[)’, ‘(] or ‘[]’. The ‘()’ atom was the empty interval and infinity was represented by '+inf' and '-inf'. I also defined a little algebra to allow for infinite operands participating to +, - and the like.

With the library the goals are the same but the way to get there is different. Much different.

To write or not to write. That is the question

Prolog has the unification mechanism which guarantees that a term, once “assigned”, stays unchanged. Not so for .NET: here having a immutable data structure is a conscious design decision. Here are some of the trade-offs:

  • An immutable structure is less prone to errors (it doesn’t have state transitions).
  • An immutable structure works better with functional programming and parallelism (virtually no side effects).
  • An immutable structure is less efficient (it creates a new instance for any change).
  • An immutable structure should not be inherited from (it’s easy to break the substitution principle with inheritance).
  • An immutable structure requires special equality/hashing logic (the reference to the object is more or less irrelevant).

Given than an interval should not be much more than a number, I opted in Pavings.NET for immutability with the provision that creating a new interval out of an old one plus some change should be inexpensive.

Setting limits

Another design challenge for Pavings.NET is the treatment of limits. The lack of pattern matching in .NET makes it impossible to treat intervals and their limits as a single unit.

One cannot have the equivalent of ‘[)’ and ‘(]’ Prolog atoms to be treated agnostically in C# unless those entities inherit from the same parent entity (class or interface) but in such case any information about the actual type is lost when the parent type is used (except by down-casting which is not a good idea anyway).

Therefore, in order to have actual and explicit knowledge about the limits, data structures to represent them are. They should satisfy the following conditions:

  • be as simple as the underlying type
  • engage in the same comparison operations as the underlying type as well as with each other
  • accommodate infinity

The answer to these requirements is the following structs:

public struct LoLimit<T>: IComparable<LoLimit<T>>, IComparable<HiLimit<T>>, IComparable<T>,
                           IEquatable<LoLimit<T>>, IEquatable<HiLimit<T>>, IEquatable<T>
                           where T: IComparable<T>, new()

public struct HiLimit<T>: IComparable<LoLimit<T>>, IComparable<HiLimit<T>>, IComparable<T>,
                           IEquatable<LoLimit<T>>, IEquatable<HiLimit<T>>, IEquatable<T>
                           where T: IComparable<T>, new()

One should observe that the constraints on T lack IEquatable<T>. This is to lessen the requirements on T as much as possible, not because some arcane incompatibility. The new() constraint is required to allow for limits without an underlying value, such as infinite limits.

Computations without INumeric

One of the first limitation that a numeric programmer encounters with .NET is the lack of a INumeric interface that defines the contract that any numeric type should uphold.

In the absence of INumeric, Pavings.NET must compensate by providing the interval operations explicitly. However, because the underlying type is generic and structs block inheritance, computation within intervals has to be externalized.

This is achieved via the Bridge Pattern applied twice: the Interval<T> struct is using an object of type ComputationObject<T> which in turn is using an object of type AlgebraObject<T>. AlgebraObject is responsible with basic algebraic operations (addition, subtraction, etc) whereas Computation performs more complex operations such as calculating the median of two values. Both ComputationObject and AlgebraObject are roots of their own hierarchies.

Each Interval<T> object has its own internal ComputationObject<T> instance which can default to a general, convenient value for type T (the same applies to AlgebraObject embedded within Computation). The static classes Algebra<T> and Computation<T> stand as factories for such defaults.

Detaching computations from intervals, besides being necessary in the context, has many advantages:

  • one can extend behaviours of intervals even if inheritance from intervals is not possible.
  • one can provide temporary computations on instance-by-instance basis.
  • one can change the behaviour of computations without replacing the computation objects, if those objects permit that.

The last point is of particular significance. For example, one can decide to change the precision of computation over floating-point types on the fly via the AlgebraObject<T>.Delta property. This is very useful when taking samples of values over intervals: the distance between samples can decrease and the sample set can grow in precision without any change in the data structures.

Performance considerations

While the existing design largely satisfies the initial goals of generality, extensibility and efficiency, there is one concern that the design of Interval<T> and its peers much fully acknowledge: the performance penalties incurred by hidden side effects induced by the compiler and/or the CLR.

At the top of the list stands the involuntary boxing or unboxing induced by struct interface implementations. More precisely, by the boxing that occurs when a struct implementing an interface is accessed via a variable of that interface type.

These penalties can be avoided in two ways:

  • by using variables of generic type and not interface type even though the generic type is constrained to implement that interface.
  • by using parameters of generic type and not interface type even though the generic type is constrained to implement that interface.

In other words, the solution is to go as concrete as possible (and not as general as possible) when dealing with types that are (or can be) structs.

A perfect example is the IIntervalComparable<T> interface. The Interval<T> struct is implementing this interface which features two methods:

ICompareResult CompareTo(T obj);

ICompareResult CompareTo(Interval<T> obj);

The second method is the interesting one: even though CompareTo(IIntervalComparable<T>) would have been more general, CompareTo(Interval<T>) is more efficient because a copy on the parameter stack is more efficient than boxing Interval<T> into an object in the heap.

Conclusions

Making software design decisions is always a tricky and challenging endeavour and it strongly relies on both intuition and experience.

Life shows that such challenges do not belong to high-level designs only: even creating a simple struct like Interval<T> may involve non-trivial considerations regarding the run-time under which the structure will operate.

Moreover, the lessons learned in some programming paradigm do not necessarily transfer elsewhere – as the two implementation of intervals in Prolog and Pavings.NET show.

No comments:

Post a Comment