Saturday 28 March 2015

Enumerables

The EnumJ library needs one small addition before going public: an Enumerable<E> interface.

Enumerator<E> versus Enumerable<E>

As I mentioned in a previous post, Enumerator<E> is like Iterator<E> but endowed with many methods that make enumerators look a lot like Java 8 streams. In addition and unlike streams, enumerators exhibit high composability, shareability and fault tolerance.
All is great, but with one caveat: they don’t repeat (streams do the same). Once you finish enumerating, there is no way to reuse the enumerator, it’s dead forever.
True, it is possible to re-generate the enumerator from the same source and it is also possible to rebuild the pipeline, but what if both the source and the pipeline are unknown? What if all we get is a single Enumerator<E> object>? In such case, it is not possible to re-enumerate the same sequence again.
Enumerable<E> solves this conundrum. This interface extends Iterable<E> and defines the iterator() method in terms of a new method called enumerator():
public interface Enumerable<E> extends Iterable<E> {

   
@Override
   
public default Iterator<E> iterator() {
       
return enumerator();
   
} 

   
public Enumerator<E> enumerator(); }

Implementation details

In addition to replacing iterator() with enumerator(), the Enumerable<E> interface will feature the same composability that Enumerator<E> enjoys: map(), filter() and the like will be part of its portfolio of methods and it will have the same degree of composablity as Enumerator<E>. This means that, for example, applying one hundred thousand concatenations will not overflow the stack when the iterator is being returned.
Internally, Enumerable<E> will contain a source of type Iterable<E> along with a pipe of builders that will apply various operators upon the original iterator when returning the enumerator. This way, the desired enumerator will get returned every single time enumerator() or iterator() gets called.
From a more “theoretical” point of view, Enumerable<E> will be a way to build abstract collections that can be enumerated repeatedly in a manner similar to IEnumerable<E> from .NET LINQ.

Why all the fuss

Firstly, because enumeration repeatability is a serious matter. Passing around an enumerable that can be enumerated repeatedly is much more potent than passing around an enumerator that can be traversed only once.
Secondly, abstracting over sets opens the door for something much bigger: to be able to treat sets as variables, therefore to be able to write programs that work upon entire sets and not just individual values. The idea is not new, at least three well known domains in Computer Science have been doing it for a long time:
  • Constraint Programming over finite domains – which restricts sets based on propagate-able constraints
  • Logic Programming – which defines sets of solutions in logic terms
  • Probabilistic Programming – which replaces data-holding variables with random-holding variables; that is, with random variables that we know from Mathematics. These random variables are nothing else than bags of elements having occurring frequencies differing according to some probabilistic distribution
But about all the possibilities opened by Enumerable<E> I will write in future posts.

No comments:

Post a Comment