Monday, August 26, 2013

Slick: 5 seconds of hate

Simple reminder why I hate ORM. Some daily experience from trying to make Slick doing something more useful than the examples they generously provide in the very incomplete and hazy documentation (I just wanted to count the rows in some query):

Warning #1:
method count in class ColumnExtensionMethods is deprecated: Use Query.count instead

After some code changes and more esoteric responses from the compiler, I get this:

Warning #2:
method count in class Query is deprecated: Use .length instead of .count

LOL

++ to add more misery to insult, I can't print out the SQL statement that this thing generates to get the row count, because it's now a method of a query! (Of course, there are always the database logs...)

Argh.

Update:  as a result of this rant, I had a quick twitter conversation with , who explained that myQuery.length.run will give the result I want, and Query(myQuery.length).selectStatement would produce the SQL, and that in Slick 2.0.0, that's coming out somewhere in September, it will be possible to write myQuery.length.selectStatement directly. That is nice, except that Slick 2.0.0 won't be backwards compatible, which probably means that the ongoing projects might not be able to benefit from the new library, unless people are already using alpha version of 2.0.0 (which we don't).

Nevertheless, good to know that someone's working on the problem :-}

Saturday, August 10, 2013

Scala, lists, vectors and mergesort

Today I noticed the following example of mergesort, quoted in the online book "Scala by example" (9.3 Example: Merge sort, page 69):

def msort[A](less: (A, A) => Boolean)(xs: List[A]): List[A] = {
    def merge(xs1: List[A], xs2: List[A]): List[A] =
        if (xs1.isEmpty) xs2
        else if (xs2.isEmpty) xs1
        else if (less(xs1.head, xs2.head)) xs1.head :: merge(xs1.tail, xs2)
        else xs2.head :: merge(xs1, xs2.tail)
    val n = xs.length/2
    if (n == 0) xs
    else merge(msort(less)(xs take n), msort(less)(xs drop n))
}

Further the authors say that the complexity of this algorithm is O(N log (N)), which makes it an attractive option for sorting lists.

Being in an inquisitive mood, I couldn't help wondering how big can in practice be a list sorted with such algorithm. (My current machine is 64-bit Ubuntu with 6 GB RAM and 8 cores)

So here we go.

scala> import scala.util.Random
scala> val randomNums = Seq.fill(1000)(Random.nextInt)
scala> msort((x:Int,y:Int)=>x<y)(randomNums.toList)

res27: List[Int] = List(-2145589793, -2143814602, -2143330861, -2142630038, -2136184047, -2135476301, -2132371275, -2129131922, -2123018872, -2120069375, -2118819114, -2117572993, -2112618055, -2102489605, -2096417279, -2095978655, -2095806565, -2087860343, -2087105558, -2085005596, -2083360810, -2077234330, -2065393243, -2058765966, -2056823240, -2053145149, -2047696716, -2044737011, -1847777706, -18259...

OK, for 1000 elements it works, for 10000 it works too. But let's push one more order of magnitude...

scala> val randomNums = Seq.fill(100000)(Random.nextInt).toList
randomNums: List[Int] = List(186865346, 638415825, -637620864, 220723809, -1536234831, 1710286185, 126091472, -1621728642, -1819749330, -294195052, -613979926, 1278841478, -111715804, -1953497441, 1891679544, 582175290, 1555531003, -430520072, 652471392, 1211722008, -112446234, -1900260621, 2058382521, 564201400, -1225275015, 2069052362, 797097978, 1077363576, 1469066877, -303059738, -166855116, -1876385701, 285630983, -1956550564, -1991336959, -1713232594, -868759609, -723403847, -282664963, 1965397484, 1563483549, -618177790, -297223307, -197365661, -703715983, 28207094, -1793590690, 374050582, 992041027, -1931269739, -932512120, -1551657371, 1523463808, -742246427, -1665172973, 50892779, -286029416, -1654054925, -874783455, -1825744857, -571856180, 289326103, -215127347, -1488600483,...
scala> msort((x:Int,y:Int)=>x<y)(randomNums)
java.lang.StackOverflowError
at .merge$1(<console>:20)
at .merge$1(<console>:22)
at .merge$1(<console>:20)
at .merge$1(<console>:20)
at .merge$1(<console>:20)
at .merge$1(<console>:20)
at .merge$1(<console>:22)
        ...
Oi wei, we ran out of stack!
The reasons are obvious: this line of code

xs2.head :: merge(xs1, xs2.tail)

means that all the time while you are busy merging the partial lists, there are orphan elements hanging around, waiting until they are allowed to be the head. And the stack isn't made of rubber!

Well... there is some sport, of course, could we make it at least work? Using the tail recursion, perhaps? (If you have forgotten, tail recursion happens when the recursive function ends with calling itself. Then the compiler can be smart enough to make a cycle instead of the sequence of the recursive calls, at least if you ask it politely :) )

import scala.annotation.tailrec
...
@tailrec
def merge(xs1: List[A], xs2: List[A]): List[A] =
   if (xs1.isEmpty) xs2
       else if (xs2.isEmpty) xs1
       else if (less(xs1.head, xs2.head)) xs1.head :: merge(xs1.tail, xs2)
       else xs2.head :: merge(xs1, xs2.tail)

However, this won't compile, and rightly so:

[error] 
<filename>: could not optimize @tailrec annotated method merge: it contains a recursive call not in tail position
[error] else xs2.head :: merge(xs1, xs2.tail)

Ah right. Operator :: is right associative, that means it's called after merge(arg1, arg2) was called, therefore tail recursion isn't possible.

Well, without further ado, that is the variant I have ended up with. It worked in the sense that it didn't break - there was now a cycle - but for the list with 100000 elements it took about 5 minutes to do the sorting!

import scala.math.Ordering.Implicits._ 
import scala.util.Random 
import scala.annotation.tailrec

object Sorting {
def less[T:Ordering](x:T,y:T) = x<y
@tailrec
def merge[A:Ordering ](xs0: List[A], xs1: List[A], xs2: List[A]): List[A] =
if (xs1.isEmpty) if (xs0 == Nil) xs2 else xs0 ::: xs2
else if (xs2.isEmpty) if (xs0 == Nil) xs1 else xs0 ::: xs1
else if (less(xs1.head, xs2.head)) merge(xs0 :+ xs1.head, xs1.tail, xs2)
else merge(xs0 :+ xs2.head, xs1, xs2.tail)

def msort[A:Ordering](xs: List[A]): List[A] = {
val n = xs.length/2
if (n == 0) xs
else merge(List[A](), msort(xs take n), msort(xs drop n))
}
def checkSorted[A:Ordering](x:A, xs:List[A]):Boolean = xs match {
case Nil => true
case head::tail => if (head < x) false else checkSorted(head, tail)
}
def test(size:Int, printAll:Boolean = true) = {
val randomNums = Seq.fill(size)(Random.nextInt).toList
val sorted = msort(randomNums)
if (printAll) println(sorted)
if (!checkSorted(sorted.head, sorted.tail)) println("Not sorted!!!")
}
}

The reason why this code is slow is in the fact that now we have added the accumulator, and we want to add elements to the end of it. However, List (as also mentioned here) only guarantees the quick access to the head - the complexity of accessing the last element would be proportional to the size of the list!

So is there anything to do to make it faster, without trying hard to imagine some other algorithm? My solution was to use Vectors instead of Lists.

object Sorting {
def less[T:Ordering](x:T,y:T) = x<y
@tailrec
def merge[A:Ordering ](xs0: Vector[A], xs1: Vector[A], xs2: Vector[A]): Vector[A] =
if (xs1.isEmpty) if (xs0 == Nil) xs2 else xs0 ++ xs2
else if (xs2.isEmpty) if (xs0 == Nil) xs1 else xs0 ++ xs1
else if (less(xs1.head, xs2.head)) merge(xs0 :+ xs1.head, xs1.tail, xs2)
else merge(xs0 :+ xs2.head, xs1, xs2.tail)

def msort[A:Ordering](xs: Vector[A]): Vector[A] = {
val n = xs.length/2
if (n == 0) xs
else merge(Vector[A](), msort(xs take n), msort(xs drop n))
}
def checkSorted[A:Ordering](x:A, xs:List[A]):Boolean = xs match {
case Nil => true
case head::tail => if (head < x) false else checkSorted(head, tail)
}
def test(size:Int, printAll:Boolean = true) = {
val randomNums = Seq.fill(size)(Random.nextInt).toVector
val sortedVec = msort(randomNums)
if (printAll) println(sortedVec)
val sorted = sortedVec.toList
if (!checkSorted(sorted.head, sorted.tail)) println("Not sorted!!!")
}
}

Vectors guarantee constant access to any element they contain. It is probably slightly slower than accessing the head of the list, but for the tasks like this one it seems to be a definitely better option. It took very little time to sort both 100000 and 1000000 vectors of random integers. With 10000000, however, it still takes quite some time and apparently more memory, but 10 millions is already more than the amount of rows in many decent tables :-)

I hope it was useful, or at least that you did rid to the end. I still want to write a C++ implementation and see which one finishes first. So may be there will be the part two...

Sunday, August 4, 2013

A Voyage into the Scala Reflection land, part 2: case or no case

Let's make a little module (using :paste mode of the console or by compiling it separately and importing the corresponding module) - it looks a bit weird but suitable enough for what I want to demonstrate:

trait DemoStuff {
  lazy val i = 0 
  lazy val s = ""
  lazy val printMe = s"${s},${i}"
}

case object CaseEnglish extends DemoStuff {
  override lazy val i = 1 
  override lazy val s = "Hello"
}

case object CaseGerman extends DemoStuff {
  override lazy val i = 2 
  override lazy val s = "Guten TaG"
}

object NonCaseOne extends DemoStuff {
  override lazy val i = 3 
  override lazy val s = "Bye"
}

(Regarding the usage of lazy vals, I was inspired by this explanation).

In theory, both "normal" objects and case objects in Scala are supposed to be singletons. However, as far as serialization is concerned, it seems that case objects are more singletons than their non-case siblings

scala> val nco = NonCaseOne
nco: NonCaseOne.type = NonCaseOne$@74232853

scala> val ce = CaseEnglish
ce: CaseEnglish.type = CaseEnglish

This mimics the same conventions for the classes versus case classes.

Let's go on with reflection. Assume that, like in the previous case, we imported the universe, the current mirror and defined the helper to get the type tag for us.

scala> val ceType = getTypeTag(ce)
ceType: reflect.runtime.universe.Type = CaseEnglish.type

Let's look at the base classes for this type.

scala> val baseClasses = ceType.baseClasses
baseClasses: List[reflect.runtime.universe.Symbol] = List(object CaseEnglish, trait Serializable, trait Serializable, trait Product, trait Equals, trait DemoStuff, class Object, class Any)

I am still not sure why train Serializable is listed twice in these cases, but let's look at the symbol for DemoStuff, which is, as we know, the direct parent of our objects.

scala> val dsSymbol = baseClasses.filter(_.name.toString=="DemoStuff").head
res9: reflect.runtime.universe.Symbol = trait DemoStuff

Interesting to note that both ceType.typeSymbol and dsSymbol are seen as classes:

scala> ceType.typeSymbol.isClass
res12: Boolean = true

scala> dsSymbol.isClass
res14: Boolean = true

Apparently, there is a class implicitly defined for every object. We can get the symbols for the corresponding objects by asking for the companion of the original symbol, and see that one of them is considered to be a class and another isn't, also that one of them is described as "module" and another as "module class":

scala> val ceoSymbol = ceType.typeSymbol.companionSymbol
ceoSymbol: reflect.runtime.universe.Symbol = object CaseEnglish

scala> ceoSymbol.isClass
res16: Boolean = false

scala> ceoSymbol.isModule
res20: Boolean = true

scala> ceoSymbol.isModuleClass
res20: Boolean = false

scala> ceType.typeSymbol
res17: reflect.runtime.universe.Symbol = object CaseEnglish

scala> ceType.typeSymbol.isClass
res18: Boolean = true

scala> ceType.typeSymbol.isModule
res21: Boolean = false

scala> ceType.typeSymbol.isModuleClass
res21: Boolean = true

There is no companion symbol for a trait, however:

scala> dsSymbol.companionSymbol
res19: reflect.runtime.universe.Symbol = <none>

Now the question: if the dsSymbol is a class, can't we get its successors?

scala> val descendants = dsSymbol.asClass.knownDirectSubclasses
descendants: Set[reflect.runtime.universe.Symbol] = Set()

No luck? Now let's make the trait DemoStuff sealed, reload the example and do the same steps:

sealed trait DemoStuff {
  lazy val i = 0 
...(the rest of the code and the reflection steps are the same)...

scala> val dsSymbol = ru.typeOf[DemoStuff]
dsSymbol: reflect.runtime.universe.Type = DemoStuff

scala> val descendants = dsSymbol.asClass.knownDirectSubclasses
descendants: Set[reflect.runtime.universe.Symbol] = Set(object CaseEnglish, object CaseGerman, object NonCaseOne)

Yay, look what we've got! Just like with the good old enums, we can see them all, because for the sealed trait, all its direct descendants should be defined in the same module, so the parent class knows about them all.

Now suppose we want to instantiate all such objects. One way of using all this info would be to map the types and some inside information - e.g. val i in our case - so that we could mimic accessing the enums by value. Of course, if all elements are case objects, we can also just map the types to their string representations.

There is one right way and one wrong way to instantiate objects. Let's do the right way first.
In implies getting object companions and converting each of them from Symbol to ModuleSymbol:

scala> val descendantObjects = descendants map (m=> if (m.isModuleClass) m.companionSymbol else m) map (_.asModule)
descendantObjects: scala.collection.immutable.Set[reflect.runtime.universe.ModuleSymbol] = Set(object CaseEnglish, object CaseGerman, object NonCaseOne)

Finally, we can use the mirror to turn these symbols into instances:

scala> val reflectedObjects = descendantObjects map (m=>mirror.reflectModule(m).instance.asInstanceOf[DemoStuff])
reflectedObjects: scala.collection.immutable.Set[DemoStuff] = Set(CaseEnglish, CaseGerman, NonCaseOne$@74232853)

They do the right things, and they are equal to the "normally" created instances:

scala> reflectedObjects map (_.printMe)
res25: scala.collection.immutable.Set[String] = Set(Hello,1, Guten TaG,2, Bye,3)

scala> reflectedObjects.head == ce
res26: Boolean = true

Now, the wrong way! (DANGER!)
Suppose we go ahead with classes, not bothering to get the object companions, and try to instantiate them using constructors:

scala> val dsType = ru.typeOf[DemoStuff]
dsType: reflect.runtime.universe.Type = DemoStuff

scala> val descendants = dsType.typeSymbol.asClass.knownDirectSubclasses
descendants: Set[reflect.runtime.universe.Symbol] = Set(object CaseEnglish, object CaseGerman, object NonCaseOne)

scala> val ctors = descendants map (_.typeSignature.member(ru.nme.CONSTRUCTOR))
ctors: scala.collection.immutable.Set[reflect.runtime.universe.Symbol] = Set(constructor CaseEnglish, constructor CaseGerman, constructor NonCaseOne)

So let's reflect and instantiate these classes:

scala> val reflected = descendants map (m=>mirror.reflectClass(m.asClass))
reflected: scala.collection.immutable.Set[reflect.runtime.universe.ClassMirror] = Set(class mirror for CaseEnglish (bound to null), class mirror for CaseGerman (bound to null), class mirror for NonCaseOne (bound to null))

scala> val stuff = (reflected zip ctors) map (m=> m._1.reflectConstructor(m._2.asMethod))
stuff: scala.collection.immutable.Set[reflect.runtime.universe.MethodMirror] = Set(constructor mirror for CaseEnglish.<init>(): CaseEnglish.type (bound to null), constructor mirror for CaseGerman.<init>(): CaseGerman.type (bound to null), constructor mirror for NonCaseOne.<init>(): NonCaseOne.type (bound to null))

scala> val instances = stuff map (_.apply().asInstanceOf[DemoStuff])
instances: scala.collection.immutable.Set[DemoStuff] = Set(CaseEnglish, CaseGerman, NonCaseOne$@661a6677)

scala> instances map (_.printMe)
res1: scala.collection.immutable.Set[String] = Set(Hello,1, Guten TaG,2, Bye,3)

Looks about right? But... what if we compare the newly instantiated CaseEnglish with the our old friend val ce? They should be the same, right?

scala> instances.head == ce
res2: Boolean = false

OOPS. We managed to create those shadow classes for our, supposedly one and unique, case objects. Congratulations, what a bummer!

That concludes the story for now... but you'll never know what the future brings. Hope it was useful :)