Thursday, November 14, 2013

Abusing Scala streams

Tried to run a simple experiment with the Sieve of Eratosphenes, defined in Scala course by Odersky, and find out how far are the limits for streams (which are, in fact, recursive functions). I expected that it will run out of stack at some point, instead it produced a funny error I've never seen before garbage collector gave up :)

scala> Welcome to Scala version 2.10.2 (OpenJDK 64-Bit Server VM, Java 1.7.0_25).

scala> def sieve(s: Stream[Int]): Stream[Int] =
     | 
     | s.head #:: sieve(s.tail filter (_ % s.head != 0))
sieve: (s: Stream[Int])Stream[Int]

scala> def from(n: Int): Stream[Int] = n #:: from(n+1)
from: (n: Int)Stream[Int]

scala> val primes = sieve(from(2))
primes: Stream[Int] = Stream(2, ?)

scala> primes.take(10).toList
res0: List[Int] = List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)

scala> primes.take(100).toList
res1: List[Int] = List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541)

scala> primes.take(1000).toList
res2: List[Int] = List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947...
scala> primes.take(10000).toList
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Throwable.getStackTraceElement(Native Method)
at java.lang.Throwable.getOurStackTrace(Throwable.java:826)
at java.lang.Throwable.printStackTrace(Throwable.java:655)
at java.lang.Throwable.printStackTrace(Throwable.java:720)
at scala.tools.nsc.util.package$$anonfun$stackTraceString$1.apply(package.scala:84)
at scala.tools.nsc.util.package$$anonfun$stackTraceString$1.apply(package.scala:84)
at scala.tools.nsc.util.package$.stringFromWriter(package.scala:73)
at scala.tools.nsc.util.package$.stackTraceString(package.scala:84)
at scala.tools.nsc.Global.throwableAsString(Global.scala:299)
at scala.tools.nsc.interpreter.ILoop$$anonfun$1.applyOrElse(ILoop.scala:533)
at scala.tools.nsc.interpreter.ILoop$$anonfun$1.applyOrElse(ILoop.scala:531)
at scala.runtime.AbstractPartialFunction$mcZL$sp.apply$mcZL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcZL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcZL$sp.apply(AbstractPartialFunction.scala:25)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:573)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:576)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:867)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:822)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:822)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:822)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

I guess the reason there is that the list which is given to the next stream element is recomputed at every step.

Fibonacci stream (defined in Scaladocs for Stream) went somewhat better, it "simply" runs out of memory:

scala> import scala.math.BigInt
import scala.math.BigInt

scala> val fibs:Stream[BigInt] = BigInt(0) #:: BigInt(1) #:: fibs.zip(
     | fibs.tail).map(n=> {
     | n._1 + n._2})
fibs: Stream[scala.math.BigInt] = Stream(0, ?)

scala> fibs.take(10000)
res0: scala.collection.immutable.Stream[scala.math.BigInt] = Stream(0, ?)

scala> fibs.take(10000).toList
res1: List[scala.math.BigInt] = List(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025, 20365011074, 32951280099, 53316291173, 86267571272, 139583862445, 225851433717, 365435296162, 591286729879, 956722026041, 1548008755920, 2504730781961, 4052739537881, 6557470319842, 10610209857723, 17167680177565, 27777890035288, 44945570212853, 72723460248141, 117669030460994, 190392490709135, 308061521170129, 498454011879264, 806515533049393, 1304969544928657, 2111485077978050,...

scala> fibs.take(100000).toList
java.lang.OutOfMemoryError: Java heap space
at java.math.BigInteger.add(BigInteger.java:1075)
at java.math.BigInteger.add(BigInteger.java:1048)
at scala.math.BigInt.$plus(BigInt.scala:203)
at $anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(<console>:10)
at $anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(<console>:9)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
at scala.collection.immutable.Stream$$anonfun$take$2.apply(Stream.scala:731)
at scala.collection.immutable.Stream$$anonfun$take$2.apply(Stream.scala:731)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
at scala.collection.immutable.Stream.foreach(Stream.scala:548)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:629)
at scala.collection.AbstractTraversable.to(Traversable.scala:105)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.AbstractTraversable.toList(Traversable.scala:105)
...

OK, let's look at the stream of the natural numbers. It seemed to survive longer, but as we reach certain limit things still crash horribly.

scala> def from(n:Int): Stream[Int] = n #:: from(n+1)
from: (n: Int)Stream[Int]

scala> val nats = from(0)
nats: Stream[Int] = Stream(0, ?)

scala> nats.take(10000)
res1: scala.collection.immutable.Stream[Int] = Stream(0, ?)

scala> nats.take(10000).toList
res2: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176,...
scala> nats.take(100000).toList
res3: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176,...
scala> nats.take(1000000).toList
res4: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176,...
scala> nats.take(10000000).toList
java.lang.OutOfMemoryError: Java heap space
at scala.collection.immutable.Stream.take(Stream.scala:731)
at scala.collection.immutable.Stream$$anonfun$take$2.apply(Stream.scala:731)
at scala.collection.immutable.Stream$$anonfun$take$2.apply(Stream.scala:731)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
at scala.collection.immutable.Stream.foreach(Stream.scala:548)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:629)
at scala.collection.AbstractTraversable.to(Traversable.scala:105)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.AbstractTraversable.toList(Traversable.scala:105)
...

scala> nats.slice(10000000,10000010).toList
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Neither will it work if we pretend that only the necessary part is evaluated (underlying List will have to go through all preceding elements anyway), and again, failing at the Garbage Collector level:

scala> nats.take(10000000).slice(9999990,10000000).toList
error: 
     while compiling: <console>
        during phase: erasure
     library version: version 2.10.2
    compiler version: version 2.10.2
  reconstructed args: -nowarn

  last tree to typer: This(object $eval)
              symbol: object $eval in package $line15 (flags: <module>)
   symbol definition: class $eval extends Object
                 tpe: type
       symbol owners: object $eval -> package $line15
      context owners: package <root>

== Enclosing template or block ==

Import( // val <import>: ImportType(scala)
  "scala" // final package scala, tree.tpe=scala.type
  List(
    ImportSelector(_,-1,null,-1)
  )
)

== Expanded type of tree ==

ThisType(object $eval)

uncaught exception during compilation: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:745)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:790)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:702)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:566)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:573)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:576)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:867)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:822)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:822)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:822)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)


By the way, at that point the console compiler is already unusable (and it doesn't realize that it was not this but the previous instruction that sent it to ashes) - apparently Garbage Collector failing is almost the last line of defense:
scala> from(10000000).take(10).toList
error: 
     while compiling: <console>
        during phase: global=terminal, atPhase=jvm
     library version: version 2.10.2
    compiler version: version 2.10.2
  reconstructed args: 

  last tree to typer: Apply(method toList)
              symbol: method toList in trait TraversableOnce (flags: <method> <triedcooking>)
   symbol definition: def toList(): List
                 tpe: List
       symbol owners: method toList -> trait TraversableOnce -> package collection
      context owners: object iw -> package $line16

== Enclosing template or block ==

Template( // val <local $iw>: <notype>, tree.tpe=type
  "java.lang.Object" // parents
  ValDef(
    private
    "_"
    <tpt>
    <empty>
  )
  DefDef( // def <init>(): type
    <method>
    "<init>"
    []
    List(Nil)
    <tpt> // tree.tpe=type
    Block( // tree.tpe=Unit
      Apply( // def <init>(): Object in class Object, tree.tpe=Object
        $read$super."<init>" // def <init>(): Object in class Object, tree.tpe=()Object
        Nil
      )
      ()
    )
  )
)

== Expanded type of tree ==

TypeRef(
  TypeSymbol(
    sealed abstract class List extends collection.AbstractSeq with collection.immutable.LinearSeq with Product with collection.generic.GenericTraversableTemplate with collection.LinearSeqOptimized
    
  )
)

uncaught exception during compilation: java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded

That entry seems to have slain the compiler.  Shall I replay
your session? I can re-run each line except the last one.
[y/n]

even though normally, this solution would just work:
scala> def from(n:Int): Stream[Int] = n #:: from(n+1)
from: (n: Int)Stream[Int]

scala> from(10000000).take(10).toList
res0: List[Int] = List(10000000, 10000001, 10000002, 10000003, 10000004, 10000005, 10000006, 10000007, 10000008, 10000009)

And while we are at that... normal (non-stream) List seems to do a better job on the large sequences of numbers:

scala> (1 to 10000000)
res1: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170...
scala> res1.slice(9999990,10000000)
res2: scala.collection.immutable.IndexedSeq[Int] = Vector(9999991, 9999992, 9999993, 9999994, 9999995, 9999996, 9999997, 9999998, 9999999, 10000000)

We can go until we hit the "real" limit:
scala> (1 to 1000000000)
res6: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170...
scala> res6.slice(999999000,1000000000)
res7: scala.collection.immutable.IndexedSeq[Int] = Vector(999999001, 999999002, 999999003, 999999004, 999999005, 999999006, 999999007, 999999008, 999999009, 999999010, 999999011, 999999012, 999999013, 999999014, 999999015, 999999016, 999999017, 999999018, 999999019, 999999020, 999999021, 999999022, 999999023, 999999024, 999999025, 999999026, 999999027, 999999028, 999999029, 999999030, 999999031, 999999032, 999999033, 999999034, 999999035, 999999036, 999999037, 999999038, 999999039, 999999040, 999999041, 999999042, 999999043, 999999044, 999999045, 999999046, 999999047, 999999048, 999999049, 999999050, 999999051, 999999052, 999999053, 999999054, 999999055, 999999056, 999999057, 999999058, 999999059, 999999060, 999999061, 999999062, 999999063, 999999064, 999999065, 999999066, 999999067, 99...
scala> (1 to 10000000000)
<console>:1: error: integer number too large
       (1 to 10000000000)
             ^

scala> (1L to 10000000000L)
java.lang.IllegalArgumentException: 1 to 10000000000 by 1: seqs cannot contain more than Int.MaxValue elements.
at scala.collection.immutable.NumericRange$.count(NumericRange.scala:249)
at scala.collection.immutable.NumericRange.numRangeElements$lzycompute(NumericRange.scala:53)
at scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:52)
at scala.collection.immutable.NumericRange.length(NumericRange.scala:55)
at scala.collection.immutable.NumericRange.toString(NumericRange.scala:208)
at scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:321)
at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:333)
at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:341)
...

Conclusion is as usual: magic should be applied where it's applicable :-}


Monday, August 26, 2013

Slick: 5 seconds of hate

Simple reminder why I hate ORM. Some daily experience from trying to make Slick doing something more useful than the examples they generously provide in the very incomplete and hazy documentation (I just wanted to count the rows in some query):

Warning #1:
method count in class ColumnExtensionMethods is deprecated: Use Query.count instead

After some code changes and more esoteric responses from the compiler, I get this:

Warning #2:
method count in class Query is deprecated: Use .length instead of .count

LOL

++ to add more misery to insult, I can't print out the SQL statement that this thing generates to get the row count, because it's now a method of a query! (Of course, there are always the database logs...)

Argh.

Update:  as a result of this rant, I had a quick twitter conversation with , who explained that myQuery.length.run will give the result I want, and Query(myQuery.length).selectStatement would produce the SQL, and that in Slick 2.0.0, that's coming out somewhere in September, it will be possible to write myQuery.length.selectStatement directly. That is nice, except that Slick 2.0.0 won't be backwards compatible, which probably means that the ongoing projects might not be able to benefit from the new library, unless people are already using alpha version of 2.0.0 (which we don't).

Nevertheless, good to know that someone's working on the problem :-}

Saturday, August 10, 2013

Scala, lists, vectors and mergesort

Today I noticed the following example of mergesort, quoted in the online book "Scala by example" (9.3 Example: Merge sort, page 69):

def msort[A](less: (A, A) => Boolean)(xs: List[A]): List[A] = {
    def merge(xs1: List[A], xs2: List[A]): List[A] =
        if (xs1.isEmpty) xs2
        else if (xs2.isEmpty) xs1
        else if (less(xs1.head, xs2.head)) xs1.head :: merge(xs1.tail, xs2)
        else xs2.head :: merge(xs1, xs2.tail)
    val n = xs.length/2
    if (n == 0) xs
    else merge(msort(less)(xs take n), msort(less)(xs drop n))
}

Further the authors say that the complexity of this algorithm is O(N log (N)), which makes it an attractive option for sorting lists.

Being in an inquisitive mood, I couldn't help wondering how big can in practice be a list sorted with such algorithm. (My current machine is 64-bit Ubuntu with 6 GB RAM and 8 cores)

So here we go.

scala> import scala.util.Random
scala> val randomNums = Seq.fill(1000)(Random.nextInt)
scala> msort((x:Int,y:Int)=>x<y)(randomNums.toList)

res27: List[Int] = List(-2145589793, -2143814602, -2143330861, -2142630038, -2136184047, -2135476301, -2132371275, -2129131922, -2123018872, -2120069375, -2118819114, -2117572993, -2112618055, -2102489605, -2096417279, -2095978655, -2095806565, -2087860343, -2087105558, -2085005596, -2083360810, -2077234330, -2065393243, -2058765966, -2056823240, -2053145149, -2047696716, -2044737011, -1847777706, -18259...

OK, for 1000 elements it works, for 10000 it works too. But let's push one more order of magnitude...

scala> val randomNums = Seq.fill(100000)(Random.nextInt).toList
randomNums: List[Int] = List(186865346, 638415825, -637620864, 220723809, -1536234831, 1710286185, 126091472, -1621728642, -1819749330, -294195052, -613979926, 1278841478, -111715804, -1953497441, 1891679544, 582175290, 1555531003, -430520072, 652471392, 1211722008, -112446234, -1900260621, 2058382521, 564201400, -1225275015, 2069052362, 797097978, 1077363576, 1469066877, -303059738, -166855116, -1876385701, 285630983, -1956550564, -1991336959, -1713232594, -868759609, -723403847, -282664963, 1965397484, 1563483549, -618177790, -297223307, -197365661, -703715983, 28207094, -1793590690, 374050582, 992041027, -1931269739, -932512120, -1551657371, 1523463808, -742246427, -1665172973, 50892779, -286029416, -1654054925, -874783455, -1825744857, -571856180, 289326103, -215127347, -1488600483,...
scala> msort((x:Int,y:Int)=>x<y)(randomNums)
java.lang.StackOverflowError
at .merge$1(<console>:20)
at .merge$1(<console>:22)
at .merge$1(<console>:20)
at .merge$1(<console>:20)
at .merge$1(<console>:20)
at .merge$1(<console>:20)
at .merge$1(<console>:22)
        ...
Oi wei, we ran out of stack!
The reasons are obvious: this line of code

xs2.head :: merge(xs1, xs2.tail)

means that all the time while you are busy merging the partial lists, there are orphan elements hanging around, waiting until they are allowed to be the head. And the stack isn't made of rubber!

Well... there is some sport, of course, could we make it at least work? Using the tail recursion, perhaps? (If you have forgotten, tail recursion happens when the recursive function ends with calling itself. Then the compiler can be smart enough to make a cycle instead of the sequence of the recursive calls, at least if you ask it politely :) )

import scala.annotation.tailrec
...
@tailrec
def merge(xs1: List[A], xs2: List[A]): List[A] =
   if (xs1.isEmpty) xs2
       else if (xs2.isEmpty) xs1
       else if (less(xs1.head, xs2.head)) xs1.head :: merge(xs1.tail, xs2)
       else xs2.head :: merge(xs1, xs2.tail)

However, this won't compile, and rightly so:

[error] 
<filename>: could not optimize @tailrec annotated method merge: it contains a recursive call not in tail position
[error] else xs2.head :: merge(xs1, xs2.tail)

Ah right. Operator :: is right associative, that means it's called after merge(arg1, arg2) was called, therefore tail recursion isn't possible.

Well, without further ado, that is the variant I have ended up with. It worked in the sense that it didn't break - there was now a cycle - but for the list with 100000 elements it took about 5 minutes to do the sorting!

import scala.math.Ordering.Implicits._ 
import scala.util.Random 
import scala.annotation.tailrec

object Sorting {
def less[T:Ordering](x:T,y:T) = x<y
@tailrec
def merge[A:Ordering ](xs0: List[A], xs1: List[A], xs2: List[A]): List[A] =
if (xs1.isEmpty) if (xs0 == Nil) xs2 else xs0 ::: xs2
else if (xs2.isEmpty) if (xs0 == Nil) xs1 else xs0 ::: xs1
else if (less(xs1.head, xs2.head)) merge(xs0 :+ xs1.head, xs1.tail, xs2)
else merge(xs0 :+ xs2.head, xs1, xs2.tail)

def msort[A:Ordering](xs: List[A]): List[A] = {
val n = xs.length/2
if (n == 0) xs
else merge(List[A](), msort(xs take n), msort(xs drop n))
}
def checkSorted[A:Ordering](x:A, xs:List[A]):Boolean = xs match {
case Nil => true
case head::tail => if (head < x) false else checkSorted(head, tail)
}
def test(size:Int, printAll:Boolean = true) = {
val randomNums = Seq.fill(size)(Random.nextInt).toList
val sorted = msort(randomNums)
if (printAll) println(sorted)
if (!checkSorted(sorted.head, sorted.tail)) println("Not sorted!!!")
}
}

The reason why this code is slow is in the fact that now we have added the accumulator, and we want to add elements to the end of it. However, List (as also mentioned here) only guarantees the quick access to the head - the complexity of accessing the last element would be proportional to the size of the list!

So is there anything to do to make it faster, without trying hard to imagine some other algorithm? My solution was to use Vectors instead of Lists.

object Sorting {
def less[T:Ordering](x:T,y:T) = x<y
@tailrec
def merge[A:Ordering ](xs0: Vector[A], xs1: Vector[A], xs2: Vector[A]): Vector[A] =
if (xs1.isEmpty) if (xs0 == Nil) xs2 else xs0 ++ xs2
else if (xs2.isEmpty) if (xs0 == Nil) xs1 else xs0 ++ xs1
else if (less(xs1.head, xs2.head)) merge(xs0 :+ xs1.head, xs1.tail, xs2)
else merge(xs0 :+ xs2.head, xs1, xs2.tail)

def msort[A:Ordering](xs: Vector[A]): Vector[A] = {
val n = xs.length/2
if (n == 0) xs
else merge(Vector[A](), msort(xs take n), msort(xs drop n))
}
def checkSorted[A:Ordering](x:A, xs:List[A]):Boolean = xs match {
case Nil => true
case head::tail => if (head < x) false else checkSorted(head, tail)
}
def test(size:Int, printAll:Boolean = true) = {
val randomNums = Seq.fill(size)(Random.nextInt).toVector
val sortedVec = msort(randomNums)
if (printAll) println(sortedVec)
val sorted = sortedVec.toList
if (!checkSorted(sorted.head, sorted.tail)) println("Not sorted!!!")
}
}

Vectors guarantee constant access to any element they contain. It is probably slightly slower than accessing the head of the list, but for the tasks like this one it seems to be a definitely better option. It took very little time to sort both 100000 and 1000000 vectors of random integers. With 10000000, however, it still takes quite some time and apparently more memory, but 10 millions is already more than the amount of rows in many decent tables :-)

I hope it was useful, or at least that you did rid to the end. I still want to write a C++ implementation and see which one finishes first. So may be there will be the part two...

Sunday, August 4, 2013

A Voyage into the Scala Reflection land, part 2: case or no case

Let's make a little module (using :paste mode of the console or by compiling it separately and importing the corresponding module) - it looks a bit weird but suitable enough for what I want to demonstrate:

trait DemoStuff {
  lazy val i = 0 
  lazy val s = ""
  lazy val printMe = s"${s},${i}"
}

case object CaseEnglish extends DemoStuff {
  override lazy val i = 1 
  override lazy val s = "Hello"
}

case object CaseGerman extends DemoStuff {
  override lazy val i = 2 
  override lazy val s = "Guten TaG"
}

object NonCaseOne extends DemoStuff {
  override lazy val i = 3 
  override lazy val s = "Bye"
}

(Regarding the usage of lazy vals, I was inspired by this explanation).

In theory, both "normal" objects and case objects in Scala are supposed to be singletons. However, as far as serialization is concerned, it seems that case objects are more singletons than their non-case siblings

scala> val nco = NonCaseOne
nco: NonCaseOne.type = NonCaseOne$@74232853

scala> val ce = CaseEnglish
ce: CaseEnglish.type = CaseEnglish

This mimics the same conventions for the classes versus case classes.

Let's go on with reflection. Assume that, like in the previous case, we imported the universe, the current mirror and defined the helper to get the type tag for us.

scala> val ceType = getTypeTag(ce)
ceType: reflect.runtime.universe.Type = CaseEnglish.type

Let's look at the base classes for this type.

scala> val baseClasses = ceType.baseClasses
baseClasses: List[reflect.runtime.universe.Symbol] = List(object CaseEnglish, trait Serializable, trait Serializable, trait Product, trait Equals, trait DemoStuff, class Object, class Any)

I am still not sure why train Serializable is listed twice in these cases, but let's look at the symbol for DemoStuff, which is, as we know, the direct parent of our objects.

scala> val dsSymbol = baseClasses.filter(_.name.toString=="DemoStuff").head
res9: reflect.runtime.universe.Symbol = trait DemoStuff

Interesting to note that both ceType.typeSymbol and dsSymbol are seen as classes:

scala> ceType.typeSymbol.isClass
res12: Boolean = true

scala> dsSymbol.isClass
res14: Boolean = true

Apparently, there is a class implicitly defined for every object. We can get the symbols for the corresponding objects by asking for the companion of the original symbol, and see that one of them is considered to be a class and another isn't, also that one of them is described as "module" and another as "module class":

scala> val ceoSymbol = ceType.typeSymbol.companionSymbol
ceoSymbol: reflect.runtime.universe.Symbol = object CaseEnglish

scala> ceoSymbol.isClass
res16: Boolean = false

scala> ceoSymbol.isModule
res20: Boolean = true

scala> ceoSymbol.isModuleClass
res20: Boolean = false

scala> ceType.typeSymbol
res17: reflect.runtime.universe.Symbol = object CaseEnglish

scala> ceType.typeSymbol.isClass
res18: Boolean = true

scala> ceType.typeSymbol.isModule
res21: Boolean = false

scala> ceType.typeSymbol.isModuleClass
res21: Boolean = true

There is no companion symbol for a trait, however:

scala> dsSymbol.companionSymbol
res19: reflect.runtime.universe.Symbol = <none>

Now the question: if the dsSymbol is a class, can't we get its successors?

scala> val descendants = dsSymbol.asClass.knownDirectSubclasses
descendants: Set[reflect.runtime.universe.Symbol] = Set()

No luck? Now let's make the trait DemoStuff sealed, reload the example and do the same steps:

sealed trait DemoStuff {
  lazy val i = 0 
...(the rest of the code and the reflection steps are the same)...

scala> val dsSymbol = ru.typeOf[DemoStuff]
dsSymbol: reflect.runtime.universe.Type = DemoStuff

scala> val descendants = dsSymbol.asClass.knownDirectSubclasses
descendants: Set[reflect.runtime.universe.Symbol] = Set(object CaseEnglish, object CaseGerman, object NonCaseOne)

Yay, look what we've got! Just like with the good old enums, we can see them all, because for the sealed trait, all its direct descendants should be defined in the same module, so the parent class knows about them all.

Now suppose we want to instantiate all such objects. One way of using all this info would be to map the types and some inside information - e.g. val i in our case - so that we could mimic accessing the enums by value. Of course, if all elements are case objects, we can also just map the types to their string representations.

There is one right way and one wrong way to instantiate objects. Let's do the right way first.
In implies getting object companions and converting each of them from Symbol to ModuleSymbol:

scala> val descendantObjects = descendants map (m=> if (m.isModuleClass) m.companionSymbol else m) map (_.asModule)
descendantObjects: scala.collection.immutable.Set[reflect.runtime.universe.ModuleSymbol] = Set(object CaseEnglish, object CaseGerman, object NonCaseOne)

Finally, we can use the mirror to turn these symbols into instances:

scala> val reflectedObjects = descendantObjects map (m=>mirror.reflectModule(m).instance.asInstanceOf[DemoStuff])
reflectedObjects: scala.collection.immutable.Set[DemoStuff] = Set(CaseEnglish, CaseGerman, NonCaseOne$@74232853)

They do the right things, and they are equal to the "normally" created instances:

scala> reflectedObjects map (_.printMe)
res25: scala.collection.immutable.Set[String] = Set(Hello,1, Guten TaG,2, Bye,3)

scala> reflectedObjects.head == ce
res26: Boolean = true

Now, the wrong way! (DANGER!)
Suppose we go ahead with classes, not bothering to get the object companions, and try to instantiate them using constructors:

scala> val dsType = ru.typeOf[DemoStuff]
dsType: reflect.runtime.universe.Type = DemoStuff

scala> val descendants = dsType.typeSymbol.asClass.knownDirectSubclasses
descendants: Set[reflect.runtime.universe.Symbol] = Set(object CaseEnglish, object CaseGerman, object NonCaseOne)

scala> val ctors = descendants map (_.typeSignature.member(ru.nme.CONSTRUCTOR))
ctors: scala.collection.immutable.Set[reflect.runtime.universe.Symbol] = Set(constructor CaseEnglish, constructor CaseGerman, constructor NonCaseOne)

So let's reflect and instantiate these classes:

scala> val reflected = descendants map (m=>mirror.reflectClass(m.asClass))
reflected: scala.collection.immutable.Set[reflect.runtime.universe.ClassMirror] = Set(class mirror for CaseEnglish (bound to null), class mirror for CaseGerman (bound to null), class mirror for NonCaseOne (bound to null))

scala> val stuff = (reflected zip ctors) map (m=> m._1.reflectConstructor(m._2.asMethod))
stuff: scala.collection.immutable.Set[reflect.runtime.universe.MethodMirror] = Set(constructor mirror for CaseEnglish.<init>(): CaseEnglish.type (bound to null), constructor mirror for CaseGerman.<init>(): CaseGerman.type (bound to null), constructor mirror for NonCaseOne.<init>(): NonCaseOne.type (bound to null))

scala> val instances = stuff map (_.apply().asInstanceOf[DemoStuff])
instances: scala.collection.immutable.Set[DemoStuff] = Set(CaseEnglish, CaseGerman, NonCaseOne$@661a6677)

scala> instances map (_.printMe)
res1: scala.collection.immutable.Set[String] = Set(Hello,1, Guten TaG,2, Bye,3)

Looks about right? But... what if we compare the newly instantiated CaseEnglish with the our old friend val ce? They should be the same, right?

scala> instances.head == ce
res2: Boolean = false

OOPS. We managed to create those shadow classes for our, supposedly one and unique, case objects. Congratulations, what a bummer!

That concludes the story for now... but you'll never know what the future brings. Hope it was useful :)

Tuesday, July 30, 2013

Voyage into the Scala reflection land, part 1.

Recently I have played a little with Scala reflection capabilities. Hence I would like to share what I have learned.

Welcome to Scala version 2.10.2 (OpenJDK 64-Bit Server VM, Java 1.7.0_21).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import reflect.runtime.{universe=>ru}
import reflect.runtime.{universe=>ru}
scala> import reflect.runtime.{currentMirror=>mirror}

First step: import the "experimental" reflection library and assign a tag to it (this way it's easy to see in code where this library has been used. Also, import the "root" mirror through which you, like Alice, can enter the reflection world. Another way to obtain the "root" mirror would be by calling:

scala> val otherRootMirror = ru.runtimeMirror(getClass.getClassLoader)
otherRootMirror: reflect.runtime.universe.Mirror = JavaMirror with scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@59a10521 of type class scala.tools.nsc.interpreter.IMain$TranslatingClassLoader with classpath [(memory)] and parent being scala.tools.nsc.util.ScalaClassLoader$URLClassLoader@5a57e77f of type class scala.tools.nsc.util.ScalaClassLoader$URLClassLoader with classpath [file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/resources.jar,file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar,file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jsse.jar,file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jce.jar,file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/charsets.jar,file:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rhino.jar,file:/usr/local/share/scala/scala-2.10.2/lib/akka-actors.jar,f...


Now, let's do an easy thing: define a case class and inspect its instance with reflection.

scala> case class ReflectDemo(intVal: Integer, stringVal:String) {
     | def multiply(byThat: Int): Int = intVal*byThat
     | def reverse(): String = stringVal.reverse
     | }
defined class ReflectDemo

scala> val rd = ReflectDemo(5, "five")
rd: ReflectDemo = ReflectDemo(5,five)

Let's get the type symbol for the object we are using. The docs about the Scala reflection suggest defining a helper in the following way:

scala> def getTypeTag[T:ru.TypeTag](obj:T) = ru.typeOf[T]
getTypeTag: [T](obj: T)(implicit evidence$1: reflect.runtime.universe.TypeTag[T])reflect.runtime.universe.Type

Now we can get the type tag of our object!

scala> val typeTag = getTypeTag(rd)
typeTag: reflect.runtime.universe.Type = ReflectDemo

It opens a lot of possibilities...

scala> typeTag.
=:=                 asInstanceOf        asSeenFrom          baseClasses         baseType            contains            
declaration         declarations        erasure             exists              find                foreach             
isInstanceOf        map                 member              members             normalize           substituteSymbols   
substituteTypes     takesTypeArgs       termSymbol          toString            typeConstructor     typeSymbol          
weak_<:<            widen               

For example, we can use it to get all the base classes for this type:

scala> typeTag.baseClasses
res10: List[reflect.runtime.universe.Symbol] = List(class ReflectDemo, trait Serializable, trait Serializable, trait Product, trait Equals, class Object, class Any)

We can get the type name:

scala> typeTag.typeSymbol.name
res18: reflect.runtime.universe.Name = ReflectDemo

And all the members (defined in all base classes up to the very generic ones:

scala> typeTag.members
res8: reflect.runtime.universe.MemberScope = Scopes(method equals, method toString, method hashCode, method canEqual, method productIterator, method productElement, method productArity, method productPrefix, method copy$default$2, method copy$default$1, method copy, method reverse, method multiply, constructor ReflectDemo, value stringVal, value stringVal, value intVal, value intVal, method $init$, method $asInstanceOf, method $isInstanceOf, method synchronized, method ##, method !=, method ==, method ne, method eq, constructor Object, method notifyAll, method notify, method clone, method getClass, method wait, method wait, method wait, method finalize, method asInstanceOf, method isInstanceOf, method !=, method ==)

The other method, declarations, seems to omit the generics:

scala> typeTag.declarations
res9: reflect.runtime.universe.MemberScope = SynchronizedOps(value intVal, value intVal, value stringVal, value stringVal, constructor ReflectDemo, method multiply, method reverse, method copy, method copy$default$1, method copy$default$2, method productPrefix, method productArity, method productElement, method productIterator, method canEqual, method hashCode, method toString, method equals)

In any case, we can use one of these to get the list of all the variables defined in the object of this type:

scala> val variables = typeTag.members.filter(!_.isMethod)
variables: Iterable[reflect.runtime.universe.Symbol] = SynchronizedOps(value stringVal, value intVal)

Now, back to the mirrors and stuff. Let's use the root mirror to get the InstanceMirror for the runtime object which we are inspecting.

scala> val instanceMirror = mirror.reflect(rd)
instanceMirror: reflect.runtime.universe.InstanceMirror = instance mirror for ReflectDemo(5,five)

We can use this mirror to reflect any of the variables belonging to that object, using the Symbols we obtained earlier:

scala> variables.map(m=>instanceMirror.reflectField(m.asTerm))
res12: Iterable[reflect.runtime.universe.FieldMirror] = List(field mirror for ReflectDemo.stringVal (bound to ReflectDemo(5,five)), field mirror for ReflectDemo.intVal (bound to ReflectDemo(5,five)))

Ultimately, we can obtain the values for these variables:

scala> val values = variables.map(m=>instanceMirror.reflectField(m.asTerm).get)
values: Iterable[Any] = List(five, 5)

Also, we can use the same Symbols to get access to the variables' types:

scala> val types = variables.map(_.typeSignature)
types: Iterable[reflect.runtime.universe.Type] = List(String, java.lang.Integer)

Now, if we combine all this, we get the whole definition of the variable's properties:

scala> (variables zip( types zip values)).toMap
res17: scala.collection.immutable.Map[reflect.runtime.universe.Symbol,(reflect.runtime.universe.Type, Any)] = Map(value stringVal -> (String,five), value intVal -> (java.lang.Integer,5))

This can be used, for example, if you get a fancy to create a wrapper for persisting the given object into the database based solely on its structure. You can use the type information to create the table structure and then use the values of the given objects to populate the rows. (I did something like that for someone recently, as an experimental prototype. It did work :) ).

Update: How about creating new instances of classes during reflection? Well, let's try to create a new instance of our ReflectDemo class.

The original input, then, would be the type and the map of variables, where the names point to the values (in principle, we could also reflect the type of every variable and use the type info to check if the variables match the expectations of the constructor...)

scala> val values = Map("intVal"->5, "stringVal"->"five")
values: scala.collection.immutable.Map[String,Any] = Map(intVal -> 5, stringVal -> five)

scala> val rdType = ru.typeOf[ReflectDemo]
rdType: reflect.runtime.universe.Type = ReflectDemo

Let's reflect the class:

scala> val rdClass = mirror.reflectClass(rdType.typeSymbol.asClass)
rdClass: reflect.runtime.universe.ClassMirror = class mirror for ReflectDemo (bound to null)

The result is of the type ClassMirror and it can be used to invoke the constructor for the given class.
We can get all constructors for a class from its Type information:

scala> val constructors = rdType.members.filter(m=>m.isMethod && m.asMethod.isConstructor)
constructors: Iterable[reflect.runtime.universe.Symbol] = SynchronizedOps(constructor ReflectDemo, method $init$, constructor Object)

In our case, we know we only have one constructor, and we can also access it using the predefined name tag:

scala> val defaultCtor = rdType.member(ru.nme.CONSTRUCTOR)
defaultCtor: reflect.runtime.universe.Symbol = constructor ReflectDemo

Let's have a look which parameters it expects, now that's the piece of magic you just have to know (took me some research :) ) :

scala> val paramsList = defaultCtor.asMethod.paramss
paramsList: List[List[reflect.runtime.universe.Symbol]] = List(List(value intVal, value stringVal))

I suppose that this reflects the fact that one can invoke the constructor with various sets of parameters.
In our case we only have one constructor, so let's to the easy way. If there would be more than one constructor, we would have to use the name and type information from the available values in order to find out the suitable one, which is the exercise I leave for the curious reader (as such things go :) )

scala> val params = paramsList.head

params: List[reflect.runtime.universe.Symbol] = List(value intVal, value stringVal)
scala> val mappedValues = params map (m=>values(m.name.toString))
mappedValues: List[Any] = List(5, five)

Now we are almost there. Use the ClassMirror to reflect the constructor:

scala> val runtimeCtor = rdClass.reflectConstructor(defaultCtor.asMethod)
runtimeCtor: reflect.runtime.universe.MethodMirror = constructor mirror for ReflectDemo.<init>(intVal: java.lang.Integer, stringVal: String): ReflectDemo (bound to null)

MethodMirror provides the method apply(args: Any*) which allows to supply a sequence of any kind of values. However, it won't accept the List, because it expects a Seq, so it will consider the List to be a single parameter, and complain loudly:

scala> runtimeCtor.apply(mappedValues)
java.lang.IllegalArgumentException: wrong number of arguments
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaConstructorMirror.apply(JavaMirrors.scala:444)
....

But (that's another little piece of magic) one can easily make a sequence from the List:

scala> val newRD = runtimeCtor.apply(mappedValues:_*)
newRD: Any = ReflectDemo(5,five)

Remember the instance of the same class we had in the beginning? Let's compare them and prove that we could recreate the same instance!

scala> rd == newRD
res24: Boolean = true

Yay! 

Well, that's enough for now. Next topic (if I get to that): the specifics of case classes and objects, and the dangers of their instantiating via reflection :) By the way, in this post, the case class could also be the normal class - nothing would have changed except that you'd create it with New. The ways you instantiate (case) objects and (case) classes via reflection, though, are very different, and should be used with some care! To be continued :)