Posted by: leppie | November 3, 2008

IronScheme does CPS!

After a week or 2 of struggling, IronScheme is slowly but surely getting CPS support that will enable full first-class continuations (iow, the can be re-invoked multiple times).

Here is a little teaser:

> (+ 8 (call/cc (case-lambda [(c) (set! plus8 c) (c 5)])))
13
> (plus8 3)
11

Currently, this executes from the REPL without issues, but there remains a lot of problem areas that still need to be addressed.

Also note, with this version the entire Scheme source gets converted to CPS, and results in a bootfile roughly 2,5 times the previous size. Not sure about performance currently.

Update:

Now I need to figure out how to handle a CPS compatible version of letrec/letrec*. Redefining map not to use letrec, allows the following to work:

> (call/cc 
  (lambda (outer) 
    (map 
      (lambda (x) 
        (call/cc 
          (lambda (inner) 
            (set! next inner) (outer x)))) 
      '(1 2 3))))
1
> (next values)
2
> (next values)
3
> (next values)
( ... garbage ... )

Update 2:

It appears letrec/letrec* is not a problem, but rather values/call-with-values, but that should be trivial to fix πŸ™‚

Update 3:

Appears to be not so trivial after all! 😐

Update 4:

Damn you Confucious! Was using the wrong identity function to pass as a continuation! All working now πŸ™‚

Advertisements

Responses

  1. Congratulations Leppie!

  2. […] IronScheme does CPS! This was written by Grant. Posted on Monday, November 3, 2008, at 1:32 pm. Filed under Link. Tagged .NET, Scheme. Bookmark the permalink. Follow comments here with the RSS feed. Post a comment or leave a trackback. […]

  3. Thanks, still quite far from working correctly, but at least the ‘difficult’ bits are now integrated. Now to get rid of the parts where I ‘cheated’.

    It is also pretty slow at the moment. While it did manage to build the bootfile, it did take 6 times longer 😦

  4. ;; Can Iron Scheme handle this?

    (define (ev? n) (or (= n 0) (od? (- n 1))))

    (define (od? n) (and (> n 0) (ev? (- n 1))))

    (ev? 123456789)

  5. @George: Yes, it does handle tail calls correctly, your example does take a loooong time to run though.

  6. Great and very interesting.
    How is this implemented?
    I’m working on PyPy JIT for CLI
    http://morepypy.blogspot.com/2008/11/porting-jit-to-cli-part-1.html
    and I found that usually .NET does a very bad job with code that doesn’t follow the usual “C#-like” style (e.g., it has terrible performances for tail calls).

  7. @Antonio: Currently I use the a modified version of the DLR (only about 15% of it) to help with code generation. Ideally, I would like to write a better more suited code generator for it.

    I have not seen performance penalties using tail calls. In fact a CPS’d version of the naive factorial algorithm runs faster (and need no stack space). My biggest problems with tail calls are the lack of caller info they can provide when debugging. In fact without tail calls, I just run out of stack space in CPS mode.

    My biggest cost (IMO) so far is for closures, especially their associated environments can end up taking a lot a space depending on the ‘depth’. Not only does it take up a huge amount of space (in terms of code generated), but access times are slow too.

    And thanks for the link πŸ™‚

  8. @Antonio (again): Wow, very impressive numbers! I think I should start looking into using PyPy in some or other way (I did have a look at it about a year back).

  9. @leppie
    uhm, I’m surprised that tail calls work so well for you, I wonder if I did something wrong.
    I wrote a little benchmark that sums all the integers up to N using a loop, a normal recursive call and a tail call; the original C# source is here:
    http://codespeak.net/svn/user/antocuni/cli-bench/tailcall.cs

    then I patched the IL to add the tail. prefix to the call:
    http://codespeak.net/svn/user/antocuni/cli-bench/tailcall.il

    Here is the final .exe:
    http://codespeak.net/svn/user/antocuni/cli-bench/tailcall.exe

    With CLR 2.0.50727 the tail call is about 6.5 times *slower* than the normal call, which is by itself 2 times slower than the for loop.
    Could you try to run the benchmark, please?

  10. @Antonio:
    I have not tested it yet, but from the looks of it, I do something different. Due to the dynamic requirements of Scheme, types should be checked inside the procedure, hence I always use object types for parameters and return values, in the signature.

    I will fiddle a bit with your code to see if some implicit boxing is occurring on the return value.

  11. @Antonio: Thanks for the information, my results are the same as yours. I did not realize the tail calls on .NET behave this badly 😦

    In my tests with the code, tail calls were roughly 10 times slower. Tried a few variations, it seems using an object type for a return value gets the best performance, but still very slow!

    It also probably explains the 6 times slow down I get with the CPS version of IronScheme as every call should be a tail call.

  12. @leppie
    😦
    Ok, it seems that tail calls are indeed not an option for me at the moment. For scheme it’s even worse, because you *need* to use tail calls anyway, I suppose.

    FWIW, mono behaves much better with tail calls: on my machine tail calls are about 30% faster than normal calls, so you maybe want to run your benchmarks also there.

  13. I applied the NoInlining option to the recursive calls, and it seems tail calls are never inlined (same time), probably intentional, but that still makes the tail call 6 times slower 😦

  14. OTOH, that gives us an interesting problem to solve πŸ™‚

  15. Antonio :

    I was just browsing the F# generated assemblies, and they use tail calls where possible it seems. I wonder how they deal with the performance impact. Perhaps .NET 4 will offer better performance in that aspect πŸ™‚

  16. Note that Mono’s implementation of tail calls is broken: they only eliminate direct self and mutual tail recursion and not general tail calls, i.e. to dynamic locations. Consequently, many conventional FP idioms (e.g. combinators) will leak stack space until the stack overflows on Mono.

    • Hi Jon

      I kinda realised that, but after investigation, it appears the problem is with virtual calls only. I have changed my emitted code to call delegates non-virtual (as it’s not really needed according to the spec), but have not tested tail recursion on Mono recently.

      How does F# cope on Mono? I see they use tail calls in all tail positions.

      Cheers

      leppie


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: