Ola Bini's complete blog can be found at: http://olabini.com/blog

Items:   1 to 5 of 31   Next »

Saturday, February 11, 2012

The last few years the expressiveness of programming languages have been on my mind. There are many things that comes into consideration for expressiveness, not matter what definition you actually end up using. However, what I’ve been thinking about lately is syntax. There’s a lot of talk about syntax and many opinions. What made me start thinking more about it lately was a few blog posts I read that kind of annoyed me a bit. So I thought it was time to put out some of my thoughts on syntax here.

I guess the first question to answer is whether syntax matters for a programming language. The traditional computer science view is largely that syntax doesn’t matter. And in a reductionist, system level view of the world this is understandable. However, you also have the opposite view which comes strongly into effect especially when talking about learning a new language, but also for reading existing code. At that point many people are of the opinion that syntax is extremely important.

The way I approach the question is based on programming language design. What can I do when designing a language to make it more expressive for as many users as possible. To me, syntax plays a big part in this. I am not saying that a language should designed with a focus on syntax or even with syntax first. But the language syntax is the user interface for a programmer, and as such there are many aspects of the syntax that should help a programmer. Help them with what? Well, understanding for one. Reading. Communicating. I suspect that writing is not something we’re very much interested in optimizing for in syntax, but that’s OK. Typing fewer characters doesn’t actually optimize for writing either - the intuition behind that statement is quite easy: imagine you had to write a book. However, instead of writing it in English, you just wrote the gzipped version of the book directly. You would definitely have to type much less - but would that in any way help you write the book? No, probably it would make it harder. So typing I definitely don’t want to optimize. However, I would like to make it easy for a programmer to express an idea as consicely as they can. To me, this is about mentioning all things that are relevant, without mentioning irrelevant things. But incidentally, a syntax with that property is probably going to be easier to communicate with, and also to read, so I don’t think focusing on writing at all is the right thing to do.

Fundamentally, programming is about building abstractions. We are putting together extremely intricate mind castles and then try to express them in such a way that our computers will realize them. Concepts, abstractions - and manipulating and communicating them - are the pieces underlying programming languages, and it’s really what all languages must do in some way. A syntax that makes it easier to think about hard abstractions is a syntax that will make it easier to write good and robust programs. If we talk about the Sapir-Whorf hypothesis and linguistic relativity, I suspect that programmers have an easier time reasoning about a problem if their language choice makes those abstractions clearer. And syntax is one way of making that process easier. Simply put, the things we manipulate with programming languages are hard to think about, and good syntax can improve that.

Seeing as we are talking about reading - who is this person reading? It makes a huge difference if we’re trying to design something that should be easy to read for a novice or we’re trying to design a syntax that makes it easier for an expert to understand what’s going on. Optimally we would like to have both, I guess, but that doesn’t seem very realistic. The things that make syntax useful to an expert are different than what makes it easy to read for a novice.

At this point I need to make a request - Rich Hickey gave a talk at Strange Loop a few months ago. It’s called Simple made Easy and you can watch it here: http://www.infoq.com/presentations/Simple-Made-Easy - you should watch it now.

Simply put, if you had never learnt any German, should you really expect to be able to read it? Is it such a huge problem that someone who has never studied Prolog will have no idea what’s going on until they study it a bit? Doesn’t it make sense that people who understand German can express all the things they need to say in that language? Even worse, when it comes to programming languages, people expect them to be readable to people who have never programmed before! Why in world would that ever be a useful goal? It would be like saying German is not readable (and is thus a bad language) because dolphins can’t read it.

A tangential aspect to the simple versus easy of programming languages is also how our current syntactic choices echo what’s been done earlier. It’s quite uncommon with a syntax design that becomes wildly successful while looking completely different from previous languages. This seems to have more to do with how easy a language is to learn, rather than how good the syntax actually is by itself. As such, it’s suspect. Historical accidents seem to contribute much more syntax design than I am comfortable with.

Summarizing: when we talk about reading programming languages, it doesn’t make much sense to optimize for someone who doesn’t know the language. In fact, we need to take as a given that a person knows a programming language. Then we can start talking about what aspects reduce complexity and improve communication for a programmer.

When are talking about reading of languages, one thing that sometimes come up is the need for redundancy. Specifically, one of the blogs that inspired these thoughts basically claimed that the redundancy in the design of Java was a good thing, because it improved readability. Now, I find this quite interesting - I have never seen any research that explains why this would be the case. In fact, the only argument in support I’ve heard that backs up the idea is that natural languages have highly redundant elements, and thus programming languages should too. First, that’s not actually true for all natural languages - but we must also consider _why_ natural languages have so much redundancy built in. Natural languages are not designed (with a few exceptions) - they grow to have the features they have because they are useful. But reading, writing, speaking and listening of natural languages have so different evolutionary pressures from each other that they should be treated differently. The reason we need redundancy is simply because it’s very hard to speak and listen without it. For all intents and purposes, what is considered good and idiomatic in spoken language is very different from written language. I just don’t buy this argument for redundancy. It might be good with redundancy in programming language syntax, but so far I remain to be convinced.

It is sometimes educational to look at mathematical notation. However, mathematical notation is just that - notation. I’m not convinced we can have one single notation for programming languages, and I don’t think it’s something to aspire to. But the useful lesson from math notation is how terse it is. However, you still need to spend a long time to digest what it means. That’s because the ideas are deep. The thinking that went into them is deep. If we ever come to a point where programming languages can embody as deep ideas in as terse a notation, I suspect we will have figured out how to design programming language syntax that is way better than what we have right now.

I think this covers most of the things I wanted to cover. At some point I would like to talk about why I think Smalltalk, Ruby, Lisp and some others have quite good syntax, and how that syntax is intimately related with why those languages are powerful and expressive. Some other random thoughts I wanted to cover was evolvability of language syntax, whether a syntax should be designed to be easy to parse, and possibly also how much English specifically has impact the design of programming languages. But these are thoughts for another time. Suffice to say, syntax matters.


Wednesday, November 9, 2011

It seems the JavaScript tool space is not completely saturated yet. As I mentioned in my previous post I’ve had particular trouble finding a good solution to code coverage. So I decided to build my own version of it. The specific feature to notice is transparent translation of source code and support for branch coverage. It also has some limitations at the moment, of course. This is release 0.0.1 and as such is definitely a first release. If you happen to use the Jasmine JUnit runner it should be possible to drop in this directly and have something working immediately.

You can find information, examples and downloads here: http://jescov.olabini.com


Monday, October 24, 2011

My most recent project was on a fairly typical Java Web project where we had a component that should be written in JavaScript. Nothing fancy, and nothing big. It does seem like people are still not taking JavaScript seriously in these kind of environments. So I wanted to take a few minutes and talk about how we developed JavaScript on this project. The kind of advice I’ll be giving here is well suited for web projects with small to medium amounts of JavaScript. If you’re writing large parts of your application on the client side, you probably want to go with a full stack framework to help you out, so these things are less relevant.

Of course, most if not all things I’ll cover here can be gleaned from other sources, and probably better. And if you’re an experienced JavaScript developer, you are probably fine without this article.

I had to do two things to get efficient in using JavaScript. The first one was to learn to ignore the syntax. The syntax is clunky and definitely gets in the way. But with the right habits (such as having a shortcut for function/lambda literals, and making sure to always put the returned value on the same line as the return statement) I’ve been able to see through the syntax and basically use JavaScript in a Scheme-like style. The second thing is to completely ignore the object system. I use a lot of object literals, but not really any constructors or the this-keyword. Both of these features can be used well, but they are also very clunky, and hard to get everyone on a team to understand the same way. I love prototype based OO as a model, and I’ve used it with success in Ioke and Seph. But with JavaScript I generally shy away from it.

The module pattern

The basic idea of the module pattern is that you encapsulate all your code in anonymous functions that are then immediately evaluated to generate the actual top level object. Since JavaScript has some unfortunate problems with global variables (like, they are there), it’s safest to just put all your code inside of one or more of these modules. You can also make your modules take the dependencies you want to use. A simple module might look like this:

var olaBiniSeriousBanking = (function() {
  var balance = 0;

  function deposit(num) {
    balance += num;
  }

  function checkOverdraft(amount) {
    if(balance - amount < 0) {
      throw "Can't withdraw more than exists in account";
    }
  }

  function withdraw(amount) {
    checkOverdraft(amount);
    balance -= amount;
  }

  return {deposit: deposit, withdraw: withdraw};
})();
In this case the balance variable is completely hidden inside a lexical closure, and can only be accessed by the deposit and withdraw functions. These functions are also not in the global namespace so there is no risk for clobbering. It’s also possible to have lots and lots of helper functions that no one else can see. That makes it easier to make your functions smaller - and incidentally, the largest problem I’ve seen with JavaScript code quality is that functions tend to be very large. Don’t do that!
A useful variation of the module pattern is to extract the construction function and give it a name. Even though you might use it immediately, it makes it possible to create more than one of these, use different dependencies, or make it accessible from tests so you can inject collaborators:

var olaBiniGreeterModule = (function(greeting) {
  return {greet: function(name) {
    console.log(greeting + ", " + name);
  }};
});
var olaBiniGreeterEng = olaBiniGreeterModule("Hello");
var olaBiniGreeterSwe = olaBiniGreeterModule("Hejsan");

RequireJS

The module pattern is good on its own, but there are some things that can be done by a loader that makes things even better. There are several variations of these module loaders, but my favorite so far is RequireJS. I have several reasons for this, but the main one is probably that it is very light weight, and is actually a net win even for very small web applications. There are lots of benefits with letting RequireJS handle your modules. The main ones is that it takes care of dependencies between modules, and loads them automatically. This means you can define one single entry point for your JavaScript, and RequireJS makes sure to load everything else. Another good aspect of RequireJS is that it allows you to avoid any global names at all. Everything is handled by callbacks inside of RequireJS. So how does it look? Well, a simple module with a dependency can look like this:

// in file foo.js
require(["bar", "quux"], function(bar, quux) {
  return {doSomething: function() {
    return bar.something() + quux.something();
  }};
});
If you have something else that uses foo, then this file will be loaded, bar.js and quux.js will be loaded and the results of loading them (the return value from the module function) will be sent in as arguments to the function that creates the foo module. So RequireJS takes care of all this loading. But how do you kick it off? Well, you should have one single script tag in your HTML, that will point to require.js. You will also add an extra attribute to this script tag that points to the entry point to the JavaScript:

<script data-main="scripts/main" src="scripts/require.js"> </script>
This will do a number of things. It will load require.js. It will set the scripts directory as the base for all module references in your JavaScript. And it will load scripts/main.js as if it’s a RequireJS module. And if you want to use our foo-module earlier, you can create a main.js that looks like this:

// in file main.js
require(["foo"], function(foo) {
  require.ready(function() {
    console.log(foo.doSomething());
  });
});
This will make sure that foo.js and its dependencies bar.js and quux.js will be loaded before the function is invoked. However, one aspect of JavaScript that people sometimes gets wrong is that you have to wait until the DOM is ready to execute JavaScript. With RequireJS we use the ready function inside the require object to make sure we can do something when everything is ready. Your main module should always wait with doing something until the document is ready.
In general, RequireJS has helped a lot with structure and dependencies and it makes it very simple to break up JavaScript into much smaller pieces. I like it a lot. There are a few downsides, though. Main is that it doesn’t interact well with server side JavaScript (or at least it didn’t when I read up on it a month ago). Also, it doesn’t provide a clean way of getting access to the module functions without executing them, which becomes annoying when testing these things. I’ll talk a bit more about that in the section on testing.

No JavaScript in HTML

I don’t want any JavaScript whatsoever in the HTML, if I can avoid it. The only script tag should be the one that starts your module and loading framework - in my case RequireJS. We don’t have any event handlers embedded in the pages at all. We started out from a place where some of our pages had lots of event handlers and refactored to a much smaller code base that was much easier to work with by extracting all of these things into separate JavaScript modules. This has a side effect that anything you want to work with should be possible to semantically identify, either by using CSS classes or data attributes. Try to avoid convoluted paths to find elements. It’s OK to add some extra classes and attributes to make your JavaScript clean and simple.

Init functions on ready

In terms of how we structure modules in a real application, we don’t actually do much work on startup. Instead, most of the work involves setting up event handlers and so on. The way we are doing that is to have the top level modules expose an init method, that is expected to be called by the main module when it starts up. Imagine in a system where you have dojo as the main framework, and you have this code:

// foo.js
require(["bar"], function(bar) {
  function sayHello(node) {
    console.log("hello " + node);
  }

  function attachEventHandlers(dom) {
    dom.query(".fluxCapacitors").onclick(sayHello);
  }

  function init(dom) {
    bar.init(dom);
    attachEventHandlers(dom);
  }

  return {init: init};
});

// main.js
require(["foo"], function(foo) {
  require.ready(function() {
    foo.init(dojo);
  });
});
This will make sure to set up all event handlers and put the application in the right state to be used.

Lots of callbacks

Once you’ve taught yourself to ignore the verbosity of anonymous lambdas in JavaScript, they become very handy tools for creating APIs and helper functions. In general, the code we write use a lot of callbacks and helper wrapper functions. I also use functions that generate new functions quite liberally, doing things like currying and similar aspects. A fairly typical example is something like this:

function checkForChangesOn(node) {
  return function() {
    if(dojo.query(node).length() > 42) {
      console.log("Warning, flux reactor in flax");
    }
  };
}

dojo.query(".clixies").onclick(checkForChangesOn(".fluxes"));
dojo.query(".moxies").onclick(checkForChangesOn(".flexes"));
This kind of abstraction can lead to very readable and clean JavaScript if done well. It can also lead to code where very piece is as small as it can be. In fact, one of the ways we use to make the syntax a little bit more bearable is to extract creation of anonymous functions into factory functions like this.

Lots of anonymous objects

Anonymous objects are great for many things. They work as a substitute for named arguments, and can be very useful to return more than one value. In our code base we use anonymous objects a lot, and it definitely helps with code readability.

Testing

We use Jasmine for unit testing our JavaScript. This works quite well in general. Since this is a fairly typical Java web application we wanted to run it as part of our regular build process. This means we ended up using the JUnit Jasmine runner, which allow us to run these tests outside of browsers and format the results using all the available JUnit tools. Since we’ve tried to make the scripts as modular and small as possible, and also extracting most of the DOM behavior, we have avoided using HTML fixtures. This means our tests are leaning more towards traditional unit tests, rather than BDD style tests - which I’m not sure I’m comfortable with. But with the current size of the application, this is not really a problem.
Seeing as we wanted to test each module in isolation, we wanted to be able to instantiate the RequireJS module with our custom mock dependencies. This ended up not being very easy with RequireJS, so instead of trying to fit in to that model, we just don’t load RequireJS at all during testing, but instead have a top-level require function that just saves away the module function with a well defined name. This means we can instantiate the modules as many times as we want and inject different mocks for different purposes.
In general, Jasmine works well for us, but there are some features missing from the mocking/stubbing framework that makes certain things a bit complicated. One thing I miss a lot is the capability of having stubs returning different valueus depending on the arguments sent in. Some ugly code has been written to get around this.

Open questions

Our current JavaScript process works well for us, but there are still some open things we haven’t done yet. First among these is to integrate JSLint into our build process. I really think that should be there, so I have no excuse. We don’t have tests running inside of browsers. I’m actually OK with this, since we’re trying to do more unit level coverage with Jasmine. Hopefully our acceptance tests cover some of the browser based testing. We are not doing minification at all, and we probably won’t need it based on the current expected usage. For a different audience we would certainly minify everything - this is something RequireJS can do really well though. We don’t have any coverage tool running on our JavaScript either. This is something I’m also uncomfortable with, but I haven’t really found a good tool that allows us to run coverage as part of our CI process yet. I also care more about branch coverage than line coverage, and no tool seems to give you this at the moment.

Summary

JavaScript can be completely OK to work with, provided you treat it as a real language. It’s quite powerful, but we also have a lot of bad habits based on hacking together small things, or just doing what works. As we go forward with JavaScript, this needs to stop. But the good news is that if you’re a decent developer, you shouldn’t have any problem picking anything of this up.

Wednesday, August 10, 2011

On my current project we are using Spring MVC and we try to use autowiring as much as possible. I personally strongly prefer constructor injection, since this gives me the luxury of working with final fields. I also like being able to inject all things a class needs - including loggers. Most of the time I don’t really want to use custom loggers from tests, but sometimes I do want to make sure something gets logged correctly, and being able to inject a logger seems like a natural way of doing that. So, with that preamble out of the way, my problem was that this seemed quite hard to achieve in Spring. Specifically, I use SLF4J, and I want to inject the equivalent of doing LoggerFactory.getLogger(MyBusinessObject.class). Sadly, Spring doesn’t give access to the place where something is going to be injected in any of the hooks available. Most solutions I found to this problem relies on using a BeanPostProcessor to set a field on the object after it’s been created. This defeats three of my purposes/principles - I can’t use the logger in the constructor, the field will be mutable and I won’t get told by Spring if I’ve made a mistake in my wiring.

There was however one solution I found in a StackOverflow post - sadly it wasn’t complete. Specifically, I needed to use it in a Spring MVC setting and also from inside of tests. So this blog post is mainly to provide the complete solution for something like this. It’s a simple problem, but it was surprisingly tricky to get working correctly. But now that I have it, it will be very convenient. This code is for Spring 3.1, and I haven’t tested it on anything else.

The first part of this injection is to create our own custom BeanFactory - which is what Spring uses internally to manage beans and dependencies. The default one is called DefaultListableBeanFactory and we will just subclass it like this:

public class LoggerInjectingListableBeanFactory
                extends DefaultListableBeanFactory {
    public LoggerInjectingListableBeanFactory() {
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
           new QualifierAnnotationAutowireCandidateResolver());
    }

    public LoggerInjectingListableBeanFactory(
              BeanFactory parentBeanFactory) {
        super(parentBeanFactory);
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
            new QualifierAnnotationAutowireCandidateResolver());
    }

    @Override
    public Object resolveDependency(
               DependencyDescriptor descriptor, String beanName,
              Set<String> autowiredBeanNames, TypeConverter typeConverter)
                     throws BeansException {
        Class<?> declaringClass = null;

        if(descriptor.getMethodParameter() != null) {
            declaringClass = descriptor.getMethodParameter()
                    .getDeclaringClass();
        } else if(descriptor.getField() != null) {
            declaringClass = descriptor.getField()
                    .getDeclaringClass();
        }

        if(Logger.class.isAssignableFrom(descriptor.getDependencyType())) {
            return LoggerFactory.getLogger(declaringClass);
        } else {
            return super.resolveDependency(descriptor, beanName,
                    autowiredBeanNames, typeConverter);
        }
    }
}

The magic happens inside of resolveDependency where we can figure out the declaring class by checking either the method parameter or the field - and then see whether the thing asked for is a Logger. Otherwise we just delegate to the super implementation.

In order to use this from anything we need an actual ApplicationContext that uses it. I didn’t find any hook to set the BeanFactory after the application context was created, so I ended up creating two new ApplicationContext implementations - one for tests and one for the Spring MVC purpose. They are slightly different, but try to do so little as possible while retaining the behavior of the original. The application context for the tests look like this:

public class LoggerInjectingGenericApplicationContext
                    extends GenericApplicationContext {
    public LoggerInjectingGenericApplicationContext() {
        super(new LoggerInjectingListableBeanFactory());
    }
}

This one just calls the super constructor with an instance of our custom bean factory. The application context for Spring MVC looks like this:

public class LoggerInjectingXmlWebApplicationContext
                    extends XmlWebApplicationContext {
    @Override
    protected DefaultListableBeanFactory createBeanFactory() {
        return new LoggerInjectingListableBeanFactory(
                    getInternalParentBeanFactory());
    }
}

The XmlWebApplicationContext doesn’t have a constructor that takes a bean factory, so instead we override the createBeanFactory method to return our custom instance. In order to actually use these implementations some more things are needed. In order to get our tests to use it, a test.context.support.ContextLoader implementation is necessary. This code is mostly just copied from the default implementation - sadly it doesn’t provide any extension points and the place I want to override are in the middle of two final methods. It feels quite ugly to just copy the implementations, but there are no hooks for this…

public class LoggerInjectingApplicationContextLoader
                        extends AbstractContextLoader {
    public final ApplicationContext loadContext(
     MergedContextConfiguration mergedContextConfiguration)
                                  throws Exception {
        String[] locations = mergedContextConfiguration.getLocations();
        GenericApplicationContext context =
                  new LoggerInjectingGenericApplicationContext();
        context.getEnvironment().setActiveProfiles(
               mergedContextConfiguration.getActiveProfiles());
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    public final ConfigurableApplicationContext
            loadContext(String... locations) throws Exception {
        GenericApplicationContext context =
              new LoggerInjectingGenericApplicationContext();
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    protected void loadBeanDefinitions(
            GenericApplicationContext context, String... locations) {
        createBeanDefinitionReader(context).
               loadBeanDefinitions(locations);
    }

    protected BeanDefinitionReader createBeanDefinitionReader(
                      final GenericApplicationContext context) {
        return new XmlBeanDefinitionReader(context);
    }

    @Override
    public String getResourceSuffix() {
        return "-context.xml";
    }
}

The final thing necessary to get your tests to use the custom Bean Factory is to specify the loader to use in the ContextConfiguration on your test class, like this:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(value = "file:our-app-config.xml",
          loader = LoggerInjectingApplicationContextLoader.class)
public class SomeTest {
}

In order to get Spring MVC to pick this up, you can edit your web.xml and add a new init-param for the DispatcherServlet, like this:

    <servlet>
        <servlet-name>Spring MVC Dispatcher Servlet</servlet-name>
        <servlet-class>
           org.springframework.web.servlet.DispatcherServlet
        </servlet-class>
        <init-param>
            <param-name>contextConfigLocation</param-name>
            <param-value>WEB-INF/our-app-config.xml</param-value>
        </init-param>
        <init-param>
            <param-name>contextClass</param-name>
            <param-value>
               com.example.LoggerInjectingXmlWebApplicationContext
            </param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

This approach seems to work well enough. Some of the code is slightly ugly and I would definitely love to have a better hook for injection points to know where it will get injected. Having factory methods be able to take the receiver object might be very convenient, for example. Being able to customize the bean factory seems like it also should be much easier than this.


Sunday, March 13, 2011

I have recently started work on Seph again. I preannounced it last summer (here), then promply became extremely busy at work. Busy enough that I didn’t really have any energy to work on this project for a while. Sadly, I’m still as busy, but I’ve still managed to find some small slivers of time to start working on the compiler parts of the implementation. This has been made much easier and more fun since JSR292 is getting near to completion, and an ASM 4 branch is available that makes it easier to compile Java bytecode with support for invoke dynamic built in.

So that means that the current code in the repository actually goes a fair bit to where I want it to be. Specifically, the compiler compiles most code except for abstractions that create abstractions, and calls that take keyword arguments. Assignments is not supported either right now. I don’t expect any of these features to be very tricky to implement, so I’m waiting with that and working on other more complicated things.

This blog post is meant to serve two purposes. The first one is to just tell the world that Seph as an idea and project actually is alive and being worked on - and what progress has been made. The other aspect of this post is to talk about some of the things that make Seph a quite tricky language to compile. I will also include some thoughts I have on how to solve these problems - and suggestions are very welcome if you know of a better approach.

To recap, the constraints Seph is working under is that it has to run on Java 7. It has to be fully compiled (in fact, I haven’t decided if I’ll keep the interpreter at all after the compiler is working). And it has to be fast. Ish. I’m aiming for Ruby 1.8-speed at least. I don’t think that’s unreasonable, considering the dimensions of flexibility Seph will have to allow.

So let’s dive in. These are the major pain points right now - and they are in some cases quite interconnected…

Tail recursion

All Seph code has to be tail recursive, which means a tail call should never grow the stack. In order to make this happen on the JVM you need to save information away somewhere about where to continue the call. Then anyone using a value has to check for a tail marker token, and if one that is found, that caller will have to do a repeated call on the current tail until a real value is produced. All the information necessary for the tail will also have to be saved away somewhere.

The approach I’m currently taking is fairly similar to Erjangs. I have a SThread object that all Seph calls will have to pass along - this will act as a thread context as soon as I add light weight threads to Seph. But this place also serves a good place to save away information on where to go next. My current encoding of the tail is simply a MethodHandle that takes no arguments. So the only thing you need to do to pump the tail call is to repeatedly check for the token and call the tail method handle. Still, doing this all over the place might not be that performant. At the moment, the code is not looking up a MethodHandle from scratch in the hot path, but it will have to bind several arguments in order to create the tail method handle. I’m unsure what the performance implications of that will be right now.

Argument evaluation from the callee

One aspect of Seph that works the same as in Ioke is that a method invocation will never evaluate the arguments. The responsibility of evaluating arguments will be in the receiving code, not the calling code. And since we don’t know whether something will do a regular evaluation or do something macro-like, it’s impossible to actually pre-evaluate the arguments and push them on the stack.

The approach Ioke and the Seph interpreter takes is to just send in the Message object and allow the callee to evaluate it. But that’s exactly what I want to avoid with Seph - everything should be possible to compile, and be running hot if that’s possible. So sending Messages around defeats the purpose.

I’ve found an approach to compile this that actually works quite well. It also reduces code bloat in most circumstances. Basically, every piece of code that is part of a message send will be compiled to a separate method. So if you have something like foo(bar baz, qux) that will compile into the main activation method and two argument methods. This approach is recursive, of course. What this gives me is a protocol where I can use method handles to the argument methods, push them on the stack, and then allow the callee to evaluate them however they want. I can provide a standard evaluation path that just calls each of the method handles in turn to generate the values. But it also becomes very easy for me to send them in unevaluated. As an example this is almost exactly what the current implementation of the built in “if” method looks like. (It’s not exactly like this right now, because of transitional interpreter details).

public final static SephObject _if(SThread thread, LexicalScope scope,
        MethodHandle condition, MethodHandle then, MethodHandle _else) {
    SephObject result = (SephObject)condition.invokeExact(thread, scope,
                                                          true, true);

    if(result.isTrue()) {
        if(null != then) {
            return (SephObject)then.invokeExact(thread, scope,
                                                true, true);
        } else {
            return Runtime.NIL;
        }
    } else {
        if(null != _else) {
            return (SephObject)_else.invokeExact(thread, scope,
                                                 true, true);
        } else {
            return Runtime.NIL;
        }
    }
}

Of course, this approach is not perfect. It’s still a lot of code bloat, I can’t use the stack to pass things to the argument evaluation, and the code to bind the argument method handles take up most of the generated code at the moment. Still, it seems to work and gives a lot of flexibility. And compiling regular method evaluations will make it possible to bind these argument method handles straight in to an invoke dynamic call site, which could improve the performance substantially when evaluating arguments (something that will probably happen quite often in real world code… =).

Intrinsics are just regular messages

Many of the things that are syntax elements in other languages are just messages in Seph. Things like “nil”, “true”, “false”, “if” and many others work exactly the same way as a regular message send to something you have defined yourself. In many cases this is totally unnecessary though - and in most cases knowing the implementation at the call site allow you to improve things substantially in many cases. I think it’s going to be fairly uncommong to override any of those standard names. But I still want to make it possible to do it. And I’m fine with the programs that do this takng a performance hit from it. So the approach I’ve come up with (but not implemented yet) is this - I will special case the compilation of every place that has the same name as one of the intrinsics. This special casing will bind to a different bootstrap method than regular Seph methods. As a running example, let’s consider compiling a piece of code with “true” in it. This will generate a message send that will be taken care of by a sephTrueBootstrapMethod. We still have to send in all the regular method activation arguments, though. What this bootstrap method will do is to set up a call site that points to a very special method handle. This method handle will be a guardWithTest created through a SwitchPoint specific to the true value. The first path of that GWT (guardWithTest) will just return the true value directly without any checks whatsoever. The else path of the GWT will fallback to a regular Seph fallback method that does inline caching and regular lookup. The magic happens with the SwitchPoint - the places that create new bindings will check for these intrinsic names and if one of those names is used anywhere in the client code, the SwitchPoint will be changed over to the slow path.

In summary, I think a fast path can be possible for many of these things for most programs. The behaviour when you override “if” should still work as expected, but will make the global performance of that program slower for the rest of the execution.

When does lexical scopes escape?

Seph has mutable lexical scopes. But it’s impossible to know which names will escape and which won’t - so as far as I can see, I can’t use the Java stack to represent variables except for in some small amount of very degenerate cases. I’m not sure if it’s worth it to have that code path yet, so I haven’t thought much about it.

Class based PICs aren’t a good fit

One of the standard optimizations that object oriented languages use is something called a polymorphic inline cache. The basic idea is that looking up a method is the really slow operation. So if you can save away the result of doing that, guarded by a very cheap test, then you can streamline the most common cases. Now, that cheap test is usually a check against the class. As long as you send in an instance with the same class, then a new method lookup doesn’t have to happen. Doing a getClass and then a identity equality on that is usually fairly fast (a pointer comparison on most architectures) - so you can builds PICs that don’t actually spend much time in the guard.

But Seph is a prototype based language. So any object in the system can have different methods or values associated with a name, and there is no clear delineation on objects with new names and values in them. Especially, since Seph objects are immutable, every new object will most likely have a new set of values in them. And saving a way objects and dispatching on them becomes much less performant, since the call sites will basically never work on the same object. Now, there are solutions to this - but most of them are tailored for languages where you usually use a class based pattern. V8 uses an approach called hidden classes to figure out things like that. I’m considering implementing something similar, but I’m a bit worried that the usage pattern of Seph will be far enough away from the class based world that it might not work well.

Summary

So, Seph is not terribly easy to compile, and I don’t have a good feeling for how fast it can actually be made. I guess we’ll have to wait and see. But it’s also an interesting challenge, coming up with solutions to these problems. I think I might also have to go on a new research binge, investigating how Self and NewtonScript did things.


Items:   1 to 5 of 31   Next »