Compiling Brainfuck to JS

#Brainfuck #Node #JS #InterpreterDev

As threaded code didn’t perform very well in combination with JS, we will explore another way of executing Brainfuck. Instead of running the Brainfuck code or converted version of it, this time the goal is to completely translate it into a different programming language. By enabling the power of a sophisticated (JIT) compiled language, we might be equipped to challenge our record, set by version 2.

Since the program is going to translate Brainfuck to source code in another language, the fitting name would be source-to-source-compiler or meta-compiler. But these are too long to type and read, why I will refer to it just as a compiler in the following paragraphs. 😅

Simple language – Simple Conversion

As with the rest of this project I’m going to stick with JS as the language the compiler is written in and as the output format, the compiler will produce. One benefit, besides consistency, is the fact that evaluating dynamically created JS is dead simple. In the future, I might take a shot at either creating C or more interestingly native machine code. But all in good time.

Again Brainfuck makes our lives pretty easy. We only have to convert 8 different commands to JS. Technically it’s just 7 as [ and ] are part of loops. Additionally, I will leave , not implement once more for the same reason as in version 1. This makes only 6 instructions to be translated!



let src= '', i= 10;
while( i-- ) {
  src+= `console.log('Hello world! Nr: ${i}');`;
}

const f= new Function( src );
f();

The example shows how to dynamically create a callable function in JS. It’s just a matter of building the source code that should be executed as the body of the function and then calling the Function constructor with the string as a parameter.

Our compiler will do the exact same thing. Like we appended opcodes or function objects to the code array, we are going to append JS-snippets to a string. The string will be initialized with some extra code, that should be executed before the Brainfuck code runs. This includes the creation of the data array and cell pointer, but also some utility functions.

Probably the simplest commands to convert are the inc- and decrement instruction. The only thing they do is change the current cell's value. For five merged increments the following will be added to the string: data[ptr] += 5;

Because the movement commands < and > also do some additional checking their code is a bit different. Left movement also appends a check whether the pointer underflows. Right movement instead checks for an overflow and expands the cell array appropriately. Since this is way too much code to append for each and every occurrence of a right movement, a predefined function is called alternatively after every move.

This is one of the two functions in the head of string, that it gets initialized with. Every right move will be translated into the following: ptr+= 3; ae(); The array-extend (ae) function does exactly what its name already tells. The other function in the head is pc() . It is responsible for printing characters (pc) to the console.

While loops (pun intended) have been the most complicated thing to translate up at this point, requiring a stack to keep track of jump addresses, they are arguably the simplest now. Every loop beginning is converted to a while-loop-head while( data[ptr] ) and every end to a closing curly bracket. That’s it.



// ... Counting number of repetitions

switch( cur ) {
  case ASCII.Plus:    str+= 'data[ptr]+= '+ n+ ';'; break;
  case ASCII.Minus:   str+= 'data[ptr]-= '+ n+ ';'; break;
  case ASCII.Left:    str+= 'if(!ptr) { throw Error("Pointer underflow"); } ptr-= '+ n+ ';'; break;
  case ASCII.Right:   str+= 'ptr+= '+ n+ '; ae(); '; break;
}



// ...

case ASCII.Dot:
  str+= 'pc(); ';
  break;

case ASCII.OBracket:
  str+= 'while( data[ptr] ){ ';
  break;

case ASCII.CBracket:
  str+= '}';
  break;

// ...

Running the code is as simple as creating a function object and calling it. But when we finally run it, we can make a disappointing observation.

It is basically on par with our first version. How could this be? How could be a script written in JS be just slightly faster than an interpreter, that has to do a linear search every time it runs or skips a loop?

The reason might be quite simple. V8 has limits in place that keep it from extremely large functions. If we console.log() our function body, we can see, that’s probably the case here. We are talking about 358kB of pretty dense JS code in a single function here!

Hence, the solution might be as simple as splitting the source code into multiple smaller function blocks. Then V8 should be able to do a better analysis of the code and create a better optimized machine code version. (If it even created one up until this point.)

Code Segmentation

Of course, the first question coming up is: Where should the code be split up? The best approach would be to make mostly equal-sized functions that are small enough to be optimized, but still contain a lot of code to keep the number of individual functions down.

Well, that would be elegant, but way too complicated and I’m not in the mood for such funny business. (I save up that power for the final article…) Therefore I decided to go with the code blocks Brainfuck code comes already segmented in: loops.

The plan is to convert each loop into a function, which will be placed in the ‘global’ scope and called where the loop used to be in the Brainfuck source. This means, that the JS source code will be printed out of order now, as the nested loops are converted into flattened function definitions.

The compile() function now calls mkFunction() to create the global function body. mkFunction() is very similar to the old JS code generator as it appends all commands to a source code string. However, of using a global string object every function has its own JS string that is pushed to an array at the end. Additionally, a function declaration head is printed except for the first function, which is the global scope. Whenever a loop is encountered, the function-id-counter fid is incremented, and mkFunction() recurs into itself to load the contents of the loop. fid is used as the function name by the loop's parent to call the loop's function.



let fid= 0;
function mkFunction( strs, i= 0 ) {
  let str= '';

  // Print head except for global scope
  if( i ) {
    str+= 'function fn'+ fid+ '() { while( data[ptr] ){ ';
    fid++;
  }
  
  for( ; i< src.length; i++ ) {
    const cur= src.readUInt8(i);
    switch( cur ) {
      // ...

      case ASCII.OBracket:
        // Put function call instead of loop
        str+= 'fn'+ fid+ '(); ';

        // Recur to load loop body
        i= mkFunction(strs, i+1 );
        break;

      case ASCII.CBracket:
        // End of function / loop
        str+= '}}';
        strs.push( str );
        return i;
    }
  }

  // End of global scope
  strs.push( str );
}

When the whole code is converted, we are left with an array of functions that only need to be joined and evaluated to a callable JS function.



let func;
function compile() {
  console.log('Compiling...');
  
  let str= /* Init code */;

  let strs= [ str ];
  mkFunction( strs );

  func= new Function( strs.join(' ') );
}

Some people might be wondering now if all the additional function calls have an impact on the execution speed. I would say “no”, or if so, only a neglectable small one. A lot of the functions are extremely short and allow for really easy inlining if V8 decides to do so, eliminating the call altogether. Moreover, all of these functions don’t take any parameters and return nothing, reducing the cost of the call to pushing the return address (and some other stuff) and doing a jump. Furthermore, this means, there is no need to collect typeinfo for parameters and function polymorphism. Since no local variables are declared, even no stackframe space is required.

We got a winner

Benchmarking the segmenting version (4.5) reveals a new high score! Comparing to our previous runs, the performance has increased by a factor of about 8. But also compared to the bytecode running interpreter version (2), this one is about two times faster.

Just by helping V8 a little bit, we were able to massively improve the execution speed of the JS code. But there are multiple things left that could be made better in the code, that the compiler currently creates. Next time we will take a look at these optimizations and try to squeeze a bit more performance out of Brainfuck.

Part 3

Part 5

last edited 11th Aug 20

Sign In

Compiling Brainfuck to JS

Simple language – Simple Conversion

Code Segmentation

We got a winner