llvmruby.org

Archive for August, 2008

writing a YARVish bytecode compiler in ruby

with 2 comments

Since llvmruby let’s us write native compiled things that interact with Ruby interpreter values, and obvious thing to do with it is to write a compiler for the bytecode used in Ruby 1.9, ie YARV. This is the YARV instruction set and if you click through to an opcode definition, you will see the C code used to execute that opcode. For example, the opt_lt instruction is implemented with this C code:

if (FIXNUM_2_P(recv, obj) &&
	BASIC_OP_UNREDEFINED_P(BOP_LT)) {
	long a = FIX2LONG(recv), b = FIX2LONG(obj);

	if (a < b) {
	    val = Qtrue;
	}
	else {
	    val = Qfalse;
	}
    }
    else {
	PUSH(recv);
	PUSH(obj);
	CALL_SIMPLE_METHOD(1, idLT, recv);
    }

In my llvm based compiler, which is woefully incomplete and for now just assumes that the incoming arguments are fixints, the op code is implemented this way:

when :opt_lt
        obj = b.pop
        recv = b.pop
        x = b.fix2int(recv)
        y = b.fix2int(obj)
        val = b.icmp_sle(x, y)
        val = b.int_cast(val, LONG, false)
        val = b.mul(val, 2.llvm)
        b.push(val)

This bit of assembly avoids jumps by knowing that Ruby represents Qtrue and Qfalse as 2 and 0 respectively, so the single bit 0 or 1 result of the cmp instruction is simply expanded and multiplied by 2.

I’ve implemented a small and growing subset of the YARV instruction set in ruby_vm.rb, and has some tests in test_ruby_vm.rb showing some small example functions put together with this YARVish assembly. For example the sequence:

bytecode = [
      [:newarray],
      [:dup],
      [:putobject, LLVM::Value.get_immediate_constant(0)],
      [:putobject, LLVM::Value.get_immediate_constant('shaka')],
      [:opt_aset],
      [:pop]
    ]

creates a new array and sets the first element of the array to a string. Since the guts of using LLVM from Ruby has been basically finished, I am now mostly working on translating the YARV instruction set into Ruby.

Written by tom

August 31st, 2008 at 3:50 pm

Posted in Uncategorized

support for 32-bit and 64-bit ruby interpreters

with 2 comments

Tonight, I implemented and tested support for 32-bit Ruby. How does this work?

  // Figure out details of the target machine
  const IntegerType *machine_word_type;
  if(sizeof(void*) == 4) {
    machine_word_type = Type::Int32Ty;
  } else {
    machine_word_type = Type::Int64Ty;
  }
  rb_define_const(cLLVMRuby, "MACHINE_WORD", Data_Wrap_Struct(cLLVMType, NULL, NULL, const_cast<IntegerType*>(machine_word_type)));

First the extension creates a MACHINE_WORD type based on the pointer size of the machine.

  # describe structures used by the ruby 1.8/1.9 interpreters
  module RubyInternals
    FIXNUM_FLAG = 0x1.llvm
    CHAR = Type::Int8Ty
    P_CHAR = Type::pointer(CHAR)
    LONG = MACHINE_WORD
    VALUE = MACHINE_WORD
    P_VALUE = Type::pointer(VALUE)
    ID = MACHINE_WORD
    RBASIC = Type::struct([VALUE, VALUE])
    RARRAY = Type::struct([RBASIC, LONG, LONG, P_VALUE])
    P_RARRAY = Type::pointer(RARRAY)
    RSTRING = Type::struct([RBASIC, LONG, P_CHAR, VALUE])
    P_RSTRING = Type::pointer(RSTRING)
  end

All the Ruby data types are defined in the RubyInternals module.

If you are familiar with Ruby internals from having worked on C extensions, you will know that everything is a VALUE, which is one machine word that is either nil, true, false, a FixInt, or a pointer to a struct somewhere on the heap. This makes our definitions pretty simple, all that changes between machines is the size of this pointer.

You’ll also notice in this code, that the native Ruby data types themselves are defined in Ruby code here. For example, an Array in Ruby is represented internally using this struct:

struct RArray {
    struct RBasic basic;
    long len;
    union {
        long capa;
        VALUE shared;
    } aux;
    VALUE *ptr;
};

And in our Ruby mappings that allows us to work on this structure, it’s defined basically the same way using LLVM data types:

RARRAY = Type::struct([RBASIC, LONG, LONG, P_VALUE])

Easy!

Written by tom

August 27th, 2008 at 5:30 am

Posted in Uncategorized

what does llvmruby look like?

with one comment

LLVMRuby is a pretty straighforward wrapper around the C++ API for generating LLVM bytecode and JIT compilation. The LLVM API is both very well designed and very object oriented and so maps pretty straightforwardly into Ruby. Here is a simple example of using LLVMRuby to construct and JIT compile a function which manipulates native Ruby objects.

require 'llvm'
include LLVM
include RubyInternals

class Builder
  include RubyHelpers
end

m = LLVM::Module.new('ruby_bindings_examples')
ExecutionEngine.get(m)

def ftype(ret_type, arg_types)
  Type.function(ret_type, arg_types)
end

rb_ary_new = m.external_function('rb_ary_new', ftype(VALUE, []))
rb_to_id = m.external_function('rb_to_id', ftype(VALUE, [VALUE]))
rb_ivar_get = m.external_function('rb_ivar_get', ftype(VALUE, [VALUE, ID]))
rb_ivar_set = m.external_function('rb_ivar_set', ftype(VALUE, [VALUE, ID, VALUE]))

class TestClass
  def initialize
    @shaka = 'khan'
  end
end

test_instance = TestClass.new

# take an object and an instance variable symbol, return value of instance variable
type = Type.function(VALUE, [VALUE, VALUE])
f = m.get_or_insert_function('shakula', type)
obj, ivar_sym = f.arguments
b = f.create_block.builder
new_ary = b.call(rb_ary_new)
ivar_id = b.call(rb_to_id, ivar_sym)
ret_val = b.call(rb_ivar_get, obj, ivar_id)
b.return(ret_val)
ret = ExecutionEngine.run_function(f, test_instance, :@shaka)
puts "get instance variable @shaka: #{ret.inspect}"

Written by tom

August 26th, 2008 at 2:24 am

Posted in Uncategorized

llvmruby is fun

with 2 comments

I recently wrote an extension for Ruby which allows to use the LLVM compiler infrastructure from inside the Ruby interpreter. While still in development, it already supports enough of the LLVM API to write interesting programs. I have gotten the library to the point that it is able to interact in interesting ways with the native internals of the interpreter, meaning that it is possible to create the equivalent of C Ruby extensions from within Ruby itself and just-in-time (JIT) compile them.

I think you will agree that generating abstract assembler with Ruby is much more fun than generating it from C++.

The project now lives on git hub: http://github.com/tombagby/llvmruby

I have developed/used it only on my home computer, which is 64bit Linux machine. I imagine that it errors hilariously on 32bit machines. It does, however, have nice extconf and should build in a nice/standard way. I am very interested to hear of your build problems with it on different platforms such that I can fix it!

Share and enjoy.

Written by tom

August 25th, 2008 at 1:40 am

Posted in Uncategorized