Archive for August, 2008
writing a YARVish bytecode compiler in ruby
Since llvmruby let’s us write native compiled things that interact with Ruby interpreter values, and obvious thing to do with it is to write a compiler for the bytecode used in Ruby 1.9, ie YARV. This is the YARV instruction set and if you click through to an opcode definition, you will see the C code used to execute that opcode. For example, the opt_lt instruction is implemented with this C code:
if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_LT)) { long a = FIX2LONG(recv), b = FIX2LONG(obj); if (a < b) { val = Qtrue; } else { val = Qfalse; } } else { PUSH(recv); PUSH(obj); CALL_SIMPLE_METHOD(1, idLT, recv); }
In my llvm based compiler, which is woefully incomplete and for now just assumes that the incoming arguments are fixints, the op code is implemented this way:
when :opt_lt obj = b.pop recv = b.pop x = b.fix2int(recv) y = b.fix2int(obj) val = b.icmp_sle(x, y) val = b.int_cast(val, LONG, false) val = b.mul(val, 2.llvm) b.push(val)
This bit of assembly avoids jumps by knowing that Ruby represents Qtrue and Qfalse as 2 and 0 respectively, so the single bit 0 or 1 result of the cmp instruction is simply expanded and multiplied by 2.
I’ve implemented a small and growing subset of the YARV instruction set in ruby_vm.rb, and has some tests in test_ruby_vm.rb showing some small example functions put together with this YARVish assembly. For example the sequence:
bytecode = [ [:newarray], [:dup], [:putobject, LLVM::Value.get_immediate_constant(0)], [:putobject, LLVM::Value.get_immediate_constant('shaka')], [:opt_aset], [:pop] ]
creates a new array and sets the first element of the array to a string. Since the guts of using LLVM from Ruby has been basically finished, I am now mostly working on translating the YARV instruction set into Ruby.
support for 32-bit and 64-bit ruby interpreters
Tonight, I implemented and tested support for 32-bit Ruby. How does this work?
// Figure out details of the target machine const IntegerType *machine_word_type; if(sizeof(void*) == 4) { machine_word_type = Type::Int32Ty; } else { machine_word_type = Type::Int64Ty; } rb_define_const(cLLVMRuby, "MACHINE_WORD", Data_Wrap_Struct(cLLVMType, NULL, NULL, const_cast<IntegerType*>(machine_word_type)));
First the extension creates a MACHINE_WORD type based on the pointer size of the machine.
# describe structures used by the ruby 1.8/1.9 interpreters module RubyInternals FIXNUM_FLAG = 0x1.llvm CHAR = Type::Int8Ty P_CHAR = Type::pointer(CHAR) LONG = MACHINE_WORD VALUE = MACHINE_WORD P_VALUE = Type::pointer(VALUE) ID = MACHINE_WORD RBASIC = Type::struct([VALUE, VALUE]) RARRAY = Type::struct([RBASIC, LONG, LONG, P_VALUE]) P_RARRAY = Type::pointer(RARRAY) RSTRING = Type::struct([RBASIC, LONG, P_CHAR, VALUE]) P_RSTRING = Type::pointer(RSTRING) end
All the Ruby data types are defined in the RubyInternals module.
If you are familiar with Ruby internals from having worked on C extensions, you will know that everything is a VALUE, which is one machine word that is either nil, true, false, a FixInt, or a pointer to a struct somewhere on the heap. This makes our definitions pretty simple, all that changes between machines is the size of this pointer.
You’ll also notice in this code, that the native Ruby data types themselves are defined in Ruby code here. For example, an Array in Ruby is represented internally using this struct:
struct RArray { struct RBasic basic; long len; union { long capa; VALUE shared; } aux; VALUE *ptr; };
And in our Ruby mappings that allows us to work on this structure, it’s defined basically the same way using LLVM data types:
RARRAY = Type::struct([RBASIC, LONG, LONG, P_VALUE])
Easy!
what does llvmruby look like?
LLVMRuby is a pretty straighforward wrapper around the C++ API for generating LLVM bytecode and JIT compilation. The LLVM API is both very well designed and very object oriented and so maps pretty straightforwardly into Ruby. Here is a simple example of using LLVMRuby to construct and JIT compile a function which manipulates native Ruby objects.
require 'llvm' include LLVM include RubyInternals class Builder include RubyHelpers end m = LLVM::Module.new('ruby_bindings_examples') ExecutionEngine.get(m) def ftype(ret_type, arg_types) Type.function(ret_type, arg_types) end rb_ary_new = m.external_function('rb_ary_new', ftype(VALUE, [])) rb_to_id = m.external_function('rb_to_id', ftype(VALUE, [VALUE])) rb_ivar_get = m.external_function('rb_ivar_get', ftype(VALUE, [VALUE, ID])) rb_ivar_set = m.external_function('rb_ivar_set', ftype(VALUE, [VALUE, ID, VALUE])) class TestClass def initialize @shaka = 'khan' end end test_instance = TestClass.new # take an object and an instance variable symbol, return value of instance variable type = Type.function(VALUE, [VALUE, VALUE]) f = m.get_or_insert_function('shakula', type) obj, ivar_sym = f.arguments b = f.create_block.builder new_ary = b.call(rb_ary_new) ivar_id = b.call(rb_to_id, ivar_sym) ret_val = b.call(rb_ivar_get, obj, ivar_id) b.return(ret_val) ret = ExecutionEngine.run_function(f, test_instance, :@shaka) puts "get instance variable @shaka: #{ret.inspect}"
llvmruby is fun
I recently wrote an extension for Ruby which allows to use the LLVM compiler infrastructure from inside the Ruby interpreter. While still in development, it already supports enough of the LLVM API to write interesting programs. I have gotten the library to the point that it is able to interact in interesting ways with the native internals of the interpreter, meaning that it is possible to create the equivalent of C Ruby extensions from within Ruby itself and just-in-time (JIT) compile them.
I think you will agree that generating abstract assembler with Ruby is much more fun than generating it from C++.
The project now lives on git hub: http://github.com/tombagby/llvmruby
I have developed/used it only on my home computer, which is 64bit Linux machine. I imagine that it errors hilariously on 32bit machines. It does, however, have nice extconf and should build in a nice/standard way. I am very interested to hear of your build problems with it on different platforms such that I can fix it!
Share and enjoy.