llvmruby now available in gem format
Thanks to Christian Plessl getting gemification started, llvmruby is now available as a gem install from github. LLVM itself is still a separate install and build. This will probably remain true as LLVM is huge and takes forever to compile. Including it with every version of a gem seems ridiculous. The other good news is that LLVM 2.4 is due out at the end of the month at which point we will be targeting a real non-development version!
To install as a gem, you need to be able to install gems from github.
- Upgrade to at least 1.2 of rubygems
- Add github as a gem source: gem source -a http://gems.github.com
- Make sure that you have built LLVM, that you built it with –enable-pic, and that llvm-config is in your path
- gem install tombagby-llvmruby
Now just:
require 'rubygems' require 'llvm'
And you are good to go!
recent developments
I’ve been busy with my day job and not so busy on llvmruby. I got more interested in writing a static compiler for a type-inferred subset of Ruby ala RPython than in the YARV/JIT work I was doing so got a little distracted. Fortunately, open-source software lets others move in and pick up the slack and today I noticed a new yarv2llvm project on github which looks like it’s already yielded some interesting results.
As far as development on llvmruby itself, I added more complete floating point and casting support in response to some requests from Christoffer Lernö who was also kind enough to write some preliminary documentation. I also received some nice patches from Christian Plessl and Marc-Andre Cournoyer. It’s very satisfying to see that kind of support from the community.
I wish it was easier to communicate with the Japanese side of the Ruby community as I’ve noticed some interesting things like this (seemingly successful) effort to get llvmruby to work on Windows under Cygwin. Time to try contacting and crossing the language barrier I guess.
compiling a standalone binary
I added support for function types with variable arguments thus allowing us to call printf as an external function. This is very important as it allows us to tell the world how much we like grapes. Here is how you would compile a standalone executable that states your feelings about grapes:
require 'llvm' include LLVM m = LLVM::Module.new('grapes') ExecutionEngine.get(m) char_star = Type.pointer(Type::Int8Ty) main_type = Type.function(Type::Int32Ty, [ Type::Int32Ty, Type.pointer(char_star) ]) ftype = Type.function(Type::Int32Ty, [char_star], true) printf = m.external_function('printf', ftype) main = m.get_or_insert_function('main', main_type) b = main.create_block.builder grapes_str = b.create_global_string_ptr("I LIKE GRAPES!\n") b.call(printf, grapes_str) b.return(0.llvm(Type::Int32Ty)) puts m.inspect m.write_bitcode("main.o")
What happens when you run this program? First, you will see the resulting llvm code on the console:
; ModuleID = 'grapes' internal constant [16 x i8] c"I LIKE GRAPES!\0A\00" ; <[16 x i8]*>:0 [#uses=1] declare void @abort() declare i32 @printf(i8*, ...) define i32 @main(i32, i8**) { bb: call i32 (i8*, ...)* @printf( i8* getelementptr ([16 x i8]* @0, i32 0, i32 0) ) ; <i32>:2 [#uses=0] ret i32 0 }
This all got turned into bitcode and saved in the file “main.o”. Now all you have to do is link it with the command:
llvm-ld --native main.o
Now you have a lovely a.out file which is a little native binary bursting with grape love:
$ ./a.out I LIKE GRAPES!
writing a YARVish bytecode compiler in ruby
Since llvmruby let’s us write native compiled things that interact with Ruby interpreter values, and obvious thing to do with it is to write a compiler for the bytecode used in Ruby 1.9, ie YARV. This is the YARV instruction set and if you click through to an opcode definition, you will see the C code used to execute that opcode. For example, the opt_lt instruction is implemented with this C code:
if (FIXNUM_2_P(recv, obj) && BASIC_OP_UNREDEFINED_P(BOP_LT)) { long a = FIX2LONG(recv), b = FIX2LONG(obj); if (a < b) { val = Qtrue; } else { val = Qfalse; } } else { PUSH(recv); PUSH(obj); CALL_SIMPLE_METHOD(1, idLT, recv); }
In my llvm based compiler, which is woefully incomplete and for now just assumes that the incoming arguments are fixints, the op code is implemented this way:
when :opt_lt obj = b.pop recv = b.pop x = b.fix2int(recv) y = b.fix2int(obj) val = b.icmp_sle(x, y) val = b.int_cast(val, LONG, false) val = b.mul(val, 2.llvm) b.push(val)
This bit of assembly avoids jumps by knowing that Ruby represents Qtrue and Qfalse as 2 and 0 respectively, so the single bit 0 or 1 result of the cmp instruction is simply expanded and multiplied by 2.
I’ve implemented a small and growing subset of the YARV instruction set in ruby_vm.rb, and has some tests in test_ruby_vm.rb showing some small example functions put together with this YARVish assembly. For example the sequence:
bytecode = [ [:newarray], [:dup], [:putobject, LLVM::Value.get_immediate_constant(0)], [:putobject, LLVM::Value.get_immediate_constant('shaka')], [:opt_aset], [:pop] ]
creates a new array and sets the first element of the array to a string. Since the guts of using LLVM from Ruby has been basically finished, I am now mostly working on translating the YARV instruction set into Ruby.
support for 32-bit and 64-bit ruby interpreters
Tonight, I implemented and tested support for 32-bit Ruby. How does this work?
// Figure out details of the target machine const IntegerType *machine_word_type; if(sizeof(void*) == 4) { machine_word_type = Type::Int32Ty; } else { machine_word_type = Type::Int64Ty; } rb_define_const(cLLVMRuby, "MACHINE_WORD", Data_Wrap_Struct(cLLVMType, NULL, NULL, const_cast<IntegerType*>(machine_word_type)));
First the extension creates a MACHINE_WORD type based on the pointer size of the machine.
# describe structures used by the ruby 1.8/1.9 interpreters module RubyInternals FIXNUM_FLAG = 0x1.llvm CHAR = Type::Int8Ty P_CHAR = Type::pointer(CHAR) LONG = MACHINE_WORD VALUE = MACHINE_WORD P_VALUE = Type::pointer(VALUE) ID = MACHINE_WORD RBASIC = Type::struct([VALUE, VALUE]) RARRAY = Type::struct([RBASIC, LONG, LONG, P_VALUE]) P_RARRAY = Type::pointer(RARRAY) RSTRING = Type::struct([RBASIC, LONG, P_CHAR, VALUE]) P_RSTRING = Type::pointer(RSTRING) end
All the Ruby data types are defined in the RubyInternals module.
If you are familiar with Ruby internals from having worked on C extensions, you will know that everything is a VALUE, which is one machine word that is either nil, true, false, a FixInt, or a pointer to a struct somewhere on the heap. This makes our definitions pretty simple, all that changes between machines is the size of this pointer.
You’ll also notice in this code, that the native Ruby data types themselves are defined in Ruby code here. For example, an Array in Ruby is represented internally using this struct:
struct RArray { struct RBasic basic; long len; union { long capa; VALUE shared; } aux; VALUE *ptr; };
And in our Ruby mappings that allows us to work on this structure, it’s defined basically the same way using LLVM data types:
RARRAY = Type::struct([RBASIC, LONG, LONG, P_VALUE])
Easy!
what does llvmruby look like?
LLVMRuby is a pretty straighforward wrapper around the C++ API for generating LLVM bytecode and JIT compilation. The LLVM API is both very well designed and very object oriented and so maps pretty straightforwardly into Ruby. Here is a simple example of using LLVMRuby to construct and JIT compile a function which manipulates native Ruby objects.
require 'llvm' include LLVM include RubyInternals class Builder include RubyHelpers end m = LLVM::Module.new('ruby_bindings_examples') ExecutionEngine.get(m) def ftype(ret_type, arg_types) Type.function(ret_type, arg_types) end rb_ary_new = m.external_function('rb_ary_new', ftype(VALUE, [])) rb_to_id = m.external_function('rb_to_id', ftype(VALUE, [VALUE])) rb_ivar_get = m.external_function('rb_ivar_get', ftype(VALUE, [VALUE, ID])) rb_ivar_set = m.external_function('rb_ivar_set', ftype(VALUE, [VALUE, ID, VALUE])) class TestClass def initialize @shaka = 'khan' end end test_instance = TestClass.new # take an object and an instance variable symbol, return value of instance variable type = Type.function(VALUE, [VALUE, VALUE]) f = m.get_or_insert_function('shakula', type) obj, ivar_sym = f.arguments b = f.create_block.builder new_ary = b.call(rb_ary_new) ivar_id = b.call(rb_to_id, ivar_sym) ret_val = b.call(rb_ivar_get, obj, ivar_id) b.return(ret_val) ret = ExecutionEngine.run_function(f, test_instance, :@shaka) puts "get instance variable @shaka: #{ret.inspect}"
llvmruby is fun
I recently wrote an extension for Ruby which allows to use the LLVM compiler infrastructure from inside the Ruby interpreter. While still in development, it already supports enough of the LLVM API to write interesting programs. I have gotten the library to the point that it is able to interact in interesting ways with the native internals of the interpreter, meaning that it is possible to create the equivalent of C Ruby extensions from within Ruby itself and just-in-time (JIT) compile them.
I think you will agree that generating abstract assembler with Ruby is much more fun than generating it from C++.
The project now lives on git hub: http://github.com/tombagby/llvmruby
I have developed/used it only on my home computer, which is 64bit Linux machine. I imagine that it errors hilariously on 32bit machines. It does, however, have nice extconf and should build in a nice/standard way. I am very interested to hear of your build problems with it on different platforms such that I can fix it!
Share and enjoy.