Hydra VM

Developer notes.


From here and further down, i'm using an objectified term which means that we dealing with encapsulated state of some object. A HydraVM encapsulates full interpreter state in single object. This allowing it to support as many as needed interpreter instances to be run using different native threads. Most of VM functions (and all primitives) now require an additional argument - interpreter instance.
Objectified plugins - are plugins which are ready for work with new VM, fully compatible and thread safe (except from bugs ;) , while legacy plugins using function wrappers to call different VM functions. Note that HydraVM is designed to support old plugins, especially external ones. You can use old plugins with new VM, but limited only to main (first) interpreter instance which runs in a main OS process thread.

Functionality and compatibility

HydraVM is fully compatible with any squeak image (means, you can load and run any squeak image with it (i hope)). If your image using only basic set of primitives (available from VM and from core plugins), you can run it in parallel thread without risk to crash VM.
For plugins which is not prepared for new VM, you are limited only to main interpreter instance which runs using main process thread.

Here are list of plugins which currently support new VM and aware that they can run using many native threads with different interpreter instances:


All primitives, implemented by VM are thread-safe by default.


Currently, HydraVM runs only on Win32 platform. For making it run on different platforms you need to change platform-specific code to make it compatible with new interpreter.

There is no binaries for download publicly available yet, so you would need to download and build VM yourself.

For this you need to download a VMMaker package located here:

location: 'http://squeaksource.com/HydraVM'
user: '' password: ''

A platform-specific sources can be downloaded from SVN repository at http://squeakvm.org/svn/squeak/branches/HydraVM   

To unlock new VM functionality (loading and running new interpreter , and communicating between images) you have to load a HydraVM package, located in same repository into your development image.

Interpreter changes

Interpreter class>>objectifyClass

should return class, which should be used for getting different options by code generator to generate objectified sources.

This includes the following methods:

#structTypeName - return string for naming an objectified Interpreter struct.
#structObjectName - a parameter name in functions for use as a pointer to Interpreter instance
#publicMethods - returns a collection of selectors which should be kept public (used outside of generated sources) and thus, should not be removed from source. A public methods list is important for generating a list of function prototypes (interp_prototypes.h), which is then can be included in plugins or platform code to get access to public VM functions. If you want new function/method to be available for plugins, you should include it's selector in publicMethods.

#objectifiedExternalCalls - a list of function names, which require an Interpreter instance as first argument. For a slang code like self foo: argument, where foo is undefined method, CodeGenerator will choose appropriate call format foo(argument) or  foo(intr, argument), depending if given selector shows itself in #objectifiedExternalCalls or not.

#isPrimitive: - return true if given selector is primitive (used only with Interpreter source generation)
#isGlobalVariableName:  - by returning false, given instance variable will be declared globally, by returning true it means following:
- for interpreter , given variable will be included into objectified struct
- for plugin, given variable will be included into attached pluign state struct

#addExtraDefinitionsTo: cCodeGenerator
- should return string, containing extra definitions (typedefs /defines) which will be included in source body, just after basic #include directives.
Available for both Interpreter and plugin code generation.

Code generator automatically determines, which methods require an Interpreter instance as argument and which is not, by checking that given methods accessing objectified state variables, or using a calls to methods, which require Interpreter instance as argument.

Because of this, now we need additional information on what methods using Interpreter instance or not and can't rely on some static declarations of functions,
code generator generates additional header files to make sure that functions will be used with proper arguments in manually created sources (such as platform specific code).
A file 'interp_prototypes.h' contains a list of all publicly available functions which can be used by platform code.
Note, that because it generated automatically, all you need is to include this file. And don't forget to remove any forward function declarations from your source, otherwise you risk that you calling a function with wrong number of arguments (sometimes compiler fails to determine that, if you not rebuilding ALL sources after changes).

Objectified plugin generation notes:

If plugin uses objectified VM, it should return true in #supportsObjectifiedVM class method.
Objectified plugins don't use interpreterProxy var, instead they using VM methods directly.
Now, 'interpreterProxy foo' in slang source means that you calling VM function, and code generator automatically determines, is this function requires an Interpreter instance as argument or not by referring to previously generated interpreter source. So make sure, that before generating plugin sources, you generated fresh Interpreter sources as well.

Code generator, for each call to interpreterProxy foo , now generating vmFunction(foo) in souce, which is a macro.
For internal plugins, this macro expands to just a function name, so internal plugins calling VM functions directly.
For external plugins, this macro expands to indirect call using struct VMFunctions var, which is automatically generated and populated by proper function pointers when VM first calling setInterpreter() function, at plugin initialization.

This means, that from now,  plugins are free to use any public VM function, and it's availability determined automatically at plugin initialization stage, so if some function is not available, then plugin simply fails to initialize. 
Now you are not constrained with using proper InterpreterProxy struct and proper VM version as it was with old VM.
In fact, new InterpreterProxy struct contains only 3 functions , and it's used by plugins only in single function setInterpreter() to check VM version and grab VM function pointers identified by name. See sqVirtualMachine.h, where it's declared.
Make sure to regenerate plugin source when you moving plugin from internal to external and vise versa.

Plugins and attached states.

Some plugins need to keep additional state per Interpreter instance to work properly. This means, that primitives for different interpreter instances should access different state, and for such need, HydraVM provides an interface for managing attached states.
All instance vars, declared in plugin, and which is not global (#isGlobalVariableName: returns false for given var name) now are kept in separate struct called PluginState.

When initializing, plugin calls #attachStateBuffer:initializeFn:finalizeFn: , which will return an attached state id for use by plugin.
Code generator automatically generates a code for use attached state, when you have any ivars declared as non-global in plugin. If plugin having #initializePluginState and #finalizePluginState methods, then they will be used to initialize and finalize per-interpreter plugin states.

You are free to attach as many state buffers as you need. Each interpreter instance will have separate attached state buffer for same id.
To add attached state, use #attachStateBuffer:initializeFn:finalizeFn:. To access attached state, use getAttachedStateBuffer(..) VM function.
You can look at win32 socket plugin and platform code, to see how to use attached states.

Some notes about legacy primitives, how do they work

A legacy (or one might say, original primitives) is primitives which are not converted to be used by HydraVM, and not aware that they are working in multi-threaded environment.
The main difference between old primitives and new ones is, that new primitives should use following function format:
sqInt (*) (struct Interpreter *);

while old primitives using this one:

sqInt (*) (void);

One nuance, which i found during initial stages of developing new VM, is a cdecl calling convention, which helps me a lot to do much less changes in primitive invocation mechanisms.
A cdecl call convention is really useful, in a way, that you can call function with more arguments than it's expecting, so it actually works in this way:
all primitives is called with interpreter instance argument.
A new primitive functions are aware of that and put this argument in use, while old primitives simply don't see it, but still working without stack corruption.
It's because that under cdecl calling convention, a stack cleanup is on hands of caller, not callee.
So, it allows me to call all primitives in uniform way without any risk!

Risks that we taken for working with legacy primitives:
As you already know, there is no difference from VM side , if its calling a legacy primitive or new one.
A check, if primitive is able to run for non-main interpreter is placed inside InterpreterProxy functions, which simply setting success flag to false, so this primitive will fail , if it called from non-main interpreter.
A risk here, is that before calling any of VM functions (like getting argument from stack), some primitives can do something that may lead to unpredicted behavior.
Also, there are risks that primitive can use a return result from function before checking a success flag. But this is likely will indicate that plugin code contains bugs, which should be fixed :)
In perfect, VM should know, if it calling legacy primitive, and check if it going to call it for non-main interpreter instance, and if so, do not try to call it at all and report failure instead. This will prevent any chances that legacy primitive contains code which incomatible with new VM and therefore may crash it.

Events in HydraVM

 An event in HydraVM can be represented by any abstract structure, and
having only two mandatory fields:

typedef struct vmEvent {
       struct vmEvent * volatile next;
       eventFnPtr fn;
} vmEvent;

The first field - next, used to be able to put events into queue,
the second one is the function pointer of form:

typedef sqInt (*eventFnPtr)(struct Interpreter*, struct vmEvent*);

Any additional event payload can be implementation specific. For
instance, for channels i generating events which consisting of
information of destination channel + data buffer. So, event holds
everything in itself.

Each Interpreter instance having own event queue. Event queue
implementation belongs to 3 platform specific functions:

void ioInitEventQueue(struct vmEventQueue * queue);
void ioEnqueueEventInto(struct vmEvent * event , struct vmEventQueue * queue);
struct vmEvent * ioDequeueEventFrom(struct vmEventQueue * queue);

The requirements of enqueue/dequeue functions implementation is
simple: they should be atomic.
On windows i'm using InterlockedCompareExchange(), and on other
platforms, based on x86 architecture an equivalent is 'lock cmpxchg'
asm instruction.
To support other architectures, which may don't have atomic CAS
(compare and store) instructions, there can be a need in changing
vmEventQueue struct, to keep additional information, like mutex
handle, to ensure that enqueue/dequeue operations are thread safe.

If you still didn't catch how events working, here is some additional
- since events are thread safe, you can generate an event from any
native thread, and don't need to take any additional steps for
synchronizing with VM/Interpreter instance. This, in particular, used
in SocketPlugin to signal semaphores when socket (which served by
separate native thread) changes it's state.

About event handling function:
this is the function which will be called when interpreter will
interrupt for handling events, so in this function you have
synchronized access to object memory, interpreter state e.t.c. and
don't have to worry about concurrency.
Also, function along with event payload are very convenient for
determining context and what event means and what it will do.
Instead of making dozens of event types, enumerating them.. then
writing a case statements, it's doing a simple dispatch
event->fn(interpreter, event); so, system are flexible and can handle
events of any kind doing anything you want.

As for example, suppose you wanna write a plugin which needs to post
events to interpreter, but with your own, custom handling code, and
with your event payload.
So, declare an event in form:

struct myEvent
 struct vmEvent header;
 int myField1;
 int myField2;

Now to post event we simply can do:

myEvent * event = malloc(sizeof(myEvent));
event->header->fn = myHandler;
event->myField1 = ...

Now, a handler function:

sqInt myHandler(struct Interpreter * intr, myEvent * evt)
  ... do something nasty here, knowing that you can't be trapped by
concurrency issues :)..

 free(evt); // release memory, allocated for event

Known bugs

- (fixed) minimising Squeak window causes putting VM into infinite loop, stopping from responding to user input and screen redraw .
- when running secondary image, VM can crash during window resizing. This error can be reproduced by doing 'DisplayScreen startUp' repeatedly.

Wish list and further development


If you have any comments or questions, you can contact me (Igor Stasenko via email:  siguctua at gmail.com) or Andreas Raab. Also, fell free to post your comments/questions on squeak-dev mailing list <squeak-dev@lists.squeakfoundation.org>, if you wish to discuss them with more broader audience.