Hydra VM
Developer notes.
Objectified/objectification
From here and further down, i'm using an objectified term which means
that we dealing with encapsulated state of some object. A HydraVM encapsulates full
interpreter state in single object. This allowing it to support as many as needed interpreter
instances to be run using different native threads. Most of VM functions (and all
primitives)
now require an additional argument - interpreter instance.
Objectified plugins - are plugins which are ready for work with new VM, fully compatible
and thread safe (except from bugs ;) , while legacy
plugins using function wrappers to call different VM functions. Note that HydraVM is designed to support old plugins,
especially external ones. You can use old plugins with new VM, but limited only
to main (first) interpreter instance which runs in a main OS process thread.
Functionality and compatibility
HydraVM is fully compatible with any squeak image (means, you can load and run any
squeak image with it (i hope)). If your image using only basic set of primitives (available
from VM and from core plugins), you can run it in parallel
thread without risk to crash VM.
For plugins which is not prepared for new VM, you are limited only to main interpreter
instance which runs using main process thread.
Here are list of plugins which currently support new VM and aware that they can
run using many native threads with different interpreter instances:
BalloonEngineSimulation
FilePluginSimulator
SocketPlugin
FloatMathPlugin
LargeIntegersPlugin
FilePlugin
BalloonEnginePlugin
FloatArrayPlugin
FFIPlugin
UUIDPlugin
All primitives, implemented by VM are thread-safe by default.
Availability
Currently, HydraVM runs only on Win32 platform. For making it run on different platforms
you need to change platform-specific code to make it compatible with new interpreter.
There is no binaries for download publicly available yet, so you would need to download
and build VM yourself.
For this you need to download a VMMaker package located here:
MCHttpRepository
location: 'http://squeaksource.com/HydraVM'
user: '' password: ''
A platform-specific sources can be downloaded from SVN repository at
http://squeakvm.org/svn/squeak/branches/HydraVM
To unlock new VM functionality (loading and running new interpreter , and communicating
between images) you have to load a HydraVM package, located in same repository into
your development image.
Interpreter changes
Interpreter class>>objectifyClass
should return class, which should be used for getting different options by code
generator to generate objectified sources.
This includes the following methods:
#structTypeName - return string for naming an objectified Interpreter struct.
#structObjectName - a parameter name in functions for use as a pointer to Interpreter
instance
#publicMethods - returns a collection of selectors which should be kept public (used
outside of generated sources) and thus, should not be removed from source. A public methods list is important for generating a list of function
prototypes (interp_prototypes.h), which is then can be included in plugins or platform
code to get access to public VM functions. If you want new function/method to be
available for plugins, you should include it's selector in publicMethods.
#objectifiedExternalCalls - a list of function names, which require an Interpreter
instance as first argument. For a slang code like self foo: argument,
where foo is undefined method, CodeGenerator will choose appropriate call
format foo(argument) or foo(intr, argument), depending if
given selector shows itself in #objectifiedExternalCalls or not.
#isPrimitive: - return true if given selector is primitive (used only with Interpreter
source generation)
#isGlobalVariableName: - by returning false, given instance variable will
be declared globally, by returning true it means following:
- for interpreter , given variable will be included into objectified struct
- for plugin, given variable will be included into attached pluign state struct
#addExtraDefinitionsTo: cCodeGenerator
- should return string, containing extra definitions (typedefs /defines) which will
be included in source body, just after basic #include directives.
Available for both Interpreter and plugin code generation.
Code generator automatically determines, which methods require an Interpreter instance
as argument and which is not, by checking that given methods accessing objectified
state variables, or using a calls to methods, which require Interpreter instance
as argument.
Because of this, now we need additional information on what methods using Interpreter
instance or not and can't rely on some static declarations of functions,
code generator generates additional header files to make sure that functions will
be used with proper arguments in manually created sources (such as platform specific
code).
A file 'interp_prototypes.h' contains a list of all publicly available functions
which can be used by platform code.
Note, that because it generated automatically, all you need is to include this file.
And don't forget to remove any forward function declarations from your source, otherwise
you risk that you calling a function with wrong number of arguments (sometimes compiler
fails to determine that, if you not rebuilding ALL sources after changes).
Objectified plugin generation notes:
If plugin uses objectified VM, it should return true in #supportsObjectifiedVM
class
method.
Objectified plugins don't use interpreterProxy var, instead they using VM methods
directly.
Now, 'interpreterProxy foo' in slang source means that you calling VM function,
and code generator automatically determines, is this function requires an Interpreter
instance as argument or not by referring to previously generated interpreter source.
So make sure, that before generating plugin sources, you generated fresh Interpreter
sources as well.
Code generator, for each call to interpreterProxy foo , now generating
vmFunction(foo) in souce, which is a macro.
For internal plugins, this macro expands to just a function name, so internal plugins
calling VM functions directly.
For external plugins, this macro expands to indirect call using struct VMFunctions
var, which is automatically generated and populated by proper function pointers
when VM first calling setInterpreter() function, at plugin initialization.
This means, that from now, plugins are free to use any public VM function,
and it's availability determined automatically at plugin initialization stage, so
if some function is not available, then plugin simply fails to initialize.
Now you are not constrained with using proper InterpreterProxy struct
and proper VM version as it was with old VM.
In fact, new InterpreterProxy struct contains only 3 functions , and it's
used by plugins only in single function setInterpreter() to check VM version
and grab VM function pointers identified by name. See sqVirtualMachine.h,
where it's declared.
Make sure to regenerate plugin source when you moving plugin from internal to
external and vise versa.
Plugins and attached states.
Some plugins need to keep additional state per Interpreter instance to work properly.
This means, that primitives for different interpreter instances should access different
state, and for such need, HydraVM provides an interface for managing
attached states.
All instance vars, declared in plugin, and which is not global (#isGlobalVariableName:
returns false for given var name) now are kept in separate struct called PluginState.
When initializing, plugin calls #attachStateBuffer:initializeFn:finalizeFn:
, which will return an attached state id for use by plugin.
Code generator automatically generates a code for use attached state, when you have
any ivars declared as non-global in plugin. If plugin having #initializePluginState
and #finalizePluginState methods, then they will be used to initialize and finalize
per-interpreter plugin states.
You are free to attach as many state buffers as you need. Each interpreter instance
will have separate attached state buffer for same id.
To add attached state, use #attachStateBuffer:initializeFn:finalizeFn:. To access
attached state, use getAttachedStateBuffer(..) VM function.
You can look at win32 socket plugin and platform code, to see how to use attached
states.
Some notes about legacy primitives, how do they work
A legacy (or one might say, original primitives) is primitives which are not converted
to be used by HydraVM, and not aware that they are working in multi-threaded
environment.
The main difference between old primitives and new ones is, that new primitives
should use following function format:
sqInt (*) (struct Interpreter *);
while old primitives using this one:
sqInt (*) (void);
One nuance, which i found during initial stages of developing new VM, is a cdecl
calling convention, which helps me a lot to do much less changes in primitive
invocation mechanisms.
A cdecl call convention is really useful, in a way, that you can call function with
more arguments than it's expecting, so it actually works in this way:
all primitives is called with interpreter instance argument.
A new primitive functions are aware of that and put this argument in use, while
old primitives simply don't see it, but still working without stack corruption.
It's because that under cdecl calling convention, a stack cleanup is on hands of caller,
not callee.
So, it allows me to call all primitives in uniform way without any risk!
Risks that we taken for working with legacy primitives:
As you already know, there is no difference from VM side , if its calling a legacy
primitive or new one.
A check, if primitive is able to run for non-main interpreter is placed inside InterpreterProxy
functions, which simply setting success flag to false, so this primitive will fail
, if it called from non-main interpreter.
A risk here, is that before calling any of VM functions (like getting argument from
stack), some primitives can do something that may lead to unpredicted behavior.
Also, there are risks that primitive can use a return result from function before
checking a success flag. But this is likely will indicate that plugin code contains
bugs, which should be fixed :)
In perfect, VM should know, if it calling legacy primitive, and check if it going
to call it for non-main interpreter instance, and if so, do not try to call it at
all and report failure instead. This will prevent any chances that legacy primitive
contains code which incomatible with new VM and therefore may crash it.
Events in HydraVM
An event in HydraVM can be represented by any abstract structure, and
having only two mandatory fields:
typedef struct vmEvent {
struct vmEvent * volatile next;
eventFnPtr fn;
} vmEvent;
The first field - next, used to be able to put events into queue,
the second one is the function pointer of form:
typedef sqInt (*eventFnPtr)(struct Interpreter*, struct vmEvent*);
Any additional event payload can be implementation specific. For
instance, for channels i generating events which consisting of
information of destination channel + data buffer. So, event holds
everything in itself.
Each Interpreter instance having own event queue. Event queue
implementation belongs to 3 platform specific functions:
void ioInitEventQueue(struct vmEventQueue * queue);
void ioEnqueueEventInto(struct vmEvent * event , struct vmEventQueue * queue);
struct vmEvent * ioDequeueEventFrom(struct vmEventQueue * queue);
The requirements of enqueue/dequeue functions implementation is
simple: they should be atomic.
On windows i'm using InterlockedCompareExchange(), and on other
platforms, based on x86 architecture an equivalent is 'lock cmpxchg'
asm instruction.
To support other architectures, which may don't have atomic CAS
(compare and store) instructions, there can be a need in changing
vmEventQueue struct, to keep additional information, like mutex
handle, to ensure that enqueue/dequeue operations are thread safe.
If you still didn't catch how events working, here is some additional
information:
- since events are thread safe, you can generate an event from any
native thread, and don't need to take any additional steps for
synchronizing with VM/Interpreter instance. This, in particular, used
in SocketPlugin to signal semaphores when socket (which served by
separate native thread) changes it's state.
About event handling function:
this is the function which will be called when interpreter will
interrupt for handling events, so in this function you have
synchronized access to object memory, interpreter state e.t.c. and
don't have to worry about concurrency.
Also, function along with event payload are very convenient for
determining context and what event means and what it will do.
Instead of making dozens of event types, enumerating them.. then
writing a case statements, it's doing a simple dispatch
event->fn(interpreter, event); so, system are flexible and can handle
events of any kind doing anything you want.
As for example, suppose you wanna write a plugin which needs to post
events to interpreter, but with your own, custom handling code, and
with your event payload.
So, declare an event in form:
struct myEvent
{
struct vmEvent header;
int myField1;
int myField2;
..
};
Now to post event we simply can do:
myEvent * event = malloc(sizeof(myEvent));
event->header->fn = myHandler;
event->myField1 = ...
....
enqueueEvent(intr,myEvent);
Now, a handler function:
sqInt myHandler(struct Interpreter * intr, myEvent * evt)
{
... do something nasty here, knowing that you can't be trapped by
concurrency issues :)..
free(evt); // release memory, allocated for event
}
Known bugs
- (fixed) minimising Squeak window causes putting VM into infinite loop, stopping from responding
to user input and screen redraw .
- when running secondary image, VM can crash during window resizing. This error
can be reproduced by doing 'DisplayScreen startUp' repeatedly.
Wish list and further development
- make better, cleaner VM startup:
There are many initialization routines in platform specific code. They are
a total mess now, sparced over different files and require a specific invocation order.
I had troubles in some places to make interpreter properly initialized, because
some stages can be performed only after new interpreter instance is received, because
some state should be kept on a per-image basis.
- support a per-image startup options (a command line-like argument string should
be available for all images, not only for one which loaded from command line)
- redesign a security on a per-image basis (to say, i'd like to redesign
it from a scratch). Currently for this responsible a SecurityPlugin, but
it's not objectified. The problem is, that many plugins getting pointers to it's
funtions and calling them directly, assuming specific number of arguments (don't
use Interpreter instance as first argument).
SecurityPlugin is left unchanged because of compability reasons, because legacy
plugins can't use objectified functions and will call them with wrong arguments,
leading VM to crash.
With new design, we need parts representing global security options (for everything
in VM), and for per-interpreter instance as well.
- what is missing , and which should be done before HydraVM can be considered as fully
functional VM:
- a code which can unload image (stopping interpreter, doing save & quit or
simply quit)
This one, apart, could arise many difficulties and problems:
HydraVM can be considered 'resource-safe', when it will not show any memory leaks
after loading and closing a huge amount of images (1000000+) with single run.
I didn't harvested the code in this direction, and oh.. it can be tricky: some memory
leaks can be very hard to nail down, especially in such moderate complex system
as squeak VM.
- a higher level API for channels: implement a HydraStream class, which can be used
as a regular stream
- provide a generic framework to connect any kind of external event source with language
side using Hydra events & channels. Refactor current platform code to use new
framework for handling user input.
If you have any comments or questions, you can contact me (Igor Stasenko via email:
siguctua at gmail.com) or Andreas Raab. Also, fell free to post your comments/questions
on squeak-dev mailing list <squeak-dev@lists.squeakfoundation.org>, if you
wish to discuss them with more broader audience.