64-bit Unix Squeak Frequently Asked Questions

2007-10-29

(Many thanks are due to Andrew Gaylard and David Lewis for putting this together.)

Questions

What is a 32-bit image?
What is a 64-bit image?
Can I run a 32-bit image on my 64-bit computer?
Can I run a 64-bit image on my 32-bit computer?
Can a single VM run both 32-bit and 64-bit images?
What is a 64-bit VM?
Can I run a 32- or 64-bit VM on my computer?
For which hardware/OS combinations is there a 64-bit VM?
Does this VM run both 32-bit and 64-bit images?
I have a 64-bit computer; should I use a 64-bit VM to run my 32-bit Squeak images?
What are the advantages of using 64-bits?
What are the disadvantages of using 64-bits?
My OS allows a single process to grow to 3.75GB; if I need a lot of objects can I use a 32-bit image / VM, or must I use a 64-bit one?
Where can I find a 64-bit image?
Why isn't there an officially-released 64-bit image?
How can I make a 64-bit image from a 32-bit image?
How does a 32-bit VM manage to run a 64-bit image if pointers are only 32-bits wide?
What sizes and alignment does the new 64-bit image format use for pointers and integers?
How do I tell if a given image file is 32-bit or 64-bit?

Questions, with answers

What is a 32-bit image?
A 32-bit image is an image in which the object memory uses a 32-bit word size for object pointers, limiting its total size to a maximum amount of 4GB of memory. The formats of object memory and object pointers are defined in class ObjectMemory (see the class comment for a basic explanation). As of this writing, all Squeak images of practical interest are 32-bit images.

What is a 64-bit image?
A 64-bit image is an image in which the object memory uses a 64-bit word size for object pointers, allowing the size of the image to grow beyond 4GB of memory. Squeak now supports a 64-bit image format that is sufficient to produce a working system, but which is intentionally simple. It is expected to be modified and extended to take advantage of additional 64-bit capabilities in the future.

Can I run a 32-bit image on my 64-bit computer?
Yes. A 32-bit image can be run on either a 32-bit VM or a 64-bit VM. Some computer platforms (e.g. 64-bit Linux) can run both the 32-bit VM and 64-bit VM on the same system.

Can I run a 64-bit image on my 32-bit computer?
Yes. If you build a VM with the "64-bit VM?" check box selected, you will create a VM that runs 64-bit images. This will work on 32-bit host systems as well as on 64-bit host systems.

Can a single VM run both 32-bit and 64-bit images?
No. For any given computer platform, two different VMs are required to run 32-bit and 64-bit images. The type of VM that you build is governed by the "64-bit VM?" check box in VMMaker, and is independent of the word size of your computer. While it would be possible to create a VM that is "smart" enough to run both 32-bit and 64-bit images, this is currently of little practical value due to Squeak's reliance on plugins that are linked to 32-bit or 64 external library code.

Any combination of 32/64 bit VM and 32/64-bit image is possible, but note that all currently available Squeak images are still in 32-bit format, and most (perhaps all) pre-built VMs are 32-bit applications.


What is a 64-bit VM?
A 64-bit VM is one which is compiled with the LP64 or ILP64 data model. This means, in C terms, that pointers and longs are 64-bits wide.

Can I run a 32- or 64-bit VM on my computer?
It depends. Some current architectures, such as the x86-64 and the UltraSPARC, can run 32-bit as well as 64-bit applications; these are known as "bi-arch" systems. However, some systems, such as the Alpha, can only run 64-bit applications. For bi-arch systems, you can choose whether to run a 32-bit or 64-bit VM. For 64-bit-only systems, you don't have that choice; you can only run a 64-bit VM, since there's no way of compiling a 32-bit application.

For which hardware/OS combinations is there a 64-bit VM?
  • Linux on 64-bit architectures: x86-64, SPARC64, Alpha, Power64, etc.
  • Solaris on x86-64 and SPARC64
  • MacOS on Power64
  • Windows on x86-64

Does my 64-bit VM run both 32-bit and 64-bit images?
No. Any VM will run either 32-bit or 64-bit images, but not both. You can select one or the other when you generate sources with VMMaker, and you can install both flavors of VM on your system (one each for 32-bit images and 64-bit images).

If you try to run a 64-bit image with a VM built for 32-bit images, you will get an error message such as this:

This interpreter (vers. 6502) cannot read image file (vers. 68000).
If you try to run a 32-bit image using a VM built for 64-bit images, you will get an error message such as this:
This interpreter (vers. 68000) cannot read image file (vers. 6502).

I have a 64-bit computer; should I use a 64-bit VM to run my 32-bit Squeak images?
It depends. Either one will work, but if your image depends on plugins that are only available for 32-bit systems, use the 32-bit VM. Otherwise, if you are building your own VM, go ahead and use the 64-bit version.

What are the advantages of using 64-bits?
The first advantage is that your image size can be enormous. If you need the size of your VM code plus in-memory image to exceed 4 GB, then a 64-bit image running on a 64-bit VM is for you. Note that it will take ages to write out an image that's this big to disk. The sort of applications that need this are those which load a small(ish) image, and run code that creates millions of objects, but don't save them back to disk in the image. Keep in mind that the garbage collector is probably not up to the task of collecting multiple gigabytes.

Another advantage is that certain architectures (e.g. the Alpha) don't offer a 32-bit mode; they are 64-bit only. For such machines, a 64-bit VM is required; the image may be 32- or 64-bit.

Another advantage is that when the 64-bit-VM is built, the C compiler knows the ABI is different from the 32-bit ABI. The x86-64 case is an interesting example: the old i386 ABI offered few registers, used i387 floating-point, and passed parameters on the stack (remember, memory writes are slower than register moves). The x86-64 ABI and architecture, on the other hand, has many more registers, has SSE, SSE2, etc. for FP, and passes parameters in registers where possible. It also has additional instructions (MMX et al). All of these CPU and ABI features may make for a VM that runs faster, but only if (a) the compiler is able to make use of them and (b) is told to do so at compile-time. However, the gains are unlikely to be much, and will also be offset by the cost of large pointers (see below). If you're looking for performance, it's important to measure a 32-bit VM with a 32-bit image versus a 64-bit VM with a 64-bit image before assuming anything.


What are the disadvantages to using 64-bits?
A disadvantage to 64-bit code is that pointers are 8 bytes instead of 4; they are also aligned on 8-byte boundaries, meaning that some space around them, known as `padding', is wasted. This means that (a) pointers take more space in RAM, (b) take more memory bandwidth when the CPU loads and stores them, (c) take up valuable space in on-chip caches, and (d) will have greater wastage due to padding compared to 32-bit pointers aligned on 4-byte boundaries. For most users, the upper 32-bits be zero, so it makes little sense to load, process and store pointers that are double the size but only half-used. So for these users, a 32-bit VM and image is a good choice.

Another disadvantage is that most users use the 32-bit VM and a 32-bit image. This combination is therefore the most tested, and therefore most stable, combination.

A third disadvantage is that code for many of the plugins is not yet ported to a 64-bit VM.


My OS allows a single process to grow to 3.75GB; if I need a lot of objects can I use a 32-bit image / VM, or must I use a 64-bit one?
There have in the past been problems related to the so-called "2-GB limit". This is due to conversion to and from signed 32-bit integers to 32-bit pointers in the VM code. These problems appeared when the operating system loaded the image into memory at addresses above the 2GB mark, and could occur with normal-sized images, not only images larger than 2GB. These issues should be a thing of the past. Use the most recently-released VM for your platform, and report any problems that you see. You should only *need* a 64-bit VM and 64-bit image if your image size will exceed 4GB.

Where can I find a 64-bit image?
These are scarce. The original 64-bit port project (from Ian and Dan) includes a 64-bit image that worked with the VM distributed at that time. A current VM cannot execute the original 64-bit image due to changes in the interpreter since that time. It is possible to update that original image using a modified VM, and the resulting image is executable using a current unmodified VM. However, there are no official or supported releases of 64-bit images at this time.

Why isn't there an officially-released 64-bit image?
Lack of interest: most people don't need a 64-bit image.

There may yet be some changes to the 64-bit image format to take advantage of features of 64-bit CPUs. For instance, 63-bit tagged integers might be possible.


How can I make a 64-bit image from a 32-bit image?
Use the SystemTracer (SystemTracerV2 on SqueakMap). The original 64-bit Squeak image was created using this tool, and a sufficiently motivated person should be able to reproduce the job. However, the SystemTracer does not currently work on little-endian computers (including Intel), so some work should be expected in order to enhance SystemTracer before a successful conversion will be possible.

How does a 32-bit VM manage to run a 64-bit image if pointers are 32-bits?
The short answer: It relies on the image size being smaller than 4GB.

The long answer: Object pointers within the object memory are not pointers in the C sense of the word. The VM needs to be able to convert the object pointers into C pointers, and this can be done on either a 32-bit host or a 64 bit host. The only caveat would be that if a 64-bit image grew to a size large enough to use object pointers larger than the 32-bit limit (i.e. an image approaching 4GB in size), then a 64-bit VM would be required.


What sizes and alignment does the new 64-bit image format use for pointers and integers?
Object pointers are 64-bits wide, allowing for memory up to 2^64 bytes to be directly addressable. They are aligned on 8-byte boundaries. Integers are still implemented as tagged 31-bit values, but are sign-extended to use the full 64-bit object word size and therefore are aligned on 8-byte boundaries. Future enhancements to 64-bit Squeak will probably make use of the larger word size to increase the range of SmallInteger values, which will require further changes to both the VM and the image in order to be effective. These alignments were chosen as most 64-bit CPUs require them.

The object header and pointer formats for both 32-bit and 64-bit images are documented in the class comment of ObjectMemory (in the VMMaker package). The conversions to and from host data types are done in platforms/Cross/vm/sqMemoryAccess.h using either macros or inline functions. The actual conversions vary from host to host, and are controlled by macros such as SQ_HOST64 and SQ_IMAGE32 which must be set for that host. In the case of a Unix VM, the configure utility is used to specify the characteristics of the host platform.

The word size to be used in the object memory is specified by the SQ_VI_BYTES_PER_WORD macro on src/vm/interp.h. This file is created in the VMMaker code generation process, and the value of SQ_VI_BYTES_PER_WORD is determined by the "64-bit VM?" check box in the VMMaker tool.

In summary, the object word format is described in class ObjectMemory, the host data type conversions are specified in sqMemoryAccess.h, and the image word size is specified in interp.h.


How do I tell if a given image file is 32-bit or 64-bit?
The first four bytes in the image file are a "magic" value that indicates the image word size. This is specified in Interpreter>>imageFormatVersion. For most images, this will be the first four bytes of the image file, although in some cases the image data may be offset by 512 bytes in order to permit an image file to be treated as an executable program on Unix platforms (see http://en.wikipedia.org/wiki/Shebang_(Unix)).

For instance, if I load VMMaker-3.8b6 into a stock Squeak3.9a-7024.image file, I see this:

        imageFormatVersion
                "Return a magic constant that changes when the image format
                changes. Since the image reading code uses this to detect byte
                ordering, one must avoid version numbers that are invariant
                under byte reversal."

                BytesPerWord == 4
                        ifTrue: [^6502]
                        ifFalse: [^68000]
  
Examining the file itself gives this:
        apg at breakfast: ~/squeak xxd Squeak3.9a-7024.image | head -1
        0000000: 0000 1966 0000 0040 011c 7ee0 0427 b000  ...f... at ..~..'..
  
Looking at the first four bytes gives this:
          apg at breakfast: ~/squeak perl -e 'print 0x1966'
          6502
  
(Or do "16r1966 " in a workspace; it also returns 6502.)

So Squeak3.9a-7024.image is a 32-bit image file (since BytesPerWord == 4).