Phix Bytecode Generation

So we have input in Three Address Code IL form, and we want to spit out byte code ready to do something with it.

Before doing anything with it, lets consider what can be done with it:

  • Output to file in PHBC for later execution either under an interpreter or a JIT compiler
  • Compile into native code and output a file

VM Interaction

The VM provides a lot of services to executing (and compiling) code. For example, by providing metadata and runtime symbol registration services.

Symbol Management

All symbols exist on a per execution context basis, which means that if there were two phix interpreters running in the same memory space, the function ‘moo’ could point to two different places.

There are 5 types of symbols that need managing:

  • Constants
  • Variables
  • Functions
  • Classes
  • Namespaces

Before tacking each one, lets consider the problem we (the compiler) are trying to solve.

let’s say we have two files, lib/A.php and index.php:

<?php
  // lib/A.php
  echo "Hello, $name\n";
<?php
  // index.php
  $name = 'Theo';
  include 'lib/A.php';
  echo 'How are you?';
  $name .= ' Zourzouvillys';
  mail('theo@crazygreek.co.uk', 'Who are you?', "I'm $name");

The problem

The big problem in index.php is the include statement. While include is really useful, it massively limits what the compiler can optimize. If that include statement was not there and the code was instead inline, we could optimize $name out completely:

  echo "Hello, Theo\n";
  echo 'How are you?';
  mail('theo@crazygreek.co.uk', 'Who are you?', "I'm Theo Zourzouvillys");

Howerver, we can’t, and include is here to stay, so we need to provide an efficient method for sharing variable scope across multiple files, but at the same time allowing maximum speed in accessing variables locally.

Variables

Global Context

While compiling, the SSA phase builds up a list of global variables that need to be both imported and exported:

<?php
  // start of file
  $var1 = 'Theo';
  include 'lib/A.php';
  $var2 = 1;
  $var3 = null;
  // end of file

In the above example, var1 would be in the export table, and var1, var2 and var3 would be in the import table.

What this means is come runtime when a file is imported, every symbol offset in the import section of the metainfo is looked up in the Context Symbol Resolver DAtabase (CSR DB) for the given AppDomain. If an existing entry is found, it’s given position is memory is provided to the importer, and the usecnt is incremented.

If it is not found, then we allocate a memory slow for it, set the position to null, and add the slot into the CSR DB.

Of course, variable variables that can not be deduced at compile time still need to query the CSR DB, but normal variables are referenced via an offset in the lookup table for the given file, which means that it’s fast.

[CompileUnit] => Array
(
  [CompileUnitID] => '91b5f467-440e-43fc-ad89-d5f617d518a2',
  [Symbols] => Array
  (
    [Variables] => Array
    (
      [0] => var1
      [1] => var2
      [2] => var3
    )
  )
)

in code, $var2 is accessed at memory position:

 Context.BaseOffset + ContextSymbolResolver[AppDomain.CompileUnit['91b5f467-440e-43fc-ad89-d5f617d518a2'].FileOffset].SymbolOffset[1]]

While this may seem a bit complicated, it’s massivly simplified when we consider that the frame stack frame always contains out FileOffset in the current AppDomain:

  Context.BaseOffset + ContextSymbolResolver[base_stack_frame + X].SymbolOffset[1]

Note however that Context.BaseOffset can change at any point a rw variable might be created (function calls, rw variable access, going out of scope, etc), we we can’t keep it’s position around in register in these cases. Note however that function/method calls with the CallHasNoSideEffect attribute can be guaranteed to not create new global symbol scope (but doesn’t exclude it from reading/writing to already existing ones, unless CallHasNoGlobalAccess is also set)

A symbols offset in the global symbol table can be found using the phix_internal_variable_offset_ro(VariableName). An invalid symbol with have a position of -1UL.

Symbols can also be created by way of phix_internal_variable_create(VariableName), or phix_internal_variable_offset_rw(VariableName)

function xx()
{
  global $moo;
}
function xx()
{
  $moo =& Context.BaseOffset + ContextSymbolResolver[base_stack_frame + X].SymbolOffset[phix_internal_variable_offset_rw("moo")];
}

$GLOBALS

$GLOBALS is just a reference to a special object that does lookups within the global symbol table. The reasons for this rather than doing it the other way around (i.e, just re-writing $moo in global scope to be $GLOBALS[’moo’] is performance due to the optimizer being able to handle SSA in variables a lot better than in arrays.

We should probably add a -fphix-no-global-array option or something to stop this object being generated in a compiled unit, as we won’t be able to tell if it’s never being referenced, but removing it will enhance performance.

function moo($x, $y)
{
  return $$x[$y]; // $x might be 'GLOBALS' for all the compiler knows.
}

Any self respecting library would never dream of touching $GLOBALS anyway.

Function/Method Context

Functions can be optimized even more heavily under a lot of circumstances: when a function doesn’t perform any symbol table clobbering such as:

  • extract
  • $$xxx

Then we can actually just store all variables on the stack without any symbol table at all.

In the case where symbol table clobbering takes place, we secretly create an array of references to all the local variables.

 
  function xxx()
  {
 
    $__SUPER_SECRET_INTERNAL_FUNCTION_REFERENCES = array();
 
    $moo = 1;
 
    $__SUPER_SECRET_INTERNAL_FUNCTION_REFERENCES['moo'] = &$moo;
 
    echo $moo, "\n";
 
    $cows = null;
 
    $__SUPER_SECRET_INTERNAL_FUNCTION_REFERENCES['cows'] = &$cows;
 
    $cows = new Object;
 
    $arr = array();
 
    $__SUPER_SECRET_INTERNAL_FUNCTION_REFERENCES['arr'] = &$arr;
 
    $arr[] = 4;
 
    // We can now mess with symbol table
 
  }
 

One very important thing to remember is:

using symbol table clobbering functions will break a large number of optimizations due to aliasing

Constants

Constants are handled in almost the same way as variables, except they’re (obviously!) constant, and as such, can’t be accessed through the AppDomain’s Context _rw methods.

Unlike PHP, there is absolutely no way we allow constants to be redefined at runtime. the reason for this is that constants are constant!

Functions

Functions are an odd case, as the compiler *may* be able to know the exact function to be called, as long as it’s in global scope, in which case we can perform a number of massive optimizations, as we can do tail call optimization, run certain functions without side effects in parallel, jump straight to position without doing a symbol lookup, and inline simple functions.

For this reason, when ever you can, use compile time includes, as without them, we have to leave include to runtime, which may of course

Compile time includes are defined by way of the static_include() function, which works just like require_once, except it is done at runtime. In the future this will be replaced with import sutff, however it’s useful for backward compatibility with PHP.

 
// for backward compat with PHP:
if (!defined("PHIX_UBERFUNKY")) { function static_include($file) { require_once($file); } }
 
static_include('inc/config.php');
 
 ... or ...
 
if (!defined("PHIX_UBERFUNK")) {
  static_include("test.php");
}
else {
  require_once('test.php');
}

Note that static_include is actually a compile time parameter, and as such can only take constant parameters. Basic string manipulation along with any functions that are marked with the <CompileTime> flag will work at compile time when used in a compile time context (such as static_include):

  static_include(dirname(__FILE__) . "/inc/config.php");

Internal Function

PHP has a huge number of internal functions - to speed up lookup, we use the same symbol table lookup trick, which is bound at runtime in much the same way a linker does. Only when a function is called in a completely dynamic scope will we resort to looking it up in the global function table followed by the local context.

the only difference is that internal functions only need to be exported, as we do direct jumps for ones that are in the same compile unit (which may be over multiple files) at global scope.

Functions that are not defined at global scope require a virtual lookup, so that’s where the symbol resolution comes in:

 
if ($argv[0] == 'a')
{
  function a ()
  {
    return 1;
  }
}
else
{
  function a()
  {
    return 2;
  }
}
 
$x = a();

In the above case, we can’t possibly know which function will be defined as a, so we create a symbol table entry for A, and then dereference that for the jump to CALL.

Classes

Classes are essentially the same issue as functions, except that we also have parents to worry about (don’t we always?).

Optimization can only be performed if a class is defined before it is used, in global namespace, and all of it’s parents are also defined.

Namespaces

FIXME: Namespace lookups are fairly simple at runtime, document.

 
internals/bytecode_generation.txt · Last modified: 2007/03/26 12:55 by 80.249.108.13
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki