Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The PPC port adds some minor extensions to the C2 compiler to support the PPC architecture.  We also implemented some new optimizations.

Extend adl/adlc by effect TEMP_DEF.

  *** currently under work ***

Adl is the language used to express code generation patterns in the C2 compiler of hotspot. Effects are used to specify additional properties of IR nodes that can not be derived from the match rules.  E.g., the USE_DEF effect allows to specify that a register used by a node and a register defined by the node must be identical, as required by two-address assembler instructions.

The new TEMP_DEF effect is similar to USE_DEF, except that a TEMP node will be generated that represents the USE.

With this effect one can express that the def'ed register must be different from the used ones corresponding to ins. Currently this is already possible by specifying effect TEMP for the operand with effect DEF from the match rule.

Introducing this new identifyer makes the code more readable and allows to specify the effect for nodes without match rules.

An example is an optimized encode node, if the base of the compressed heap is 35G aligned, i.e., the shifted narrow oop can be merged with the base by an or instruction.

On PPC we can shift and or with a single instruction, so we can implement Decode with these instructions:

mov Rdst = Rbase
rldimi Rdst = Rdst || (Rsrc << 3)

As the move is off the critical path, this is superior to do a shift and an add. Unfortunately we must guarantee that Rdst != Rsrc. which we do with a TEMP_DEF effect:

instruct decodeN(iRegPdst dst, iRegNsrc src) %{
    match(Set dst (DecodeN src));
    effect(TEMP_DEF dst);

ImplicitNullChecks on operating systems where the zero page is not read protected.

The C2 compiler uses load or store operations with sufficiently small offsets to do null checks. It risks that the protected page at address zero is hit, and returns via the signal handler to handle the Java NullPointerException. The code assumes that the zero page is read and write protected.  Alternatively, the optimization can be switched off altogether. ImplicitNullChecks are a very profitable optimizations On Aix, the zero page is only write protected, thus ImplicitNullChecks as described above can not be used. We extended the optimization to work on Aix.

We introduce a property zero_page_read_protected in the os layer. This is set to false on Aix, to true on others. The ImplicitNullCheck optimization only considers Stores as operations to check for NULL if this property is set.

With compressed oops, a special protected page is placed before the heap so that a decoded NULL points into this page and an access traps. The protection of this page is independent of the protection of the zero page. Our fix considers this so that for heap-based compressed oops also Loads are considered as null check operations.
http://hg.openjdk.java.net/ppc-aix-port/jdk8/hotspot/rev/2c8a195051eb

Trap based null and range checks

PPC offers an instruction that does a compare-and-trap.  I.e., it throws a SIGTRAP if the comparison failed. This instruction is very cheap if the comparison is true.

We use this instruction to do NullChecks where ImplicitNullChecks are not applicable. Especially on Aix, where the zero page is not read protected, these are many. Further we do range checks with this instruction.

We introduce these instructions during matching if the probability of the branch to be taken is very low. Unfortunately this requires changing shared code, as the block layout algorithm and construction of the ... are not aware of these nodes.

We introduce two new flags TrapBasedNullChecks and TrapBasedRangeChecks and a function Node::is_TrapBasedCheckNode().  We adapt FillExceptionTables() and PhaseCFG::fixup_flow() to handle these nodes. In fixup_flow() we take care that the case that traps is the taken branch.
http://hg.openjdk.java.net/ppc-aix-port/jdk8/hotspot/rev/f9ce6652d9c2

This change also requires that the linux os implementation supports SIGTRAP which is introduced here:
http://hg.openjdk.java.net/ppc-aix-port/jdk8/hotspot/rev/5792f69f1881

Adlc: Fields in MachNodes.

We extend adl by a specification of fields and adlc to generate these.

If an instruct specification contains a line ins_<name>(<val>), adlc generates a function ins_<name>() returning the constant <val>.

We extended adlc to generate a field in the node if the declaration is ins_field_<name>(<type>). The field declaration is <type> _<name>;
http://hg.openjdk.java.net/ppc-aix-port/jdk8/hotspot/rev/d1dfaa602f1a

Adlc: Specify which nodes to rematerialize.

Adlc has a row of internal heuristics that determines which nodes should be
rematerialized during register allocation. This is not very well suited for
our port. Therefore we introduce to adl annotations ins_cannot_rematerialize() and
ins_should_rematerialize() that complement this heuristic. 
http://hg.openjdk.java.net/ppc-aix-port/jdk8/hotspot/rev/eabde81a086d

Handling ordered loads and stores

...

Below you find an example how to use late expand for the sparc.ad file. Further down you see the code generated by adlc. Perhaps you can find better use cases for this feature.
               
--- a/src/cpu/sparc/vm/sparc.ad   2012-11-21 12:27:04.591486000 +0100      
+++ b/src/cpu/sparc/vm/sparc.ad    2012-11-19 14:45:15.059452000 +0100                         
@@ -1933,7 +1937,7 @@                                                                            
 }                                                                                               

 // Does the CPU require late expand (see block.cpp for description of late expand)?
-const bool Matcher::require_late_expand = false;
+const bool Matcher::require_late_expand = true;
 // Should the Matcher clone shifts on addressing modes, expecting them to
 // be subsumed into complex addressing expressions or compute them into
@@ -7497,6 +7501,7 @@
 // Register Division
 instruct divI_reg_reg(iRegI dst, iRegIsafe src1, iRegIsafe src2) %{
   match(Set dst (DivI src1 src2));
+  predicate(!UseNewCode);
   ins_cost((2+71)*DEFAULT_COST);
   format %{ "SRA     $src2,0,$src2\n\t"
@@ -7506,6 +7511,68 @@
   ins_pipe(sdiv_reg_reg);
 %}
+//------------------------------------------------------------------------------------
+
+encode %{
+
+  enc_class lateExpandIdiv_reg_reg(iRegI dst, iRegIsafe src1, iRegIsafe src2) %{
+    MachNode *m1 = new (C) divI_reg_reg_SRANode();
+    MachNode *m2 = new (C) divI_reg_reg_SRANode();
+    MachNode *m3 = new (C) divI_reg_reg_SDIVXNode();
+
+    m1->add_req(n_region, n_src1);
+    m2->add_req(n_region, n_src2);
+    m3->add_req(n_region, m1, m2);
+
+    m1->_opnds[0] = _opnds[1]->clone(C);
+    m1->_opnds[1] = _opnds[1]->clone(C);
+
+    m2->_opnds[0] = _opnds[2]->clone(C);
+    m2->_opnds[1] = _opnds[2]->clone(C);
+
+    m3->_opnds[0] = _opnds[0]->clone(C);
+    m3->_opnds[1] = _opnds[1]->clone(C);
+    m3->_opnds[2] = _opnds[2]->clone(C);
+
+    ra_->set1(m1->_idx, ra_->get_reg_first(n_src1));
+    ra_->set1(m2->_idx, ra_->get_reg_first(n_src2));
+    ra_->set1(m3->_idx, ra_->get_reg_first(this));
+
+    nodes->push(m1);
+    nodes->push(m2);
+    nodes->push(m3);
+  %}
+%}
+
+instruct divI_reg_reg_SRA(iRegIsafe dst) %{
+  effect(USE_DEF dst);
+  size(4);
+  format %{ "SRA     $dst,0,$dst\n\t" %}
+  ins_encode %{ __ sra($dst$$Register, 0, $dst$$Register); %}
+  ins_pipe(ialu_reg_reg);
+%}
+
+instruct divI_reg_reg_SDIVX(iRegI dst, iRegIsafe src1, iRegIsafe src2) %{
+  effect(DEF dst, USE src1, USE src2);
+  size(4);
+  format %{ "SDIVX   $src1,$src2,$dst\n\t" %}
+  ins_encode %{ __ sdivx($dst$$Register, 0, $dst$$Register); %}
+  ins_pipe(sdiv_reg_reg);
+%}
+
+instruct divI_reg_reg_Ex(iRegI dst, iRegIsafe src1, iRegIsafe src2) %{
+  match(Set dst (DivI src1 src2));
+  predicate(UseNewCode);
+  ins_cost((2+71)*DEFAULT_COST);
+
+  format %{ "SRA     $src2,0,$src2\n\t"
+            "SRA     $src1,0,$src1\n\t"
+            "SDIVX   $src1,$src2,$dst" %}
+  lateExpand( lateExpandIdiv_reg_reg(src1, src2, dst) );
+%}
+
+//------------------------------------------------------------------------------------
+
 // Immediate Division
 instruct divI_reg_imm13(iRegI dst, iRegIsafe src1, immI13 src2) %{
   match(Set dst (DivI src1 src2));


Code generated by adlc:

class divI_reg_reg_ExNode : public MachNode { 
  // ...
  virtual bool           requires_late_expand() const { return true; }
  virtual void           lateExpand(GrowableArray <Node *> *nodes, PhaseRegAlloc *ra_);
  // ...
};

...