Performance Ideas

This page contains a series of ideas, each of which should be tied to a specific JIRA where the discussion / resolution will occur.

Reducing State Switches

This idea is based on the fact that with Region caching enabled, almost everything we do is rendering images and text. Right now, the first time a checkbox is rendered (for example), we first render it to an image, store the image in a cache, and thereafter whenever we have to render the checkbox we do so by rendering the cached image (simplified, but you get the point). When we render text, we are also rendering images, but with a different shader. At the moment that means that to render a checkbox, we first setup the shader for rendering from an image, render, and then switch to the text shader and render text. If you have a page with 20 check boxes, we end up doing 40 state switches.

Further, every UI control is essentially the exact same thing – either it is only images, or it is images + text.

The first idea, then, is to have a single shader which can handle images and text. If this is possible, it would mean that we wouldn't have to perform state switching for most of a normal business UI. most of the UI is made up of Regions (controls), Text, and Images. All of those could be handled by a single shader.

The second idea is that we could have a pre-baked image cache for Modena that we simply upload at startup. In this way we avoid the initial rasterization pass entirely (that is, we don't have to to first draw to an image and then draw from the image to the back buffer).

The third idea is to add 9-slice support to Region for those cases where it can be supported, such that we don't have to redraw things just because they are taller but can still use the cached images.

Preliminary testing with CheckBox seems to indicate a potential 6x improvement in performance for this case (where you have a hundred or so check boxes on the scene). The numbers for TableView were only marginally better – perhaps due to the overhead in CSS / Layout related to the table, although this analysis is speculative.

Reducing Redundant Relayout

I believe we presently do much more work per scene than is required when a single component nested deep in the structure of the scene graph has changed its preferred size and requires layouts to execute. The way this is supposed to work at present is that, when a Node calls requestLayout, it is assumed that this node may have changed in such a way that its preferred size, min size, or max size has changed such that these changes would impact how the node is laid out in its container.

If the container is a Group, then during the layout pass when the child is resized to its new preferred size, this change in size might also impact the size of the Group. If the Group is a child of a layout container, the change in the group size will also impact the layout of items within that parent layout container, and require another layout pass.

In the normal course of affairs, when a node's requestLayout is called, it walks up the tree marking each parent in the tree as also needing to have layout applied. This is because the change in the pref width of a button may in fact impact the pref width of its parent container which may affect the pref width of the parent container's parent container, and so on. If one of those parent containers is a layout root (such as the content pane of a ScrollPane) then we don't walk any further up in the hierarchy since we know a change to the nodes within the layout root will have no impact on the pref width / height of the layout root.

WIth this basic understanding, a few ideas come to mind:

Verify that all of the Nodes (such as Controls) which can be content roots are properly identified as such. For example, the TableView, ListView, and TreeView should be content roots.
When running a layout pass, have the ability to determine whether a dirty layout node's pref / min / max size has changed. If not, then we have no need to run layout on this node, but can proceed to asking the dirty layout children to lay out themselves.
During a typical layout pass, the parent asks each child for its prefWidth, prefHeight, minWidth, minHeight, maxWidth, and maxHeight (or maybe 4 of those 6). It then proceeds to perform the layout algorithm 3 or more times. Suppose I have root R and layout container L with children C1-C3. When R attempts to lay itself out, it first asks L for its prefWidth (say). To figure out its pref width, L must get the pref width / height of C1 - C3 and perform the layout algorithm, so that it knows what its preferred width is. R then asks L for its minWidth (say), and L must then ask for the min width of C1 - C3 and run the layout algorithm to figure out what its min width is. And so forth. Multiple passes on complex layout algorithms is likely hurting us substantially. In retrospect I might have said that min/max was never computed only ever specified manually, which would have probably had a big positive impact in terms of performance. Nevertheless, the fact that we have to run this multiple times even when not strictly necessary is probably a cause of poor performance.

Investigate Native GUI Timer, Pulses and Event Flow

Especially when the system is stressed, FPS can be sensitive to the timing of events and the pulse timer. Right now, it is undefined when native GUI events, FX pulse events and runLater() actions happen other than flooding the system with runLater()'s will not starve native GUI events.

There is some annicdotal evidence that suggests using the native GUI timer on OS X improves performance.

Preinitialize Controls to Well Know CSS Default Values

Rather than running CSS at start up, precompute the defaults and initialize FX to have these values. This should improve start up time.

Support Instancing in JavaFX / Better Texture Caching

Right now, it is possible to ask FX to cache a node. This causes the node to be represented to a texture on the graphics card and the texture is retained for future draws. This can make drawing of the node much faster provided that the node is not changing and the node is drawn a lot. Turning caching on is not always a win and needs to be done carefully by the application programmer (if at all).

There is evidence that application caching is more performant that system caching. Application code that renders static content to an image and then uses the same image in many different nodes is effectively caching. The image is represented as a single texture and that texture is on the graphics card.

Instancing would allow the application programer to declare that identical nodes are shared in the render tree. This would allow the system to cache and optimize drawing.