Page History

This page contains a series of ideas, each of which should be tied to a specific JIRA where the discussion / resolution will occur.

Graphics

Command Buffer

Right now, JFX has two threads: The UI thread and the Render thread. The render thread currently works in direct mode. It traverses the render graph hierarchy and issues graphics commands to the card. This incurs latency on some calls when the render thread needs to wait for the graphics card to execute the current operation. When a node is rendered, the output is normally a request to run a certain shader and a set of vertices for the shader. Rather than talking directly to the card, the command buffer saves a list of commands and their arguments and then puts the final result together to give to the graphics card.

How does this improve performance? Since graphics commands are saved, the Java code that computes vertices does not need to run. More importantly, threads can compute command buffers concurrently and not incur latency from the graphics card.

https://javafx-jira.kenai.com/browse/RT-23462

Multi-Threading

Modern CPU's have many more cores that could be taken advantage of so multi-threaded rendering is a necessity. Using a command buffer, multiple threads can process branches of the render graph hierarchy. A single render thread is responsible for executing the command buffers. While command buffer threads are executing, they can request resources from the render thread so that when it comes time to execute the buffer, textures and other resources have been created.

Reducing State Switches ("super shader")

This idea is based on the fact that with Region caching enabled, almost everything we do is rendering images and text. Right now, the first time a checkbox is rendered (for example), we first render it to an image, store the image in a cache, and thereafter whenever we have to render the checkbox we do so by rendering the cached image (simplified, but you get the point). When we render text, we are also rendering images, but with a different shader. At the moment that means that to render a checkbox, we first setup the shader for rendering from an image, render, and then switch to the text shader and render text. If you have a page with 20 check boxes, we end up doing 40 state switches.

...

Preliminary testing with CheckBox seems to indicate a potential 6x improvement in performance for this case (where you have a hundred or so check boxes on the scene). The numbers for TableView were only marginally better – perhaps due to the overhead in CSS / Layout related to the table, although this analysis is speculative.

https://javafx-jira.kenai.com/browse/RT-30741

https://javafx-jira.kenai.com/browse/RT-30922

Preserve the Back Buffer

If we are not updating the whole screen on each pulse then we could benefit from preserving the framebuffer. We would need to explicitly clear dirty regions before rendering to them.

https://javafx-jira.kenai.com/browse/RT-30721

https://javafx-jira.kenai.com/browse/RT-20356

https://javafx-jira.kenai.com/browse/RT-30723

https://javafx-jira.kenai.com/browse/RT-30361

Optimize String Measuring

It is no secret that the cost of string measuring can have a huge effect on performance. String measuring operations are called often in FX to determine the preferred size of controls and layout happens often in FX as application code changes the contents of controls.

It's easy to see the same strings being measured over and over again. We could fix the callers to cache/call less or cache way down deep inside of Prism.

https://javafx-jira.kenai.com/browse/RT-30158

Implement Hardware Layers

We could be taking advantage of hardware layers to speed up composition.

https://javafx-jira.kenai.com/browse/RT-30719

Smooth Animation

This is not a performance optimization silky smooth animation makes a program seem faster and look more polished.

Controls

Reducing Redundant Relayout

...

Verify that all of the Nodes (such as Controls) which can be content roots are properly identified as such. For example, the TableView, ListView, and TreeView should be content roots.
When running a layout pass, have the ability to determine whether a dirty layout node's pref / min / max size has changed. If not, then we have no need to run layout on this node, but can proceed to asking the dirty layout children to lay out themselves.
During a typical layout pass, the parent asks each child for its prefWidth, prefHeight, minWidth, minHeight, maxWidth, and maxHeight (or maybe 4 of those 6). It then proceeds to perform the layout algorithm 3 or more times. Suppose I have root R and layout container L with children C1-C3. When R attempts to lay itself out, it first asks L for its prefWidth (say). To figure out its pref width, L must get the pref width / height of C1 - C3 and perform the layout algorithm, so that it knows what its preferred width is. R then asks L for its minWidth (say), and L must then ask for the min width of C1 - C3 and run the layout algorithm to figure out what its min width is. And so forth. Multiple passes on complex layout algorithms is likely hurting us substantially. In retrospect I might have said that min/max was never computed only ever specified manually, which would have probably had a big positive impact in terms of performance. Nevertheless, the fact that we have to run this multiple times even when not strictly necessary is probably a cause of poor performance.

Preinitialize Controls to Well Know CSS Default Values

Rather than running CSS at start up, precompute the defaults and initialize FX to have these values. This should improve start up time.

Other Ideas

Investigate Native GUI Timer, Pulses and Event Flow

...

There is some annicdotal evidence that suggests using the native GUI timer on OS X improves performance.

Support Instancing in JavaFX / Better Texture Caching

Right now, it is possible to ask FX to cache a node. This causes the node to be represented to a texture on the graphics card and the texture is retained for future draws. This can make drawing of the node much faster provided that the node is not changing and the node is drawn a lot. Turning caching on is not always a win and needs to be done carefully by the application programmer (if at all).

There is evidence that application caching is more performant that system caching. Application code that renders static content to an image and then uses the same image in many different nodes is effectively caching. The image is represented as a single texture and that texture is on the graphics card.

Instancing would allow the application programer to declare that identical nodes are shared in the render tree. This would allow the system to cache and optimize drawing.

Child pages

Versions Compared

Old Version 3

New Version Current

Key