(Need to supply some code samples that show the problem and the solution)
Performance Areas:
- General
- Turning data into class files
- Binary CSS
- Use fewer nodes
- Describe the process that occurs for every node:
- Picking
- Sync
- Bounds computation
- Computing dirty regions
- Visits when laying out
- Visits when applying CSS
- Embedded maybe 1000 nodes max, Desktop 10x that
- Probably 200 on embedded, 20,000 on Desktop should be very fast
- Less memory overhead
- Each Node concurrently costs 5-7k
- In part because of state copy for multi-threading
- Each Node concurrently costs 5-7k
- Describe the process that occurs for every node:
- Execute less code
- Reduce method calls
- Sometimes this is "free", and on desktop it doesn't matter as much, but embedded (and iOS) makes this more important
- Use FXCollections API
- Avoid structural changes (compute bound)
- Lots of work to discover if a SG change is "legal", such as avoiding loops etc.
- Visibility changes
- Presently we do a downward tree-walk to set "treeVisible" state on each node
- Threading
- Do minimal work on the main FX thread
- Can create (most) things on a background thread
- Graphics
- Need document to specify how Prism works
- What OpenGL gets generated
- What is the path used for rendering
- What do we do when we need to render images
- Text, text measurement is always slow.
- Images are faster than shapes
- Shapes require rasterization (in CPU!) and passing a mask to the GPU
- Bounds of the shape to be rasterized has a big impact on performance
- Images that are not changing are about the fastest thing in the universe
- To cache, or not to cache? That is the question!
- Every time the thing being cached changes, it is very expensive to redraw it.
- But reusing the baked image a zillion times is faster than redrawing!
- Load smaller images when you don't need a full sized image
- Loading a single 12mega-pixel image on PI can run out of memory
- Memory management flags for texture management should be API
- PI Specific: Configure platform to split between GPU / Main memory
- Limit the use of things that require rendering into intermediate textures (such as non-axis aligned rects or random shapes or Node types for clip, opacity on things that aren't leaf nodes, effects, blend modes)
- Look at fill rate
- Reduce overdraw
- Be aware of how dirty regions work (maybe opportunity to turn it off?)
- Explain how region image caching
- When do we do it, and when do we not
- Need document to specify how Prism works
- Animation
- Specify the use of cache hints during animations
- DEFAULT
- QUALITY
- ROTATE
- SCALE
- SCALE_AND_ROTATE
- SPEED
- Under certain circumstances such as animating nodes that are very expensive to render, it is desirable to be able to perform transformations on the node without it having to regenerate the cached bitmap. An option in such cases is to perform the transforms on the cached image itself, using one of the CacheHints.
- Specify the Interpolator for a KeyValue in a Timeline
- Using Interpolator.EASE_BOTH can provide a much nicer looking animation that looks smoother to the eye than the default LINEAR.
- Specify the use of cache hints during animations
- UI Controls
Smart customization: no need to extend from the Control class:
- Need more functionality than provided by an existing control
- Not necessary to provide as a library control
- Extend Layout container and CSS styles
- CSS
- Avoid selectors that have to match against the entire set of parents
- Use stylesheets not setStyles
- Use pseudo-class state, not multiple style classes, for state-based styles (FX 8)
- FXML
- Proper use of FXMLLoader#load
- The load() method of FXML loader is a tricky method. Make sure you are not calling the static one. The next code is wrong – it calls static method load() which has nothing common with your fxmlLoader object:
- Proper use of FXMLLoader#load
FXMLLoader fxmlLoader = new FXMLLoader();
Parent obj = fxmlLoader.load(getClass().getResource("MyApp.fxml"));
Object invalidController = fxmlLoader.getController();
Correct way:
FXMLLoader fxmlLoader = new FXMLLoader(getClass().getResource("MyApp.fxml"));
Object obj = fxmlLoader.load();
Object myController = fxmlLoader.getController();
How to find performance problems ...
Random (need sorting):
- Manual state sorting in the scene graph
- Making a fast Cell?
- Animation: Control bitmap caching
- Animation: use new API for smooth animations (text rendering hints etc)
Pulse logger should keep track of which cached images are invalidated when.
From John Smith:
The biggest thing when doing performance work is identifying the benchmarks. Once we know what we're measuring, it *will* get faster.
This is one of the most difficult things I found about trying to code performance sensitive stuff for JavaFX. It's the not knowing part of it.
JavaFX features high level features such as effects, css and animation and it's hard to know where performance bottlenecks will be without trial and error.
For instance, you can draw hundreds of thousands of lines really quick, but if you try to draw a path with more than 10000 elements, things start rendering slow - so you can speed your rendering up by using lines rather than paths, but unless you know that or try it, you might get stuck.
Or another instance is selecting the wrong pixel format for a WritableImage can kill the performance of trying to animate a video by twiddling the image's pixels because the frame-rate drops an order of magnitude without the right pixel format.
I also find it hard to know the impact of something like effects or CSS on the GPU or battery life, because it is pretty difficult for me to objectively measure those kind of things.
Should I set up lots of parallel animations, or am I better off having a central pulse style system which does everything on a tick? Without knowing how animations are implemented, e.g. if they use their own thread or if they incur a bunch of other overheads, it's hard to make an objective decision about that.
If your application uses WebView with intensive JavaScript, then you are better off using a 32 bit jvm on windows rather than a 64 bit one, because one will use a JIT JavaScript compiler and the other won't.
The type of effects used make a large performance difference. For example, boxblur is a whole lot quicker than a gaussianblur - as it's supposed to be I guess from reading the wiki pages on what the algorithms incur.
Buffering of canvas commands, and a subsequent pause while initially rendering a canvas with lots of commands can introduce pauses to the application that most api users aren't going to know about until it starts occurring.
I am sure there are many more similar performance impacting tradeoffs which could have been listed here that I don't really know about or understand (such as whether I should rely on dirty region heuristics or node cache hints to optimize rendering performance or should I just snapshot the nodes myself and use the snapshot rather than relying on a platform optimization).
I think a lot of the above is just the nature of JavaFX and goes with the territory - it's a relatively high level library which abstracts you from some of the low level implementation details so that the true cost of some of your api usage choices are hidden from you.
Did we try turning cache to true and cache hint to SPEED?
A simple game I wrote used some basic animation of about 50 nodes with effects applied to them (translucent, blur, sectioned viewports into a large Image) and without caching ran painfully slowly. Setting caching to true and using cache hints (just on the animated nodes) made a massive performance difference (on a macbook air), it was the difference between a game which was playable and a game which was not (i.e. framerate did not drop to single digits and the air's fan didn't spin up).
Turning on caching for select nodes was the easiest and single biggest performance improvement I got for the game.
We had an embedded hack-fest a couple weeks ago in which performance on desktop went from 320-800+fps on table view scrolling, which in large measure came down to reducing the number of state switches on the graphics card (and the resulting decrease in the number of OpenGL calls).
I realize the above statement is to do with internal optimizations, but should I, as a user of the JavaFX API ever have to worry that the way in which I write my user code may result in something like an increased number of state switches on the graphics card?
Or should I just be able to ignore that kind of stuff as an implementation detail, kind of like when I drive my car, I press the accelerator and it goes and I don't really need to worry much about how that happened?
The difficulty for me here is that I don't know what a state switch on a graphics card is and have no way of knowing whether a particular code path is triggering a lot of switches.
It seems like an aim for JavaFX is to not require the developer be a low-level mechanic to make things work.
We should never require you to have to follow a 15 point performance plan just to get acceptable performance, or to avoid choppiness
Nevertheless, an official performance guide would be useful (like Android's: http://developer.android.com/training/best-performance.html).
I put together a short (and necessarily incomplete) guide as part of an answer to a stackoverflow question on obtaining good performance with JavaFX:
http://stackoverflow.com/questions/14467719/what-could-be-the-best-way-to-view-millions-of-images-with-java
Thoughts and questions from Zonski:
- reduce use of effects. If the effect is static use images instead of
effect
- In javafx 1.3 it would cost more to use stroke instead of a fill. For
instance if you have a rectangle with a stroke with would be more efficient
to draw 2 rectangles with a fill, so the other one would be used to produce
the stroke. Don't know if this still applies.