(Need to supply some code samples that show the problem and the solution)

 

Performance Areas:

 FXMLLoader fxmlLoader = new FXMLLoader(); 
Parent obj = fxmlLoader.load(getClass().getResource("MyApp.fxml"));
Object invalidController = fxmlLoader.getController();

Correct way:

 FXMLLoader fxmlLoader = new FXMLLoader(getClass().getResource("MyApp.fxml")); 
Object obj = fxmlLoader.load();
Object myController = fxmlLoader.getController();

How to find performance problems ...

Random (need sorting):

Pulse logger should keep track of which cached images are invalidated when.

From John Smith:

The biggest thing when doing performance work is identifying the benchmarks. Once we know what we're measuring, it *will* get faster.


This is one of the most difficult things I found about trying to code performance sensitive stuff for JavaFX.  It's the not knowing part of it.  

JavaFX features high level features such as effects, css and animation and it's hard to know where performance bottlenecks will be without trial and error.  

For instance, you can draw hundreds of thousands of lines really quick, but if you try to draw a path with more than 10000 elements, things start rendering slow - so you can speed your rendering up by using lines rather than paths, but unless you know that or try it, you might get stuck. 

Or another instance is selecting the wrong pixel format for a WritableImage can kill the performance of trying to animate a video by twiddling the image's pixels because the frame-rate drops an order of magnitude without the right pixel format.

I also find it hard to know the impact of something like effects or CSS on the GPU or battery life, because it is pretty difficult for me to objectively measure those kind of things.  

Should I set up lots of parallel animations, or am I better off having a central pulse style system which does everything on a tick?  Without knowing how animations are implemented, e.g. if they use their own thread or if they incur a bunch of other overheads, it's hard to make an objective decision about that.  

If your application uses WebView with intensive JavaScript, then you are better off using a 32 bit jvm on windows rather than a 64 bit one, because one will use a JIT JavaScript compiler and the other won't.

The type of effects used make a large performance difference.  For example, boxblur is a whole lot quicker than a gaussianblur - as it's supposed to be I guess from reading the wiki pages on what the algorithms incur.

Buffering of canvas commands, and a subsequent pause while initially rendering a canvas with lots of commands can introduce pauses to the application that most api users aren't going to know about until it starts occurring.

I am sure there are many more similar performance impacting tradeoffs which could have been listed here that I don't really know about or understand (such as whether I should rely on dirty region heuristics or node cache hints to optimize rendering performance or should I just snapshot the nodes myself and use the snapshot rather than relying on a platform optimization).

I think a lot of the above is just the nature of JavaFX and goes with the territory - it's a relatively high level library which abstracts you from some of the low level implementation details so that the true cost of some of your api usage choices are hidden from you.  

Did we try turning cache to true and cache hint to SPEED?


A simple game I wrote used some basic animation of about 50 nodes with effects applied to them (translucent, blur, sectioned viewports into a large Image) and without caching ran painfully slowly.  Setting caching to true and using cache hints (just on the animated nodes) made a massive performance difference (on a macbook air), it was the difference between a game which was playable and a game which was not (i.e. framerate did not drop to single digits and the air's fan didn't spin up).  

Turning on caching for select nodes was the easiest and single biggest performance improvement I got for the game.

We had an embedded hack-fest a couple weeks ago in which performance on desktop went from 320-800+fps on table view scrolling, which in large measure came down to reducing the number of state switches on the graphics card (and the resulting decrease in the number of OpenGL calls).


I realize the above statement is to do with internal optimizations, but should I, as a user of the JavaFX API ever have to worry that the way in which I write my user code may result in something like an increased number of state switches on the graphics card?  

Or should I just be able to ignore that kind of stuff as an implementation detail, kind of like when I drive my car, I press the accelerator and it goes and I don't really need to worry much about how that happened? 

The difficulty for me here is that I don't know what a state switch on a graphics card is and have no way of knowing whether a particular code path is triggering a lot of switches.

It seems like an aim for JavaFX is to not require the developer be a low-level mechanic to make things work.

We should never require you to have to follow a 15 point performance plan just to get acceptable performance, or to avoid choppiness


Nevertheless, an official performance guide would be useful (like Android's: http://developer.android.com/training/best-performance.html).

I put together a short (and necessarily incomplete) guide as part of an answer to a stackoverflow question on obtaining good performance with JavaFX: 
http://stackoverflow.com/questions/14467719/what-could-be-the-best-way-to-view-millions-of-images-with-java

 

Thoughts and questions from Zonski:

1) turning cache to true and cache hint to SPEED  I haven't seen explained what the drawbacks are to doing this. I assume there is some trade-off for turning this on otherwise it would be on by default?
2) AnimationTimer - how is this related to transitions (like TransitionTimer). They seem to be two very different things - not just at the usage level but the actual underlying performance and implementation - but it's not clear to me exactly the difference. I wanted to work with both AnimationTimer stuff and Transitions in a consistent way (e.g. pause one thing to pause my whole app). When trying to use the API my instinct was that the transitions used an underlying AnimationTimer, so I was looking for a way to setAnimationTimer on the transitions, so they all shared the same instance. I'm obviously seeing it wrong, but it's not clear to me why these things are so different. 
3) If I have an app with multiple views (say 20 or 30) and I want to animate between them (slide in, out, etc). Should I have all the views already added to the scene and then just move them off-center (and/or make them invisible), or should I add and remove them at the start and end of each animation. I'd assumed that adding and removing was the way to go for performance, is this correct? The question really is: If a node is in the scene graph but not visible does it add much overhead to rendering, picking, etc? Doing something like Mac's Mission Control would be much easier if I didn't have to add/remove and could just zoom, but is this a bad idea from performance perspective?
4) Is there much of a hit in building a view full of nodes and then throwing them away and rebuilding on each showing of that view? In my apps (browser style) I would build all the views on startup and then reuse these by loading different data into them (e.g. the same instance was used for showing Person1 details as Person2, just loading different values into it). I thought this would be the best from performance, but there was a comment from Richard a while back that suggested it was perfectly ok/performant to throw away and build the page again on each showing? This would be much easier from an animation point of view as animating the transition 'back' or 'forward' between two pages of the same view is problematic when you are reusing the same view for both! 
5) Is there much overhead to a view that it is in memory but not actually added to the scene. i.e. if I do end up building a new instance of the view for every "page load" (as per question 4), then it would be extra handy to just keep those views in memory so when the back button is hit I don't have to rebuild them. I assume the memory would build up pretty quickly if there were large tables, images, videos, etc? How does a web browser manage to cache all these pages in history and would the same approach be performant for JFX?
From Pedro:
- use layouts whenever possible instead of binds for laying out nodes
- reduce use of effects. If the effect is static use images instead of
effect
- In javafx 1.3 it would cost more to use stroke instead of a fill. For
instance if you have a rectangle with a stroke with would be more efficient
to draw 2 rectangles with a fill, so the other one would be used to produce
the stroke. Don't know if this still applies.