Closed Bug 857817 Opened 11 years ago Closed 11 years ago

[NBV] Consumes all client resources when load is slightly increased

Categories

(Core :: Layout, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla23
Tracking Status
firefox21 --- verified
firefox22 --- verified
firefox23 --- verified

People

(Reporter: retornam, Assigned: mattwoodrow)

References

()

Details

Attachments

(2 files)

STR:
Load http://66.135.48.43/index_load_test.html?event_interval=2000 in Firefox, Chrome or Opera
After a few minutes it either takes down the browser or leads to the browser's process taking up 100% of CPU when you check the Activity monitor on your Mac
Summary: [NBV] Favicon.ico is missing/404 → [NBV] Consumes all resources when load is increased
Cmore and I tested this in together in Chrome again using a 200 millisecond interval and 10 tabs open to simulate ~ 10 users on his machine. My Chrome instance averaged 80% CPU usage and 14 threads 

For Firefox he couldn't open more than  3 tabs at 200 millisecond intervals. My Firefox instance averaged 102% CPU usage and 27 threads.
Aubrey: Are there specific webkit optimizations that you are using or is Chrome just rendering the animations more efficiently?
Assignee: nobody → misteranderson
If 3-10 people visiting this page makes a visitor's CPU peg at 100% and/or crash the browser, this is not ready to even share with 700+ employees at a MoCo meeting.
Summary: [NBV] Consumes all resources when load is increased → [NBV] Consumes all client resources when load is slightly increased
@cmore, we don't have any specific optimizations happening for webkit, that renderer is just taking better advantage of hardware-acceleration when CSS asks for "translate3d()" style transforms. i have pointed out FF's handling of this on a couple other bug threads, but i'm not sure if it is on anyone's radar per se.  FF looks to me like its falling into some kind of GPU acceleration emulation, but i'm not basing that on anything other than observation.

IMPORTANTLY, there was an embarrassing bug present yesterday in the client-side rate-limiting (it was, ahem, commented out because i was testing something else and spaced re-enabling it). the run-away resource use should now be much curtailed. we still give the CPU a workout on FF, but not to the crash-tacular extent you saw.

please clear cache and give it another go.
Thanks for the update. We will re-run some tests and see the results.
From what I have read, Firefox is hardware rendering everything by default. I notice a Google Chrome Render process per tab that are each taking up a percentage of the CPU on top of the main Chrome process.

http://stackoverflow.com/questions/9068132/why-arent-browsers-smart-enough-to-hardware-accelerate-without-tricks

I cleared cache re-ran a load test by myself and it doesn't seem much better in Firefox. I can try it again tomorrow with Raymond with the same test that we ran today.
I cleared my cached and tested again with Chris Firefox was averaging 93% and 33 Threads. This reduced to 70% and 32 threads after Chris closed all his tabs
Chrome seems to be decent and Firefox uses almost all of the CPU with only 2-3 visitors clicking at 200ms intervals. If we are going to launch this at a MoCo all hands, nearly everyone is going to be on Firefox.

What can we do to address the Firefox speed issues?
FWIW, client-side performance will get a significantly better when the visual design is locked and assets / layout can be optimized.  the version we are load testing on is not optimized in any way as we had intended to just get a jump on load testing server-side while the visuals were still coming in.

the main stall as i understand it is that Sean Martell has been out with pneumonia and Barry is also out this week (perfect storm?).  i think we'll get unblocked on that front early next week.

i will optimize all i can in the interim though and focus on FF rendering optimization, specifically.
Can you just over compress the images for now even if they are not ideal and not the final ones? I want to get to the root of the the Firefox issues since that is the target audience. If Firefox can play Unreal (https://blog.mozilla.org/blog/2013/03/27/mozilla-is-unlocking-the-power-of-the-web-as-a-platform-for-gaming/) with all of the animations in JS, you would think it could handle NBV. 

Also, can you post a link to the github repo that was supposed to be shared by this past Wednesday?
Ahoy,

FWIW, visual design is now locked and so the first round of Gecko rendering optimizations are in the latest build at http://66.135.48.43

Clear cache and hit it, you'll notice that there is a pretty vast speed improvement when we remove the 3D positioning for the globe (I'm seeing about a 60% CPU improvement using translate() versus translate3d() for a complex layout)

The Unreal experiment is a great case in point, actually!  In that example, they are rendering into a WebGL canvas with JavaScript calling for redraws and it just screams.  This makes me think that there is not a bottleneck in talking to the GPU, and it's somewhere up the stack in the way the CSS instruction transform-style: preserve-3d is handled.

What we are trying to do in NBV is render html elements (DIVs in this case) with 3D positioning.  In Gecko when we position elements this way, using a combination of translate3d, rotateX, rotateY, and the all-important transform-style: preserve-3d, what we see is a massive munch on the CPU to render these elements initially, and again to animate them or perform complex masking in the renderer (as in the case of a DIV with overflow:hidden and border-radius: 100%, we're asking to just render the visible area in the now-circular DIV).

In WebKit, what we see when we position things this way is that the rendering for the elements is directly hardware accelerated and passed on to the GPU with very little additional CPU overhead generated.  It is essentially the same as drawing into a 3D canvas context with WebGL.

The render on Gecko is doing *something* in there which is causing our additional overhead.

I'll be happy to pluck out a couple simplified test cases if that would help!

The source is also up on GitHub, let me know if you didn't see that link.
I can't reproduce the actual problem, but there is a way you can help, and there is also a very obvious problem here.

First, can you please use Tools -> Profiler and attach a profile here so we can see whats happening here (I am unable to reproduce, your steps are not very clear to be honest).

Second, running the site I see the scroll region fluctuate wildly (look at the scroll bars, they keep moving around). As long thats the case, the CPU has to constantly reflow the page and we can't rely on the GPU for acceleration. Please change the site such that the scroll region doesn't change (nothing moves off screen). That alone might fix the problem for you.
roc, dbaron, I really think we should reconsider our decision to force a reflow to update the scrollable region when an active layer moves off screen. I think we should simply clip and ignore the change to the scrollable region the way WebKit does it. I know its not standards faithful, but the current behavior seems not really useful (bouncing scrollbars are useless), and people keep running into this performance issue. This is not the first time we see this. What do you think?
I have attached a simple, isolated test for you guys to check out.  I have left you some comments in the CSS with things you additionally test.  Just comment the lines I refer to in index.html starting around line 71 and refresh your browser to see the various buggy layout behaviors.  PrefixFree is included in this example for ease in browser switching.

Performance-wise, in FF, you'll see that just spinning the globe with nothing else going on takes up about 90% of your CPU and the frame rate is 8-12 FPS.  If you open this same code in Chrome, you'll see it uses very little CPU (about 15% for me) and the frame rate is very smooth, indicating we're probably getting the right flavor of hardware acceleration there.

This also demonstrates another odd bug with transform-origin that apple webkit has (e.g. Safari) where it does not plot the transform origin to the same place as FF and Chrome do, given the same instructions.

Hope this helps and let me know if you have questions!
Attachment #736139 - Attachment mime type: application/octet-stream → application/java-archive
(In reply to Andreas Gal :gal from comment #13)
> roc, dbaron, I really think we should reconsider our decision to force a
> reflow to update the scrollable region when an active layer moves off
> screen. I think we should simply clip and ignore the change to the
> scrollable region the way WebKit does it. I know its not standards faithful,
> but the current behavior seems not really useful (bouncing scrollbars are
> useless), and people keep running into this performance issue. This is not
> the first time we see this. What do you think?

Agreed that bouncing scrollbars aren't useful, but not showing scrollbars when we should be can be bad in other situations. I think a good option would be to throttle the rate at which we process reflows triggered by UpdateOverflow so scrollbars will appear reasonably soon if not right away.

In this case that's not the problem; at least, the testcase in comment #14 uses overflow:hidden so shouldn't be hitting any of that.

Matt, can you look at this? 3D transforms and preserve-3d goodness :-)
Assignee: misteranderson → matt.woodrow
Also I would suggest that WebGL is probably a better solution for spinny globes than CSS 3D transforms.

jar:https://bug857817.bugzilla.mozilla.org/attachment.cgi?id=736139!/gpu-test/index.html
Using 24 DIVs for polygons and masking to a circle is a clue that this is a bit of a hack :-)
roc, I'm super glad you bring this up!  There were a couple concerns when we were looking at WebGL rendering for this: 

The first is that a big part of this application's underlying message is, "You can do it, the Web is for everyone."  The technical barrier for Canvas + WebGL is high, even with a helper library supporting you.  The barrier to "do it declaratively" is much lower, even if you have to take some non-traditional steps to solve a given problem.  We wanted to have the "view-source" experience for this not to lead to a canvas tag and 2500 lines of JavaScript, but instead to show you markup and CSS properties you could tweak in real time in Firebug and think, "Wow, I didn't know markup could do this."  One person's hack might be another person's revelation, depending on what they are trying to accomplish with a given level of technical know-how!

The second is portability.  What are the common denominators for web runtimes on iOS, Android, and FFOS today?  If we went the WebGL route, iOS devices couldn't play or we'd have to build it a second time this way anyway, which again sends a bit of the wrong message, IMO.

From my relatively outside perspective, I look at a lightning-fast rendering pipeline from markup and CSS to (mobile) GPUs as a particularly killer feature for FFOS and FF alike!
Those are good points, but we should be careful about chaining ourselves to what iOS Safari can do. Apple is semi-intentionally crippling the Web on iOS Safari by not supporting WebGL and other new Web platform features.
preserve-3d means that multiple frame's transforms are combined into a single ContainerLayer. The existing AreLayersMarkedActive implementation only checked one of these frames, and missed if the bit was set on one of the others.

This was causing our scale clamping code to think the transform was static, and never rounded the scale factors.

This resulted in the scale factors changing every frame, and triggered us to redraw all the images frame.

I get steady-ish 60fps with a debug build with this patch.
Attachment #737327 - Flags: review?(roc)
Comment on attachment 737327 [details] [diff] [review]
Make AreLayersMarkedActive aware of preserve-3d

Review of attachment 737327 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +4585,5 @@
>      static_cast<LayerActivity*>(Properties().Get(LayerActivityProperty()));
> +  if (layerActivity && (layerActivity->mChangeHint & aChangeHint)) {
> +    return true;
> +  }
> +  if (aChangeHint & nsChangeHint_UpdateTransformLayer &&

parens around &
Attachment #737327 - Flags: review?(roc) → review+
https://hg.mozilla.org/mozilla-central/rev/8aeb4de5f470
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
@mattwoodrow kick ass!  i am so psyched about this fix!

will this be available in the nightlies this week?

a
Yes, it should be in tomorrow's nightly.

We can request approval get it uplifted to aurora and beta as well if required.
(In reply to Matt Woodrow (:mattwoodrow) from comment #25)
> Yes, it should be in tomorrow's nightly.
> 
> We can request approval get it uplifted to aurora and beta as well if
> required.

Sweet! I love when projects like this expose an edge-case that feeds back into improving the product. Nice job, everyone!
Component: Other → Layout
Product: Websites → Core
Comment on attachment 737327 [details] [diff] [review]
Make AreLayersMarkedActive aware of preserve-3d

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: much-too-slow performance with some cases of CSS 3D transforms
Testing completed (on m-c, etc.): just landed
Risk to taking this patch (and alternatives if risky): very contained patch, low risk
String or IDL/UUID changes made by this patch: none
Attachment #737327 - Flags: approval-mozilla-beta?
Attachment #737327 - Flags: approval-mozilla-aurora?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #27)
> Risk to taking this patch (and alternatives if risky): very contained patch,
> low risk

If approved by triage tomorrow afternoon PT, this would go into our fourth beta of six. Is there anything QA should be regression testing (outside of verifying the test case above)?

How might a regression pop up its head? New stability issue, web regression when doing X, etc.
Attachment #737327 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Alex Keybl [:akeybl] from comment #28)
> If approved by triage tomorrow afternoon PT, this would go into our fourth
> beta of six. Is there anything QA should be regression testing (outside of
> verifying the test case above)?

Anything with 3D transforms basically.

> How might a regression pop up its head? New stability issue, web regression
> when doing X, etc.

Hard to predict. Performance issue or crash on some particular Web page(s) I guess ... hard to say since it seems so safe.
Comment on attachment 737327 [details] [diff] [review]
Make AreLayersMarkedActive aware of preserve-3d

The patch is low risk and looks to be self contained so its OK to uplift.
We still have a couple of beta's and may do an immediate backout if we get any unusual reports or due to any stability impact this may have.

I will also request QA to help us test a few websites which involving 3D transform along with URL in the description.
Attachment #737327 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Keywords: verifyme
Using the testcase attached to this bug:
* Firefox Nightly 23.0a1 2014-04-11 uses ~97% of my CPU
* Firefox Nightly 23.0a1 2014-04-18 uses ~53% of my CPU
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla23
(In reply to bhavana bajaj [:bajaj] from comment #30)
> I will also request QA to help us test a few websites which involving 3D
> transform along with URL in the description.

Please advise on other websites we can use to test once this lands in a Beta.
Here are several tests and examples which really push 3D CSS rendering. To get a performance baseline, compare and contrast rendering performance with some flavor of webkit.

http://daneden.me/animate 

http://www.keithclark.co.uk/labs/3dcss/demo/ and http://www.keithclark.co.uk/labs/css3-fps/

http://www.movikantirevo.com/#ladder

http://famo.us (you'll need to spoof the User Agent because it currently doesn't currently try to render in Gecko)

http://www.paulrhayes.com/experiments/cube-3d/touch.html

http://www.paulrhayes.com/experiments/sphere/
(In reply to misteranderson from comment #34)
> Here are several tests and examples which really push 3D CSS rendering. To
> get a performance baseline, compare and contrast rendering performance with
> some flavor of webkit.

Which flavor? Not 2.3 :-P. Not so much ICS. Isn't the gold standard iOS-latest? Or is Android 4.3 about at par finally?

/be
For devices, i still count iOS-latest as the gold standard for hardware accelerated markup and CSS rendering (exclusively in mobile safari, of course).  We see Chrome on Android rendering the same code about 1/2 the speed of mobile safari, plus the lazy touch interface which combined still make that a challenging runtime on which to make beautiful things.

Most of the links I reference above will give you a baseline on Desktop webkit: Chrome-latest being the best standard overall, with Safari on OSX still rendering the same markup and CSS about 15-30% faster because of their proprietary core animation hooks.
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (X11; Linux i686; rv:21.0) Gecko/20100101 Firefox/21.0

Using the testcase attached to this bug, I've encountered the following:
On Mac OSX 10.8.3, Firefox 21 beta 3 uses 101% of my CPU and it hangs. With Firefox beta 4 it uses 43%. On Chrome, Safari and Opera it uses maximum 11% of my CPU.
On Ubuntu 12.10 (32 bit), FF 21 beta 3 uses 176% and FF 21 beta 4 - 56%.
On Windows 7 (64 bit), FF beta 3 uses 39% and FF beta 4 - 0.5% of my CPU; Chrome, Safari and Opera use maximum 0.2%. 

Using the urls from comment 34, the CPU usage is much lower than the attached testcase on all platforms; I've encountered no hangs.

Based on this results, marking FF 21 as verified.
Does this mean this change will land on May 14th when Firefox 21 moves from beta to release channel? If so, Barry/Pete/Aubrey: you would technically be unblocked with proceeding with this project now that Firefox is happy.
(In reply to Chris More [:cmore] from comment #38)
> Does this mean this change will land on May 14th when Firefox 21 moves from
> beta to release channel? If so, Barry/Pete/Aubrey: you would technically be
> unblocked with proceeding with this project now that Firefox is happy.

The change is already landed in Firefox 21 so yes, it will be fixed in release when Firefox 21 is released.
That's fantastic news!  I'll put in a check to show the 3d globe on FF 21+
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0

Verified as fixed on FF 22 beta 2 (Build ID: 20130514181517). The performance results are similar to the ones from comment 37. No hangs; no crashes.
Also verified on:
Mozilla/5.0 (X11; Linux i686; rv:22.0) Gecko/20100101 Firefox/22.0
Mac OS X 10.8: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0
Verified as fixed on FF 23 beta 2 using the links from comment 34 and the attached testcase. (Build ID: 20130701144430).

User agents:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0
Mozilla/5.0 (X11; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: