Mew was recently updated to 20110412 last week. These were the major changes:
- Dark Intent’s periodic damage increase is now 1/2/3% per 4.1 PTR patch notes.
- Interrupts are reliable per 4.1 patch notes.
- Simulation is now the default model for all new profiles.
- The Simulator now allows for casting Rebirth/Tranquility durring the encounter.
- Furor and Predator’s Swiftness are now modeled by the Simulator.
- Spell Vulnerability/Haste/Crit Taken are specifiable buffs/debuffs (Mostly used for Tranquility/Rebirth, also affects certain trinkets).
- New Simulator script calls getBerserkBaseDuration(), getEncounterDuration(), getElapsedTime(), and isAutoAttackEnabled()
- New Simulator script actions AUTOATTACK_START, AUTOATTACK_STOP.
- The Simulator now expects the script to manage the swing timer.
- Formulation no longer uses Ferocious Bite at all over 25% mob HP.
- New option that will cause Mew to model Glyph of Shred as +1/1/2 ticks.
- BUG: Change potion cooldown. (Was 60 sec is now 120 sec, no DPS impact for encounter durations worth modeling.)
- BUG: Apply the ToTT multiplier to trinket sourced damage procs. (DMC:H is still sub-par compared to other options.)
- BUG: Stampede buff duration is now always 8 sec regardless of the number of points in the talent.
- Numerous other internal changes, please see the SVN commit logs if you are curious.
A couple more 4.1 changes has not made it in yet but are fixed within SVN: Dark Intent bonus reduced to 3%, and Berserk is off the GCD now. At the moment Yawning is looking at updating the Pivot UI to allow the use of the Mew Bear Simulations while I’m doing some work on improving the backend and writing a Chardev importer.
A few weeks ago, I’ve reported that Glyph of Berserk is now better than Glyph of Tiger’s Fury. We’ve since discovered that we could push a little more DPS by expending energy instead of pooling energy while Tiger’s Fury is up. This increases the value of the Glyph of Tiger’s Fury to the point where it gives slightly more DPS than Glyph of Berserk, but it is all very dependent on the duration of the encounter. Some encounter durations favor Glyph of Berserk, but overall, Glyph of Tiger’s Fury is a slight win. However, I would continue to recommend Glyph of Berserk over Glyph of Tiger’s Fury because of the extra utility it gives during encounters that feature burn phases – and the majority of encounters this tier do.
Recently, I’ve been seeing a lot of questions asked about the Relative Stat Values (RSV) produced by Mew. There has been weird RSV results obtained, and I’ve found that in just about all cases, this is caused by running Mew Simulation with insufficient iterations. The default 10k iterations is sufficient to test if one strategy is better than another, or if changing a piece of gear gives any improvement in performance. However, for the purpose of generating RSVs, I would recommend using 1 million iterations.
When Mew Simulation reports DPS results and RSV results, it gives the corresponding 95% confidence limits for the result too. For example if Mew reports “DPS: 22000.00 +/- 10.0″, it means that we are 95% confident that the true mean lies between 21990 and 22010. Running more iterations will reduce the error and give us a mean value that would be more accurate. Running 10k iterations, we are looking at RSVs having values like “Hit Rating: 0.9500 +/- 0.15″, and you can see that the error is unacceptably large to the point where it is being useless to compare against something like “Crit Rating: 1.05 +/- 0.15″. In order for Crit Rating is better than Hit Rating with statistical significance (95% confidence), the RSV measured for Crit Rating needs to be 0.15 * SQRT(2) = 0.212 more than that of Hit Rating. Increasing the number of iterations by 100 times to 1million will reduce the Error by 10 times, to ~0.015 requiring 0.0212 difference in RSV, which is far more acceptable.
We know that Mastery is the best secondary stat, so that will always be the priority. Haste and Critical Strike Rating are sort of the joint second-best, so with the help of Kurenin, I’ve prepared 3 different E359 BIS profiles: Haste Oriented, Crit Oriented, Balanced. These are the results:
| Haste Oriented | Crit Oriented | Balanced | Error | |
| DPS | 23491.0 | 23562.6 | 23528.8 | +/- 1.2 |
| Agility | 3.023 | 3.021 | 3.021 | +/- 0.016 |
| Strength | 2.266 | 2.280 | 2.274 | +/- 0.016 |
| Mastery Rating | 1.140 | 1.178 | 1.162 | +/- 0.016 |
| Crit Rating | 1.049 | 1.020 | 1.035 | +/- 0.016 |
| Haste Rating | 0.944 | 1.094 | 1.160 | +/- 0.016 |
| Hit Rating | 0.979 | 0.988 | 0.981 | +/- 0.016 |
| Expertise Rating | 0.971 | 0.978 | 0.973 | +/- 0.016 |
The error of +/- 0.016 means we require a difference of 0.023 to confirm (with 95% confidence) that one stat is statistically better than another stat.
The results of the Haste-Oriented Profile are not unexpected – Over-pushing Haste Rating causes the value of Haste Rating to fall even below the values of Hit Rating and Expertise Rating. In the Crit-Oriented Profile, Haste Rating’s value increases and is now better than Crit Rating while still belong lower than Mastery Rating.
It is the Balanced Profile that is giving surprising results. It would have been expected that the Balanced Profile provides the highest DPS, where Haste Rating and Crit Rating should have similar values. Instead Haste Rating’s value has been pushed up to the point of being statistically similar as Mastery Rating, while the overall DPS has dropped to below that of the Crit-Oriented Profile.
This is all rather strange and I have not yet worked out the reason for this. It would probably take a few more profiles with varying levels of Crit/Haste balance to get a better picture, but this will take a lot to do. Furthermore I’ve just realized that the profiles had Glyph of Tiger’s Fury instead of Glyph of Berserk which may affect the results. Also, I’m using a newer (unreleased) version of Mew that has the 4.1 Berserk changes in. I’m also looking into testing E372 profiles too.
So much work, so little time….


Seems you and Leafkiller are on the same page. :)
As I commented on his theorycraft post as well, my (purely anecdotal) observation is that GoB is better than GoTF in real-world situations, both for the burn-phase utility you mention above, and also due to the difficulty in fully maximizing the reduced cooldown of TF. I’ve looked through a few logs on WoL- one would expect cats with GoB to have higher Berserk uptimes (they do), and one would expect cats without GoB to have higher TF uptimes (they don’t.)
I am now officially glad that i took statistics and understand what you are talking about with the confidence intervals
I’m not sure I understand your problem with crit/haste. Even if the interaction of haste/crit is non-trivial due to the split between white/yellow attacks (which don’t work in the same way), you should not forget that 0 haste rating => 0% haste, while 0 crit rating != 0% crit. What may seem balanced to you does not mean it’s balanced in the numbers.
Haste and crit stack multiplicatively: in a world of white attacks, the optimal dps is finding the rectangle with the largest area and the smallest perimeter (which is the square, i.e. crit = haste). Add in the meta gem (which favors crit => more “area” per unit) and the combo point generation (faster/better DoTs and uptimes), and this will shift the calculation in favor of crit, which is what your result is showing, since the break point is closer to 9%/29%. I also suppose that you run simulations with full raid buffs, which means +5% crit, +10% haste, +30% partial haste (heroism), so calculating the “area” just from the numbers will not be easy and will best be left done by Mew :)
BTW you should be careful with your 1 million iterations: while it will increase the stability of the average, the HWHM of the DPS distribution will not improve, which means that you could be optimizing beyond the noise threshold… not exactly useful….
If you’ve read my post, I’ve already mentioned 10k iterations is generally sufficient for DPS comparisons. 1M is pretty much the minimum for RSV comparisons. Not sure what you are getting at.
What I mean with my final paragraph is that when you need to resort to the average on 1m iterations to see a difference, the game you’re playing is optimizing the average DPS in Mew, and not optimizing the DPS in WoW. Ingame you don’t replay the combat 1m times and take the the average, so when Mew reports a 600 standard deviation in the DPS distribution, I have some trouble believing that the 40 (= sigma/15) DPS difference you see between “balanced” and “crit” is of any practical use….
There’s a reason why the article is titled “Mew and RSVs”, not “Mew and DPS averages”.
The “balanced” spec is a bit misleading. It’s clearly not balanced in the value between haste and crit. So it’s not really surprising that the dps is lower.
If you have the time to run many 1M iterations to find your definition of the “balance” point please do so and post the results. Otherwise the balance point defined here is simply somewhere approximately halfway between haste-oriented and crit-oriented profile.
This is a very interesting post, Tangedyn. Thank you for sharing.
We are clearly seeing abnormal behaviour in Mew these days. I already experienced something similar. I simulated someone in near BiS gear. He had far more than double the amount of crit rating than amount of haste rating, yet crit appeared to have a higher RSV than haste. This was with version from 15 February, though.
‘We’ve since discovered that we could push a little more DPS by expending energy instead of pooling energy while Tiger’s Fury is up.’
I thought this always was what we were supposed to do? I have always depleted my energy bar during TFs. Hmm…
‘In order for Crit Rating is better than Hit Rating with statistical significance (95% confidence), the RSV measured for Crit Rating needs to be 0.15 * SQRT(2) = 0.212 more than that of Hit Rating.’
Why exactly do you multiply the received error with the square root of 2?
Is the balanced one balanced in terms of rating? I think it is, but I expected it to be balanced in terms of RSV. As you already know, balanced in terms of rating is not the same as balanced in terms of RSV. If crit and haste were balanced in terms of RSV it should yield a higher DPS than the crit oriented one. It’s still very interesting how frighteningly close haste comes to mastery – definitely not what one would expect to ever happen.
Yeah, we know we should do that. It was an oversight left out of the older default strategy script.
http://en.wikipedia.org/wiki/Student%27s_t-test
It’s balanced in term of rating, simply because at the moment it’s too much time and work for me to try to find the RSV balance point on my own.
From the results gathered, we know that it’s not a simple relationship where more crit leads to lower value of crit and higher value of haste.
Thank you for the reply.
If you want, I would be glad to help out. I could run 1M iteration sims to the point where I get the RSVs for crit and haste to be balanced, and hand the results to you.
If I were to do that, however, I would need to know stats, talents, buffs, model parameters etc. and a place to hand you the results, like a PM through MMO-C, for instance.
Thanks for the offer. I’m working with Yawning on getting an official feedback channel for Mew up, probably a google group. That would probably be the most conducive place for collaboration with other testers.
I’ll get back to you as soon as possible. I’ll try to get another release of Mew out soon so we have something common to work on. If you like though, you could use the profiles I have linked above but use Glyph of Berserk instead of TF. For the 1 million iterations, I generally turn off “High Frequency simulation”.
Sounds great to create an official feedback channel for Mew. It’s a very good idea.
If you are going to release Mew very soon I think it’d be smarter to wait for it. But then again, it depends on how long ‘soon’ is.
Probably a day before 4.1 is released
Is the difference between formulation and simulation so far off that you can’t use the formulation to find the balance point, then run the simulation once?
Thank you once again for putting in the extra effort to keep us all at the top of our game!
The values from Formulation and Simulation are fairly close, but not close enough for the purpose of finding the balance point. There are a lot of small things that the Formulation doesn’t take into account, like energy capping, energy pooling, DoT clipping, and more, all of which are very hard to Formulate accurately.