Jump to content

Crimean Archivist

Member
  • Posts

    91
  • Joined

  • Last visited

Posts posted by Crimean Archivist

  1. I actually thought that Conquest 15 was quite interesting. It didn't really make things all that more difficult, but it did make it more think things through a little more.

    The shared HP pool for each character made me think a little more about each turn.

    This is actually my favorite gimmick so far. I'm not a fan of Conquest 24. The saving grace of that chapter is that Setsuna doesn't have a proc.

    I agree that many map gimmicks just extend the turn count, without really changing much about how you go about it. I didn't struggle much with the foxes, I just wanted them to get out of the way. One thing that helped there was that I noticed that each unit will only remain an illusion for one turn at a time (at least when they're within range) so that sped things up a bit.

    Most gimmicks only got me once, or I played with them for a bit to figure them out and then reset so I could exploit my new knowledge. It took me a while to learn the wind and it's still throwing me off a bit in Revelation 9, but I'm getting there. The overwhelming majority of the gimmicks don't even matter much if you remember the fundamentals.

  2. Nearly 13000 data points. 83 has fallen outside of the confidence interval for the 3A+B model (measured value 96.2, sample size 186), although just barely. However, it looks like it may be another fluke, as the points on either side of it, as 82 and 84, which are both higher-confidence values, have lower averages and fit more cleanly.

    It might be dynamic, but from my understanding of a dynamic model, it doesn't change the expected value, only the standard deviation. I'm not sure why there's so much flux between certain points, and I'm not certain how to take this F(A, B, z) = P and make it into something useful. This has only become more complicated as it's continued and it's maddening.

    I'm pretty certain with 13000 points, we're far enough along to say something significant about the system, if only we know where to look. Here, I'm sharing my current data set with everyone in hopes that maybe someone will notice something useful.

    Edit: I have something. All weighted models have a linear region where the slope is (A+B)/A from point to point. Using Excel's SLOPE function on a large enough number of points with high confidence in the linear region should produce a line of best fit for that region. I need a couple more high-confidence values between 50 and 70 (we'll ignore the low region for now) and we should be good to go.

    Edit 2: Here are the linear regions for different models:

    (3A+B)/4 -- 25 to 75

    (2A+B)/3 -- 33 to 66

    (4A+B)/5 -- 20 to 80

    I think (3A+2B)/5 is effectively out at this point and (A+B)/2 doesn't have a linear region.

    On the values from 55 to 75 (varying confidences, many points with sample size <100), SLOPE returns a line of P = 1.2899*Z, very close to (3A+B)/4's slope of 1.3333. That's promising. Will keep testing, but I've got a good feeling about that.

    Hit Value Hits Misses
    1 0 5
    2 0 7
    3 0 5
    4 1 2
    5 1 4
    6 1 6
    7 0 4
    8 2 8
    9 16 188
    10 2 6
    11 0 4
    12 1 1
    13 0 6
    14 4 8
    15 2 11
    16 5 15
    17 18 89
    18 0 8
    19 1 7
    20 4 10
    21 5 12
    22 25 88
    23 40 143
    24 3 9
    25 5 9
    26 2 13
    27 152 361
    28 2 19
    29 8 11
    30 6 14
    31 28 87
    32 10 20
    33 11 15
    34 2 12
    35 7 18
    36 110 212
    37 8 20
    38 17 15
    39 11 12
    40 11 18
    41 93 120
    42 14 23
    43 20 25
    44 17 19
    45 33 33
    46 53 75
    47 23 29
    48 30 43
    49 136 127
    50 37 31
    51 26 41
    52 47 29
    53 39 35
    54 31 28
    55 106 77
    56 178 128
    57 59 38
    58 44 23
    59 74 38
    60 58 19
    61 81 46
    62 69 27
    63 208 88
    64 67 23
    65 91 39
    66 507 195
    67 257 61
    68 68 14
    69 82 28
    70 133 42
    71 104 25
    72 102 17
    73 101 20
    74 218 40
    75 350 74
    76 199 36
    77 144 28
    78 208 40
    79 175 15
    80 136 21
    81 174 11
    82 413 42
    83 179 7
    84 213 14
    85 237 11
    86 181 6
    87 226 13
    88 179 6
    89 237 9
    90 163 9
    91 195 9
    92 163 1
    93 211 4
    94 464 4
    95 244 4
    96 175 5
    97 221 4
    98 225 1
    99 217 1

  3. How would we even go about testing for a dynamic system?

    We'd take the data we already have and start checking pairs. We expect say 67 * 1.1 percent of the trials following a miss at 67 to be hits. Reapply to many hit chances and look for some kind of pattern.

    The damning factor would be whether resets affect the dynamic adjustment cache. If resetting the game resets the cache, then a lot of our currently collected data is worthless for testing dynamic hit rates. We'd have to go into My Castle battles and just continually farm until we gathered a substantial amount of data, then check those using the method above. It'd be a task unto itself.

    Progress on my end is going to be slow over the next few days; I have a lot of work due. I'll make an update if any significant advances occur.

  4. I tried to make a generalized integral in the vein of what Dark Holy Elf did for the function Ax + By = (A+B)z (where x is RN1, y is RN2, A and B are their coefficients, and z is the nominal probability), but all of my attempts have resulted in unsolvable equations. Basically, I expected the result of the integral = P (measured probability) to make some kind of concrete relationship between A and B when applied to different hit values and measured results. I'm pretty certain it's possible, but there's probably something wrong in my methodology here.

    I tried:

    From x = 0 to (A+B)*z / A; y = 0 to [(A+B)*z - A*x] / B,

    int( A*x + B*y dy dx) = P

    The result I got (checked with WolframAlpha) is z^3(A+B)^3 / (3*A*B) = P. Unfortunately this isn't even remotely close to a viable solution, as (A+B)^3 / (3*A*B) would be a constant, since A and B are constants, making P a linear function of z^3. It's very possible my bounds are wrong; it's been a while since I've done this kind of integral.

    If anyone comes up with an integral that works, let me know. It would allow me to make a single model that fits to our best data and not worry about checking against multiple different models. Then I would be able to extract A and B dynamically as we continue forward, which should hopefully expedite things.

  5. edit: I don't actually know how many of those >50% in post #59 you tested in MyCastle, BlenD. I read that post as you tested some points <50% and the other data was from before. That might already be enough data to kill this dynamic idea, depending on how it was collected.

    The other night I tested values at 71, 66, 56, 82, 41, and 49. As far as I know, all of the low values we have large samples for were collected in My Castle. Everything else is fairly mixed and includes both My Castle and chapter data.

    Thanks to Verile's latest data, 66 is now the point with the largest sample size, at 680 points. Aggregate hit rate is 72.36%. Raw 2-RN expected value (77.22%) is ~0.5% outside the 99% confidence interval. (3A+B)/4 is nearly in agreement with results with an expected value of 71.83%.

  6. Verile's results at 27 places both (3A+B)/4 and raw 2-RN outside the confidence interval for 27, providing more evidence for a 1-RN system at low values.

    Looking at the math, I don't think a dynamic system explains our anomalies. If we had a dynamic system, then what we would effectively have is a chance P to move a fixed point of reference up or down to either P+V in the case of a miss or P-V in the case of a hit. The probability would then oscillate up and down around that point. At high values, the expectation would be to undershoot the given value, as every time the probability returned to its "normal" value, it would be more likely than not to hit, adjusting down to P-V and reducing the likelihood of success the next trial. So it would be more likely to oscillate down than up. Apply that to all values, and a dynamic system would be middle-shifted, if that makes sense.

    A dynamic system could be how to explain why our measured values undercut raw 2-RN values, but modelling that would be a pain so hopefully we won't have to.

  7. How did we find dynamic growths in Shadow Dragon anyway? I've looked for the credited author on the main SF page (Nitrodon) and that person has been inactive since 2011. If there's something that needs to be tracked in order to be sure I want to be keeping tabs on it.

    Edit: Thinking it over, and I don't think a dynamic system is likely. Unlike growth rates, which can be fixed in an array and modified easily, the function that calls the RN never "knows" what it's getting. It would have to make modifications to RNs as they came in because a 99 value array would be a big waste of space. That would mean that a result at 41 could affect the result at 99 after enough trials, and that would spell all kinds of disaster.

  8. I used a meshgrid and added the above bolded models and their errors to the spreadsheet. Haven't recorded any new values since last night. Only thing to report really is that if the raw 2-RN is out for the upper range, so is (3A+2B)/5.

  9. I was going to nix a lot of the tables I have set up in Excel and replace them with the values for each model so I can compare them side-by-side in real-time, so I'll take care of it.

    I am not prepared to deal with dynamic hit rates so let's pray that's not the case.

  10. That's what it looks like. It's about time to start playing with different combinations of (M*A + N*B)/(M+N) just to make sure we're looking at the right number combination, although I doubt either M or N is greater than 5 (really, I doubt anything greater than 3, but quality assurance is good). I'm going to calculate out a mesh with (2A+B)/3, (4A+B)/5, (5A+B)/6, (3A+2B)/5, (4A+2B)/6, (5A+2B)/7, (4A+3B)/7, (5A+4B)/9, starting with the bolded values and (3A+B)/4.

  11. ~10000 data point update:

    I tested several different hit rates in Check Defenses. We now have quite a few points <50 with ~200 data points or greater.

    9 -- 200 points, 7.5% hits

    23 -- 179 points, 21.23% hits

    36 -- 312 points, 33.65% hits

    41 -- 189 points, 42.86% hits

    49 -- 249 points, 51.41% hits

    And above 50:

    56 -- 282 points, 57.09% hits

    63 -- 270 points, 71.48% hits

    66 -- 171 points, 74.27% hits

    67 -- 284 points, 83.45% hits (anomaly?)

    74 -- 240 points, 84.58% hits

    75 -- 395 points, 82.28% hits

    76 -- 217 points, 84.79% hits

    82 -- 420 points, 90.71% hits

    84 -- 190 points, 92.63% hits

    85 -- 213 points, 96.23% hits

    87 -- 200 points, 94% hits

    89 -- 202 points, 96.53% hits

    94 -- 430 points, 99.07% hits

    95 -- 183 points, 97.81% hits

    This is where some educated guessing comes in.

    Raw 2-RN almost universally overestimates measured hit rates. Ideally, if the system was a raw 2-RN one, we would expect a near-equal number of values to be represented above the line as below it (taking into account that misses disproportionately affect the upper edge case). Taking that a step further, the average error for high-confidence values for the (3A+B)/4 model is -1.01872 as opposed to the Split model's 1.679617. When only the magnitude of the errors is considered, the (3A+B)/4 model rests at 2.38098 for high-confidence values while the Split model is at 2.69778. Here's the graph, for reference:

    post-21742-0-23629500-1457593332_thumb.png

    I don't think our current data is conclusive for (3A+B)/4, but I do think it is conclusive to say that the system has changed. Also, the value for (3A+B)/4 and raw 2-RN fall outside the 99% confidence interval at Hit=9, so that's pretty substantial additional evidence for 1-RN < 50.

  12. Honestly it looks like the best way is to copy-paste the values since you can't upload spreadsheets here. To ensure that the formatting stays okay, I'd recommend exporting the data to a .csv or a .txt file first. In Excel: File > Export > Change File Type.

  13. I have a Conquest save where I can test out 19% hits consistently. Of the five or so times I did it, all of them missed.

    If you want more data points, let me know how many.

    19 would be a good value to farm at, since there's a 10% difference between 1-RN and 2-RN and will help ascertain one way or another if we have a split system.

    For testing raw vs weighted 2-RN, 66 is the absolute best spot, as there's a 5% difference between the two models. It'll take about 300 points at some value between 63 and 70 to make any kind of conclusion. I'd suggest 67 because it's the one with the most points so far, but we're acknowledging that our value for 67 is anomalously high at present and might confound results. Testing another nearby value may be enough to ascertain whether or not to discard that outcome.

  14. I can brute a couple hundred more. Is 75 the only anomaly, or is there a range I can look for (such as 75-79)? Finding castles with exact hit rates is a chore.

    I've got 75 covered because of a Normal save at Ch 2 abusing Kaze. You can brute force any value less than 90 with a statistically significant sample size and it will make a difference overall -- above 90 it would take too many tries to get any meaning on its own and is only useful when fitting a broader model. Anywhere between 60 and 80 is ideal, or 20 to 40 on the other side if you want to look for more evidence that values < 50 may use just one RN. We currently have the best sample size between 74 and 85, so those are a little less pressing than other values.

  15. To generalize, at any measured success rate P, the number of trials N to get the one-way margin of error of less than a value K is

    N = 6.6049 * P * (1-P) / K^2

    The current proportion for 82 is 294 successes out of 320 trials, or 91.875. If we assume the success rate doesn't change, the positive margin of error has to be less than 0.01825 in order to invalidate an unweighted 2-RN system. That's 1480 trials. Ew.

    Unfortunately the number of trials we have to perform until we hit the effective limit of a 2-RN system for any given value is...10000 and counting (you know, exactitude). If we assume one model or another is correct, we can take a 400- or 500-trial count for most values as definitively exclusive of at least one of the possibilities. The sample size for 75 is close to 400 already and I still have my Ch 2 save, so I'll go ahead and take that to 500 and see if anything changes. The Castle battling method has certainly proven effective, so I'll also take one of my files and play with skill/weapon combinations until I get something close to 25 and start hammering away there.

    I can't decide if this is as absurd as trying to figure out FE9's Forge calculations or not.

  16. First off, thanks to you guys for brute-forcing all of those data values. That brought our total point count up to 8600 and greatly improved confidence for lower values. I revised the graph to show both the (3A+B)/4 model and the split model and killed the error bars as they were cluttering the graph too much with both models on there.

    post-21742-0-78643700-1457475113_thumb.png

    The low end definitely favors a 1-RN model over a raw or weighted 2-RN model. The upper half of the graph is too close to call at present. The high-confidence values established by testing (63, 67, 75, 82, 85, and 94) are all very close to both values, as the models don't differ by much. However, the reason we pursued something other than raw 2-RN to begin with was the oddity of extensive testing at a value of 75. Of the high-confidence values, 75 is the only point whose 99% confidence interval excludes its raw 2-RN probability.

    Aggregate stats at 82 are at 294/26 H/M ratio, or 91.875. This is extremely close to the (3A+B)/4 model, which predicts 91.72. The rest are all toss-ups, save 75, which predicts something other than a raw 2-RN system. (Of note: a lower-confidence but still fairly confident value of 74 is almost exactly equal to its raw 2-RN expected value.)

    In the limit, if things are too close to call, I would lean towards keeping a raw 2-RN over a weighted one because it seems more likely to not change a system than to change it in such a way that barely affects the outcome for most values. For now, let's keep testing.

  17. 6000 data points, and I've enacted relative upper and lower bounds on the data, so that a value's upper bound can't be higher than the upper bound of a value to the right of it and its lower bound can't be lower than the lower bound of a value to the left of it. There are glitches below values of 40 because of the dearth of data points there, but above that the data is sound. I've highlighted that region.

    post-21742-0-92933500-1457386539_thumb.png

    This relative bounding produces a few rectangles where data is scarce, but we can rest assured that the rightmost point is accurate above the curve and the leftmost point is accurate below the curve. We can confidently wrap the error bounds with a function if we can find one that fits all of the upper bound maxima and one that fits all the lower bound minima. That would be under a logistic model L / (1 + e^-k(x-x0)) with L = 100 (or 99.99) and x0 = 50. Starting that calculation now because there's only one unknown (k) for any point.

    Edit: Steep approximation (high k) is almost useless, but shallow approximation (low k) enforces hard caps on certain values which will allow us to extrapolate behaviors for low values from high values. It narrows the range of likely values at the mid-extremes (20-40 and 60-80) to a range of +/- 10%. Better still, both approximations will improve in accuracy no matter what values records come from.

  18. At 5100 points now, but the data is still heavily concentrated at the top. All hit values with more than 100 data points are within +/-5 of their expected values under the (3A+B)/4 model. Some values are spot on, and the value for 75 has slowly approached our expected value, currently at 82.5. It still looks like we're undershooting most middling values (from about 30 to 60) but in general the relation holds pretty well. Would really like to find somewhere I can farm hit values 40 and below.

  19. I just so happen to have bought a new fire emblem ninty 3ds. From what the nice VincentASM says any new 3ds will do? I've never used a hacked rom so I don't know what to do or if I need to buy anything, but I don't mind helping. Idk what Vincent is even talking about though. I am still on my first conquest play through, as soon as I've beaten it I intend to start a conquest lunatic run and Hoshido hard run (to vent my conquest frustrations :) ). If I can help then please let me know.

    That's a question for VincentASM or someone who frequents/contributes to the hacking threads like shadowofchaos725 (he's the only one I know by name). Message them, find a tutorial, something along those lines.

    As for helping, you could use whatever hacked version you find to fix the stats of a specific player/enemy unit pair so that they have a certain fixed chance to hit each other and deal no damage. That way, you have a risk-free way of testing the same hit chance over and over again. Repeat for 50 turns and you have at least 200 data points. Do that at 20, 30, 40, 50, and 60, and this project is basically finished.

    Even without hacking, if you simply record values and outcomes through your playthroughs you can get a lot of data without trying very hard or getting bored. I picked up almost 3000 points in one run of Birthright, even without grinding.

  20. With more than 4,000 points, we now actually have several values over 100 trials, so I figured those might be worth sharing.

    @ 84: 106 hits, 6 misses -- measured success rate: 94.64%

    @ 85: 119 hits, 4 misses -- success rate: 96.74%

    @ 87: 117 hits, 9 misses -- success rate: 92.86%

    @ 93: 108 hits, 4 misses -- success rate: 96.43%

    And now the (3A+B)/4 model is almost universally underestimating values, although it is in some cases still comparing to sample sizes from n=4 to n=50.

    post-21742-0-40408600-1456879136_thumb.png

  21. Anecdotally, I was noticing that my Conquest playthrough was feeling similar to my recent time with XCOM 2 (which uses a single number) while my Birthright playthrough I feel like it was coming through much more on my middling hit rates hitting and the enemy's lower hit rates missing. Which got me thinking: what if the two story paths use different calculation methods? It's conceivable that Birthright uses 2 RN while Conquest goes back to old school 1 RN.

    Possible. All of my numbers so far have been from Hoshido, and I've collected the bulk of the data, so I'm not really worried about anything confounding the values. That would beg the question, though, which system is used in the initial six chapters?

    The thing about the data we've collected so far is that it pushes both 1-RN values and 2-RN values outside of the confidence range, so we're pursuing all possibilities. At any rate, I'll keep my personal data from Conquest and Birthright separate and make a third sheet for the aggregation of the two and we'll see what changes, if anything.

    This is quickly becoming quite the project.

×
×
  • Create New...