Thursday, July 31, 2014

#threadproblems

Ran into another interesting bug yesterday.  In retrospect, I am actually really proud of the fact that I was able to find the issue and fix it so quickly.  Thankfully, this time the bug was not due to my own stupidity.

I'm not sure if I have mentioned it yet, but I am currently working on a game.  I'll try to keep the details sparse, because they don't really matter.  In this game, you have a certain number of clicks to use, and after you run out of clicks, the game ends.  Sometimes, this would behave as expected.  But the bug was that most of the time when you used your final click, the game would immediately end instead of handling that click.

The game runs in a separate thread from that which handles touch events, so after briefly reviewing my logic it seemed pretty clear that it was some threading problem.  I will admit that multi-threading is probably my weakest area of expertise, mainly due to a simple lack of experience.  (That, and the fact that I have only had one professor briefly teach threading, and he is one of the worst teachers in our department, so I did not pay much attention.)  So maybe this was actually my fault.  Maybe I broke some sort of fundamental law of threading, I'm not sure.  But anyway, in my game thread, I have a while loop that keeps looping as long as the player has clicks left, and another while loop inside it that runs the actions caused by the player's click.  Both of those loop variables were being changed from the UI thread instead of the game thread.  So, it basically looked like this:


And the thread that was modifying those values looked something like this:


Now this wasn't too hard; if the touch event happened while the game thread was at the top of the outer loop, (between lines 1 and 2,) it would work fine.  Otherwise, it would ignore the last click because _clicksLeft == 0, so it would never even get into the outer loop.  So I quickly added another condition to the loop:


Fixed!  ...Or so I thought.  I tested it a few times, and it looked like the bug was gone, so I pushed my code and went to go work on something else.  Then, I was testing a couple days later, and I ran into the bug again.  D:

Clearly my fix had done something, because I didn't see the bug again for a few days.  So I tested it a few more times to make sure my eyes weren't playing tricks on me, and it happened again.  *sigh*  Time to go back to the code...  The first thing that came to mind was to try to syncronize the blocks that were dealing with those variables.  Unfortunately, that froze my entire UI, so that didn't seem to be an option.  I stared at it for about an hour, and then it hit me:

There is a rare edge case where both of the above loop conditions will be false, even if the game is not supposed to be over.  _isRunningTurn must be false, _clicksLeft must be 1, and the touch event must happen while the game thread is in the middle of checking the loop condition!  If I spread out the timing, it becomes a bit more obvious:


Of course, it probably wouldn't happen with exactly this timing, but it's possible.  The solution is simple though: just reverse the loop conditions:


Regardless of the execution timing, there is now no case where both conditions are false unless the game is really over.  Pretty crazy, I never would have thought the order of loop conditions could be such a problem!  Whoever first said "You learn something new every day," I think they might have been on to something.

No comments:

Post a Comment