Recently I was working on a problem for someone that involved a graphics glitch. It involved only one chipset on mobile OpengGLES. The engineers had been looking at this for almost two weeks and still did not have a solution. They were certain it was a driver bug. From my experience, about 1/2 the time this doesn't turn out to be the case.
One of my techniques for efficient debugging is to actually use a debugger. When I arrived and started working with the team to try and solve the issue, they had been what I call "poking" at the code. What this means is placing printfs in the code, stripping out pieces the code, writing alternative ways to render, etc.
In fact, they had one other idea that they wanted to try, which was to write some code to ping pong between textures because they had a suspicion it was a problem with glClear not being called.
I asked if I could first identify the problem exactly, which usually takes me a few hours at most (if it's a hard bug). There were a couple of bugs that did take me a week to find, but those usually took 24 or more hours to reproduce. I explained that just because they are seeing it in only one game and only one hardware chipset, it can still rear its ugly head somewhere down the road. They didn't care they just wanted a hack for now. I've seen it time and time again, if you don't understand the bug and just hack around it, you create a whole can of other beetles for yourself. I would have said worms, but beetles are bugs.
It was clear that I wasn't a good fit for this group. They didn't need me, what they needed was a coder that would just be a robot and just code up hacks until they found a solution. Not my cup of tea!
Upon my exiting they asked me for feedback on what they were doing. I said it would be nice to have a debugger and they said, "Why? Do you need to look at the stack or something?" OK, I thought, I guess I need to write a blog entry about this.
Only when drawing quads on a certain level and only after about 10-12 quads would the background graphics become glitched and would corrupt a texture or corrupt the UV coordinates or corrupt something. The corrupt something should be your goal and to identify it exactly, so that you can come up with a general solution. If you "One-Off" it you get a temporary fix that almost certainly will cause you more time and more pain in the QA process.
How I would have solved this problem in less than a day using a debugger is as follows:
- Use a debugger to set a breakpoint with a count in the code to break one time before drawing that particular quad that seems to be causing the issue
- Write down or look at all the parameters, data, etc for the working quad
- Run again with one more breakpoint that is set when the problem quad is rendered
- Now look at that quads parameters, data, etc and see if it is overwriting something
- Check Vertex Buffers, UV buffers, texture pixel data etc.
Also, I never make an assumption that it is a driver bug just because you are seeing it only on one chipset. I have identified and fixed so many bugs in both drivers and game software, I always want to get to the root of the problem and exactly locate the source of the problem.
In this case, I could imagine that ATI had closely packed a vertex data/UV data and that if NVIDIA, QualCom, etc packed differently where you might not see the glitch.
I personally use GUI debuggers because I can move around much faster in them, don't have to fill my unused synapses with a myriad of commands as I already have most of the programming languages floating up there!
Until next time!