Quick Links: Download Gideros Studio | Gideros Documentation | Gideros community chat | DONATE
Finding and fixing crashes you can't reproduce - A method I should have started using long ago — Gideros Forum

Finding and fixing crashes you can't reproduce - A method I should have started using long ago

Hey, Gideros friends. I've just implemented a new method of getting the details of problems in the field so I can fix them quickly. It seems so obvious now, but it would have saved me some serious headaches if I'd done this years ago. I thought I should post the details here in case this method will be useful to someone else.

A common frustration for any developer is when a small percentage of users are having crashes in code that works fine in-house. Depending on the OS and the nature of the problem it can be a major pain to pin down some of those bugs.
For dealing with crashes that happen inconsistently or to only some users, there are a few options. In this past I've often exchanged emails with the users reporting the problems, getting as much detail as possible, and sometimes that's enough to pin down the cause. Sometimes the app store console will provide enough detail to track it down. Sometimes, but not always, Google Play shows me a module name and line number, but with Apple I usually only see that it's a Lua error.

Without precise details of the problem, sometimes that just leaves a detailed review of lots of code, looking for anything unsafe. Is division ever done where there's a chance the denominator could be zero? Table references without checking the validity of the index, or the table itself? Is there a path to this code where this variable may be null before this operation is done? Even if you try to follow best-practices and write safe code, there will be errors.

In Gideros we have pcall(), a try-catch equivalent to capture anything that would cause a crash. I've used that in places where I knew there was a significant chance something might fail, like creating a texture from a downloaded file. Like:

local success, result = Texture.new(texture_path)
if success then
bitmap = Bitmap.new(result)
end

What I only recently learned is that when an error is detected, not only does it return false in the first result, but the details in the second, details like you'd see in the console if you were running in a Gideros Player. That is, what went wrong, on what line of what source file. Once I understood that I made cover functions over the functions that do the heavy lifting in a game, like the function that builds the main game screens, and the onEnterFrame functions. This way one pcall can catch errors anywhere in the main loop of a game, etc. For example:

function on_enter_frame()
-- The main game loop consisting of hundreds of lines of code calling many
-- other functions in other modules containing thousands of lines of code
end

function on_enter_frame_cover_function()
local success, result = pcall(on_enter_frame()
if success then
return result
else
report_error(result)
end
end

I have a report_error function that, if it hasn't already been called this run, packages up some data in a table including the app version, platform, and the text from the result of the error caught by pcall. I also include some other details about what the player has been doing, like what screen they're on, what content they're using, and string of text indicating the recent major steps that have happened in the app, etc.. I JSON encode that object and use a URLLoader to send it to a simple PHP script on my server. The script logs that event in a database, and depending on the type of event (in this case with a code flagging it as a crash prevented by pcall) it emails me the details.

I've just released the first update of Fly Fishing Simulator HD using this code, and I quickly discovered two points in the code that were causing at least a few crashes for some users. I've had no user reports of these, or reviews complaining about the crashes, but now I get instant notification of problems like this and can go right to the line in the code and fix it immediately.

If I'd figured this out years ago it would have saved me and some users some long and tiresome bug hunts.

Paul

Likes: MoKaLux, pie

+1 -1 (+2 / -0 )Share on Facebook

Comments

Sign In or Register to comment.