I've been trying to solve a mysterious crash on a Windows game, and I'm running out of ideas...
An app I've had on the Microsoft Store for years suddenly stopped working for at least a couple of users, crashing after a few seconds showing the splash screen. The Microsoft Store developer dashboard has no details about the crash. One effected user has been a customer of my games for more than 10 years, and he's been working with me to try to resolve it.
Since the app hadn't changed, my first suspicion was that the installation of the app had been corrupted somehow. I published an update as a beta to the one customer, allowing him to update the app without losing his data (records of tens of thousands of fish caught over several years, placing him high on the global leader board.) That didn't help, so I suspected the app's saved data was corrupt, though in most cases the code is tolerant of corrupt data. I built him another beta that sent his catch history to a script that stored it in a database, and a one-off system to download and restore that data. That allowed him to uninstall the app complete and reinstall, knowing we could restore his progress (I plan to do something like this for all customers in the future.) But even after uninstalling and reinstalling, it still crashes on startup for him.
In the app I already had code in place to handle failed startups to help me with troubleshooting. At many points it appends an entry to a log file showing the last step completed. Some seconds after the app is up and running, it deletes that log. So on a startup, if the file exists, it shows that the previous run didn't start successfully, and adds that log to a database on my server, showing how far it got. I've used that to solve some other issues in the past. Whatever steps the code takes between the last successful log entry and the next one it should have taken next should include the code where the crash happened. Normally that's very helpful when I can't get a stack trace.
In the betas, I've also peppered the startup code with many more logged steps like that, to pinpoint exactly what the app is trying to do when it crashes, with each beta adding more to narrow it down until I'm logging steps for individual lines of code at the area where it's been crashing. That should pinpoint the exact point of the crash, but the crashes are not consistent from one run to the next. On one run it might crash seeding the random number generator. On another it might be when it tries to open a local file. In one case it failed after one log entry and another when no other code was attempted between them. I've built a version that sends a status update to a server at each logged step, rather than only when the previous run failed to start, and the data shows the same - failing very early in the startup, but not consistently at any particular step.
That suggests the problem is happening on some other thread than the Lua code, or triggered by some other software that's running. The user has checked for updated drivers, etc. I still can't reproduce the problem - both the general release and all the betas run fine on two Windows 10 computers. The user's computer's specs exceed that of the ones I'm using - ample RAM and drive space, etc. He runs many other games on it, and the only one that crashes is mine.
Unfortunately this is the short version of the story - I've been through many one-off builds trying to solve this over the past few weeks. I originally built with Visual Studio 2015, reinstalled that and the SDKs from scratch, originally built with WinRT export and and older Gideros, now updated to UWP with the latest Gideros and Visual Studio 2017, and for complicated reasons, switched to building on a different PC. Again, everything works as it should for me, logging all the startup steps of every run to my database, whether I'm running a clean install of the game or one I've run several times.
Is there any .ini file or registry entry left behind after an uninstall of a Windows Store application made from Gideros, something that could be corrupted and still around to cause trouble after uninstalling or reinstalling the app? I'm grasping at straws here. I know that in general, if a game used to work on Windows 10 and suddenly stops, it's a good bet some Windows update changed the environment and some hardware may not place nice until drivers get updated. But if his drivers are up to date, and other games and applications that use the graphics card, etc, work fine, and the application runs fine on other computers with fully updated Windows 10, where does one go from here?
I'd love to hear any and all ideas...
Thanks,
Paul
Comments
Maybe they could run system restore to around that date and test again.
You could just publish a test case where you use an empty project and see if that does not work on his machine also.
I've also considered publishing a trivial Gideros app as a beta, without network access, plugins, or anything complicated, just to see if whatever the issue is applies to everything built this way. If it runs, I could try adding a module or plugin at a time and see if one particular one seems to be causing the crash. I don't know if or how that might lead to a fix, but it could conceivably shed more light on the cause.
I might try either of those next. Still scratching my head over this...
Likes: antix
Likes: antix
for file in lfs.dir(path) do
(stuff)
end
iter, lfs_obj = lfs.dir()
while iter do
(stuff)
iter = lfs_obj:next()
end
lfs_obj:close()
The first result is ".", and retrieving the second result crashes for the effected user.
When I built a stripped down app with most of the content removed, and two buttons that would each test one of those two methods, for him the first method doesn't crash but doesn't return any results at all. The second method works.
Still trying to get this worked out...
Paul
Likes: MoKaLux, antix
If I don't find any other solution, I may have to rework quite a bit of code to avoid ever needing to use lfs.dir() or anything equivalent. In one case my server can deliver a list of possible entries, names of folders where each of about 172 different content packages would be installed. Then I can have the app just check for the presence of each. In another case I'm getting a list of folders containing user created content (fishing flies), but I could maintain a list in a file, adding another entry each time the user creates a new item.
Paul
Likes: MoKaLux
Likes: MoKaLux
Fragmenter - animated loop machine and IKONOMIKON - the memory game
At this point I can confirm that on the few Windows 10 computers (I know of) where the problem exists, the application crashes the second time lfs.dir() iterates, even if the return value of the first iteration works as it should, and all values look appropriate until the second iteration (attempting to access the second result.) So it seems that any approach that uses lfs.dir() will crash in these cases.
I don't know if third party software, like specific anti-malware programs trying to protect files, or the presence of mounted network drives, or some other aspect of these particular systems might be involved.
Moments ago I confirmed that if the app avoids calling lfs.dir(), everything else works just fine. My latest beta just maintains a list of files written to folders, and consults those lists rather than asking the system for a list of files present by calling lfs.dir().
So that's my solution, at least for now. Lfs.dir() crashes on some small subset of Windows 10 computers, so I won't call it.
For backward compatibility, my next general release will make one attempt to use lfs.dir() to build the lists of files already present. Thereafter it will just add to the list when it adds files to folders, so most users will never experience any impact. For those few computers that crash on lfs.dir(), it will crash on that first run when it makes that attempt, but on subsequent runs it will only use the stored file lists.
Paul
Likes: MoKaLux, keszegh, talis
https://github.com/luapower/lfs/issues/1
By the way when i searched google like "lfs.dir crash on windows" so many pages about this crash
All crashes are lua based applications and in the second iteration.@PaulH
Likes: MoKaLux
Likes: talis
Fragmenter - animated loop machine and IKONOMIKON - the memory game
While I am at it, I also spotted another possible issue: they use ascii versions of the API, not unicode (wide-char), which means that it will have trouble with non pure ascii file names.
First fix is very easy: just change line 94 of /All Plugins/lfs/source/lfs.c from 'long hFile;' into 'intptr_t hFile;'
Second fix requires a bit more work.
Likes: talis, keszegh, MoKaLux, plicatibu
Likes: MoKaLux
I don't know if this is relevant, but at least a few times I've seen an iteration of lfs.dir() return a value of type "function". I didn't dig too far into that - just used code to only deal with returns of type "string", but that still puzzles me.
Likes: talis
Likes: MoKaLux