- Kernel Upgrade to a more stable Armbian base image
- Bug Fix: NOPASSWD property is now cleared from the ronindojo user after first time boot installation procedure is complete (previously only affected 2.1.0 new installs)
- Enhanced Release Candidate process
While developing the features and improvements we had scheduled for 2.2.0 (which are significant), we got reports of a nasty bug that we needed to squash. We put 2.2.0 on hold and immediately implemented the fix for a bug that affect 2.1.0 new installs. This bug allowed incorrect passwords to sign into the UI. While there is very limited operations the UI can do, and the SSH was still secure and requiring the appropriate password, it was our top priority.
Things quickly turned as we had a growing issue with the Linux Kernel bug written here. We didn't want to push the fix till we had a good grasp of the kernel issue and a usable new image. This post will cover why and how the UI and Kernel bug came to our attention and what our immediate actions were.
OS Migration Refactoring Woes
The codebase started off with being built for having a Manjaro based image. Though this lasted for quite some time, we were increasingly forced to endure dealing with breaking changes in Manjaro and decided to switch to an OS more prevalent (ubiquitous even) for ARM based SBCs: Armbian. In the migration to an Armbian based image, we focussed on two things: our RoninOS image build scripts making an Armbian based image, and RoninDojo calling on the software present on Armbian where previously it depended on Manjaro. We decided all other things to stay equal, introducing as few changes as we could and making sure we wouldn't introduce anything that would unexpectedly work differently or even break. To this point we have turned out successful and can stand tall looking back on it.
Once that was release was made public, we could start working on cleaning things up. Some portion of the code was rendered outdated, so we started reducing the complexity of where it was most obviously needed: the image build script and first time boot initialization procedure (we call it "first time init"). In refactoring this code, we worked with the assumption that it was only the Manjaro image build procedure that needed the RoninDojo user to be set to a "passwordless sudo access" mode. This mode allows for using sudo calls as a user without having to enter a password, which is normal for installation procedures that are meant to run without user interaction, such as RoninOS setting itself up for use the first time you boot it. After this refactoring was released in 2.1.0, we made an error with this assumption, leaving this mode active after the first time init. Well, to be honest, two errors in consequential conjunction actually.
In the sequence of events that happens in the code, from image building all the way to a completion of the first time init, there were two moments where the aforementioned mode would be set and then later unset. Our first error was not removing all of that code. The very first event of setting it during image building, had remained despite our effort to remove all of it. Had we removed it, we would've found out that the Armbian based build procedure still requires it, just like the Manjaro one.
Our second error was the assumption that the first time init didn't need it, since the system unit file (which kicks off our first time init) instructs the installation to be executed as "Admin". We were mistaken since we saw that "it still worked" after having thought we removed the setting/unsetting completely. As a result, we were none the wiser that this "Admin" setting merely implies running the installation as a user that's present in the sudoers file, not that it "magically" fixes the demands made by sudo usage.
As soon we found out about it, we scrambled to find the cause and quickly figured out the problem was not present in the last release before the one we had just released. With that piece of knowledge, it didn't take long before we would see in the 2.1.0 change-set specifically how the bug occurring in the new sequence of events required the mode to have been set from the start. So we investigated the need for it being there in the first place, and saw that the first time init depended on it for the many sudo calls it makes. We also noticed it would take a far bigger task of refactoring so that the first time init would no longer require it (assuming that's even possible). The fix for now is a simple one: keep the mode active until it is needed no more, which as before means unsetting it at the end of first time init.
In light of recent events we've taken a look at how we guarantee the stability of the software we release. Specifically RoninOS came into view given we had a grievance with our own release procedure.
Our internal QA method for acceptance testing is predicated on building a RoninOS image for testing that pulls in a specific RoninDojo release branch. When booting RoninOS for the first time, an embedded shellscript file is called to set up the system with a RoninDojo installation. This file has a hardcoded reference for which branch of RoninDojo to pull in. For testing a release, we make a release branch off of the development branch, and append commits like updating the CHANGELOG and fixes for any bugs we find during testing. The testing image would pull in this specific branch and so, any changes we push to the release branch can be tested immediately by our QA without having to wait for a new image to be built by the devs.
This implies that this image cannot be the thing that is released to the public, as otherwise the public would install a codebase that isn't from the master branch. To build an image for the public release, we'd first have to wait for the acceptance tests to be completed so we're sure we don't need more fixes first in RoninOS, and then we build the image that will pull RoninDojo based off the master branch.
This had the unintended (but well understood) consequence that there's time between the builds of the test image and the release image. Given this method, we've had to deal with the fact that packages may have had newer versions released in between these two moments in time, meaning our release image isn't fully the same as what we've tested. We've dealt with this by trying to keep the testing rounds short so it should be not that significant in impact. On top of that we manually assess the difference in versions of the installed package listed between the test and release images, in case anything pops out as an obvious risk.
One remark that would be on the nose for reducing this risk, is the idea of building the image two times when the acceptance test round starts, one for testing and one for the release, so that it would be at most a statistical anomaly for a package update to sneak in there. This however, is cumbersome when it comes to dealing with fixing bugs found in RoninOS as every new test image would then also require building a new release image.
<sidenote> We already build two test images every time for testing, with the second one being the same as the "normal test image", aside from it being tweaked for our testers to have easy access (using default passwords for the root and ronindojo accounts). This is done in case a diagnosis is needed for any bugs that prevent the initial user-flow of setting up the accounts from completing (looking at you PM2 and NodeJS updates).
While the codebase for this tweaked image is quite in line with that of the regular test image, the code for an actual release image requires more manually applied changes by a developer before building the image and doing this every build is much more prone to human error if it were done during the testing rounds. With our release procedure being kept relatively simple and tenable, we can make sure no procedural mistakes slip in there without us realizing, so we're not inclined to mess with that procedure unless we really have to. </side-note>
With acceptance testing for RoninDojo v2.1.1 happening at the same time as the research on the kernel bug, we've learned a few things about the build procedure for Armbian. But alas, nothing that would serve us in improving this situation for building release images that conform to what's been tested. At least, not without incurring challenges regarding getting the build procedure stable (manually configuring the build procedure is just that much trouble).
After some deliberation we considered perhaps treating the testing image as a Release Candidate (RC) that only needs a tweak to have it ready as a public release. It didn't take too long before we've found the method to edit images after they've been built. We performed some extra test to make sure we're not introducing any unintended side effects, which have all turned out great.
This solution seems to check all the boxes: we no longer need to inspect the test and release images for differences in package versions between the two, we don't need to build the release image separately, and we can now call the test image an actual RC. We've reduced our workload and improved the guarantee of stability: win-win.
- Fix: NOPASSWD property is now cleared from the ronindojo user after first time boot installation procedure is complete
- Credits: dammkewl, BTCxZelko, 200keks, BrotherRabbit, RockyRococo, s2l1, kyc3, numbers