MAC5856 - Road To The Linux Kernel
This post is a collection of reports on the tutorials provided for the MAC5856 class for Open Source Development. As such, this post will be continuously edited to account for my latest experiences with the tutorials.
More specifically, the tutoral pages provide a guide on how to setup a development environment for building and developing the Linux kernel.
After years of being an Arch Linux user, I've opted for going with a Fedora installation for a change. This required some small changes with respect to the tutorials - all of them are documented below.
Tutorial 1 - QEMU and Libvirt Setup
This first tutorial focuses on the creation of the basis for the development setup, such as the installation of all necessary packages for virtualisation.
-
Although the tutorial uses the
libvirt-qemuuser for the directory ownership, I had to go with theqemuuser in Fedora. -
To avoid having to use
sudoevery time I usevirsh, I opted for creating a tiny wrappervirshhin theactivate.shscript that solves this:function virshh() { virsh --connect "qemu:///system" $@ } export -f virshhNow instead of using
virshI usevirshhand it's working flawlessly so far. -
As many other students reported, I also had to install the
openssh-serverpackage in the VM:apt update && apt install openssh-server systemctl enable ssh && systemctl start ssh && systemctl status ssh
Tutorial 2 - Building and Booting The Kernel
-
I wasn't able to use
rsyncto fetch thevm_mod_listwe created in the last tutorial. For that I had to installrsyncin the VM first. -
I had a very annoyoing problem when creating the new VM: the system would never completely boot. Instead
virt-installwould halt very early and never finish the creation of the VM itself. After many attempts, this version worked:export img_file="${VM_DIR}/arm64_img.qcow2" export root_vda="/dev/vda2" export kernel_file="${IIO_TREE}/arch/arm64/boot/Image" export initrd_file="${BOOT_DIR}/initrd.img-6.1.0-44-arm64" export vm_name="arm64" function create-vm-virsh() { sudo virt-install \ --name ${vm_name} \ --memory 2048 \ --arch aarch64 \ --machine virt \ --osinfo detect=on,require=off \ --import \ --features acpi=off \ --disk path=${img_file},bus=virtio,format=qcow2 \ --boot kernel=${kernel_file},initrd=${initrd_file},kernel_args="loglevel=8 root=${root_vda} rootwait" \ --network bridge:virbr0 \ --graphics none }The problem seems to be that the device mapping was not going as expected but adding the
bus=virtio,format=qcow2did the job.
Tutorial 3 - Modules and Build Config
Pretty much uneventful tutorial, very easy to follow. For some reason I had to restart the VM network system with
systemctl restart systemd-networkd
Because virsh net-dhcp-leases default was failing to find any open connections even though its
networking capabilities were on.
Tutorial 4 - Device Characters
My only trouble in this tutorial was cross-compiling the host programs. Fedora doesn't ship a complete aarch64-compliant toolchain like Debian, so the easiest way around it (without manually installing an arm64-sysroot) was to use the Zig cross-compiler - which is honestly a shame for C/C++ compilers as it should've been easier with them anyway... Here is the simple command that made it work:
zig cc -target aarch64-linux-gnu read_prog.c -o read_prog
Tutorial 5 - Sending Patches By Email With Git
I only needed to follow this setup because of my new Linux setup, however I've already had experience setting the git email support before, so it went quite smoothly using the Thunderbird client setup.
Tutorial 6 - Sending Patches With Git and a USP Email
In this tutorial I first tried following option A (using git-credential-email) however when I try to authenticate, I get an error saying that "DSL 2026 OAuth Proxy has not completed the Google verification process". Since I don't have access to the "DSL 2026 OAuth Proxy" project, I decided to go with option B - which envolves using email-oauth2-proxy.
I didn't have Docker installed so I needed to configure the following:
# Install dependencies.
sudo dnf install docker docker-compose
# Start and enable the daemon.
sudo systemctl start docker
sudo systemctl enable docker
# Add yourself to the auto-created docker group and log into the group.
sudo usermod -aG docker $USER
newgrp docker
Tutorial 7 - The iio_simple_dummy Anatomy
In this tutorial we set out to start understanding the inner workings of the module iio_dummy. The
code itself is fairly standard C99 and the tutorial focuses on explaining the usage of
iio_chan_spec, which is used to describe channels to simulate sensors. One interesting thing I
inspected further was the memory allocation function kzalloc - which seems to be the recommended
way to allocate in the kernel for a general situation. The kzalloc simply zeroes the memory region
before returning, which is a thing I started doing in my codebases - all of my high-level allocation
functions already do this. It was also interesting to see the go-to patterns used for error handling
which is quite down-to-earth and easy to follow.
Tutorial 8 - IIO Dummy module Experiment One: Play with iio_dummy
This tutorial was a bit frustrating on the part of installing the newly built module. I managed to correctly install the necessary modules by doing the following:
sudo mkdir ${VM_DIR}/mount
sudo guestmount --rw --add "${VM_DIR}/arm64_img.qcow2" --mount /dev/sda2 "${VM_DIR}/mount"
sudo --preserve-env make -C "${IIO_TREE}" INSTALL_MOD_PATH="${VM_DIR}/mount" modules_install
sudo guestunmount "${VM_DIR}/mount"
I added an analogous code as a utility function in the activate.sh script so that I don't have to
remember how to do it every time it shows up.
In order to have /mnt/iio_experiments/iio/devices present you have to have iio_dummy loaded,
which contradicts the last step of the tutorial where we actually unload iio_dummy before mounting
configfs into /mnt/iio_experiments/.
In the __iio_dummy_read_raw procedure I figured that the correct approach was to substitute the
ret = IIO_VAL_INT line with a simple return IIO_VAL_INT. Analogously, in the
iio_dummy_read_raw I had to substitute ret = IIO_VAL_INT_PLUS_MICRO by return IIO_VAL_INT_PLUS_MICRO.
The remainder of the tutorial went pretty much uneventful.
Moving To guard(mutex)(&lock) and General Locking Considerations
I chose a very simple but interesting patch to tackle, updating iio/drivers/iio/temperature/mlx90614.c
to conform with the new guidance for handling locks in the Linux kernel codebase. The move consists in
changing instances of mutex_lock(&lock) and mutex_unlock(&lock) with the macro guard(mutex)(&lock).
Let's understand what guard really means:
#define CLASS(_name, var) \
class_##_name##_t var __cleanup(class_##_name##_destructor) = \
class_##_name##_constructor
#define guard(_name) \
CLASS(_name, __UNIQUE_ID(guard))
This is a very interesting pattern, the mutex is only but the part of the identifier for the
class_mutex_t and the corresponding procedures class_mutex_constructor and class_mutex_destructor.
The guard package will generate a unique name for the auxiliar variable in order to avoid name
collisions. The __cleanup macro resolves to the compiler-specific attribute __cleanup__, which is
available for Clang and GCC under __attribute__((__cleanup__(cleanup_proc))) where cleanup_proc is
the name of the cleanup procedure you want the compiler to insert at the end of the scope. One
problem with this construct is that it isn't supported by all compilers. For one, MSVC doesn't have
this attribute.
As an example, suppose we have a shared memory arena that will temporarily allocate memory and free that memory at the end of the scope. Since the arena is shared, you don't want anyone touching the arena while you're doing this temporary allocation, otherwise you may cause inconsistencies in the arena offset. To do that, you would write something along these lines:
// Code prior to scope ...
{
guard(work_arena)(&arena);
guard(mutex)(&arena.lock);
// Code within the scope...
}
// Code after scope ...
In GCC, this would be roughly equivalent to the following code:
// Code prior to scope ...
{
class_work_arena_t guard_0 __attribute__((__cleanup__(class_work_arena_destructor))) = class_work_arena_constructor(&arena);
class_mutex_t guard_1 __attribute__((__cleanup__(class_mutex_destructor))) = class_mutex_constructor(&arena.lock);
// Code within the scope...
class_mutex_destructor(&guard_1);
class_work_arena_destructor(&guard_0);
}
// Code after scope ...
After optimisations, the code will indeed be equivalent. However, in non-optimised builds, the compiler will emit a pointer lookup for the destructor procedures, creating one level of indirection in each cleanup call. The cost of this construct is relatively low compared to the cost of the inevitably forgotten cleanup calls.
In the case of mlx90614.c, all instances of mutex lock/unlock pairs have the exact same pattern:
{
// Code before... (A)
mutex_lock(&data->lock);
ret = /* Call some function */;
mutex_unlock(&data->lock);
// Code after... (B)
}
Notice that since we don't have an enclosing scope for this operation, simply using guard would
yield the following non-equivalent code, where the cleanup occurs only after the execution of the
code B:
{
// Code before... (A)
guard(mutex)(&data->lock);
ret = /* Call some function */;
// Code after... (B)
// -> Only here the data->lock would be released (Cleanup)
}
If the code B requires the use of this data->lock, then this substitution shouldn't be done at
all, as it fails to release the lock prior to executing code B. Another case where this substitution
isn't desired is: if code B is expensive, the data->lock will stay in a locked state up until the
end of B, possibly causing unnecessary stalls in other threads. One very simple way of generating
code equivalent to the original while still using guard is:
{
// Code before... (A)
{
guard(mutex)(&data->lock);
ret = /* Call some function */;
// -> Here data->lock would be released (Cleanup)
}
// Code after... (B)
}
Which, in my opinion, should be the way to go in all cases for the patterns found in mlx90614.c,
it's equivalent, readable, and only requires a very simple and localised change. On the other hand,
I'm well aware that many programmers won't agree with me in the creation of this additional scope
within their code, so unfortunately we'll have to review every mutex lock/unlock pair as separate
cases.
For reference, I'm the commit SHA I'm currently in is d2a4ec19d2a2e54c23b5180e939994d3da4a6b91 in
branch togreg.
Code Analysis
Fortunately, in the case of mlx90614.c, the only instances where the change to guard could yield
dead-locks or performance problems are those that immediately call mlx90614_power_put, and the
procedure mlx90614_sleep, which uses locks. All other cases have very simple and small logic after
ending the scope, allowing the change to guard without much hassle.
Investigating mlx90614_power_put further, we find the following:
static inline void pm_runtime_mark_last_busy(struct device *dev) {
WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
}
static inline int __pm_runtime_put_autosuspend(struct device *dev) {
return __pm_runtime_suspend(dev, RPM_GET_PUT | RPM_ASYNC | RPM_AUTO);
}
static inline int pm_runtime_put_autosuspend(struct device *dev) {
pm_runtime_mark_last_busy(dev);
return __pm_runtime_put_autosuspend(dev);
}
static void mlx90614_power_put(struct mlx90614_data *data) {
if (!data->wakeup_gpio) {
return;
}
pm_runtime_put_autosuspend(&data->client->dev);
}
Within pm_runtime_put_autosuspend we call pm_runtime_mark_last_busy, which will set
dev->power.last_busy to the current time. The only possibly expensive call here is
__pm_runtime_suspend:
int __pm_runtime_suspend(struct device *dev, int rpmflags) {
unsigned long flags;
int retval;
if (rpmflags & RPM_GET_PUT) {
retval = rpm_drop_usage_count(dev);
if (retval < 0) {
return retval;
} else if (retval > 0) {
trace_rpm_usage(dev, rpmflags);
return 0;
}
}
might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);
spin_lock_irqsave(&dev->power.lock, flags);
retval = rpm_suspend(dev, rpmflags);
spin_unlock_irqrestore(&dev->power.lock, flags);
return retval;
}
We can see some sleep calls, which should put us in alert, so lets analyse all function calls done within:
rpm_drop_usage_countmanipulatesdev->power.usage_countatomically and is very cheap overall.trace_rpm_usageis compile-time generated, and produces a no-op when tracing is disabled:
Out of curiosity, thisDEFINE_EVENT( rpm_internal, // Template rpm_usage, // Name TP_PROTO(struct device *dev, int flags), // Procedure prototype. TP_ARGS(dev, flags) // );DEFINE_EVENTmacro eventually resolves into calling__DECLARE_TRACE, which is defined as follows:
Upon further inspection, the produced functions are relatively cheap and won't cause us much trouble.#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \ static inline void __do_trace_##name(proto) \ { \ TRACEPOINT_CHECK(name) \ if (cond) { \ guard(srcu_fast_notrace)(&tracepoint_srcu); \ __DO_TRACE_CALL(name, TP_ARGS(args)); \ } \ } \ static inline void trace_##name(proto) \ { \ if (static_branch_unlikely(&__tracepoint_##name.key)) \ __do_trace_##name(args); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ WARN_ONCE(!rcu_is_watching(), \ "RCU not watching for tracepoint"); \ } \ }might_sleep_ifwill never result in a sleep since the__pm_runtime_suspendis called with theRPM_ASYNCenabled inrpmflags.rpm_suspendis a pretty big function, however, due to our staticrpmflagsbeingRPM_ASYNC | RPM_AUTO, we can deduce that the procedure will simply schedule a suspend event to a separate worker threadpm_wq.
Summarizing
In order to summarize the discussion above and concisely explain why the changes do not pose any problems to the codebase, I made this git commit:
iio: temperature: mlx90614: use guard(mutex) for EEPROM access locking
Replace mutex_lock()/mutex_unlock() pairs with guard(mutex) from
cleanup.h for cleaner and safer mutex handling. The lock protects
EEPROM access across five call sites in mlx90614_read_raw(),
mlx90614_write_raw(), and mlx90614_sleep().
In all cases, the code between mutex_unlock() and the end of scope
is either trivial computation, a return statement, or a call to
mlx90614_power_put() (which only schedules a deferred autosuspend
via pm_runtime_put_autosuspend()). The slightly extended lock scope
is negligible compared to the existing hold times, which include
EEPROM write cycles with 40ms sleeps.
Whose corresponding patch looks like this:
diff --git a/drivers/iio/temperature/mlx90614.c b/drivers/iio/temperature/mlx90614.c
index 1ad21b73e1b4..27464e4cc286 100644
--- a/drivers/iio/temperature/mlx90614.c
+++ b/drivers/iio/temperature/mlx90614.c
@@ -23,6 +23,7 @@
*/
#include <linux/bitfield.h>
+#include <linux/cleanup.h>
#include <linux/delay.h>
#include <linux/err.h>
#include <linux/gpio/consumer.h>
@@ -296,10 +297,9 @@ static int mlx90614_read_raw(struct iio_dev *indio_dev,
if (ret < 0)
return ret;
- mutex_lock(&data->lock);
+ guard(mutex)(&data->lock);
ret = i2c_smbus_read_word_data(data->client,
chip_info->op_eeprom_emissivity);
- mutex_unlock(&data->lock);
mlx90614_power_put(data);
if (ret < 0)
@@ -319,10 +319,9 @@ static int mlx90614_read_raw(struct iio_dev *indio_dev,
if (ret < 0)
return ret;
- mutex_lock(&data->lock);
+ guard(mutex)(&data->lock);
ret = i2c_smbus_read_word_data(data->client,
chip_info->op_eeprom_config1);
- mutex_unlock(&data->lock);
mlx90614_power_put(data);
if (ret < 0)
@@ -358,10 +357,9 @@ static int mlx90614_write_raw(struct iio_dev *indio_dev,
if (ret < 0)
return ret;
- mutex_lock(&data->lock);
+ guard(mutex)(&data->lock);
ret = mlx90614_write_word(data->client,
chip_info->op_eeprom_emissivity, val);
- mutex_unlock(&data->lock);
mlx90614_power_put(data);
return ret;
@@ -373,10 +371,9 @@ static int mlx90614_write_raw(struct iio_dev *indio_dev,
if (ret < 0)
return ret;
- mutex_lock(&data->lock);
+ guard(mutex)(&data->lock);
ret = mlx90614_iir_search(data->client,
val * 100 + val2 / 10000);
- mutex_unlock(&data->lock);
mlx90614_power_put(data);
return ret;
@@ -476,12 +473,11 @@ static int mlx90614_sleep(struct mlx90614_data *data)
dev_dbg(&data->client->dev, "Requesting sleep");
- mutex_lock(&data->lock);
+ guard(mutex)(&data->lock);
ret = i2c_smbus_xfer(data->client->adapter, data->client->addr,
data->client->flags | I2C_CLIENT_PEC,
I2C_SMBUS_WRITE, chip_info->op_sleep,
I2C_SMBUS_BYTE, NULL);
- mutex_unlock(&data->lock);
return ret;
}
And the saga continues... stay tuned for the next update :)