Mugnaini's blog

This post is a collection of reports on the tutorials provided for the MAC5856 class for Open Source Development. As such, this post will be continuously edited to account for my latest experiences with the tutorials.

More specifically, the tutoral pages provide a guide on how to setup a development environment for building and developing the Linux kernel.

After years of being an Arch Linux user, I've opted for going with a Fedora installation for a change. This required some small changes with respect to the tutorials - all of them are documented below.

Tutorial 1 - QEMU and Libvirt Setup

This first tutorial focuses on the creation of the basis for the development setup, such as the installation of all necessary packages for virtualisation.

Although the tutorial uses the libvirt-qemu user for the directory ownership, I had to go with the qemu user in Fedora.
To avoid having to use sudo every time I use virsh, I opted for creating a tiny wrapper virshh in the activate.sh script that solves this:
```
function virshh() {
    virsh --connect "qemu:///system" $@
}
export -f virshh
```
Now instead of using virsh I use virshh and it's working flawlessly so far.

As many other students reported, I also had to install the openssh-server package in the VM:

apt update && apt install openssh-server
    systemctl enable ssh && systemctl start ssh && systemctl status ssh

Tutorial 2 - Building and Booting The Kernel

I wasn't able to use rsync to fetch the vm_mod_list we created in the last tutorial. For that I had to install rsync in the VM first.

I had a very annoyoing problem when creating the new VM: the system would never completely boot. Instead virt-install would halt very early and never finish the creation of the VM itself. After many attempts, this version worked:

export img_file="${VM_DIR}/arm64_img.qcow2"
export root_vda="/dev/vda2"
export kernel_file="${IIO_TREE}/arch/arm64/boot/Image"
export initrd_file="${BOOT_DIR}/initrd.img-6.1.0-44-arm64"
export vm_name="arm64"

function create-vm-virsh() {
    sudo virt-install \
        --name ${vm_name} \
        --memory 2048 \
        --arch aarch64 \
        --machine virt \
        --osinfo detect=on,require=off \
        --import \
        --features acpi=off \
        --disk path=${img_file},bus=virtio,format=qcow2 \
        --boot kernel=${kernel_file},initrd=${initrd_file},kernel_args="loglevel=8 root=${root_vda} rootwait" \
        --network bridge:virbr0 \
        --graphics none
}

The problem seems to be that the device mapping was not going as expected but adding the bus=virtio,format=qcow2 did the job.

Tutorial 3 - Modules and Build Config

Pretty much uneventful tutorial, very easy to follow. For some reason I had to restart the VM network system with

systemctl restart systemd-networkd

Because virsh net-dhcp-leases default was failing to find any open connections even though its networking capabilities were on.

Tutorial 4 - Device Characters

My only trouble in this tutorial was cross-compiling the host programs. Fedora doesn't ship a complete aarch64-compliant toolchain like Debian, so the easiest way around it (without manually installing an arm64-sysroot) was to use the Zig cross-compiler - which is honestly a shame for C/C++ compilers as it should've been easier with them anyway... Here is the simple command that made it work:

zig cc -target aarch64-linux-gnu read_prog.c -o read_prog

Tutorial 5 - Sending Patches By Email With Git

I only needed to follow this setup because of my new Linux setup, however I've already had experience setting the git email support before, so it went quite smoothly using the Thunderbird client setup.

Tutorial 6 - Sending Patches With Git and a USP Email

In this tutorial I first tried following option A (using git-credential-email) however when I try to authenticate, I get an error saying that "DSL 2026 OAuth Proxy has not completed the Google verification process". Since I don't have access to the "DSL 2026 OAuth Proxy" project, I decided to go with option B - which envolves using email-oauth2-proxy.

I didn't have Docker installed so I needed to configure the following:

# Install dependencies.
sudo dnf install docker docker-compose

# Start and enable the daemon.
sudo systemctl start docker
sudo systemctl enable docker

# Add yourself to the auto-created docker group and log into the group.
sudo usermod -aG docker $USER
newgrp docker

Tutorial 7 - The iio_simple_dummy Anatomy

In this tutorial we set out to start understanding the inner workings of the module iio_dummy. The code itself is fairly standard C99 and the tutorial focuses on explaining the usage of iio_chan_spec, which is used to describe channels to simulate sensors. One interesting thing I inspected further was the memory allocation function kzalloc - which seems to be the recommended way to allocate in the kernel for a general situation. The kzalloc simply zeroes the memory region before returning, which is a thing I started doing in my codebases - all of my high-level allocation functions already do this. It was also interesting to see the go-to patterns used for error handling which is quite down-to-earth and easy to follow.

Tutorial 8 - IIO Dummy module Experiment One: Play with iio_dummy

This tutorial was a bit frustrating on the part of installing the newly built module. I managed to correctly install the necessary modules by doing the following:

sudo mkdir ${VM_DIR}/mount
sudo guestmount --rw --add "${VM_DIR}/arm64_img.qcow2" --mount /dev/sda2 "${VM_DIR}/mount"
sudo --preserve-env make -C "${IIO_TREE}" INSTALL_MOD_PATH="${VM_DIR}/mount" modules_install
sudo guestunmount "${VM_DIR}/mount"

I added an analogous code as a utility function in the activate.sh script so that I don't have to remember how to do it every time it shows up.

In order to have /mnt/iio_experiments/iio/devices present you have to have iio_dummy loaded, which contradicts the last step of the tutorial where we actually unload iio_dummy before mounting configfs into /mnt/iio_experiments/.

In the __iio_dummy_read_raw procedure I figured that the correct approach was to substitute the ret = IIO_VAL_INT line with a simple return IIO_VAL_INT. Analogously, in the iio_dummy_read_raw I had to substitute ret = IIO_VAL_INT_PLUS_MICRO by return IIO_VAL_INT_PLUS_MICRO.

The remainder of the tutorial went pretty much uneventful.

Moving To guard(mutex)(&lock) and General Locking Considerations

I chose a very simple but interesting patch to tackle, updating iio/drivers/iio/temperature/mlx90614.c to conform with the new guidance for handling locks in the Linux kernel codebase. The move consists in changing instances of mutex_lock(&lock) and mutex_unlock(&lock) with the macro guard(mutex)(&lock). Let's understand what guard really means:

#define CLASS(_name, var)						                    \
	class_##_name##_t var __cleanup(class_##_name##_destructor) =   \
		class_##_name##_constructor

#define guard(_name) \
	CLASS(_name, __UNIQUE_ID(guard))

This is a very interesting pattern, the mutex is only but the part of the identifier for the class_mutex_t and the corresponding procedures class_mutex_constructor and class_mutex_destructor. The guard package will generate a unique name for the auxiliar variable in order to avoid name collisions. The __cleanup macro resolves to the compiler-specific attribute __cleanup__, which is available for Clang and GCC under __attribute__((__cleanup__(cleanup_proc))) where cleanup_proc is the name of the cleanup procedure you want the compiler to insert at the end of the scope. One problem with this construct is that it isn't supported by all compilers. For one, MSVC doesn't have this attribute.

As an example, suppose we have a shared memory arena that will temporarily allocate memory and free that memory at the end of the scope. Since the arena is shared, you don't want anyone touching the arena while you're doing this temporary allocation, otherwise you may cause inconsistencies in the arena offset. To do that, you would write something along these lines:

// Code prior to scope ...

{
    guard(work_arena)(&arena);
    guard(mutex)(&arena.lock);

    // Code within the scope...
}

// Code after scope ...

In GCC, this would be roughly equivalent to the following code:

// Code prior to scope ...

{
    class_work_arena_t guard_0 __attribute__((__cleanup__(class_work_arena_destructor))) = class_work_arena_constructor(&arena);
    class_mutex_t      guard_1 __attribute__((__cleanup__(class_mutex_destructor)))      = class_mutex_constructor(&arena.lock);

    // Code within the scope...

    class_mutex_destructor(&guard_1);
    class_work_arena_destructor(&guard_0);
}

// Code after scope ...

After optimisations, the code will indeed be equivalent. However, in non-optimised builds, the compiler will emit a pointer lookup for the destructor procedures, creating one level of indirection in each cleanup call. The cost of this construct is relatively low compared to the cost of the inevitably forgotten cleanup calls.

In the case of mlx90614.c, all instances of mutex lock/unlock pairs have the exact same pattern:

{
    // Code before... (A)

    mutex_lock(&data->lock);
    ret = /* Call some function */;
    mutex_unlock(&data->lock);

    // Code after... (B)
}

Notice that since we don't have an enclosing scope for this operation, simply using guard would yield the following non-equivalent code, where the cleanup occurs only after the execution of the code B:

{
    // Code before... (A)

    guard(mutex)(&data->lock);
    ret = /* Call some function */;

    // Code after... (B)

    // -> Only here the data->lock would be released (Cleanup)
}

If the code B requires the use of this data->lock, then this substitution shouldn't be done at all, as it fails to release the lock prior to executing code B. Another case where this substitution isn't desired is: if code B is expensive, the data->lock will stay in a locked state up until the end of B, possibly causing unnecessary stalls in other threads. One very simple way of generating code equivalent to the original while still using guard is:

{
    // Code before... (A)

    {
        guard(mutex)(&data->lock);
        ret = /* Call some function */;
        // -> Here data->lock would be released (Cleanup)
    }

    // Code after... (B)
}

Which, in my opinion, should be the way to go in all cases for the patterns found in mlx90614.c, it's equivalent, readable, and only requires a very simple and localised change. On the other hand, I'm well aware that many programmers won't agree with me in the creation of this additional scope within their code, so unfortunately we'll have to review every mutex lock/unlock pair as separate cases.

For reference, I'm the commit SHA I'm currently in is d2a4ec19d2a2e54c23b5180e939994d3da4a6b91 in branch togreg.

Code Analysis

Fortunately, in the case of mlx90614.c, the only instances where the change to guard could yield dead-locks or performance problems are those that immediately call mlx90614_power_put, and the procedure mlx90614_sleep, which uses locks. All other cases have very simple and small logic after ending the scope, allowing the change to guard without much hassle.

Investigating mlx90614_power_put further, we find the following:

static inline void pm_runtime_mark_last_busy(struct device *dev) {
	WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
}

static inline int __pm_runtime_put_autosuspend(struct device *dev) {
	return __pm_runtime_suspend(dev, RPM_GET_PUT | RPM_ASYNC | RPM_AUTO);
}

static inline int pm_runtime_put_autosuspend(struct device *dev) {
	pm_runtime_mark_last_busy(dev);
	return __pm_runtime_put_autosuspend(dev);
}

static void mlx90614_power_put(struct mlx90614_data *data) {
	if (!data->wakeup_gpio) {
		return;
    }
	pm_runtime_put_autosuspend(&data->client->dev);
}

Within pm_runtime_put_autosuspend we call pm_runtime_mark_last_busy, which will set dev->power.last_busy to the current time. The only possibly expensive call here is __pm_runtime_suspend:

int __pm_runtime_suspend(struct device *dev, int rpmflags) {
	unsigned long flags;
	int retval;

	if (rpmflags & RPM_GET_PUT) {
		retval = rpm_drop_usage_count(dev);
		if (retval < 0) {
			return retval;
		} else if (retval > 0) {
			trace_rpm_usage(dev, rpmflags);
			return 0;
		}
	}

	might_sleep_if(!(rpmflags & RPM_ASYNC) && !dev->power.irq_safe);

	spin_lock_irqsave(&dev->power.lock, flags);
	retval = rpm_suspend(dev, rpmflags);
	spin_unlock_irqrestore(&dev->power.lock, flags);

	return retval;
}

We can see some sleep calls, which should put us in alert, so lets analyse all function calls done within:

rpm_drop_usage_count manipulates dev->power.usage_count atomically and is very cheap overall.

trace_rpm_usage is compile-time generated, and produces a no-op when tracing is disabled:

DEFINE_EVENT(
    rpm_internal, // Template
    rpm_usage, // Name
    TP_PROTO(struct device *dev, int flags), // Procedure prototype.
    TP_ARGS(dev, flags) //
);

Out of curiosity, this DEFINE_EVENT macro eventually resolves into calling __DECLARE_TRACE, which is defined as follows:

#define __DECLARE_TRACE(name, proto, args, cond, data_proto)			\
	__DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \
	static inline void __do_trace_##name(proto)			\
	{								\
		TRACEPOINT_CHECK(name)					\
		if (cond) {						\
			guard(srcu_fast_notrace)(&tracepoint_srcu);	\
			__DO_TRACE_CALL(name, TP_ARGS(args));		\
		}							\
	}								\
	static inline void trace_##name(proto)				\
	{								\
		if (static_branch_unlikely(&__tracepoint_##name.key))	\
			__do_trace_##name(args);			\
		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
			WARN_ONCE(!rcu_is_watching(),			\
				  "RCU not watching for tracepoint");	\
		}							\
	}

Upon further inspection, the produced functions are relatively cheap and won't cause us much trouble.

might_sleep_if will never result in a sleep since the __pm_runtime_suspend is called with the RPM_ASYNC enabled in rpmflags.
rpm_suspend is a pretty big function, however, due to our static rpmflags being RPM_ASYNC | RPM_AUTO, we can deduce that the procedure will simply schedule a suspend event to a separate worker thread pm_wq.

Summarizing

In order to summarize the discussion above and concisely explain why the changes do not pose any problems to the codebase, I made this git commit:

iio: temperature: mlx90614: use guard(mutex) for EEPROM access locking

Replace mutex_lock()/mutex_unlock() pairs with guard(mutex) from
cleanup.h for cleaner and safer mutex handling. The lock protects
EEPROM access across five call sites in mlx90614_read_raw(),
mlx90614_write_raw(), and mlx90614_sleep().

In all cases, the code between mutex_unlock() and the end of scope
is either trivial computation, a return statement, or a call to
mlx90614_power_put() (which only schedules a deferred autosuspend
via pm_runtime_put_autosuspend()). The slightly extended lock scope
is negligible compared to the existing hold times, which include
EEPROM write cycles with 40ms sleeps.

Whose corresponding patch looks like this:

diff --git a/drivers/iio/temperature/mlx90614.c b/drivers/iio/temperature/mlx90614.c
index 1ad21b73e1b4..27464e4cc286 100644
--- a/drivers/iio/temperature/mlx90614.c
+++ b/drivers/iio/temperature/mlx90614.c
@@ -23,6 +23,7 @@
  */

 #include <linux/bitfield.h>
+#include <linux/cleanup.h>
 #include <linux/delay.h>
 #include <linux/err.h>
 #include <linux/gpio/consumer.h>
@@ -296,10 +297,9 @@ static int mlx90614_read_raw(struct iio_dev *indio_dev,
 		if (ret < 0)
 			return ret;

-		mutex_lock(&data->lock);
+		guard(mutex)(&data->lock);
 		ret = i2c_smbus_read_word_data(data->client,
 					       chip_info->op_eeprom_emissivity);
-		mutex_unlock(&data->lock);
 		mlx90614_power_put(data);

 		if (ret < 0)
@@ -319,10 +319,9 @@ static int mlx90614_read_raw(struct iio_dev *indio_dev,
 		if (ret < 0)
 			return ret;

-		mutex_lock(&data->lock);
+		guard(mutex)(&data->lock);
 		ret = i2c_smbus_read_word_data(data->client,
 					       chip_info->op_eeprom_config1);
-		mutex_unlock(&data->lock);
 		mlx90614_power_put(data);

 		if (ret < 0)
@@ -358,10 +357,9 @@ static int mlx90614_write_raw(struct iio_dev *indio_dev,
 		if (ret < 0)
 			return ret;

-		mutex_lock(&data->lock);
+		guard(mutex)(&data->lock);
 		ret = mlx90614_write_word(data->client,
 					  chip_info->op_eeprom_emissivity, val);
-		mutex_unlock(&data->lock);
 		mlx90614_power_put(data);

 		return ret;
@@ -373,10 +371,9 @@ static int mlx90614_write_raw(struct iio_dev *indio_dev,
 		if (ret < 0)
 			return ret;

-		mutex_lock(&data->lock);
+		guard(mutex)(&data->lock);
 		ret = mlx90614_iir_search(data->client,
 					  val * 100 + val2 / 10000);
-		mutex_unlock(&data->lock);
 		mlx90614_power_put(data);

 		return ret;
@@ -476,12 +473,11 @@ static int mlx90614_sleep(struct mlx90614_data *data)

 	dev_dbg(&data->client->dev, "Requesting sleep");

-	mutex_lock(&data->lock);
+	guard(mutex)(&data->lock);
 	ret = i2c_smbus_xfer(data->client->adapter, data->client->addr,
 			     data->client->flags | I2C_CLIENT_PEC,
 			     I2C_SMBUS_WRITE, chip_info->op_sleep,
 			     I2C_SMBUS_BYTE, NULL);
-	mutex_unlock(&data->lock);

 	return ret;
 }

And the saga continues... stay tuned for the next update :)

MAC5856 - Road To The Linux Kernel