GSoC Final Report - 洛的藏书阁

Suddenly, I recalled myself writing the proposal back then, and I couldn’t help but feel a bit wistful. Every time it ends, there’s always an unintentional sense of sadness. After three months, my GSoC project has also come to an end.

The Motivation of The Project

The git-fsck(1) command is mainly used to check the consistency of the object database which misses check for the ref database. Although git-fsck(1) implicitly checks some properties of the ref database when checking connectivity, these checks aren’t sufficient to ensure that all refs are properly consistent like report shows.

The goal of this small GSoC project is to establish the infrastructure for the ref consistency check, enabling developers to easily add checks for both the files backend and the reftable backend. For more detailed information, please refer to the proposal.

The Patches

I began my work by implementing the infrastructure. However, during the review process, the design underwent significant changes. My main focus was on the following tasks:

Refactoring the fsck error messages to make them generic, allowing both the object database and the ref database to use the same fsck error message interfaces to avoid repetition.
Utilizing the existing polymorphism provided by the ref_storage_be structure. For every backend, it needs to provide its own function pointer which would bring a lot of flexibility.
Designing extensible interfaces for checking ref consistency in the files backend.
Adding a ref name consistency check for the files backend.

The ultimate merged patch links are given below:

This series has finally been merged into the master branch.

After establishing the infrastructure, I continue to implement additional checks for the files backend, as shown in the following patch links:

However, this series is still under review, and I plan to continue following up after GSoC.

Challenges

Design Challenge

Implementation is much more complex than refactoring. One of my intuitive feelings is that when designing something myself, I need to consider many factors. However, during the implementation process, I made many mistakes, mainly in two areas:

Over-engineering: I was always concerned about the extensibility of the implementation, worried that poor design would require future refactoring. However, I didn’t realize that this could introduce noise into the code. We should focus solely on the features we need right now.
Inability to balance between building something that can be easily extended in the future and something that is specific to the task at hand.

Here are the challenges I encountered during the implementation process:

Whether We Need To Refactor “fsck_options”

There are many fields in fsck_options that are specific to the object database. My first mistake was trying to make fsck_options contain the general options and sub-structures, which is what I attempted in the previous patch.

The git-fsck(1) focuses on object database consistency check. It relies on the “fsck_options” to interact with fsck error levels. However “fsck_options” aims at checking the object database which contains a lot of fields only related to object database.

In order to add ref operations, create a new struct named “fsck_refs_options” and a new struct named “fsck_objs_options”. Remove object-related fields from “fsck_options” to “fsck_objs_options”. Change the “fsck_options” with three parts of members:

The “fsck_refs_options”.

The “fsck_objs_options”.

The common settings both for refs and objects. Because we leave common settings in “fsck_options”. The setup process could be fully reused without any code changing.

While this approach may seem natural, it introduced a lot of complexity because there is a significant amount of existing code that accepts struct fsck_options * as a parameter. As a result, we had to deal with a lot of code that was not directly related to our main goal. In the final implementation, I simply added a verbose field to the fsck_options.

Junio gave me a wonderful suggestion:

Just like premature optimization is bad, premature factoring and over-modularization is bad.

How to Reuse “report” Function

To adapt to the existing fsck error levels, my initial approach was to create a new fsck_report_ref function that was separate from the object report function report. However, Patrick, Karthik, and Junio disagreed with this design, suggesting that I should reuse the report function.

The report function is closely tied to object reporting. At the time, I considered adding parameters to the report function and its corresponding callback function error_func to make their prototypes consistent. However, I was not satisfied with this solution because we couldn’t predict whether additional parameters would be needed for other ref checks in the future, which would result in poor extensibility.

But Patrick gave me a wonderful suggestion which eventually solved the problem:

A better design would likely be to make error_func() receive a void* pointer such that error_func() and then have the respective subsystems provide a function that knows to format the message while receiving either a struct fsck_object_report * or a struct fsck_ref_report *.

Long Review Duration

Building the ref consistency check infrastructure posed challenges not only in terms of design and coding but also due to the lengthy review process. This created a significant mental burden for me during the middle of GSoC. This is what I have recorded in my GSoC Week 7 blog:

The recent challenges I’ve encountered mainly stem from two aspects. Over the past two weeks, I’ve felt mentally exhausted because I haven’t received much positive feedback. Since May 30th, my first patch is still under review. Sometimes, I can’t help but feel the pressure from my peers. Seeing other GSoC participants successfully merge several patches does indeed make me feel pressured. Therefore, I realize that I must learn how to adjust my mindset during prolonged review periods.

After GSoC

After GSoC, I decided to remain involved with the Git community to continue the work I need to complete. My tentative roadmap is as follows:

Implement the pack-refs consistency check for the files backend.
Implement the reflogs consistency check for the files backend.
Implement the consistency check for the reftable backend.
Separate the object database check logic from git-fsck(1).
Enhance the git-fsck(1) command to allow users to easily disable subprocess checks, providing greater flexibility.

There is a lot to accomplish, which is exciting!

Closing Remarks

During my GSoC journey, besides improving my coding skills, I believe the greatest gain was strengthening connections between people. I established a good relationship with my two mentors, Patrick and Karthik. Initially, I rarely conversed in English with others, and during my first video meeting with Patrick, I was quite nervous, worried about my limited English speaking skills.

Additionally, I feel that I have built a good relationship with the Git community. Through an open-source project, I experienced the charm of open source, as it connects people who might not have otherwise crossed paths in life.

I am especially grateful to my two mentors from GitLab, Patrick and Karthik. They provided me with a lot of guidance, embodying the role of a teacher who imparts knowledge and resolves doubts. I also want to thank Junio, Eric, and Justin for their meticulous and detailed work during the code review process, which made the final implementation exceptionally elegant.

I also feel quite proud of myself for being able to implement a new feature for foundational software used by so many people, something I once thought could only happen in my dreams. I am very grateful for this experience and hope that I can become a mentor in the future, igniting the flame in others’ hearts.

Contents