Documenting, a task that is perceived as “expensive” but that in the long term is profitable.
One of the main reasons why documenting is a task that is perceived as “expensive” in time and effort lies in our concept of generating it. This is linked to bad experiences we may have suffered when creating or consuming it.
When we think about documentation, we associate in our mind the need to receive or create a document. This document will be a “deliverable” in docx, pdf or similar format.
If said document is written with care, that is, without an obvious hurry to finish it after having left it for the final phase of the project, we will ensure that said document is kept updated in time. To this end, a table is usually included at the beginning of it, to record the changes that occur in it.
Once the project has been delivered and completed (at least in principle) in our mind, only hope will be disguised as an expectation that this document will be updated every time a change or update is made to our product. But this rarely happens and we know it.
We have several problems with this way of working:
- The documentation is in a final format that can be copied, altered and edited in one of its copies. This causes us insecurity because we don’t know for sure that we have the most up-to-date version of the document.
- The format sometimes forces us to consume critical information with a specific software tool. And also to duplicate the information if we must deliver the documentation to different recipients in different formats, making the management and control of the versions more complex.
- The documentation is linked to a deliverable. If the person who manages the project does not have enough experience or the technological culture of the company does not give rise to include documentation requirements in the definition of done of a task, it is possible that at best, this work is in the final phase in the Gantt (or in a false “agile sprint”) of the project. This phase of “documentation and tests” usually coincides with delays in functionality compromised on dates making it the worst time to write something with a minimum of love.
- Another possibility is that our documentation is the result of an analysis carried out in a phase prior to the start of the project. This document can give us a certain context of the origin but it can never give us any kind of reliability that will help us for something after finishing the development due to the nature of the software development. We will never be safe from changes, redefinitions and continuous discoveries.
- This documentation will be far from the code of our project, so over time, the only real point will be the code we have in production. We will lose context of the reason for the implementation and the real impact on the business and other components of the architecture or software products with which our development communicates or integrates.
It is very easy for our documentation to cool
Let’s give a typical example in which we get a specification in docx format to build a service that will be available through a REST API.
From this document, we create the specification in Open Api and through it, we begin to develop the service.
Our front partners use that specification to generate mocks automatically and not have to wait for us to finish the service to start building your progressive web application.
It turns out that we realize that there was no need for the behavior of our API.
We apply the change in the code and in the specification so that our front partners can regenerate their mocks and work with this change in mind.
When you do this, our initial specification automatically becomes obsolete.
At this time, our only point of information lies in the Open API specification and not in the document. However, the Open API collects certain information, but not all, losing information about business behavior and requirements.
The situation becomes much worse if we do not even have the Open API specification.
When the years pass and another development team that does not have all the information that has been omitted during that time assumes the development of the same, they will have no choice but to become arched from the software.
Software archaeology or software archeology is the study of poorly documented or undocumented legacy software implementations, as part of software maintenance.
Software archaeology, named by analogy with archaeology, includes the reverse engineering of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information.
If we have been developing software for a while, surely we have inherited more than once a project without documentation or worse, with unreliable documentation. In other situations, we must migrate an existing development to be able to update it, improve its maintenance or perhaps to be able to include some new functionality without impacting the rest of the functionalities or the system of which that part is part.
The only solution to know for sure what this software does and what its impact is globally, is to study its entrails and start covering with black box tests to gain control and shared understanding of the team on it.
Needless to say, this is an expensive process. Even with the idea of creating a refactoring strategy or in the most extreme case, the total rewriting of the project, that time of analysis can become more expensive in time or resources than the execution of the project itself. In spite of everything, this investment will be necessary to avoid encountering the same situation in the future.
In these situations, the value of having a well-updated documentation and eliminating all “island of knowledge” is clear. But this is not easy and implies changes of culture in the work of the team.
How to start
At all times I am referring to a culture change. Changing a culture is not easy.
A culture change involves thinking in the long term, being aligned as a team and having room for action. Not all environments are favorable.
Even in real agile environments, it is difficult to add a culture of input documentation and no preparation. That is why I consider it key to introduce active documentation in the processes of Continuous Integration.
Once this is achieved, we can improve and introduce more mechanisms, roles and protocols to improve our documentation and team culture.
Introducing active documentation pipelines
Active documentation allows us to achieve several things:
- Generate documentation automatically from specifications or contracts that we already have to define for our software to work or integrate with other systems.
- Have a single point of truth that is constantly updated.
- Do not depend on final deliverable formats.
- That our documentation is part of our continuous integration and delivery cycle.
Markdown as central format
Markdown offers us many advantages to be used as documentation:
- It is format independent, it is a simple markup language based on plain text.
- There are many utilities to generate deliverables in PDF, Docx, HTML.
- There are mature CMS options that allow us to generate documentation portals from Markdown files.
Both written documentation and active documentation go through the same pipeline that would enter a Continuous Integration cycle.
In this way we can automatically regenerate both the documentation portal and the deliverable formats in each update of the code and documentation that merge in the master branch (or development, according to our strategy).
We can complicate the pipeline as much as we want, for example by including the indexing of all documentation in a centralized search service to find any information more easily.
Each pipeline will depend on the specific needs of each team.
Other non-active documentation
README files in repositories are an opportunity to give relevant information at a glance at each code repository. Caring for them is essential, and not only in Open Source projects.
Depending on our needs and the project it may be interesting to include them in our pipeline.
Glossaries arise from the need to document terms not only to have an understanding shared by our team and other teams from other areas of the company, but also to model and name domain aspects of our software.
Managing glossaries is not a simple task since in addition to exposing what each term is about, it is necessary to detail in which contexts it is used and what are the terms to which it is equivalent in others.
If we start from scratch, it is possible that our glossary will be easier to compose in the case of having a discipline or procedure to document and agree on it as doubts arise.
In contexts in which we start from a software in production in which there has not been an exercise in documenting the glossary, we will have to compile terms to which our software is already coupled, document them and associate them with the terms that we finally agree on the equipment.
Conventions and nomenclature
One of the best practices I have seen is to develop conventions when naming different aspects of our software, documenting these decisions and communicating them to other teams so that they also follow them.
For example, if all Kafka topics or Mongo collections follow the same name structure, including useful information (depending on our interests) such as the version and name of the environment, the impact on the day to day in the medium and long term is very important both for our team and for other teams with whom we collaborate.
We have talked about many things in this article and they are not free from challenges.
The main challenge of using contracts and specifications is to manage their versioning.
The versioning involves a lot of work and sometimes extra complexity since as a general rule we must maintain compatibility with clients or other components that use previous versions and that may not always have the update rate that we have.
However, it also gives us much more control and reduces the problems of long-term escalation.
Implications when sharing the documentation repository with the code repository
We have to keep in mind that perhaps our documentation undergoes changes but our code does not.
In some contexts, we have to manage this kind of thing so that the cycle of generation of an artifact with a new version is not triggered for each change of documentation integrated in our repository.
There are approaches such as separating documentation in a separate branch or using wiki systems that include services such as Github or Gitlab.
Ensure maintenance of non-active documentation
Sooner or later we have to ensure that the non-active documentation is up to date.
This is one of the great challenges we have when documenting and so the best strategy should be to make the most of every resource that can serve as active documentation.
Sooner or later we will have to assess whether we need to include roles in our team that are responsible for this work and even have full dedication to maintain the documentation.
This will depend a lot on our company, projects and needs.