Continuous Integration - Version Control, Code Reviews, LGTM

Continuous integration solves the first primary goal - allow large groups to add code to the codebase while organizing the different versions and maintaining the code structure / documentation.

A Small Anecdote:

Like we said before, this might seem trivial, but your particular setup can make or break your company within a year.

Consider a startup with a project of 1000 lines of code. At this scale, the "latest" version of the code might just be whatever the founder has on his or her laptop. Changes are easy and instant, and the only standard and decision is the founder's standard and decision.

But with a good idea, now those 1000 lines are on server running 100,000 lines of code with 8 other employees changing the code every day. So, who has the "latest" version? How does each developer reconcile their code with the main copy? Whose job is it to keep the master working and up to date? Even at this scale, the problem can be daunting, but for the sake of argument, let's say... maybe you tough it out and find a "system".

Now you're a 1000+ employees "startup" where every engineer uses your "system" to keep their code in sync. It takes 6 hours of everyone's day to keep this process going and wastes a lot of time and money as new employees onboard to this "system" and old employees wrangle 200 versions of the code at varying levels of functionality. Maybe you could find a better way to do this? But how do you convince 1000+ engineers to move all their work to the new system? How could you reconcile all the underlying discrepancies of your old system?

Version Control

NOTE: As we discuss Continous Integration, we'll be making general references to the basics of Git and GitHub. If you're a bit rusty on the concepts, I'd recommend you follow our other guide on the subject here.

Version control programs are designed to reconcile code changes to some master copy of the project. You might be familiar with pulls, commits, and pushes, but the true power of version control comes from the branch functionality. You can think of branches as alternative versions of your code.

Consider the git command git branch experiment. This command creates a separate branch (copy) of the current code called "experiment". We can open this branch with git checkout experiment. At first, the code will look the same since, as a new branch, there are no differences to the originating branch. However, all git changes will now register on this new copy alone - if you make changes and switch back to the original branch with git checkout <original branch name>, you'll see that your change no longer appear - you can still return to those changes by checking out the experiment branch.

In a typical git configuration, one branch is assumed to be the ground truth for the project - usually called "main" or, in older repositories, "master". The main pattern for most repositories is that most other branches are either:

Previous versions of the code that are kept for posterity
By making a separate branch, it become very convient to bring out older versions for comparisons, compatability testing, other letting users pick out older interfaces for their own code
Experimental versions of the code that expand on the main branch features
This can be a whole new version of the project - a "dev" branch that will eventually be made into main - or indiviual features (a single bug fix or similar)

That being said, this is not set in stone - the purpose of a particular branch is ultimately definte by how you use it. Wheter that be a small experimental implementation or a code restructure.

Follow Along - Branching

Let try an example.

Clone the following repository and open a new branch with the given command:
```
git clone <add sample repo here>
git checkout -b test
```

You may note that we both create and checkout the branch by using the -b flag

Make some changes to the code and commit it
Revert to the original branch with the following:
```
git checkout main
```

You should see that your branch changes have disappeared. We can checkout the branch to revist those changes but now we need to figure out how to get those branch changes back into main.

Pull Requests (PRs)

These branches are all well and good, but now we need to consider how we can incorporate those changes into main.

While it is bad practice, we can always just merge the branches locally with git merge other-branch while being in the main branch and then push the changes. While this may be enough for a small team, the problem is that if anyone can merge at anytime, tracking the ground truth code can become a mess (remember, it not just a team of engineers that may be contending the main branch, it could be the entire company worth of engineers).

The main fix for this, is make merging a global repository process. This there where pull requests (PRs) step in.

In general, PRs are essentially formal requests on the global repository to merge some branch into another (usually "main" or a major version of the system). Exact practices and expectations differ, but a good PR should include a good summary of what changes are made in the branch, what those branches solves, and why it matters for those changes to be made. After this, other developers on the repository can review the branch and consider the merits of your code before it is merged.

And that's all there is on the fundamentals. It may seem simple, but when well organized, this can be extremely powerful. Consider our original company example from the beginning. A company with 1000 engineers working on a single codebase would normally be a mess. But if those engineers are each assigned their own sub-branch of the main code that focuses on developping a single new feature, these teams don't have to collide until their own feature is ready. At that point, there will be extra work that must be put into the merging process but in this way the process remains orgainized. We can bring this model to a lower level - given a team of engineers, the team manager can assign each member to small tasks that contribute to a major feature. Each engineer can make their own branch and then merge to the team's version of the feature code. You can see that this method lends it's self very well to a hierarchical structure that most companies have and can scale through the level of the company.

This being said, modern pipelines have found that it is better to keep changes small than to have really large diverging branches unless those branches will never be merge again (effectively different versions of the project as a whole). If individual PR changes are kept small and relatively self-contained, we can minimize the overhead it take to ensure that there are minimal merge conflicts and integration errors.

Note: Branch Security

It's worth mentioning that all these systems are worthless unless you can enforce PRs as the main method of merging. Luckily, this is simple to do for most git hosts. Normally, the host will have a settings which can apply on particular branches. These make make it such that developers can't directly push to the main branch and must make changes through PRs. Additionally, rules can also be applied so that a PR must be approved by another developer (usually some manager or admin) before the merge can really happen.

PR Automation

All this being said, you might notice that there are a lot of engineer-hours involved in this process with all the merging and PR reviews involved. Good organization can keep this minimal with by give a structure for delagation (managers and/or department heads responsible for certian branches or aspects of the code), but how can we improve this?

Unsurprisingly, engineers answered with automation. However, this was the best kind - the kind that eleiviates the tedious, repetitive tasks and leaves the important judgement calls for the user.

An easy example is "linting". Linting is the process of cleaning up code styling - keeping tabs or spaces only, indentation, variable naming policies, etc. There are various automatic linting programs that exist and these can be integrated into git systems to automatically request changes on style as a new PR requirement - ensuring code meets a bare minimum standard. However, it is worth noting that this does not gauage the semantics of the code structure and variable - there is still a need for human eyes on the code.

This kind of process can be applied to various parts of the code including testing frameworks where the code must pass testcases before passing PRs, to automatic delatgation of PR reviews to automatic cross compile build to ensure compatability with various hardware. PR automation allows developers to allocate the more baseline integration checks making the testing, styling, and building decisions more about the high-level design and workflows.