This document describes the considerations that you should keep in mind before starting to use branching in the Virtual DataPort integration with Version Control Systems.
Virtual DataPort allows integration with a Version Control System. You can review this functionality in the Version Control Systems Integration section of the Virtual DataPort Administration Guide. Denodo supports integration with Subversion (SVN), Git and Microsoft TFS VCS servers.
When developing in Denodo using VCS integration some users have to define multiple teams working on the same project but in different areas of the project. Those teams want to work in a collaborative environment (doing push to the repository that those changes can use by their team members) but without affecting the other teams.
To set up this, the first thing that comes to mind of users that have experience with code development and VCS is to define branches following these steps:
- To define one branch for each team in the git repository.
- To have the developers on each team synchronizing just with their branch.
- To merge the branch with the master branch once the feature is developed.
This approach, traditionally followed in code development, can be applied to Denodo development, but some considerations need to be taken into account regarding the usages of branches when working with VCS in Denodo.
The important part here is to keep in mind that the merge between branches needs to be done by an external tool. Virtual DataPort integration with Version Control Systems does not provide that functionality.
So we recommend reviewing this document in order to decide if using branches fits your scenario based on the limitations described below.
The most common problem that can happen when working with multiple branches is that you can get to a situation where the VQL generated after doing the merge between several branches is incorrect and the resolution of those conflicts is complex.
The reason for this is that the VQL is not code. You cannot compile it to validate it. The problems are the changes to elements that have implications in other elements. This same problem will happen while working with any conventional database, not just with VQL and Denodo. The DDL sentences for your SQL need to be correct in order to load them into the RDBMS.
When importing a VQL in a Virtual DataPort server, the server easily controls the change propagation and detects the problems, but doing the merge with an external tool and only looking at the VQL is not that easy.
Let’s explain this with a simple example. Imagine that we have 2 views, (department and employee) and 2 branches are created for 2 different features:
- project1_branch_feature1: renames one of the employee fields and creates a new view on top of the department and employee views.
- project1_branch_feature2: creates a new view on top of the department and employee views.
In the next step, an integrator will merge both branches with the integration branch.
This merge can happen without any conflict, but that there are no conflicts during the merge does not mean that everything is fine. In this scenario, the views created in project1_branch_feature2 on top of the employee table will fail because they are using a field that does not exist anymore (it was renamed by project1_branch_feature1). This means that the pull of the integration branch is going to fail because some elements cannot be loaded (missing field) and the VQL will need to be fixed from an external tool to solve this problem.
This is the most simple situation that can happen, but the more complicated your merge gets, the more difficult it will be to review the impact of the changes in an external tool. This becomes even more difficult if external jars or global elements like i18n maps are used.
Branching Best Practices
The recommendations to minimize these problems are:
- Branches only add new elements and there is no intersection between the elements added by the different branches. This can be achieved by following a good set of naming conventions.
- If each project depends on common projects and there is a need to have common projects in branches, then each developer should work on their own server.
- Do merges on a frequent basis in order to minimize the conflicts that you can find.
- When conflicts are found, the integrator should coordinate the responsible person for each branch in order to avoid those problems.
Alternatives to the usage of branches
One reason for deciding using branches is to let the developers work on some functionality in a collaborative environment without affecting other users. There is one alternative for achieving that objective without using branches for those scenarios where the limitations described above prevent the usage of branches.
Use the same branch for the whole project (it can be the master branch) so the VCS repository will keep track of the changes for the different project sprints or phases.
When a team wants to start the development of a new feature without affecting the main project, you can create a new database for the new development using the Import Database feature of the VDP Administration Tool and all the developers working on that functionality can work together in the same database directly in the server.
Once the development is done, they can perform a push to the VCS repository so the administrator can start testing that new functionality before promoting it.
The process in the VDP server in the Development environment will be something as follows:
- db_project_1 is a database synchronized with the remote database in the master branch.
- There are several features being developed and each one has its own database: db_project_1_featureN (using db_project_1 as the remote database in VCS)
- Multiple developers connect to db_project_1_featureN database to work in the new feature as described in the scenario "Centralized workflow with shared databases" in the Scenarios and Recommended Uses section.
- During the development of featureN:
- Changes on db_project_1_featureN database are not pushed to the remote repository.
- Periodically, db_project_1_featureN database is updated to integrate the new changes in db_project_1.
- When the development of featureN is completed:
- db_project_1_featureN database needs to be updated.
- conflicts/problems need to be resolved.
- After that, commit/push the changes to the remote database in VCS.
- Integrator user should stop other changes to be pushed to the remote repository.
- Integrator user should update the db_project_1 database in the Development environment, and perform the new feature tests.
- When testing is completed, other teams can start to commit/push changes to the db_project_1 database.
This approach allows developers to access the changes of other developers since they are developing directly on the database. However, you need to keep in mind that, in order to avoid conflicts, they must coordinate among themselves to prevent modifying the same elements simultaneously.
Virtual DataPort Administration Guide: Scenarios and Recommended Uses
Virtual DataPort Administration Guide: Version Control Systems Integration
Virtual DataPort Administration Guide: Import Database