From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Deploy Application Files with Git
1. diG : Deploying large applications with Git
Alec Clews alec.clews@voga.com.au
Principal Consultant at Voga Consulting Services
Abstract
When deploying files into commercial environments it is vital to document and audit the
installation of files across multiple application instances.
This paper presents a working example that addresses some of the major issues relating to
deployment such as automation, access control, logging and change auditing.
Traditionally the Git version control tool is used to manage development source code and as a
delivery point for other tools such as such Capistrano or Puppet. However current deployment and
system management frameworks often use a specialist domain language or pidgin and need
significant configuration which can make them more complex to use. Some large tools also require
programming implementation classes in a language such as Java to perform specific actions. In a
system management environment this effort is worthwhile, but for application deployment (often
with little time or budget available) it is often too much effort. Using Git with a number of simple
wrapper scripts can greatly simplify the deployment implementation.
A basic understanding of Git and ssh is assumed.
Motivation
Deployment of software in large scale, commercial, environments presents a set of requirements
and constraints that are only usually considered as an after thought. These deployment processes
should be:
1. Repeatable: Can be run repeatedly as required
2. Reproducible: Can be re-used and installed across different systems as needed
3. Documented
4. Audited: Records should be maintained of all activity
5. Secure: Access and functions should be restricted and based on some form of access control
6. Automated: As little human interaction should be required as possible
7. Easy to use: There should be as few opportunities for errors as possible
This paper introduces some simple recipes (refered to collectively to as diG – deploy interactivity
with Git) that can be quickly implemented to address these needs. Whilst there are a number of
commercial and open source alternatives available, with much greater functionality, they all require
significant set-up and infrastructure. diG is a quick win solution with small cost and effort but
limited functionality.
Background
This approach was originally developed for teams with the following needs:
1. Frequent releases
2. Regular internal and external audits
3. Over forty application instances across production, test and development
4. An application that occupies 2.5Gb of file storage in over 10,000 files
2. 5. Security polices that make installing additional software, daemons and setuid programs a
problem (ssh was already in use and installation of Git was easy)
Scope
The features provided by diG are outlined below in the user stories. However it is also important to
articulate the things that diG does not do
Excluded features:
1. Application configuration
2. Application management
3. Operating system operations (beyond the manipulation of application file system objects)
User Stories
The following user stories describe the features that diG provides:
• diG will accept release contents from a configuration management or version control tool. If
available a release identifier must be tracked in some fashion
• diG will provide a report that lists the contents of each accepted release
• diG will have a function to physically deploy release contents to a specific environment NB:
• Pre and post processing of a releases contents are subject to separate deliveries
outside the scope of diG (c.f. Environment configuration scripts)
• A release can be re-deployed to any environment at will
• diG will provide a report that lists the contents of the release deployed to each environment
• diG will provide details of any differences between the release deployed to an environment
and any changes detected to the objects in the same runtime environment (a change audit
report).
• A (semi automatic) 'roll back' feature to a documented state would be highly desirable
• diG will allow detected changes to be marked as benign and ignored for future change audit
reports
Solution
Overview
Git, a distributed version control system, was selected for five reasons:
• Tracks changes to files
• Zero capital cost
• Supports ssh as a transport mechanism
• Does not require a server process
• Excellent support for branches
Subsequently Git has been the correct choice because of
• Flexibility
• Ease of testing
3. Component Overview
Work flow
The diG staging repository is a Git version control database
1. Software is built and released from an existing set of processes and tools
2. The release is identified with an external identifier
3. A new empty branch is created on the diG staging repository (a release branch)
4. The release contents are committed to the diG staging repository on the new release branch
5. The Git commit is a record in staging of the release contents. This may be tagged and
signed if required
4. There is a branch in diG staging for each application environment and on each corresponding
application environment there is another Git repository (one repository per application installation)
6. To release to an environment a release branch is merged to an application branch in the
staging repository
7. The updated application branch is pushed to the remote application repository
8. On the application environment files are checked out (deployed) from Git
9. The remote application is configured as needed
10. Any configuration changes can be added back to the application repository
11. Configuration changes are pulled back to the staging repository.
Assumptions
1. Each application environment has a unique and consistent identifier
2. Git software is installed on each system
3. ssh is installed on each system
Set-up
Involves two steps
1. Creating a staging repository
2. Creating a runtime environment definition (environment branch in staging) and
corresponding repository. Repeated for each application environment as needed.
Custom Merge Drivers
When Git attempts to bring two branches together it performs a merge. In diG a merge occurs when
a release is applied to an environment – the release branch is merged onto the contents of the
current application branch. However the Git default behaviour takes similar files and integrates non
overlapping changes. For a release this is not what we want (code merge has already been done
during the development and build cycle) and files in the new release should overwrite existing
application files. This is accomplished by defining a custom merge driver that implements this
behaviour.
1. Define a merge driver stanza in the Git configuration file
2. Define the file pattern for which this driver will be used
Refer to the Git documentation for more details1
The set-up scripts below perform these steps when the repositories are created by 'git init'
This is done on all Git repositories that diG uses (for completeness, although the current use cases
do not require merging on the application environment repositories).
Naming Conventions and Git Tagging
Using diG affectively requires the use of a consistent naming convention for all branches, tags etc.
For examples in the following examples all new delivery branch names are prefixed with
“DELIVERY_”. Git tags, basically immutable names that can be associated with specific commits,
are useful for both auditing and reporting purposes, and a consistent naming convention should also
be adopted. In the reporting section it is assumed that all deployments are tagged with a name that is
prefixed with “DEPLOYED”, although the specific tagging is not show. The following minimum
set of tag points is suggested
1 http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_defining_a_custom_merge_driver
5. 1. Release check in
2. Deployment merge
3. After the configuration of a new release on an environment (NB Requires a pull back to
staging first)
Create a staging repository
On the staging repository (this may require substantial free desk space) create a Git repository. e.g.
1. Creates a new empty Git staging database (a repository or repo)
2. mkdir stagingRepo
3. cd stagingRepo
4. git init # TODO add –shared options
5. Creates a custom merge driver. Explained above
6. echo '[merge "overwrite"]' >> .git/config
7. echo ' name = overwrite using cp' >> .git/config
8. echo ' driver = cp %B %A' >> .git/config
9. echo ' #recursive = defaults to driver' >> .git/config
10. echo '* merge=overwrite' >> .git/info/attributes
11. An initial empty commit and branch to which all subsequent deliveries can be added
12. git commit --allow-empty -m "Creating empty staging repo"
13. git branch -m master Empty # rename master branch
Adding a new environment
Before deployments occur to a remote application system create a repository for each application
runtime environment to be maintained. This is created on the operating system where the
application is installed
On the application runtime system:
Ensure Git is installed
Ensure there is adequate free hard disk space. This will depend on the size of the application and
the size/number of releases to be managed.
Login as the account to own and maintain the deployment database. A separate account is
suggested
1. mkdir <envID>Repo
2. cd <envID>Repo
3. git init # TODO add –shared options
4. echo '[merge "overwrite"]' >> .git/config
5. echo ' name = overwrite using cp' >> .git/config
6. echo ' driver = cp %B %A' >> .git/config
7. echo ' #recursive = defaults to driver' >> .git/config
8. echo '* merge=overwrite' >> .git/info/attributes #no spaces around '='
9. GIT_DIR=$location/<envID>Repo/.git ; export GIT_DIR
10. cd <AppEnvLocation>
11. Create .gitignore file. See below
At this point the application should be stopped
12. find . -type f -print | git update-index --add –stdin
The application may be restarted
13. git commit –m "Initial <envID> load"
14. git branch –m master <envID>
The command 'git update-index' is used rather than the more usual 'git add' because:
1. It overcomes limitations of specifying lists of file names on the command line
2. Improves performance by not invoking Git add for every file
3. Provides an opportunity to apply selection criteria on the files to be included if .gitignore is
not sufficient.
6. On the staging repository
git remote add –m <envID> -f <envID> ssh://$username@$host:$port$location
where $username, $host, $port and $location represent the location and credentials for the remote
application repository
Creating a .gitignore file
In the application root directory a .gitignore2 must be created. It lists the file patterns that are NOT
to be tracked by the system (e.g. application log files).
Delivering a new release to staging
The delivery mechanism is currently very simple. The release process, from the configuration
management tool (e.g. Subversion3) will place a new release in an empty directory (e.g. a
Subversion tag path). The delivery process attempts to identify the release ID, in the following
example this value has been placed in $DEL_NAME.
1. git checkout -b $DEL_NAME EMPTY # We create a new empty branch for delivery
2. cp -a $DEL_LOC/. . # assumes files are in $DEL_LOC
3. git add .
4. git commit -m "Accepting delivery $DEL_NAME"
5. git tag DELIVERY_${DEL_NAME} # Optional but recommended
Deploying a new release to an environment
This is a multi-stage process
Deployment Overview
On staging
1. Merge the appropriate release branch into the application branch. NB This will use the
custom merge driver and overwrite application files
2. Push the release files to the remote application environment repository
On the runtime environment
3. Deliver files to the runtime environment from the local repository
4. Configure application (not part of diG)
5. Add the configuration changes to the local repository
On Staging
6. Pull any changes back from the environment repository
Push release files to remote application environment repository
1. Merge a release onto a environment
git merge --stat --commit -m "Adding delivery $DEL_NAME to environment
$ENV_NAME" DELIVERY_$DEL_NAME
2. Move files to remote environment repo. NB This does not touch the current runtime
environment
git push --verbose --force $envID
Deliver files to the runtime environment from the local repository
Push application files from the new delivery onto the application environment
git reset --hard HEAD
At this point configuration and installation processes would be executed
2 http://www.kernel.org/pub/software/scm/git/docs/gitignore.html
3 http://subversion.tigris.org/
7. Add the configuration changes to the local repository
Add any configuration changes back into the application repository
1. git add .
2. git commit –m “Message”
Pull any changes back from the environment repository
Bring configuration changes back to staging so that the staging repository is the single source of
truth
git fetch <envID> HEAD
Change Auditing
Application environments may be monitored using a combination of 'git status' and 'git diff'. When
changes are detected they can removed with 'git reset --hard HEAD' and 'git clean -fd'
Reports
Reports will be specific to the requirements of each organisation, however Git does maintain
complete records of all files contents and their movement. By identifying artefact's unique sha1
name it is possible to track a complete deployment lifecycle. Reports can created by running scripts
with Git commands and there are many examples across the Internet. Reports are made easier by
using a consistent naming conventions in branches and tags.
Example report
#! /bin/bash
# Offer a list of previous deployments and display their contents
select rel_id in `git tag -l DEPLOYED*` quit; do
case $rel_id in
quit) break;;
*) git log --summary $rel_id;;
esac
done
Conclusion
diG has limitations but is easy to set up and use. In an environment where no provision has been
made to manage the deployment of file-system objects this solution it can be a convenient way to
set up a deployment and audit solution.
This work has been tested on Linux, Solaris 8 and Windows. So far there have been no portability
issues.
Further work
• Development and refinement of the workflow for various deployment requirements and
application types
• Scalability testing in order to discover the limits at which diG become impractical to use
• Integration of change audit on none file-system objects (e.g. database schema objects)
Possibly via checksum storage
• Further automation and integration of pre and post-deployment scripts and Git hook scripts
(c.f. http://toroid.org/ams/git-website-howto)
• A mechanism for a release to delete objects (files, database tables etc) from an environment
• A documented security model
• Investigation of improvements possible by placing the various Git repositories on shared
8. network storage. This could result in major savings on storage (Git needs only store each
unique object version once)
References and Further Reading
Git
1. Git explained on Wikipedia http://en.wikipedia.org/wiki/Git_(software)
2. Git Home page by Petr Baudis http://git.or.cz/index.html
3. Git SCM page by Scott Chacon http://git-scm.com/
4. Git Wiki http://git.or.cz/gitwiki/FrontPage
5. Git FAQ http://git.or.cz/gitwiki/GitFaq
6. Packaging Software Using Git http://www.golden-
gryphon.com/software/misc/packaging.html
Alternative Open Source Products
NB This list is illustrative only. There are doubtless many other tools, apologies to any that have
been omitted.
1. Puppet http://reductivelabs.com/trac/puppet/wiki/AboutPuppet
2. Capistrano http://www.capify.org/
3. heroku http://heroku.com/
4. Fabric http://docs.fabfile.org/
5. Vlad the Deployer http://rubyhitsquad.com/Vlad_the_Deployer.html
6. ControlTier http://open.controltier.com/
7. Sun N1 Service Provisioning System (N.B. free of cost, but restricted license)
http://www.sun.com/software/products/service_provisioning/
Acknowledgements
This paper was written with the help of Marilyn Clews and Dr Nithya Rae
License and Availability
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License.
To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter
to Creative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA
This paper and the any examples are freely available from
http://github.com/alecthegeek/diG/tree/master. Contributions and comments are welcome
About the Author
Alec is principal at Voga Consulting Services (http://voga.com.au), a boutique consultancy that
specialises in Application Life Cycle Management (ALM). He drinks too much espresso and
probably spends too much time on various social websites as alecthegeek or Alec Clews. LinkedIn
profile http://www.linkedin.com/in/alecclews