2. Problem Statement
• Santa team uses microservice architecture, comprising of 12-13
micro services to provide offers to our customers. These services are
continuously upgraded and modified as per the requirements.
• There was a need to build a system that would require minimal
effort from developer to build and deploy services on testing and
production environment.
• The system should make sure no bad deployment goes to
production.
3. Analysis and Alternatives
• Currently all teams at Flipkart use fk-ops buildserver to build and
deploy.
Problems with Build-Server
• Is single-threaded.
• Does not provide resumability.
• Does not provide any aggregated view of the deployment process.
• Requires tech-ops involvement for issues like disk space etc to
resolve.
4. Analysis and Alternatives
What we achieved
• We built a generic system, with few parameters replicated for all
the microservices of santa, same can be done for services across
flipkart.
• User has to provide a few parameters and then the system does
rest of the job.
• The system can be modified very easily to accommodate future
needs.
5. Implementation
• Jenkins and Ansible are used in system for workflow
management and deployment respectively
• Jenkins builds the project and then uploads the debian package
to repo service.
• After that it triggers ansible for the deployment.
• Ansible makes sure that the packages are installed on all the
provided machines and proper logs are generated for all the
processes performed.
• Ansible also sends the machine out of rotation in case package
installation failure.
6. Safety Nets
• Makes sure that the built package has been uploaded to repo
service.
• Verifies the version number of the installed package with the latest
package available on repo service.
• Picks up few machines to test new deployment for sanity test like
number of failures, change in latency after deployment.
• Ensures that new package(version) does not degrade the
performance of the service.
• In case of any errors, it pinpoints the exact step where the error
occured to ensure proper debugging.
6
11. CONCLUSIONS
• ONE - CLICK BUILD + DEPLOYMENT is successfully implemented and
has been tested also.
• This system is a lot more flexible and enhancible than the system
which is currently being used.
• A basic infrastructure is built with build, deployment and safety net
features
Future Scope:
Since the system is generic, it can be easily adapted for other
departments too.
With experience much more safety nets can be incorporated in the
workflow