The Infra+ Team forms the backbone of engineering. We build reliable and scalable infrastructure for other engineering teams to build services on top of.
The Infra+ team sits at the centre of all other engineering teams and provides various abstractions for other teams to build and release reliable services which impact millions of users. To this end the team strives by these mantras
- Automate all moving parts of the infrastructure
- Instrument and measure the systems we use daily
- Treat optimization as a feature
- Incorporate scale as a first class citizen from day 1
We believe these mantras let us scale our impact to be larger than ourselves. To provide more concrete examples of projects the team has undertaken in the past -
- Migrated our entire AWS cluster (50+ hosts in early 2016) from US East to Mumbai region in 1 day after the Mumbai region was launched
- Optimized and scaled the Infrastructure to support peak sale periods throughout the year. This required optimizations at all levels of the stack. Read about our optimizations in depth here
- Built the conveyer belt for developers to iterate faster and maximize productivity. It's currently a mix of and Ansible and custom integrations with Jenkins
- Built an Infrastructure Pipeline using Terraform, Jenkins, Packer and Ansible that takes care of all steps required from provisioning to deployment of services. This has allowed it to scale to 100+ hosts on AWS
- Implemented Auto Scaling for all our customer facing services
We are currently in the process of re-building our data infrastructure which will power a lot of real time data driven features on all our products as well as give all our teams a unified abstraction to analyze all our internal datasets
Fynd is growing quickly across a number of dimensions and this growth leads to many interesting challenges while we strive to become India's largest O2O fashion retail destination. The solutions that worked for us a year ago no longer do so, or work less effectively than they once did. The best practices and patterns we thought everyone knew are now growing and diverging depending on the use cases and applicability. What this means is that now we have to tackle these challenges that are incredibly complex in nature, while “replacing the engine on the plane, while in flight”. All of our services must be up and running, yet we have to keep making progress in making the underlying systems more available, robust, extensible, secure and usable. We're just getting started.