What Does The human scalability of DevOps Mean?

Table Of Content

What is DevOps
We would be using a more profound version:
Matt Klein, Editor of Envoy Proxy, mentioned in one of his articles:
A brief history of operating Internet applications
Why does DevOps work well for modern Internet startups?
What happens when a modern Internet startup undergoes hyper-growth?
Central infrastructure teams
The fallacy of fungibility
The breakdown
Is there a middle ground between the “old way” and the DevOps way?
What is the right SRE model?
Conclusion

Human Scalability of DevOps? What is that exactly?

While DevOps can work amazingly well for little engineering organizations, the training can prompt extensive human/authoritative scaling issues without the cautious idea and management. Let us first start with the definition of DevOps.

What is DevOps

The term DevOps implies various things to various individuals. Before we jump into our speculation regarding the matter, we believe it's essential to be clear about what DevOps intends to most of the people.

Wikipediadefines DevOps as:

“DevOps (a clipped compound of “development” and “operations”) is a software engineering culture and practice that aims at unifying software development (Dev) and software operation (Ops). The main characteristic of the DevOps movement is to strongly advocate automation and monitoring at all steps of software construction, from integration, testing, releasing to deployment and infrastructure management. DevOps aims at shorter development cycles, increased deployment frequency, and more dependable releases, in close alignment with business objectives.”

We would be using a more profound version:

DevOps is the practice of developers being responsible for operating their services in production, 24/7. This includes development using shared infrastructure primitives, testing, on-call, reliability engineering, disaster recovery, defining SLOs, monitoring setup and alarming, debugging and performance analysis, incident root cause analysis, provisioning, and deployment, etc.

The differentiation between the Wikipedia definition and our definition (an advancement theory versus an operational methodology) is significant and is encircled by the individual business experience of differentDevOpsspecialists. Some portion of the DevOps "development" is to present moderate moving "heritage" undertakings to the advantages of current exceptionally computerized framework and improvement rehearses. These incorporate things like: inexactly coupled administrations, APIs, and groups; continuous integration; small iterative deployments from master; Agile communication and planning; cloud-local versatile foundation; and so forth.

Matt Klein, Editor of Envoy Proxy, mentioned in one of his articles:

“For the last 10 years of my career, I have worked at hyper-growth Internet companies including AWS EC2, pre-IPO Twitter, and Lyft. Additionally, primarily due to creating and talking about Envoy, I’ve spent the last two years meeting with and learning about the technical architectures and organizational structures of myriad primarily hyper-growth Internet companies. For all of these companies, embracing automation, agile development/planning, and other DevOps “best practices” is a given as the productivity improvements are well understood. Instead, the challenge for these engineering organizations is how to balance system reliability against the extreme pressures of business growth, personnel growth, and competition.”

A brief history of operating Internet applications

Over the past, roughly thirty years of what can be known as the cutting edge Internet period, Internet application advancement, and activity have experienced three particular stages.

During the main stage, Internet applications were assembled and sent comparatively to how "contract wrapped" programming was delivered. Three unmistakable occupation jobs (improvement, quality confirmation, and tasks) would team up to move applications from advancement to creation over normally incredibly long building cycles. During this stage, each application was deployed in a devoted server farm or colocation office, further requiring activities workforce who knew about the site-explicit system, equipment, and frameworks organization.
During the subsequent stage, led basically by Amazon and Google in the late 90s and mid 00s, Internet applications at quick moving hyper-development organizations began to embrace rehearses like the cutting edge DevOps development (approximately coupled administrations, coordinated turn of events, and sending, computerization, and so on.). These organizations despite everything ran their own (exceptionally huge) server farms, yet because of the scales in question, could likewise begin creating incorporated foundation groups to handle regular concerns required by all administrations (organizing, checking, sending, provisioning, information stockpiling, reserving, physical framework, and so on.). Amazon and Google, nonetheless, never completely bound together advancement work jobs (Amazon by means of the Systems Engineer and Google through the Site Reliability Engineer), perceiving the contrasting abilities and interests engaged with each.
During the third, or cloud-native, stage, Internet applications are presently developed starting from the earliest stage depend on facilitated flexible design, regularly gave by one of the "enormous three" public clouds. Getting the item to advertise as quick as potential has consistently been the essential objective given the high probability of disappointment, anyway in the cloud-local period the base innovation accessible "out of the box" permits a pace of emphasis that smaller people what preceded. The other characterizing highlight of organizations that have begun in this time has been shunning the act of recruiting non-programming engineer jobs. The accessible framework base is so moderately powerful they reason — we would contend effectively — that startup headcount dollars are better spent on programming designers who can do both building and activities (DevOps).

The development of not employing committed activities staff in stage three organizations are fundamentally significant. Albeit, unmistakably, such an organization needn't bother with full-time framework heads to oversee machines in a colocation office, the sort of individual who might have recently filled such a vocation additionally regularly gave other 20% aptitudes, for example, framework troubleshooting, execution profiling, operational instinct, and so on. New organizations are being worked with a workforce that needs basic, not effectively replaceable, ranges of abilities.

Why does DevOps work well for modern Internet startups?

DevOps works extremely well for modern Internet startups for a couple of different reasons:

Fruitful early-stage startup engineers are an uncommon type of developer. They are hazard open-minded, incredibly brisk students, happy with completing things as quickly as conceivable paying little heed to the tech obligation caused, can frequently work in various frameworks and dialects, and normally have related knowledge with activities and frameworks organization, or are eager to learn as they go. To put it plainly, the run of the mill startup engineer is extraordinarily fit to being aDevOps engineer, regardless of whether they need to call themselves one or not.
Present-day public clouds give a mind-boggling framework base to expand on. Most essential operational assignments of the past have been mechanized away, leaving a substrate that is adequate to transport a base practical item and check whether there is an item showcase fit.
At the point when engineers are compelled to be accessible if the need arises and responsible for the code they compose, the nature of the framework improves. Nobody likes to get paged. This input circle constructs a superior item, and as we depicted in (1), the commonplace architect pulled in to take a shot at a beginning phase startup item is completely ready to learn and accomplish the operational work. This is particularly obvious given that there is a regular little repercussion for an early startup item having helpless unwavering quality. Dependability should be sufficiently useful for the item to discover a market fit and enter the hyper-development stage.

What happens when a modern Internet startup undergoes hyper-growth?

Most new companies come up short. That is the truth. All things considered, any early startup that is investing a ton of energy making framework in the picture of Google is simply sitting around idly. We generally advise individuals to stay with their solid design and not stress over whatever else until human adaptability issues (correspondence, arranging, tight coupling, and so forth.) require a move towards a more decoupled engineering.

So what happens when a cutting edge (stage three) Internet startup discovers achievement and enters hyper-development? Two or three intriguing things begin occurring simultaneously:

The pace of faculty development quickly builds, causing serious strains on correspondence and designing proficiency. We strongly suggest perusing The Mythical Man-Month (which is still to a great extent pertinent right around 50 years after its underlying distribution) for more data on this theme.
The above quite often bring about a move from a solid to microservice design as an approach to decouple advancement groups and yield more noteworthy correspondence and building proficiency.
The move from a solid to microservice engineering builds framework foundation multifaceted nature by a few significant degrees. Systems administration, discernibleness, arrangement, library the executives, security, and several different worries that were not troublesome beforehand are currently serious issues that should be illuminated.
Simultaneously, hyper-development implies traffic development and the resultant specialized scaling issues, just as more noteworthy repercussions for both complete disappointment and minor client experience issues.

Central infrastructure teams

Universally following the early startup stage, present-day Internet hyper-development organizations wind up organizing their designing associations correspondingly. This normal structure comprises of a focal foundation group supporting a lot of vertical item groups rehearsing DevOps (regardless of whether they consider it that or not).

The explanation the focal foundation group is so regular is that, as talked about above, hyper-development carries with it a related arrangement of changes, both with individuals and hidden innovation, and actually best in class cloud-native innovation is still too difficult to even consider using if each product engineering group needs to independently take care of basic issues around systems administration, discernibleness, organization, provisioning, reserving, information stockpiling, and so on. As an industry, we are several years from "serverless" advances being sufficiently vigorous to completely bolster exceptionally solid, huge scope, and realtime Internet applications in which the whole building association can to a great extent center around business rationale.

Along these lines, the focal framework group was destined to take care of issues for the bigger building association well beyond what the base cloud-local foundation natives give. Obviously, Google's infrastructure group is significantly degrees bigger than that of an organization like Lyft in light of the fact that Google is taking care of primary issues beginning at the server farm level, while Lyft depends on a considerable number of openly accessible natives. Be that as it may, the hidden purposes behind making a focal foundation association are equivalent in the two cases: theoretical however much framework as could be expected with the goal that application/item designers can concentrate on the business rationale.

The fallacy of fungibility

At long last, we show up at the possibility of "fungibility," which is the core of the disappointment of the unadulterated DevOps model when associations scale past a specific number of architects. Fungibility is the possibility that all designers are made an approach and can do all things. Regardless of whether expressed as an unequivocal employing objective (as in any event Amazon does and maybe others) or made evident by "Bootcamp" like recruiting rehearses in which designers are employed without a group or job at the top of the priority list, fungibility has become a well-known part of current building theory in the course of the last 10–15 years at numerous organizations. Why would that be?

As we already described, modern cloud-native technology and abstractions allow extremely feature-rich applications to be built with increasingly sophisticated infrastructure abstractions. Naturally, some specialist skills such as data center design and operations are no longer required for most companies.
Over the last 15 years, the industry has focused on the idea that software engineering is the root of all disciplines. For example, Microsoft has phased out the traditional QA engineer and replaced it with the Software Test Engineer, the idea being that manual QA is not efficient and all testing should be automated. Similarly, traditional operations roles have been replaced with site reliability engineering (or similar), the idea being that manual operations are not efficient, and the only way to scale is through software automation. To be clear, I agree with these trends. Automationisa more efficient way to scale.

Be that as it may, this thought is taken to its extraordinary, the same number of more up to date Internet new companies have done, has brought about just generalist programming engineers being employed, with the desire that these (DevOps) specialists can deal with improvement, QA, and tasks.

Fungibility and generalist recruiting normally works fine for early new companies. In any case, past a specific size, the possibility that all specialists are swappable turns out to be practically crazy for the accompanying reasons:

Generalists versus specialists. More complex applications and architectures require more specialist skills to succeed, whether that be frontend, infrastructure, client, operations, testing, data science, etc. This does not imply that generalists are no longer useful or that generalists cannot learn to become specialists, it just means that a larger engineering organization requires different types of engineers to succeed.
All engineers do not like doing all things. Some engineers like being generalists. Some like specializing. Some like writing code. Some like debugging. Some like UI. Some like systems. A growing engineering organization that supports specialists has to grapple with the fact that employee happiness sometimes involves working on certain types of problems and not others.
All engineers are not good at doing all things. Throughout your career, you must have met many amazing people. Some of them can start with empty files in an IDE and create an incredible system from scratch. At the same time, these same people have little intuition for how to run reliable systems, how to debug them, how to monitor them, etc. Conversely, I have been on manyinfuriatinginterview loops in which a truly incredible operations engineer who could add tremendous benefit to the overall organization purely via expertise in debugging and innate intuition on how to run reliable systems has been rejected because they did not demonstrate “sufficient coding skills.”

Unexpectedly and deceptively, associations, for example, Amazon and Facebook organize fungibility in programming building jobs however plainly esteem the split (yet at the same time covering) range of abilities among advancement and activities by proceeding to offer distinctive vocation ways for each.

The breakdown

How and at what organization size does pure DevOps breakdown? What goes wrong?

Move to microservices. By the time an engineering organization reaches ~75 people, there is almost certainly a central infrastructure team in place starting to build common substrate features required by product teams building microservices.
Pure DevOps. At the same time, product teams are being told to do DevOps.
Reliability consultants. At this organization size, the engineers who have gravitated towards working on infrastructure are very likely the same engineers who either have previous operational experience or good intuition in that area. Inevitably, these engineers become de facto SRE/production engineers and help the rest of the organization as consultants while continuing to build the infrastructure required to keep the business running.
Lack of education. As an industry, we now expect to hire people who can step in and develop and operate Internet services.However, we almost universally do a terrible job of both the new hire and continuing education required to perform this task. How can we expect engineers to have operational intuition when we never teach the skills?
Support breakdown. As the engineering organization hiring rate continues to ramp, there comes a point at which the central infrastructure team can no longer both continue to build and operate the infrastructure critical to business success, while also maintaining the support burden of helping product teams with operational tasks. The central infrastructure engineers are pulling double duty as organization-wide SRE consultants on top of their existing workload. Everyone understands that education and documentation is critical, but scheduling time to work on those two things is rarely prioritized.
Burnout. Worse, the situation previously described creates a human toll and reduces morale across the entire organization.Product engineers feel they are being asked to do things they either don’t want to do or have not been taught to do. Infrastructure engineers begin to burnout under the weight of providing support, knowing that education and documentation are needed but unable to prioritize the creation of it, all the while keeping existing systems across the company running at high reliability.

At a specific engineering association size, the wheels begin tumbling off the transport and the association starts to have human scaling issues with an unadulterated DevOps model upheld by a focal framework group. We would contend this size is reliant on the current development of open cloud-local innovation and as of this composing is someplace in the low several all-out designers.

Is there a middle ground between the “old way” and the DevOps way?

For organizations more seasoned than around 10 years, the site reliability or production engineering model has gotten normal. In spite of the fact that usage shifts across organizations, the thought is to utilize engineers who can completely concentrate on dependability building while not being under obligation to item administrators. A portion of the execution subtleties are exceptionally significant, be that as it may, and these include:

Are SREs on-call by themselves or do software engineers share the on-call burden?
Are SREs doing actual engineering and automation or are they being required to perform only manual and repetitive tasks such as deployments, recurring page resolution, etc.?
Are SREs part of a central consulting organization or are they embedded within product teams?

The accomplishment of the program and its effect on the general engineering association is frequently subjected to the responses to the above inquiries. In any case, I solidly accept that at a specific size, the SRE model is the main successful approach to scale a building association to various specialists past where an unadulterated DevOps model separates. Indeed, we would contend that effectively bootstrapping an SRE program well ahead of time of the human scaling limits illustrated in this is a basic duty of the designing initiative of a cutting edge hyper-development Internet organization.

What is the right SRE model?

Given the plenty of models as of now actualized in the business, there is no correct response to this inquiry and all models have their openings and resultant issues. We will diagram what we think the sweet spot depends on my perceptions in the course of the most recent 10 years:

Recognize that operations and reliability engineering is a discrete andhugely valuableskillset. Our rush to automate everything and the idea that software engineers are fungible is marginalizing a subset of the engineering workforce that is equally (if not more!) valuable than software engineers. An operations engineer doesn’t have to be comfortable with empty source files just the same as a software engineer doesn’t have to be comfortable debugging and firefighting during a stressful outage.Operations engineers and software engineers are partners, not interchangeable cogs.
SREs are not on-call, dashboard, and deploy monkeys. They are software engineers who focus on reliability tasks not product tasks. An ideal structure requiresallengineers to perform basic operational tasks including on-call, deployments, monitoring, etc. I think this is critically important as it helps to avoid class/job stratification between reliability and software engineers and makes software engineers more directly accountable for product quality.
SREs should be embedded into product teams, while not reporting to the product team engineering manager. This allows the SREs to scrum with their team, gain mutual trust, and still have appropriate checks and balances in place such that a real conversation can take place when attempting to weigh reliability versus features.
The goal of embedded SREs is to increase the reliability of their products by implementing reliability oriented features and automation, mentoring and educating the rest of the team on operational best practices, and acting as a liaison between product teams and infrastructure teams (feedback on documentation, pain points, needed features, etc.).

Given the plenty of models as of now actualized in the business, there is no correct response to this inquiry and all models have their openings and resultant issues. I will diagram what we think the sweet spot depends on my perceptions in the course of the most recent 10 years:

Understanding Human Scalability in DevOps

Balancing Automation and Team Dynamics for Sustainable Growth

Conclusion

Very few companies reach the hyper-growth stage at which point this post is directly applicable. For many companies, a pure DevOps model built on modern cloud-native primitives may be entirely sufficient given the number of engineers involved, the system reliability required, and the product iteration rate the business requires.

For the relatively few companies for which this post does apply, the key takeaways are:

DevOps style agile development and automation is required for any new technology company that hopes to compete.
Publicly available cloud-native primitives along with a small central infrastructure team can allow an engineering organization to scale to hundreds of engineers before the operational toll due to lack of education and role specificity starts to emerge.
Getting ahead of the operational human scaling issues requires a real investment in new hire and continuing education, documentation, and the development of an embedded SRE team that can form a bridge between product teams and infrastructure teams.

Modern hyper-growth Internet companies have (in my opinion) an egregiously large amount of burnout, primarily due to the grueling product demands coupled with a lack of investment in operational infrastructure. I believe it is possible for engineering leadership to buck the trend by getting ahead of operations before it becomes a major impediment to organizational stability.

While newer companies might be under the illusion that advancements in cloud-native automation are making the traditional operations engineer obsolete, this could not be further from the truth. For the foreseeable future, even while making use of the latest available technology, engineers who specialize in operations and reliability should be recognized and valued for offering critical skillsets that cannot be obtained any other way, and their vital roles should be formally structured into the engineering organization during the early growth phase.

Author Details

Vaibhav Umarvaishya

Cloud Engineer | Solution Architect

As a Cloud Engineer and AWS Solutions Architect Associate at NovelVista, I specialized in designing and deploying scalable and fault-tolerant systems on AWS. My responsibilities included selecting suitable AWS services based on specific requirements, managing AWS costs, and implementing best practices for security. I also played a pivotal role in migrating complex applications to AWS and advising on architectural decisions to optimize cloud deployments.

Enjoyed this blog? Share this with someone who'd find this useful

Confused About Certification?

Get Free Consultation Call