DevOps
A cultural and tooling movement where developers and operations share responsibility for software throughout its lifecycle. Tight feedback loops between “it compiles on my machine” and “it’s running in prod at 3am.”
Why?
A system you can’t deploy, monitor, or roll back isn’t really operable. “10× faster in a benchmark” is undone by a manual release process that takes a week. Also, if ops is “someone else’s problem,” those tickets never top the backlog.
Core practices:
- Continuous Integration: merge changes to main frequently, automated tests on every PR
- Continuous Delivery / Deployment: every green main is a release candidate, deployment to prod is a push-button (or automatic) operation
- Configuration as Code / Infrastructure as Code: all infra and settings checked into Git (Terraform, Ansible, Kubernetes YAML, Pulumi)
- Monitoring / Observability: metrics, logs, traces, alerts
- Incident response: runbooks, on-call, blameless postmortems
- Automation over toil: any task done twice should be a script
Adjacent terms: SRE (Google’s engineer-driven take, adds error budgets and explicit SLOs), Platform Engineering (internal platform teams providing DevOps as a product to product teams). DevOps is also called Software Reliability Engineering.
From ECE459 L34: DevOps for P4P
Everything up to now was one-shot computation. DevOps covers keeping systems running: services generally available, responding whenever requests arrive.
The trend is away from strict dev vs. ops team separation. At a startup you can’t afford two teams anyway. Letting developers feel operational pain motivates better tooling.
As the company grows a dedicated ops team may help, but dumping all work on them makes them a bottleneck. Happier outcome: dev teams solve their own ops problems instead of opening tickets.
Continuous Integration
Now table stakes. Nightly builds were a product of slow, expensive builds. With version control, good tests, and scripted deployments, every commit goes through:
- pull from version control
- build
- run tests
- report results
Social convention: don’t break the build. Results pushed to email, Slack, or Teams.