Auto-Scaling Optimization AWS, Cloud Infrastructure Optimization, Former03 GmbH, Intelligent CPU-based Auto-Scaling, AWS Operational Excellence

Optimizing Cloud Infrastructure: How Former03 Achieved Operational Excellence with AWS

Key Challenges

Former03 faced infrastructure inefficiencies as application traffic grew, including false auto-scaling triggers, unpredictable cloud costs, ineffective IP-based rate limiting in VPN environments, and application instability caused by under-sized instances. These issues increased operational overhead, consumed engineering time, and threatened strict uptime SLAs for enterprise clients.

Key Results

By optimizing AWS auto-scaling, observability, rate limiting, and instance sizing, Former03 reduced false scaling events by 80%, achieved predictable monthly cloud costs, and eliminated application crashes. Engineering effort dropped by 70%, €42,000 in annual infrastructure savings were realized, and €500,000 in at-risk revenue was secured through improved reliability and SLA compliance.

Overview

Former03 GmbH is a Munich-based digital agency specialising in sophisticated web development, UX/UI design, and multimedia solutions for enterprise clients across Germany. Founded in 2003, this established SMB has built a reputation serving high-profile clients includingDATEV (financial software), Volkswagen's Elli (EV charging solutions), Dallmayr(premium retail), and PONS (publishing). Former03 delivers mission-critical web applications and APIs that demand enterprise-grade reliability, scalability and performance.

As their client portfolio expanded to include more complex, high-traffic applications, their AWS infrastructure began showing critical inefficiencies in auto-scaling behaviour and API rate limiting that threatened service reliability and client satisfaction.

‍

Challenges

Former03 faced critical infrastructure reliability and cost optimisation challenges threatening their enterprise client relationships. Their auto-scaling configuration was triggering unnecessary scaling events due to suboptimal CloudWatch alarm thresholds set at6 MB network output, causing false alarms when processing legitimate 8 MB JSON payloads from client applications. This resulted in unpredictablei nfrastructure costs fluctuating by €3,500 monthly and inefficient resource allocation across staging (1-3 instances) and production (1-5 instances)environments.

The company's rate limiting architecture based solely on IP addresses became completely ineffective when multiple users connected through shared VPN connections, creating scenarios where all traffic appeared to originate from a single NAT IP. This lack of granular user identification led to unfair throttling where one heavy user could impact all others, creating service degradation for legitimate client requests and risking€500,000 in annual recurring revenue from major accounts.

Their lean technical team was spending 120hours monthly troubleshooting false alarms and manual infrastructure interventions, diverting critical resources from billable client work.Additionally, application crashes caused by insufficient 2GB RAM instances were impacting service availability for enterprise clients who demanded 99.9% uptimeSLAs.

‍

Solution

Ankercloud partnered with Former03 to implement a comprehensive infrastructure optimisation strategy addressing operational excellence, reliability, performance efficiency, and costoptimization, all pillars of the AWS Well-Architected Framework.

We transitioned from network-based to intelligent CPU and memory-based auto-scaling policies, implementing CloudWatchAgent for comprehensive hardware metrics collection with 1-minute granularity. CloudWatch alarm thresholds were re-calibrated from 6 MB to 8+ MB network output, eliminating false triggers while maintaining responsiveness to genuine traffic surges. We established Grafana and Prometheus integration for enhanced observability and proactive monitoring.

For rate limiting, we implemented a header-based architecture using Lambda Authorisers that inspect customX-Client-ID headers, with DynamoDB storing per-client request counters. Thissolution enabled granular rate limiting (100 requests per minute per client)regardless of VPN configuration, ensuring fair API access for all users.

Infrastructure right-sizing upgraded production instances from inadequate 2GB RAM to optimal 8GB RAM (c7g.xlarge compute-optimized instances), eliminating application crashes. We implemented lifecycle hooks for graceful job draining during scale-down events, preventing stuck jobs in the scheduler.

‍

Business Outcome

The transformation delivered measurableresults across operational efficiency and infrastructure reliability:

Operational Excellence

False auto-scaling events reduced by 80%, achieving 95% cost predictability
Engineering troubleshooting time decreased by 70% (from 120 to 36 hours monthly)
Platform stability improved with zero application crashes post-upgrade
Comprehensive monitoring enabled proactive issue detection

Financial Impact

€42,000 annual savings from eliminated unnecessary scaling events
84 engineering hours monthly redirected to billable client work (€8,400 monthly value at €100/hour)
Infrastructure costs stabilised with predictable monthly spending

Strategic Advancement:

Enterprise-grade rate limiting protected against service abuse and DDoS attacks
Enhanced monitoring capabilities improved client SLA compliance
Maintained €500,000 in at-risk annual recurring revenue from DATEV, Elli, and Dallmayr
Positioned for 30% revenue growth through improved service reliability

‍

Share this post

Related Case Studies

No Case Studies available related to this page.

Related Case Studies

The Ankercloud Team loves to listen