Author: Andrew Holmes in: News
The AWS outage this morning took down Snapchat, Fortnite, Reddit, Ring, major banks, and hundreds of other services. Millions of people couldn’t access tools they depend on. The problem started at 3:11 a.m. ET with Amazon’s DynamoDB database service in their northern Virginia data center Major AWS outage takes down web services like Snapchat and Ring, and the ripple effects were global.
If you’re running a cloud-based ERP for manufacturing operations, this outage should get your attention. Not because AWS is unreliable (it usually isn’t), but because it exposes something critical about modern manufacturing systems: your ability to run production depends on infrastructure you don’t control.
When a social media app goes down, people are annoyed. When your manufacturing ERP goes down, production stops.
Your shop floor can’t access work orders. Your receiving dock can’t log incoming materials. Your quality team can’t record inspection results. Your planners can’t adjust schedules. Your shipping department can’t generate bills of lading. If you’re running just-in-time inventory, you might not even know what materials you have on hand.
This morning’s outage lasted a few hours for most services. Imagine what happens to your production schedule when your ERP is unavailable during first shift. You’re not just losing those hours. You’re dealing with the cascading effects: delayed shipments, missed customer commitments, overtime to catch up, and the scramble to figure out where everything was when the system went down.
Cloud infrastructure promises scalability, cost efficiency, and reliability. Most of the time, it delivers. You get enterprise-grade infrastructure without building it yourself. You don’t need a server room with climate control. You don’t need IT staff managing hardware refreshes. Your team focuses on manufacturing, not maintaining servers.
For manufacturing ERP systems that need to handle multiple plants, support remote users, and integrate with supplier systems, cloud architecture makes sense. The economics usually work better than on-premise solutions. Implementation is faster. Updates happen automatically. Disaster recovery is built in.
But outages like this one reveal something uncomfortable: we’ve traded one set of risks for another.
The old way had different problems. You owned your servers. They sat in your building or your data center. You paid for capacity whether you used it or not. When something broke, it was on you to fix it, but you could physically walk to the server room and troubleshoot. Your on-premise ERP might have been slower to update and harder to scale, but when it went down, you knew exactly who to call and what levers you could pull.
Cloud services changed that equation entirely. Your manufacturing ERP now runs in someone else’s data center, probably hundreds or thousands of miles away. When AWS goes down, you go down. Your incident response plan doesn’t matter. Your IT team can’t do anything. You wait.
The internet was originally designed to be decentralized and resilient, yet today so much of our online ecosystem is concentrated in a small number of cloud regions Live updates: AWS global outage, Amazon, Snapchat, Roblox and Fortnite down | CNN Business. When one of those regions experiences a fault, the impact spreads immediately and widely.
This morning’s outage centered on AWS’s US-East-1 region in northern Virginia. This is AWS’s oldest and largest data center hub, and crucially, the control planes for many global AWS services are housed here The AWS outage is back — and half the internet’s broken again. That means even if your manufacturing data lives in a different region, you might still be affected because your authentication or access management runs through US-East-1.
For manufacturing operations, this concentration creates a specific vulnerability. You might have plants across multiple states or countries. You might have suppliers and customers distributed globally. But if your ERP relies on a single cloud region for critical services, all of those distributed operations can stop simultaneously.
Hours of cloud downtime translate to millions in lost productivity and revenue for major businesses Massive Amazon outage takes down Venmo, Snapchat, Alexa, Reddit and much of the internet – all the latest AWS updates live | TechRadar. In manufacturing, the math is straightforward. Every hour of production you lose has a direct cost: idle labor, unused equipment capacity, delayed shipments. And while your SLA with your ERP vendor might promise certain uptime guarantees, those guarantees often don’t account for infrastructure failures outside their control.
Most cloud providers advertise 99.9% or 99.95% uptime. Those numbers sound good. They are good, actually. But they don’t tell you when the downtime will happen or what it will affect.
99.9% uptime means roughly 8.7 hours of downtime per year. That could be 8.7 hours spread across multiple small incidents, or it could be one catastrophic day. For manufacturing, the timing matters enormously. Downtime at 2 a.m. on a Sunday is different from downtime at 10 a.m. on a Tuesday when you’re trying to ship $2 million worth of product.
Many insurance policies don’t trigger unless an outage lasts eight hours or more Massive Amazon outage takes down Venmo, Snapchat, Alexa, Reddit and much of the internet – all the latest AWS updates live | TechRadar. This morning’s AWS outage was largely resolved within a few hours, but that doesn’t mean the business impact was minimal for manufacturers. If the outage hits during your peak production hours, or if it prevents you from shipping to meet a customer deadline, a three-hour outage can cost more than a longer outage at a different time.
The other thing the numbers don’t tell you: the difference between planned and unplanned downtime. Scheduled maintenance windows let you prepare. You can shift production schedules. You can communicate with customers. You can have your team ready. Unplanned outages at 3 a.m. on a Monday give you no such luxury. By the time first shift arrives, you’re already behind.
If you’re evaluating cloud-based ERP solutions or managing one, here’s what matters:
Multi-region deployments help, but they’re expensive and complex. Most mid-market manufacturers can’t justify the cost. You’re making a bet that the cloud provider’s uptime will be good enough. That’s often a reasonable bet. AWS, Azure, and Google Cloud all have strong track records. But understand you’re making it, and understand what happens when that bet fails.
Know your dependencies beyond your primary ERP vendor. If your manufacturing ERP runs on AWS, you’re vulnerable to AWS outages. If your ERP integrates with suppliers through EDI providers, quality systems, warehouse management systems, or shop floor data collection tools that run on AWS, you’re vulnerable even if the core ERP infrastructure is hosted elsewhere. Map these dependencies. Most manufacturers don’t know their full exposure until an outage happens.
Your customers don’t care whose fault it is. They care that you ship on time. Have a communication plan ready before you need it. Know how you’ll update customers during an outage you can’t control. Know which production processes can continue manually and which can’t. Document the manual procedures before the crisis, not during it.
Some manufacturing processes can tolerate cloud outages better than others. If you’re running discrete manufacturing with batch processing, you might be able to weather a few hours of downtime by running from printed work orders and backfilling data later. If you’re running continuous process manufacturing with tight tolerances and automated quality checks, losing your ERP for even an hour creates serious problems.
Automated material handling systems need real-time data. MES (Manufacturing Execution System) integration stops working when the ERP is down. Automated inventory tracking fails. If your warehouse uses RF scanners connected to your cloud ERP, your receiving and shipping operations halt completely.
The just-in-time manufacturers have the least tolerance for outages. When you’re running lean inventory and tight schedules, you need real-time visibility into what you have, what’s coming, and what’s going out. A three-hour ERP outage doesn’t just cost you three hours of production. It costs you the time to figure out your actual inventory position, the time to communicate revised schedules to suppliers and customers, and the overtime to catch up.
Cloud infrastructure isn’t good or bad. It’s a set of tradeoffs between cost, control, complexity, and risk. For most manufacturers, cloud-based ERP makes more sense than on-premise solutions. The economics work better. You get faster implementations. You get automatic updates. You get better disaster recovery for normal scenarios like hardware failures or local power outages.
But cloud ERP reliability depends on infrastructure you don’t control. AWS has experienced outages in 2023 and 2021 that knocked websites offline for hours AWS outage: Company working to restore service as users report a resurgence in issues. Microsoft had a major productivity software outage earlier this month. Google’s cloud services went down for an extended period in June. Every major provider has outages. The question isn’t whether they’ll happen. It’s whether you’re prepared when they do.
Make conscious tradeoffs. Understand what you’re accepting and what you’re not. If your manufacturing operation can tolerate a few hours of downtime a few times a year, cloud infrastructure is probably fine. If it can’t, you need redundancy, and redundancy is expensive. Really expensive. Expensive enough that most manufacturers decide to accept the risk rather than pay for full redundancy.
The manufacturers that handle outages well aren’t the ones with perfect systems. They’re the ones who planned for imperfect systems and knew what to do when things failed.
Build visibility into your dependencies. Use monitoring tools that alert you to upstream provider issues, not just your own application problems. When AWS has an issue, you want to know immediately, before your production supervisors start calling because they can’t access work orders.
Create runbooks for scenarios you can’t fix. Your team should know what to do during a cloud provider outage even though they can’t fix the root cause. Who communicates with customers? Who communicates with the shop floor? Who monitors for recovery? Who decides whether to continue production manually or shut down and wait?
Design your most critical processes with offline capability where possible. Not everything can work offline, but some things can. Print work orders at the start of each shift instead of relying on real-time access. Have your quality team record inspection results on paper and backfill them later. Have your shipping team work from a queue that was generated before the outage. These workarounds aren’t elegant, but they beat complete stoppage.
Keep local copies of critical data. You need to know what inventory you have, what’s in process, and what’s scheduled even when you can’t access your ERP. Some manufacturers maintain shadow systems or periodic exports specifically for this scenario. It’s not real-time, but it’s better than nothing.
Review your vendor’s architecture and ask specific questions. Where are their primary data centers? What dependencies do they have on other cloud services? What’s their RTO (recovery time objective) for manufacturing customers? How do they handle regional failures? Do they have customers running multiple shifts who need 24/7 uptime? Good vendors will answer these questions directly and show you their architecture diagrams. Evasive answers tell you something too.
Test your manual procedures before you need them. Most manufacturers have theoretical plans for what to do during an ERP outage. Few have actually tested those plans. Run a drill. Pick a slow day and pretend your ERP is down for four hours. See what breaks. See what works. Fix the problems you find before they matter.
The AWS outage this morning will be resolved. Services will come back online. Most companies will get back to normal within hours. But the underlying reality remains: modern manufacturing increasingly depends on cloud infrastructure, and that infrastructure is more centralized than most people realize. That centralization creates systemic risk alongside its benefits.
Understanding that risk is the first step to managing it. The second step is having a plan for the day when understanding isn’t enough and you need to keep production running anyway.
For more information about how OnRamp ERP software can add value to your business fill in the contact form below. A member of our support team will contact you within 1 business day to discuss any questions you have.
Start the collaboration with us while figuring out the best solution based on your needs.
Has your business outgrown a patchwork of disconnected systems? This checklist helps you assess readiness, identify gaps, and prepare for a smooth transition.