
""It was running the Oracle database that underpinned their sales platform, and was considered sufficiently critical that there was a five-figure leased-line connection to the support vendor who 'constantly monitored' the server for issues and was paid to send an engineer within an hour," Callum explained. A nearby cache of spare parts meant that replacement hardware would usually arrive before the support tech!"
"The arrangement also had two flaws. One was that the designated contracted on call support lived sufficiently far away that the slightest bit of bad weather - and there's plenty of that in the north - meant road conditions became so bad he could not safely arrive within an hour as required. The second was that the monitoring system wasn't very good at noticing when the servers went down but was excellent at detecting startups."
"Callum told us those quirks meant incidents usually unfolded as follows: A CPU card would experience a fault; The server's OS responded to losing a quarter of its CPUs by rebooting; The server would not reboot, because one of its CPU cards was broken; The contact center would complain to IT; Callum, or whichever other IT worker was on call, would drive in to remove the faulty card and reboot the server; The server would resume operations; The support contractor would call to re"
The company operated an 8-CPU Sun server using pairs of processors on removable cards to run an Oracle database for its sales platform. A five-figure leased-line supported contracted monitoring and guaranteed an on-site engineer within an hour, while a local spare parts cache often delivered replacements faster. The hardware aged and frequently suffered CPU-card faults. The contracted engineer lived far away and could be delayed by bad weather. The monitoring system detected startups well but missed outages. Typical incidents involved a CPU fault causing a reboot, a failed reboot due to a broken card, and on-call IT physically replacing the card to restore service.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]