#incident-management

[ follow ]
fromInfoQ
12 hours ago

PagerDuty's Kafka Outage Silences Alerts for Thousands of Companies

PagerDuty, the incident management platform used by thousands of organisations to alert them to problems on their systems, suffered a major outage itself on 28th August 2025. The incident disrupted or delayed the processing of incoming events to customers in PagerDuty's US service region. Significant service degradations affected PagerDuty for more than nine hours. At its peak, approximately 95% of events were rejected over a 38-minute period, and 18% of create requests generated errors for 130 minutes.
Tech industry
#wildfire-response
fromTruthout
1 week ago
Environment

Firefighters Outraged After Immigration Raids Target Active Wildfire Response

fromTruthout
1 week ago
Environment

Firefighters Outraged After Immigration Raids Target Active Wildfire Response

Software development
fromClickUp
3 weeks ago

How to Measure and Reduce Bug Resolution Time | ClickUp

Reducing bug resolution time accelerates learning, protects revenue, and improves delivery reliability by eliminating fragmented workflows and unclear ownership.
Software development
fromClickUp
1 month ago

Top 16 Incident Management Software Tools for IT Teams | ClickUp

Effective incident management software enhances recovery through proper communication and streamlined processes.
#ai
fromInfoQ
5 months ago
Artificial intelligence

Datadog Employs LLMs for Assisting with Writing Accident Postmortems

fromInfoQ
5 months ago
Artificial intelligence

Datadog Employs LLMs for Assisting with Writing Accident Postmortems

fromInfoQ
2 months ago

Security or Convenience - Why Not Both?

You start the company-issued laptop. You really dislike this machine. It's slow, clunky. You don't like the operating system. You can't even install an ad blocker for your browser.
Software development
Artificial intelligence
fromInfoQ
2 months ago

Logz.io and Dynatrace Innovations Shift Observability Into the AI Age

AI integration into observability platforms is automating operational tasks to enhance efficiency.
Logz.io's AI Agents and Dynatrace's Davis AI significantly reduce incident resolution times.
Business intelligence
fromSilicon Canals
3 months ago

AI tool of the week: Netdata Insights, a tool that helps engineers find and fix system issues faster

Netdata Insights revolutionizes incident reporting by automating processes and delivering actionable intelligence from complex telemetry data.
Remote teams
fromNew Relic
3 months ago

Team collaboration speeds incident response

New Relic Teams enhances incident troubleshooting by centralizing ownership information, improving team coordination and reducing response times.
fromDevOps.com
4 months ago

Causely Extends Reach of Observability to Grafana Dashboards - DevOps.com

"Causely’s integration of Grafana dashboards enhances root cause analysis for DevOps teams, providing better visibility and actionable intelligence in IT workflows."
Artificial intelligence
fromSecuritymagazine
4 months ago

Automate or Fall Behind - Crisis Response at the Speed of Risk

Most businesses still treat crisis response like it's 2015. A ransomware alert goes out. Emails fly. Group chats explode. Someone digs out the playbook.
DevOps
#cybersecurity
Privacy professionals
fromSecuritymagazine
4 months ago

The Oracle breach and the case for transparent cyber response

The Oracle Cloud breach highlights the importance of responsiveness in cybersecurity, showcasing that initial denial can exacerbate damage.
Timely communication post-breach is critical to maintain trust and facilitate organizational responses.
Artificial intelligence
fromDevOps.com
4 months ago

Next-Generation Observability: Combining OpenTelemetry and AI for Proactive Incident Management - DevOps.com

Modern systems necessitate advanced monitoring solutions like OpenTelemetry due to the inadequacies of traditional tools.
E-Commerce
fromIrish Independent
4 months ago

Marks and Spencer pauses online orders and contactless payments in stores as ongoing cyber security incident persists

Marks and Spencer has suspended online orders and contactless payments due to a cyber-security incident.
[ Load more ]