Resolved
Error rates and latency across all API endpoints have been stable for the past 36 hours.
Monitoring
Error rates and latencies have returned to normal levels. We are continuing to closely monitor the system to ensure stability.
Monitoring
The mitigations have improved the error rates, and we are continuing to monitor. Elevated latencies are in the process of reducing as well.
Monitoring
We experiencing an increase in 5XX errors and latency stemming from our database provider related to the incident we experienced on 09/08. We have identified the root cause and are working towards a fix.
Monitoring
We are recovering from our widespread issues -- ingestion and trace / metrics fetching is operational again.
Latencies are 10% higher than expected, but are trending down.
We are digging into a full RCA and will have more to share in the next 1-2 days.
Thank you for your patience and apologies for any inconvenience caused by this incident.
Investigating
We are beginning to see very high latencies across the board and continued delay with runs ingestion.
We are still looking into a fix as fast as possible with our database provider.
Investigating
Run ingestion continues to be delayed.
Additionally, we are noticing that queries which surface metrics in the application are taking longer than expected to complete.
We are working to address both as quickly as possible
Investigating
The main issue we are currently seeing is a sharp increase ingestion delay (time it takes for runs to be durably ingested to LangSmith and show up in the UI / API)
We are seeing p99 times in the order of 15-20 minutes. We are investigating the issue with our database provider and pushing for a fix.
Identified
Our team is working with our service providers to address the issue with increased 5xx with our stats capabilities. You might face delays in seeing new runs on LangSmith, though no data loss is expected.
Investigating
We are currently investigating an increase in 500 errors impacting some stats queries. Our team is working to identify the root cause and will provide updates as soon as possible.
Looking for the EU status page? Find it here: https://eu.status.smith.langchain.com