Investigating Elevated 500 Errors
Resolved·Partial outage

Error rates and latency across all API endpoints have been stable for the past 36 hours.

Thu, Sep 11, 2025, 02:10 AM
(4 days ago)
·
Affected components
LangSmith API
LangSmith Run Ingestion
Updates

Resolved

Error rates and latency across all API endpoints have been stable for the past 36 hours.

Thu, Sep 11, 2025, 02:10 AM

Monitoring

Error rates and latencies have returned to normal levels. We are continuing to closely monitor the system to ensure stability.

Tue, Sep 9, 2025, 04:27 PM(1 day earlier)

Monitoring

The mitigations have improved the error rates, and we are continuing to monitor. Elevated latencies are in the process of reducing as well.

Tue, Sep 9, 2025, 03:48 PM(39 minutes earlier)

Monitoring

We experiencing an increase in 5XX errors and latency stemming from our database provider related to the incident we experienced on 09/08. We have identified the root cause and are working towards a fix.

Tue, Sep 9, 2025, 06:58 AM(8 hours earlier)

Monitoring

We are recovering from our widespread issues -- ingestion and trace / metrics fetching is operational again.

Latencies are 10% higher than expected, but are trending down.

We are digging into a full RCA and will have more to share in the next 1-2 days.

Thank you for your patience and apologies for any inconvenience caused by this incident.

Mon, Sep 8, 2025, 10:18 PM(8 hours earlier)

Investigating

We are beginning to see very high latencies across the board and continued delay with runs ingestion.

We are still looking into a fix as fast as possible with our database provider.

Mon, Sep 8, 2025, 09:27 PM(51 minutes earlier)

Investigating

Run ingestion continues to be delayed.

Additionally, we are noticing that queries which surface metrics in the application are taking longer than expected to complete.

We are working to address both as quickly as possible

Mon, Sep 8, 2025, 08:49 PM(37 minutes earlier)

Investigating

The main issue we are currently seeing is a sharp increase ingestion delay (time it takes for runs to be durably ingested to LangSmith and show up in the UI / API)

We are seeing p99 times in the order of 15-20 minutes. We are investigating the issue with our database provider and pushing for a fix.

Mon, Sep 8, 2025, 07:13 PM(1 hour earlier)

Identified

Our team is working with our service providers to address the issue with increased 5xx with our stats capabilities. You might face delays in seeing new runs on LangSmith, though no data loss is expected.

Mon, Sep 8, 2025, 05:00 PM(2 hours earlier)

Investigating

We are currently investigating an increase in 500 errors impacting some stats queries. Our team is working to identify the root cause and will provide updates as soon as possible.

Mon, Sep 8, 2025, 03:00 PM(1 hour earlier)
Powered by

Looking for the EU status page? Find it here: https://eu.status.smith.langchain.com