Run ingestion delays in LangSmith
Incident Report for LangSmith
Resolved
This incident has been resolved.
Posted Sep 19, 2024 - 21:39 UTC
Monitoring
From 1810 UTC to 1853 UTC, a series of queries to the LangSmith API caused resource starvation in our production database. This caused the buildup of a large backlog of pending runs with peak delays of 25 minutes between run ingestion and the run being available in LangSmith. During this time, there was an increase in timeouts in the LangSmith frontend as well.

Once the offending queries stopped, run ingestion delays persisted from 1853 UTC until 1935 UTC while we worked through the backlog of runs.

We have identified an inefficient query pathway and are taking corrective action to prevent a recurrence. We are actively monitoring to ensure that the issue is not repeated.
Posted Sep 18, 2024 - 21:02 UTC
Update
We are continuing to investigate a significant performance regression that is impacting run ingestion and the LangSmith UI. We are actively investigating the root cause and working toward a fix.
Posted Sep 18, 2024 - 18:49 UTC
Update
We are continuing to investigate this issue.
Posted Sep 18, 2024 - 18:29 UTC
Investigating
We are currently experiencing significant latency on run ingest with a median delay of > 5 minute. We are actively investigating.
Posted Sep 18, 2024 - 18:29 UTC
This incident affected: LangSmith Run Ingestion.