A spike in the number of incoming complex trace payloads between May 1 at 2350 UTC and May 2 at 0030 UTC caused some of our ingest workers to be become CPU starved temporarily which resulted in periods of high latency for some LangSmith tenants. We have adjusted our scaling parameters to reduce the duration of latency in the future and will be implementing optimizations to relieve the CPU starvation in an upcoming release.
Posted May 02, 2024 - 02:56 UTC
Identified
A spike in the number of incoming complex trace payloads caused some of our ingest workers to be become CPU starved temporarily which resulted in some periods of high latency. This effect was temporary and we are adjusting our scaling parameters to prevent a recurrence.
Posted May 02, 2024 - 01:37 UTC
Investigating
We are investigating latency in run ingests with delays of more than 2 minutes for a subset of runs to appear in the LangSmith UI.