From approximately 1700UTC to 1900UTC, an unexpected change to our database configuration which was applied by our service provider resulted in a performance regression in run ingestion and a peak latency of up to 20 minutes for traces received during that time.
Our service provider reverted the change and after our queue service processed the backlog, by 1910UTC performance had returned immediately to historic norms, with 99% of runs appearing in LangSmith in 5 seconds or less.
During this incident, no data was lost for any received runs and all delayed trace data should now appear in LangSmith. If you believe you are missing data in LangSmith, please reach out via support@langchain.dev with your trace and run IDs and we will investigate further.
Posted Feb 13, 2024 - 19:41 UTC
Monitoring
We have worked with our database provider to revert the recent change and are continuing to monitor. Latency for newly ingested runs has been reduced to under 5 seconds.
Posted Feb 13, 2024 - 19:19 UTC
Identified
A recent database upgrade has triggered a performance regression. We are working with our database provider on remediation and will post an update once we have additional information.
Posted Feb 13, 2024 - 18:21 UTC
Update
We have identified a performance regression with the database service that hosts runs. We are actively working on next steps to address the issue.
Posted Feb 13, 2024 - 18:17 UTC
Investigating
We are currently seeing run ingestion latency of over 5 minutes in some cases and are actively investigating the root cause. At this time, runs are being ingested and queued successfully but are not being reflected in the LangSmith UI.