Run Ingestion Delay

Resolved·Degraded performance

From approximately 1700UTC to 1900UTC, an unexpected change to our database configuration which was applied by our service provider resulted in a performance regression in run ingestion and a peak latency of up to 20 minutes for traces received during that time.

Our service provider reverted the change and after our queue service processed the backlog, by 1910UTC performance had returned immediately to historic norms, with 99% of runs appearing in LangSmith in 5 seconds or less.

During this incident, no data was lost for any received runs and all delayed trace data should now appear in LangSmith. If you believe you are missing data in LangSmith, please reach out via support@langchain.dev with your trace and run IDs and we will investigate further.

Tue, Feb 13, 2024, 07:41 PM

(2 years ago)

·

Affected components

Feb 13, 2024, 06:13 PM

07:19 PM

LangSmith Run Ingestion

Updates

Resolved

From approximately 1700UTC to 1900UTC, an unexpected change to our database configuration which was applied by our service provider resulted in a performance regression in run ingestion and a peak latency of up to 20 minutes for traces received during that time.

Our service provider reverted the change and after our queue service processed the backlog, by 1910UTC performance had returned immediately to historic norms, with 99% of runs appearing in LangSmith in 5 seconds or less.

During this incident, no data was lost for any received runs and all delayed trace data should now appear in LangSmith. If you believe you are missing data in LangSmith, please reach out via support@langchain.dev with your trace and run IDs and we will investigate further.

Tue, Feb 13, 2024, 07:41 PM

Monitoring

We have worked with our database provider to revert the recent change and are continuing to monitor. Latency for newly ingested runs has been reduced to under 5 seconds.

Tue, Feb 13, 2024, 07:19 PM(22 minutes earlier)

Identified

A recent database upgrade has triggered a performance regression. We are working with our database provider on remediation and will post an update once we have additional information.

Tue, Feb 13, 2024, 06:21 PM(57 minutes earlier)

Investigating

We have identified a performance regression with the database service that hosts runs. We are actively working on next steps to address the issue.

Tue, Feb 13, 2024, 06:17 PM

Investigating

We are currently seeing run ingestion latency of over 5 minutes in some cases and are actively investigating the root cause. At this time, runs are being ingested and queued successfully but are not being reflected in the LangSmith UI.

Tue, Feb 13, 2024, 05:15 PM(1 hour earlier)