Delivery Delays and Lost Deliveries

IMPORTANT: Upcoming change that could cause some webhooks to fail because of timeouts.

Hi everyone,

Some users have recently complained about unexplained delays in the trigger of their webhook events or even losing some events.

While digging into these issues we found that the max amount of time allowed for each trigger (currently 6 seconds) was not being enforced. This allowed webhook triggers to take much longer than expected to complete. Under load, this can lead to the slowdown of webhooks deliveries on other sites.

Next Monday we will be releasing a fix to put back the time limit check. This could cause some existing webhooks to fail on some events (if the webhook endpoint takes more than 6 seconds to return an answer to Shotgun).

We are also rolling out a fix for lost deliveries. So, starting today, no delivery should ever be lost.
Please do let us know if you see missing deliveries after this date.
(Note: We built the system so no delivery would ever be lost. The tradeoff for this is that it is possible, in extreme cases, that the same delivery is sent twice to a webhook.)

Regards,

Stéphane

PS: We are still investigating an issue which can also cause events to be delayed. We are working on getting this fixed asap and will provide an update here when the fix is ready.

5 Likes

Hi everyone, just a quick update here to let you know what’s going on…
First, we delayed the plan to inforce the maximum of 6 second allowed to process a webhook trigger. This is because we’re folding a fix for deliveries tagged as “failed” even if they had not reached the webhook endpoint (because of network hickups) and a fix for random delays that we noticed.
We’ve been testing this and things look pretty good so the plan is now to roll out the new stack next Monday (july 6th).

Note that, although this should not occur very often, as part of this change you may see a few deliveries that get repeated. We had to make the tradeoff between deliveries possibly not reaching the webhook endpoint and repeated ones so we chose the latter.

Regards,

-Stéphane

2 Likes

Hi everyone!
This was roled out a few minutes ago… 6 seconds delay are now enforced on webhook triggers and there should not be any missing deliveries anymore.

Please do let us know if you see any issues!

Thank you!

3 Likes