[13:03:47] jhathaway: swfrench-wmf there was an alert earlier about https://phabricator.wikimedia.org/T376988 was depooled, downtimed and handled to the dbas. Nothing else to do for the oncall people, and hardly any user impact. [13:04:04] *handed [13:35:39] thanks jynus [14:00:18] thanks, jynus! [14:41:45] jhathaway: might need to look at 2453b4a9fcf79c551e9aab9cabeeab2af6ca2a78 [14:42:07] ok [14:42:15] Detail: undefined method `reject' for nil:NilClass [14:42:32] I pushed an updated patch to fix that [14:42:55] or are you seeing it break somewhere sukhe? [14:43:44] jhathaway: weird, https://puppetboard.wikimedia.org/report/cp1100.eqiad.wmnet/5a5ea7c1e69e9d5ca61ba172666fac1ce85d27e9 [14:43:48] this was intermittent [14:43:52] but you are right, now it applied cleanly [14:43:58] strange [14:44:33] trying a new host [14:45:23] jhathaway: no idea why it failed on cp1100 for example but applies cleanly now [14:45:28] sorry about the noise [14:45:41] not at all, thank you for bringing it up [14:45:48] I haven't seen that failure mode before [14:45:51] still not sure of the cause [14:46:07] yeah I just got scared because this change theoretically affects everything except the DNS hosts :D [14:46:13] will keep an eye out if we see more [14:46:28] thanks [16:57:09] Hello everyone, sorry if this is wrong channel - PTiwary pinged our channel with info that API responds 502 quite often [16:58:09] from what I see in logstash - there are plenty of DB errors. Mostly "Cannot execute Wikimedia\Rdbms\Database::commit critical section while session state is out of sync." and "A connection error occurred during a query. " - can someone look into it ? [16:58:59] or whom should I ping? sorry, it's Friday 7PM and my memory doesn't want to cooperate [17:19:57] logstash is observability [17:20:20] pmiazga: IMO this should be a task and the relevant team added [17:20:32] so yeah, i'd make a task and add the observability team [17:20:46] if its unbreak now maybe start pinnging folks off https://office.wikimedia.org/wiki/Contact_list in working timezones [17:21:03] robh: Traffic has some context of this and it has been happening for some time in WME [17:21:07] so IMHO, not unbreak now [17:21:24] yeah, then task with observabitlity for logstash issue imo [17:21:25] in fact, we asked WME to reach out to the API team since we couldn't see anything wrong on the CDN [17:21:37] and maybe reference the other tasks with same issue ;D [17:38:52] yeah, and I got pinged by WME :D. So I successfully closed the loop :D