[10:40:31] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10cmooney) a:03cmooney Thanks @hashar. I would agree with @ayounsi's analysis, if considering contint2001.mgmt **in... [11:27:35] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10hashar) That is quite an epic diagnostic @cmooney ! It is definitely not trivial to end up root causing some specific... [11:28:06] 10Traffic, 10SRE: Let's Encrypt issuance chains update - https://phabricator.wikimedia.org/T283164 (10aborrero) [11:49:32] 10Traffic: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10ema) [11:49:52] 10Traffic: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10ema) p:05Triage→03Medium [13:32:03] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10hashar) I went to fetch the IRC log from https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-operations/ which are fr... [14:32:57] 10Traffic, 10CirrusSearch, 10Discovery-Search, 10Wikimedia-JobQueue, and 3 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291 (10jcrespo) Errors seem to have receded a lot since 14:05: {F34664037} [14:33:38] 10Traffic, 10CirrusSearch, 10Discovery-Search, 10Infrastructure-Foundations, and 4 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291 (10jcrespo) [15:10:04] 10Traffic, 10CirrusSearch, 10Discovery-Search, 10Infrastructure-Foundations, and 5 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291 (10BBlack) Recapping from an IRC conversation: this was a fallout of the great Let'... [15:27:05] 10Traffic, 10CirrusSearch, 10Discovery-Search, 10Infrastructure-Foundations, and 6 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291 (10jcrespo) For more longer term, I also would like to wonder if there something we... [17:34:12] 10Traffic, 10Platform Engineering, 10SRE, 10Wikimedia-production-error: Wikimedia\Assert\PostconditionException: Postcondition failed: makeTitleSafe() should always return a Title for the text returned by getRootText(). - https://phabricator.wikimedia.org/T290194 (10Umherirrender) The assertion was added f... [18:41:55] Hi. On ruwiki forums reports about accessibility of text-lb.esams.wikimedia.org starts appearing (Cloudflare glitches?). Do admins know about it? [18:46:10] Vort: can you point to such a report please? thanks [18:46:28] (russian) https://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A4%D0%BE%D1%80%D1%83%D0%BC/%D0%A2%D0%B5%D1%85%D0%BD%D0%B8%D1%87%D0%B5%D1%81%D0%BA%D0%B8%D0%B9#%D0%9E%D1%88%D0%B8%D0%B1%D0%BA%D0%B8_%D0%BF%D1%80%D0%B8_%D0%BE%D1%82%D0%BA%D1%80%D1%8B%D1%82%D0%B8%D0%B8_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D1%8B [18:52:44] thanks. the english translation works surprisingly well. the expired certificate issue is almost certainly the LE issue linked below [18:53:10] the ping that fails is something else. is it just one user or multiple reports about that? [18:53:32] at least two users with failing pings [18:53:40] it is the main problem [18:53:54] sometime pages opens, sometimes not [18:54:37] (my access is fine by the way. I'm from different country, but the same esams server) [18:56:46] thanks for sharing and alerting us. I will keep an eye on this and share it with the SRE team [18:57:03] thanks