[00:34:44] <icinga-wm>	 PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:35:24] <icinga-wm>	 RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:06:51] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.37.0-wmf.13 [core] (wmf/1.37.0-wmf.13) - 10https://gerrit.wikimedia.org/r/703262
[02:06:55] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.37.0-wmf.13 [core] (wmf/1.37.0-wmf.13) - 10https://gerrit.wikimedia.org/r/703262 (owner: 10TrainBranchBot)
[02:28:51] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.37.0-wmf.13 [core] (wmf/1.37.0-wmf.13) - 10https://gerrit.wikimedia.org/r/703262 (owner: 10TrainBranchBot)
[03:27:56] <icinga-wm>	 PROBLEM - SSH on mw1284.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:31:16] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 68 probes of 620 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[03:37:10] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 52 probes of 620 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[04:28:42] <icinga-wm>	 RECOVERY - SSH on mw1284.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:42:24] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Move db1124 to m2. [puppet] - 10https://gerrit.wikimedia.org/r/703269 (https://phabricator.wikimedia.org/T286042)
[05:21:18] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 103 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:25:53] <wikibugs>	 (03PS2) 10ArielGlenn: dumps: Drop absented cron [puppet] - 10https://gerrit.wikimedia.org/r/702933 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup)
[05:29:16] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] dumps: Drop absented cron [puppet] - 10https://gerrit.wikimedia.org/r/702933 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup)
[06:23:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10ArielGlenn) What is the expected length of service interupption for any of these days? I'm looking on the impact on the dumpsdata/snapshot hosts...
[06:31:50] <marostegui>	 !log Upgrade db1160 kernel
[06:31:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:25] <marostegui>	 !log Upgrade db1138 kernel
[06:35:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:21] <legoktm>	 !issync
[06:37:21] <ircservserv-wm>	 Syncing #wikimedia-operations (requested by legoktm)
[06:37:23] <ircservserv-wm>	 Set /cs flags #wikimedia-operations litharge +o
[06:45:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add separate role for Ganeti test cluster [puppet] - 10https://gerrit.wikimedia.org/r/703213 (https://phabricator.wikimedia.org/T286206) (owner: 10Muehlenhoff)
[06:50:36] <marostegui>	 !log Upgrade db1122 kernel
[06:50:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:19] <moritzm>	 !log installing PHP 7.3 securiy updates on buster
[06:54:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:57:06] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 184 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:00:04] <jouncebot>	 Deploy window No deploys all week! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210706T0700)
[07:06:53] <marostegui>	 !log Upgrade db1104 kernel
[07:07:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:00] <icinga-wm>	 PROBLEM - SSH on cp5005.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:21:54] <wikibugs>	 10ops-eqiad, 10DBA: Upgrade db1104 firmware - https://phabricator.wikimedia.org/T286226 (10Marostegui)
[07:22:07] <wikibugs>	 10ops-eqiad, 10DBA: Upgrade db1104 firmware - https://phabricator.wikimedia.org/T286226 (10Marostegui) p:05Triage→03High a:03Kormat
[07:25:21] <wikibugs>	 (03PS1) 10Muehlenhoff: Add library hint for libuv1 [puppet] - 10https://gerrit.wikimedia.org/r/703348
[07:27:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for libuv1 [puppet] - 10https://gerrit.wikimedia.org/r/703348 (owner: 10Muehlenhoff)
[07:30:07] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Evaluate contour as an ingress - https://phabricator.wikimedia.org/T286196 (10Joe) **Usage**  Basic usage can be obtained by just defining the Ingress properties in the charts:  ` apiVersion: networking.k8s.io/v1 kind: Ingress metadata:   name: lambdoid spec:   rules:   - ht...
[07:32:00] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 69 probes of 628 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:32:02] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Evaluate contour as an ingress - https://phabricator.wikimedia.org/T286196 (10Joe) Collection of metrics via prometheus is supported natively, both from envoy and contour.  Using configuration options is also possible to make contour / envoy log in json format, see https://p...
[07:33:22] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Evaluate contour as an ingress - https://phabricator.wikimedia.org/T286196 (10Joe) Overall, contour seems built the right way and looks like a promising ingress. I am just wary of using CRDs so much, and of the fact it needs yet another operator to work properly.  I think th...
[07:43:48] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 51 probes of 628 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:44:12] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 63 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:53:25] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1007 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.129e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[07:53:54] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 93 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:59:48] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 52 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[08:43:43] <moritzm>	 !log installing libuv1 security updates on buster
[08:43:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:53] <_joe_>	 !log repooling wdqs1007 now that lag has caught up
[09:01:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:23] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Evaluate contour as an ingress - https://phabricator.wikimedia.org/T286196 (10Joe) 05Open→03Resolved p:05Triage→03High
[09:03:25] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Create a gateway in kubernetes for the execution of our "lambdas" - https://phabricator.wikimedia.org/T261277 (10Joe)
[09:16:55] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Fix Puppet CA expired certs - https://phabricator.wikimedia.org/T286229 (10jbond)
[09:17:05] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Move db1124 to m2. [puppet] - 10https://gerrit.wikimedia.org/r/703269 (https://phabricator.wikimedia.org/T286042) (owner: 10Marostegui)
[09:18:10] <wikibugs>	 (03PS1) 10Jbond: P:puppetmaster: update NRPE script [puppet] - 10https://gerrit.wikimedia.org/r/703359 (https://phabricator.wikimedia.org/T286229)
[09:19:31] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30112/console" [puppet] - 10https://gerrit.wikimedia.org/r/703359 (https://phabricator.wikimedia.org/T286229) (owner: 10Jbond)
[09:22:39] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1017 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[09:23:19] <_joe_>	 marostegui: put that database back!
[09:24:41] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1017 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[09:27:53] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1017 is CRITICAL: CRITICAL check_failover servers up 2 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[09:28:02] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1021 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[09:28:02] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1012 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[09:29:22] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1015 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[09:32:56] <marostegui>	 ^ me
[09:35:09] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1017 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[09:35:17] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1021 is OK: OK check_failover servers up 1 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[09:35:17] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1012 is OK: OK check_failover servers up 1 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[09:38:11] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Redirect https://lists.wikimedia.org/pipermail/foobar/ to https://lists.wikimedia.org/hyperkitty/list/foobar@lists.wikimedia.org/ - https://phabricator.wikimedia.org/T285949 (10Aklapper) Thanks a lot! <3
[09:38:17] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1013 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[09:52:57] <wikibugs>	 (03PS1) 10Jbond: P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231)
[09:53:37] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mwdebug: add more network egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/703391
[09:53:39] <wikibugs>	 (03PS2) 10Jbond: P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231)
[09:54:21] <wikibugs>	 (03PS3) 10Jbond: P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231)
[09:55:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[09:58:46] <wikibugs>	 (03PS4) 10Jbond: P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231)
[09:59:25] <wikibugs>	 (03CR) 10Jbond: P:configmaster: update disc state to match post dc switch over state (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[10:00:04] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30113/console" [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[10:00:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[10:02:13] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM minus the flake8 violations." [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[10:02:33] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Create Ganeti test cluster - https://phabricator.wikimedia.org/T286206 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[10:04:55] <wikibugs>	 (03PS5) 10Jbond: P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231)
[10:13:06] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:configmaster: update disc state to match post dc switch over state [puppet] - 10https://gerrit.wikimedia.org/r/703390 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[10:18:15] <wikibugs>	 (03PS1) 10Jbond: P:configmaster: update expected status for eventgate-external [puppet] - 10https://gerrit.wikimedia.org/r/703396 (https://phabricator.wikimedia.org/T286231)
[10:18:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/703359 (https://phabricator.wikimedia.org/T286229) (owner: 10Jbond)
[10:19:09] <moritzm>	 !log installing jackson-databind security updates on buster
[10:19:15] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Fix Puppet CA expired certs - https://phabricator.wikimedia.org/T286229 (10jbond) The ceritifcate was unused so it has been removed
[10:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:19:59] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:configmaster: update expected status for eventgate-external [puppet] - 10https://gerrit.wikimedia.org/r/703396 (https://phabricator.wikimedia.org/T286231) (owner: 10Jbond)
[10:27:52] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10cmooney) I’ve no need to think anything other than what’s in the child tasks at this point.  Having put out feelers externally I’ve some anecdot...
[10:37:26] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:puppetmaster: update NRPE script [puppet] - 10https://gerrit.wikimedia.org/r/703359 (https://phabricator.wikimedia.org/T286229) (owner: 10Jbond)
[10:41:59] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1013 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[10:42:07] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1015 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[11:02:35] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 134 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[11:16:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
[11:16:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
[11:17:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:44] <wikibugs>	 10SRE: Integrate Buster 10.10 point update - https://phabricator.wikimedia.org/T285206 (10MoritzMuehlenhoff)
[11:32:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
[11:32:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:17] <icinga-wm>	 PROBLEM - Puppet CA expired certs on puppetmaster1001 is CRITICAL: CRITICAL: 1869 puppet certs need to be renewed: https://wikitech.wikimedia.org/wiki/Puppet%23Renew_agent_certificate
[11:35:12] <_joe_>	 uhm
[11:35:19] <_joe_>	 1869?
[11:35:24] <_joe_>	 that seems a bit much
[11:47:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
[11:47:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:49] <jbond>	 _joe_: looking i made a change to the check so probably broke it
[11:53:51] <wikibugs>	 (03PS1) 10Jbond: P:puppetmaster: fix expiry check [puppet] - 10https://gerrit.wikimedia.org/r/703404
[11:55:54] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:puppetmaster: fix expiry check [puppet] - 10https://gerrit.wikimedia.org/r/703404 (owner: 10Jbond)
[11:56:33] <apergos>	 omg
[11:56:43] <apergos>	 sometimes the emoji substitution is just over the top 
[11:56:59] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 143 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[11:57:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
[11:57:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:43] <icinga-wm>	 RECOVERY - Puppet CA expired certs on puppetmaster1001 is OK: OK: all puppet agent certs fine https://wikitech.wikimedia.org/wiki/Puppet%23Renew_agent_certificate
[11:57:52] <apergos>	 https://share.riseup.net/#aRHZo6TEnddrZm02XnZezA
[11:57:58] * jbond lunch 
[11:58:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
[11:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:43] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 8 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:02:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
[12:02:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:15] <icinga-wm>	 PROBLEM - dump of s4 in codfw on alert1001 is CRITICAL: dump for s4 at codfw taken more than 8 days ago: Most recent backup 2021-06-28 12:00:02 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[12:07:29] <icinga-wm>	 PROBLEM - dump of s4 in eqiad on alert1001 is CRITICAL: dump for s4 at eqiad taken more than 8 days ago: Most recent backup 2021-06-28 12:00:02 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[12:08:51] <icinga-wm>	 PROBLEM - dump of es5 in eqiad on alert1001 is CRITICAL: dump for es5 at eqiad taken more than 8 days ago: Most recent backup 2021-06-28 12:00:01 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[12:09:49] <wikibugs>	 (03PS1) 10Aklapper: phabricator weekly changes email: Fix query for cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/703427 (https://phabricator.wikimedia.org/T286181)
[12:10:27] <icinga-wm>	 PROBLEM - dump of es4 in codfw on alert1001 is CRITICAL: dump for es4 at codfw taken more than 8 days ago: Most recent backup 2021-06-28 12:00:01 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[12:10:35] <icinga-wm>	 PROBLEM - dump of es5 in codfw on alert1001 is CRITICAL: dump for es5 at codfw taken more than 8 days ago: Most recent backup 2021-06-28 12:00:01 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[12:14:11] <wikibugs>	 (03CR) 10Aklapper: "Great... Thanks to running "git checkout -b T286181 origin/master" as usual, but having this repo using "origin/production" instead to con" [puppet] - 10https://gerrit.wikimedia.org/r/703427 (https://phabricator.wikimedia.org/T286181) (owner: 10Aklapper)
[12:14:26] <wikibugs>	 (03Abandoned) 10Aklapper: phabricator weekly changes email: Fix query for cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/703427 (https://phabricator.wikimedia.org/T286181) (owner: 10Aklapper)
[12:14:50] <jbond>	 marostegui: fyi ^^
[12:17:00] <wikibugs>	 (03PS1) 10Aklapper: phabricator weekly changes email: Fix query for cookie-licked tasks [puppet] - 10https://gerrit.wikimedia.org/r/703428 (https://phabricator.wikimedia.org/T286181)
[12:17:15] <icinga-wm>	 PROBLEM - dump of es4 in eqiad on alert1001 is CRITICAL: dump for es4 at eqiad taken more than 8 days ago: Most recent backup 2021-06-28 12:00:01 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[12:17:57] <wikibugs>	 (03CR) 10Aklapper: "https://gerrit.wikimedia.org/r/c/operations/puppet/+/703428" [puppet] - 10https://gerrit.wikimedia.org/r/703427 (https://phabricator.wikimedia.org/T286181) (owner: 10Aklapper)
[12:22:36] <marostegui>	 jbond: the dumps issue?
[12:22:54] <jbond>	 marostegui: yes
[12:23:20] <marostegui>	 jbond: thanks, they might recover soon automatically, sometimes it takes a bit longer than expected, I will monitor the issue
[12:23:30] <jbond>	 ack
[12:26:59] <wikibugs>	 (03PS1) 10Jbond: initial commit [debs/cfssl] - 10https://gerrit.wikimedia.org/r/703431
[12:27:37] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] initial commit [debs/cfssl] - 10https://gerrit.wikimedia.org/r/703431 (owner: 10Jbond)
[12:34:12] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Have linters/tests results show up as comments in files on gerrit - https://phabricator.wikimedia.org/T209149 (10kostajh) I've posted a few (mostly untested) patches today, here's the summary (cc @Tgr, @hashar, @Jdforrester-WMF, @awight):  *...
[12:36:34] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10ArielGlenn) >>! In T284592#7198669, @cmooney wrote: > I’ve no reason to think anything other than what’s in the child tasks at this point.  Havi...
[12:52:39] <wikibugs>	 10SRE, 10ops-eqiad, 10User-fgiunchedi: Disk failed on thanos-be1003 - https://phabricator.wikimedia.org/T285664 (10jbond) I have disabled notifications for this host in icinga to stop it appearing in the results for "Ensure hosts are not performing a change on every puppet run"
[12:56:52] <icinga-wm>	 PROBLEM - SSH on logstash1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[13:07:02] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mwdebug: add more network egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/703391 (owner: 10Giuseppe Lavagetto)
[13:10:03] <wikibugs>	 (03Merged) 10jenkins-bot: mwdebug: add more network egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/703391 (owner: 10Giuseppe Lavagetto)
[13:14:57] <icinga-wm>	 PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: git_pull_charts.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:15:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
[13:15:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:23] <icinga-wm>	 RECOVERY - SSH on logstash1009 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[13:27:02] <_joe_>	 uhm what's up with git pull charts
[13:28:13] <icinga-wm>	 RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:28:15] <wikibugs>	 (03CR) 10Jbond: "updated thanks" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/701498 (owner: 10Jbond)
[13:28:34] <wikibugs>	 (03PS3) 10Jbond: sre.idm.logout: create cookbook to logout users [cookbooks] - 10https://gerrit.wikimedia.org/r/701498
[13:29:15] <icinga-wm>	 PROBLEM - SSH on logstash1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[13:29:41] <wikibugs>	 (03PS6) 10Ottomata: Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232)
[13:30:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
[13:30:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sre.idm.logout: create cookbook to logout users [cookbooks] - 10https://gerrit.wikimedia.org/r/701498 (owner: 10Jbond)
[13:31:53] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[13:32:03] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mwdebug: bump opcache size and n. of files [deployment-charts] - 10https://gerrit.wikimedia.org/r/703435 (https://phabricator.wikimedia.org/T280497)
[13:34:12] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mwdebug: bump opcache size and n. of files [deployment-charts] - 10https://gerrit.wikimedia.org/r/703435 (https://phabricator.wikimedia.org/T280497) (owner: 10Giuseppe Lavagetto)
[13:36:29] <wikibugs>	 (03PS1) 10Joal: Bump AQS druid snapshot to 2021-06 [puppet] - 10https://gerrit.wikimedia.org/r/703436
[13:36:41] <joal>	 ottomata: for when you have a minute --^
[13:37:11] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Bump AQS druid snapshot to 2021-06 [puppet] - 10https://gerrit.wikimedia.org/r/703436 (owner: 10Joal)
[13:40:16] <wikibugs>	 (03PS1) 10Ottomata: gobblin_job - fix typo in path to default jobconfig file [puppet] - 10https://gerrit.wikimedia.org/r/703437 (https://phabricator.wikimedia.org/T271232)
[13:41:53] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] gobblin_job - fix typo in path to default jobconfig file [puppet] - 10https://gerrit.wikimedia.org/r/703437 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[13:43:17] <icinga-wm>	 RECOVERY - dump of es5 in codfw on alert1001 is OK: Last dump for es5 at codfw (es2025.codfw.wmnet) taken on 2021-07-06 00:00:01 (1916 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[13:45:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
[13:45:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:11] <wikibugs>	 (03PS1) 10Ottomata: gobblin_job - set PYTHONPATH in environment [puppet] - 10https://gerrit.wikimedia.org/r/703439 (https://phabricator.wikimedia.org/T271232)
[13:49:05] <logmsgbot>	 !log otto@cumin1001 START - Cookbook sre.aqs.roll-restart
[13:49:05] <logmsgbot>	 !log otto@cumin1001 END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
[13:49:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:18] <logmsgbot>	 !log otto@cumin1001 START - Cookbook sre.aqs.roll-restart
[13:49:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:45] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] gobblin_job - set PYTHONPATH in environment [puppet] - 10https://gerrit.wikimedia.org/r/703439 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[13:49:51] <wikibugs>	 (03PS2) 10Ottomata: gobblin_job - set PYTHONPATH in environment [puppet] - 10https://gerrit.wikimedia.org/r/703439 (https://phabricator.wikimedia.org/T271232)
[13:49:57] <icinga-wm>	 RECOVERY - dump of es4 in eqiad on alert1001 is OK: Last dump for es4 at eqiad (es1022.eqiad.wmnet) taken on 2021-07-06 00:00:01 (1938 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[13:50:33] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] gobblin_job - set PYTHONPATH in environment [puppet] - 10https://gerrit.wikimedia.org/r/703439 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[13:53:41] <logmsgbot>	 !log otto@cumin1001 END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
[13:53:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:26] <icinga-wm>	 RECOVERY - SSH on logstash1009 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[13:57:45] <icinga-wm>	 PROBLEM - SSH on logstash1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:00:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
[14:00:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:39] <wikibugs>	 (03PS1) 10Jbond: O:apereo_cas: move WMF specific code to profile::idp [puppet] - 10https://gerrit.wikimedia.org/r/703442
[14:04:55] <icinga-wm>	 RECOVERY - SSH on logstash1009 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:09:02] <icinga-wm>	 PROBLEM - SSH on logstash1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:10:19] <icinga-wm>	 RECOVERY - SSH on logstash1009 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:14:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/703442 (owner: 10Jbond)
[14:14:57] <icinga-wm>	 PROBLEM - SSH on logstash1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:30:25] <icinga-wm>	 RECOVERY - SSH on logstash1009 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[14:30:59] <icinga-wm>	 PROBLEM - Check systemd state on logstash1009 is CRITICAL: CRITICAL - degraded: The following units failed: elasticsearch_5@production-logstash-eqiad.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:31:41] <wikibugs>	 (03PS1) 10Jbond: initial commit [puppet-apereo_cas] - 10https://gerrit.wikimedia.org/r/703446
[14:33:02] <wikibugs>	 (03PS2) 10Jbond: initial commit [puppet-apereo_cas] - 10https://gerrit.wikimedia.org/r/703446
[14:33:16] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] initial commit [puppet-apereo_cas] - 10https://gerrit.wikimedia.org/r/703446 (owner: 10Jbond)
[14:34:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] O:apereo_cas: move WMF specific code to profile::idp [puppet] - 10https://gerrit.wikimedia.org/r/703442 (owner: 10Jbond)
[14:37:15] <wikibugs>	 (03PS1) 10Jbond: P:idp: move tomecat specific config to profile [puppet] - 10https://gerrit.wikimedia.org/r/703447
[14:42:47] <icinga-wm>	 RECOVERY - dump of es5 in eqiad on alert1001 is OK: Last dump for es5 at eqiad (es1025.eqiad.wmnet) taken on 2021-07-06 00:00:01 (1916 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[14:44:33] <icinga-wm>	 RECOVERY - dump of es4 in codfw on alert1001 is OK: Last dump for es4 at codfw (es2022.codfw.wmnet) taken on 2021-07-06 00:00:01 (1938 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[14:55:43] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30115/console" [puppet] - 10https://gerrit.wikimedia.org/r/703447 (owner: 10Jbond)
[14:58:46] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: wmdebug: fix networkpolicy ports definitions [deployment-charts] - 10https://gerrit.wikimedia.org/r/703449
[15:00:50] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: mwdebug: fix networkpolicy ports definitions [deployment-charts] - 10https://gerrit.wikimedia.org/r/703449
[15:00:57] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 66 probes of 628 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:02:15] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:idp: move tomecat specific config to profile [puppet] - 10https://gerrit.wikimedia.org/r/703447 (owner: 10Jbond)
[15:12:45] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 58 probes of 628 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:22:32] <wikibugs>	 (03PS1) 10Jbond: O:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618)
[15:23:33] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 41 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:23:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] O:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618) (owner: 10Jbond)
[15:39:58] <icinga-wm>	 RECOVERY - dump of s4 in codfw on alert1001 is OK: Last dump for s4 at codfw (db2099.codfw.wmnet:3314) taken on 2021-07-06 00:00:01 (382 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[15:43:11] <icinga-wm>	 RECOVERY - dump of s4 in eqiad on alert1001 is OK: Last dump for s4 at eqiad (db1145.eqiad.wmnet:3314) taken on 2021-07-06 00:00:02 (382 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting
[15:44:27] <wikibugs>	 (03PS1) 10Ottomata: Remove camus webrequest job in analytics test cluster [puppet] - 10https://gerrit.wikimedia.org/r/703457 (https://phabricator.wikimedia.org/T271232)
[15:48:37] <logmsgbot>	 !log otto@deploy1002 Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
[15:48:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:02] <logmsgbot>	 !log otto@deploy1002 Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
[15:54:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:01] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Remove camus webrequest job in analytics test cluster [puppet] - 10https://gerrit.wikimedia.org/r/703457 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[16:04:59] <wikibugs>	 (03PS2) 10Jbond: C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618)
[16:06:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618) (owner: 10Jbond)
[16:14:35] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 128 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:22:34] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "SGTM" [puppet] - 10https://gerrit.wikimedia.org/r/701658 (https://phabricator.wikimedia.org/T285361) (owner: 10Ladsgroup)
[16:23:52] <wikibugs>	 (03PS1) 10Ottomata: eventgate now uses prometheus directly instead of statsd bridge [deployment-charts] - 10https://gerrit.wikimedia.org/r/703463 (https://phabricator.wikimedia.org/T272714)
[16:23:58] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/703252 (https://phabricator.wikimedia.org/T286218) (owner: 10Legoktm)
[16:24:47] <icinga-wm>	 RECOVERY - SSH on cp5005.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:33:55] <legoktm>	 Amir1: woot, rolling both out now
[16:34:08] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] mailman: Enable verp probes [puppet] - 10https://gerrit.wikimedia.org/r/701658 (https://phabricator.wikimedia.org/T285361) (owner: 10Ladsgroup)
[16:34:21] <Amir1>	 legoktm: thanks for the work <3
[16:34:24] <wikibugs>	 (03PS3) 10Legoktm: mailman3: Discard all mails with a X-Spam-Score >= 6 [puppet] - 10https://gerrit.wikimedia.org/r/703252 (https://phabricator.wikimedia.org/T286218)
[16:35:40] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] mailman3: Discard all mails with a X-Spam-Score >= 6 [puppet] - 10https://gerrit.wikimedia.org/r/703252 (https://phabricator.wikimedia.org/T286218) (owner: 10Legoktm)
[16:39:53] <wikibugs>	 (03PS3) 10Jbond: C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618)
[16:41:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618) (owner: 10Jbond)
[16:42:26] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
[16:42:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:13] <wikibugs>	 (03PS4) 10Jbond: C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618)
[16:44:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618) (owner: 10Jbond)
[16:46:41] <wikibugs>	 (03PS5) 10Jbond: C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618)
[16:48:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] C:rsync::server:  convert to concat [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618) (owner: 10Jbond)
[16:51:15] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 16): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30122/console" [puppet] - 10https://gerrit.wikimedia.org/r/703452 (https://phabricator.wikimedia.org/T205618) (owner: 10Jbond)
[16:52:27] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Have linters/tests results show up as comments in files on gerrit - https://phabricator.wikimedia.org/T209149 (10kostajh) Output from the above patches (where possible anyway) can be seen here: https://gerrit.wikimedia.org/r/c/mediawiki/core...
[17:03:09] <icinga-wm>	 PROBLEM - etcd request latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 operation={get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:03:21] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 verb=DELETE https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:03:22] <icinga-wm>	 PROBLEM - etcd request latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 operation={delete,get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:04:08] <legoktm>	 erm, ^ is that an issue _joe_?
[17:04:19] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={LIST,PATCH,PUT} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:04:47] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 54 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:05:15] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:07:11] <icinga-wm>	 RECOVERY - etcd request latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:08:09] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:08:53] <icinga-wm>	 RECOVERY - etcd request latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:12:39] <_joe_>	 legoktm: it just means that creating new objects in k8s is slow, which in a normal situation shouldn't be an issue
[17:12:56] <legoktm>	 ok
[17:13:09] <legoktm>	 what would have caused it?
[17:14:29] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 124 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:19:25] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
[17:19:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:47] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
[17:19:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:54] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
[17:20:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:24] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
[17:20:25] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 43 probes of 627 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:20:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:25:55] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
[17:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:53] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Enable verp probes in mailman3 - https://phabricator.wikimedia.org/T285361 (10Legoktm) 05Open→03Resolved a:03Ladsgroup
[17:39:50] <wikibugs>	 (03PS1) 10Ladsgroup: dumps: Migrate kiwix update cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/703470 (https://phabricator.wikimedia.org/T273673)
[17:45:45] <wikibugs>	 (03PS1) 10Ottomata: Declare analytics webrequest and netflow gobblin jobs [puppet] - 10https://gerrit.wikimedia.org/r/703471 (https://phabricator.wikimedia.org/T271232)
[17:47:13] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30123/console" [puppet] - 10https://gerrit.wikimedia.org/r/703471 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[17:47:53] <icinga-wm>	 PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[17:48:01] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1 C: 03+2] Declare analytics webrequest and netflow gobblin jobs [puppet] - 10https://gerrit.wikimedia.org/r/703471 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata)
[17:48:28] <wikibugs>	 (03PS1) 10Ottomata: Fix comment in job/gobblin.pp [puppet] - 10https://gerrit.wikimedia.org/r/703472
[17:48:36] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Fix comment in job/gobblin.pp [puppet] - 10https://gerrit.wikimedia.org/r/703472 (owner: 10Ottomata)
[17:54:55] <wikibugs>	 (03PS1) 10Ottomata: Ensure absent webrequest and netflow camus jobs [puppet] - 10https://gerrit.wikimedia.org/r/703474 (https://phabricator.wikimedia.org/T271232)
[17:56:52] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] "Merging.  Will only deploy to staging clusters for now." [deployment-charts] - 10https://gerrit.wikimedia.org/r/703463 (https://phabricator.wikimedia.org/T272714) (owner: 10Ottomata)
[17:57:21] <icinga-wm>	 RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 23654 bytes in 0.326 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:00:18] <wikibugs>	 (03PS1) 10Ladsgroup: Set $wgIncludejQueryMigrate to false in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703476 (https://phabricator.wikimedia.org/T280944)
[18:03:44] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
[18:03:44] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
[18:03:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:03:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:24] <wikibugs>	 (03PS1) 10Ottomata: eventgate-analytics/values-staging.yaml - set num_workers: 0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/703477 (https://phabricator.wikimedia.org/T272714)
[18:19:22] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-analytics/values-staging.yaml - set num_workers: 0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/703477 (https://phabricator.wikimedia.org/T272714) (owner: 10Ottomata)
[18:32:07] <wikibugs>	 (03PS1) 10Ottomata: eventgate - Allow for setting num_workers: 0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/703478
[18:32:46] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate - Allow for setting num_workers: 0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/703478 (owner: 10Ottomata)
[18:34:38] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
[18:34:38] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
[18:34:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:48] <Amir1>	 I'm messing with mwdebug2001 to test T285919
[18:51:49] <stashbot>	 T285919: Allow links to dag.wikipedia.org from Wikidata - https://phabricator.wikimedia.org/T285919
[19:50:38] <wikibugs>	 10SRE, 10MediaWiki-Cache: APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp - https://phabricator.wikimedia.org/T286260 (10Ladsgroup)
[19:50:47] <Amir1>	 _joe_: ^ have fun
[19:54:24] <wikibugs>	 10SRE, 10MediaWiki-Cache: APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp - https://phabricator.wikimedia.org/T286260 (10Joe) Clearing acpu is as easy as doing a rolling restart of the cluster. But I think we should first fix the BagOfStuff implementation and/or all the...
[19:54:31] <_joe_>	 Amir1: awesome
[19:54:43] <_joe_>	 Amir1: what were we saying today about unit testing?
[19:55:05] <Amir1>	 or lack thereof :D
[20:04:28] <wikibugs>	 (03PS1) 10Tks4Fish: zhwiktionary: Add namespaces: *118 - Reconstruction *119 - Reconstruction Talk [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703480 (https://phabricator.wikimedia.org/T286101)
[20:04:30] <wikibugs>	 (03PS1) 10Tks4Fish: zhwiktionary: Add aliases for namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703481 (https://phabricator.wikimedia.org/T286101)
[20:04:33] <wikibugs>	 (03PS1) 10Tks4Fish: zhwiktionary: Add templateeditor right [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703482 (https://phabricator.wikimedia.org/T286101)
[20:09:52] <wikibugs>	 10SRE, 10MediaWiki-Cache, 10Patch-For-Review: APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp - https://phabricator.wikimedia.org/T286260 (10Krinkle) I'd say the exptime parameter as timestamp is obscure and something I've not seen even once being used in the past ten...
[20:10:19] <wikibugs>	 10SRE, 10MediaWiki-Cache, 10Platform Engineering, 10Patch-For-Review: APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp - https://phabricator.wikimedia.org/T286260 (10Krinkle)
[20:45:20] <wikibugs>	 10SRE, 10MediaWiki-Cache, 10Platform Engineering, 10Patch-For-Review: APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp - https://phabricator.wikimedia.org/T286260 (10Ladsgroup) While this is clearly the way to do but I'm slightly worried by deploying this we will caus...
[21:06:50] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS6939/IPv6: Active - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:14:56] <icinga-wm>	 PROBLEM - Router interfaces on cr3-eqsin is CRITICAL: CRITICAL: host 103.102.166.131, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:15:54] <icinga-wm>	 PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - AS1299/IPv6: Idle - Telia, AS1299/IPv4: Idle - Telia https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:21:43] <wikibugs>	 10SRE, 10MediaWiki-Cache, 10Platform Engineering, 10Patch-For-Review: APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp - https://phabricator.wikimedia.org/T286260 (10Krinkle) Right, but changing ApcuBag first will do the same thing as well.  Perhaps a safer first step...
[21:22:36] <icinga-wm>	 RECOVERY - Router interfaces on cr3-eqsin is OK: OK: host 103.102.166.131, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:23:30] <icinga-wm>	 RECOVERY - BGP status on cr3-eqsin is OK: BGP OK - up: 319, down: 1, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:25:57] <icinga-wm>	 RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 423, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:29:09] <wikibugs>	 (03PS1) 10Ottomata: Bump eventgate image version to get normalized prometheus metric labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/703487 (https://phabricator.wikimedia.org/T272714)
[21:55:28] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect - HE, AS6939/IPv6: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[22:10:52] <icinga-wm>	 RECOVERY - BGP status on cr2-eqsin is OK: BGP OK - up: 66, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[22:43:57] <wikibugs>	 (03PS1) 10Legoktm: Re-enable Score using Shellbox on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423)
[22:45:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Re-enable Score using Shellbox on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[22:46:57] <wikibugs>	 (03PS2) 10Legoktm: Re-enable Score using Shellbox on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423)
[22:47:28] <wikibugs>	 (03PS1) 10H.krishna123: [WIP] web_app: Creating skeleton code for frontend, and static files [software/bernard] - 10https://gerrit.wikimedia.org/r/703490 (https://phabricator.wikimedia.org/T285438)
[22:50:51] <wikibugs>	 (03CR) 10Ladsgroup: [C: 04-1] Re-enable Score using Shellbox on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[22:51:54] <wikibugs>	 (03PS1) 10Legoktm: services_proxy: Add envoyproxy for shellbox [puppet] - 10https://gerrit.wikimedia.org/r/703491 (https://phabricator.wikimedia.org/T281423)
[22:53:26] <wikibugs>	 (03CR) 10Legoktm: Re-enable Score using Shellbox on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[22:53:41] <wikibugs>	 (03PS3) 10Legoktm: Re-enable Score using Shellbox on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423)
[22:56:43] <wikibugs>	 (03PS2) 10Legoktm: services_proxy: Add envoyproxy for shellbox [puppet] - 10https://gerrit.wikimedia.org/r/703491 (https://phabricator.wikimedia.org/T281423)
[22:57:34] <wikibugs>	 (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30125/console" [puppet] - 10https://gerrit.wikimedia.org/r/703491 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[22:59:26] <wikibugs>	 (03PS3) 10Legoktm: services_proxy: Add envoyproxy for shellbox [puppet] - 10https://gerrit.wikimedia.org/r/703491 (https://phabricator.wikimedia.org/T281423)
[23:12:17] <legoktm>	 I'm hacking on mwdebug2001
[23:17:13] <legoktm>	 https://test.wikipedia.org/wiki/Score has real scores again
[23:17:26] <wikibugs>	 (03PS4) 10Legoktm: Re-enable Score using Shellbox on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423)
[23:21:52] <wikibugs>	 (03CR) 10Legoktm: Re-enable Score using Shellbox on testwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[23:22:29] <wikibugs>	 (03CR) 10DannyS712: Re-enable Score using Shellbox on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[23:34:32] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 104 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[23:34:48] <wikibugs>	 (03PS5) 10Legoktm: Re-enable Score using Shellbox on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423)
[23:34:50] <wikibugs>	 (03PS1) 10Legoktm: Document $wgShellboxSecretKey in private/readme.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703495
[23:34:52] <wikibugs>	 (03PS1) 10Legoktm: Add Shellbox to {Production,Labs}Services.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703496 (https://phabricator.wikimedia.org/T281423)
[23:35:27] <wikibugs>	 (03CR) 10Legoktm: Re-enable Score using Shellbox on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/703489 (https://phabricator.wikimedia.org/T281423) (owner: 10Legoktm)
[23:45:53] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10Platonides) 05Open→03Resolved I have actually removed those two print() statements (some debugging, it seems), so it doesn't produce any output.  List is now...
[23:48:20] <wikibugs>	 (03PS1) 10Legoktm: lists: Redirect /mailman/options/<list> too [puppet] - 10https://gerrit.wikimedia.org/r/703497 (https://phabricator.wikimedia.org/T286267)
[23:50:05] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] lists: Redirect /mailman/options/<list> too [puppet] - 10https://gerrit.wikimedia.org/r/703497 (https://phabricator.wikimedia.org/T286267) (owner: 10Legoktm)
[23:52:04] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Redirect old /mailman/options/<list> urls - https://phabricator.wikimedia.org/T286267 (10Legoktm) 05Open→03Resolved a:03Legoktm ` km@cashew ~> curl -I "https://lists.wikimedia.org/mailman/options/daily-image-l" HTTP/1.1 301 Moved Permanently Date: T...