Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 976 items:

2023-01-23 00:00:20 <icinga-wm> RECOVERY - Check systemd state on maps2009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 00:05:10 <icinga-wm> PROBLEM - Check systemd state on maps2009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 00:31:36 <icinga-wm> RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 00:36:26 <icinga-wm> PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_legacy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 02:07:47 <jinxer-wm> (JobUnavailable) firing: (4) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 02:12:47 <jinxer-wm> (JobUnavailable) firing: (10) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 02:27:47 <jinxer-wm> (JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 02:37:47 <jinxer-wm> (JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 02:47:47 <jinxer-wm> (JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 02:52:47 <jinxer-wm> (JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 03:10:11 <jinxer-wm> (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
2023-01-23 03:35:13 <wikibugs> ('PS1) ''Gerrit maintenance bot: mariadb: Promote db2104 to s2 master [puppet] - ''https://gerrit.wikimedia.org/r/882253 (https://phabricator.wikimedia.org/T327609)'
2023-01-23 03:51:22 <wikibugs> ('CR) ''Ladsgroup: [C: ''+2] mariadb: Promote db2104 to s2 master [puppet] - ''https://gerrit.wikimedia.org/r/882253 (https://phabricator.wikimedia.org/T327609) (owner: ''Gerrit maintenance bot)'
2023-01-23 03:52:30 <Amir1> !log Starting s2 codfw failover from db2107 to db2104 - T327609
2023-01-23 03:52:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 03:52:34 <stashbot> T327609: Switchover s2 master (db2107 -> db2104) - https://phabricator.wikimedia.org/T327609
2023-01-23 03:54:59 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
2023-01-23 03:56:47 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 03:56:50 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 04:02:28 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 04:02:30 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 04:12:28 <wikibugs> ('CR) ''Ladsgroup: [C: ''+1] "Can you add the ticket?" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/868127 (owner: ''Daniel Kinzler)'
2023-01-23 04:28:01 <wikibugs> ('PS1) ''Gerrit maintenance bot: mariadb: Promote db2123 to s5 master [puppet] - ''https://gerrit.wikimedia.org/r/882254 (https://phabricator.wikimedia.org/T327611)'
2023-01-23 04:30:53 <wikibugs> ('Abandoned) ''Ladsgroup: mariadb: Promote db2104 to s2 master [puppet] - ''https://gerrit.wikimedia.org/r/881375 (https://phabricator.wikimedia.org/T327370) (owner: ''Gerrit maintenance bot)'
2023-01-23 04:32:53 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
2023-01-23 04:32:57 <stashbot> T327611: Switchover s5 master (db2113 -> db2123) - https://phabricator.wikimedia.org/T327611
2023-01-23 04:33:10 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
2023-01-23 04:33:25 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
2023-01-23 04:51:45 <wikibugs> ('CR) ''Ladsgroup: [C: ''+2] mariadb: Promote db2123 to s5 master [puppet] - ''https://gerrit.wikimedia.org/r/882254 (https://phabricator.wikimedia.org/T327611) (owner: ''Gerrit maintenance bot)'
2023-01-23 04:53:32 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 04:53:34 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 04:57:08 <Amir1> !log Starting s5 codfw failover from db2113 to db2123 - T327611
2023-01-23 04:57:11 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 04:57:12 <stashbot> T327611: Switchover s5 master (db2113 -> db2123) - https://phabricator.wikimedia.org/T327611
2023-01-23 04:57:41 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
2023-01-23 04:59:40 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
2023-01-23 05:01:57 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 05:02:00 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 05:07:37 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 05:07:39 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 05:13:37 <wikibugs> ('PS1) ''KartikMistry: Content Translation: Add campaign for Wiki Loves Living Heritage [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882266 (https://phabricator.wikimedia.org/T327587)'
2023-01-23 05:33:43 <icinga-wm> PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
2023-01-23 05:34:37 <icinga-wm> RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
2023-01-23 05:50:49 <wikibugs> ('PS3) ''KartikMistry: Update cxserver to 2023-01-20-051603-production [deployment-charts] - ''https://gerrit.wikimedia.org/r/881051 (https://phabricator.wikimedia.org/T323840)'
2023-01-23 05:56:33 <kart_> Updating cxserver in a few minutes..
2023-01-23 05:57:12 <wikibugs> ('CR) ''KartikMistry: [C: ''+2] Update cxserver to 2023-01-20-051603-production [deployment-charts] - ''https://gerrit.wikimedia.org/r/881051 (https://phabricator.wikimedia.org/T323840) (owner: ''KartikMistry)'
2023-01-23 06:02:07 <wikibugs> ('Merged) ''jenkins-bot: Update cxserver to 2023-01-20-051603-production [deployment-charts] - ''https://gerrit.wikimedia.org/r/881051 (https://phabricator.wikimedia.org/T323840) (owner: ''KartikMistry)'
2023-01-23 06:12:06 <logmsgbot> !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
2023-01-23 06:12:33 <logmsgbot> !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
2023-01-23 06:16:21 <logmsgbot> !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
2023-01-23 06:17:01 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 06:17:04 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 06:17:05 <logmsgbot> !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
2023-01-23 06:18:15 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 06:18:17 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 06:18:37 <logmsgbot> !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
2023-01-23 06:19:31 <logmsgbot> !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
2023-01-23 06:23:29 <kart_> !log Updated cxserver to 2023-01-20-051603-production (T323840, T326236)
2023-01-23 06:23:34 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 06:23:35 <stashbot> T326236: Post-creation work for gucwiki - https://phabricator.wikimedia.org/T326236
2023-01-23 06:23:35 <stashbot> T323840: Make the Google translate the default Machine Translation in Central Kurdish Wikipedia - https://phabricator.wikimedia.org/T323840
2023-01-23 06:52:47 <jinxer-wm> (JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 06:56:36 <wikibugs> ('PS1) ''Stang: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131)'
2023-01-23 06:58:40 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 06:58:42 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 07:02:54 <wikibugs> ('PS1) ''Stang: shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380)'
2023-01-23 07:05:26 <wikibugs> ('PS2) ''Stang: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131)'
2023-01-23 07:05:46 <wikibugs> ('PS3) ''Stang: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131)'
2023-01-23 07:08:50 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 07:08:52 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 07:09:43 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 07:09:45 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 07:10:11 <jinxer-wm> (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
2023-01-23 07:13:23 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1106 db1206 T326669', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
2023-01-23 07:13:27 <stashbot> T326669: Productionize db1206-db1225 - https://phabricator.wikimedia.org/T326669
2023-01-23 07:22:08 <wikibugs> ('CR) ''Ayounsi: Add PTR resolution to firewall logs (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/880889 (https://phabricator.wikimedia.org/T327095) (owner: ''Ayounsi)'
2023-01-23 07:23:10 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
2023-01-23 07:24:00 <wikibugs> ('PS1) ''Marostegui: mariadb: Switch s1 sanitarium master [puppet] - ''https://gerrit.wikimedia.org/r/882515 (https://phabricator.wikimedia.org/T326669)'
2023-01-23 07:24:44 <wikibugs> ('CR) ''Marostegui: [C: ''+2] mariadb: Switch s1 sanitarium master [puppet] - ''https://gerrit.wikimedia.org/r/882515 (https://phabricator.wikimedia.org/T326669) (owner: ''Marostegui)'
2023-01-23 07:25:21 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
2023-01-23 07:25:31 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
2023-01-23 07:37:24 <wikibugs> ('CR) ''Ayounsi: WIP: add rt_flow grokking (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/880500 (https://phabricator.wikimedia.org/T325806) (owner: ''Filippo Giunchedi)'
2023-01-23 07:38:15 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
2023-01-23 07:40:26 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
2023-01-23 07:40:36 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
2023-01-23 07:41:52 <wikibugs> ('PS3) ''Stang: zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387)'
2023-01-23 07:42:33 <icinga-wm> PROBLEM - puppet last run on idm-test1001 is CRITICAL: CRITICAL: Puppet has been disabled for 604942 seconds, message: test OIDC - slyngshede, last run 7 days ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
2023-01-23 07:43:52 <wikibugs> ('PS6) ''Elukey: changeprop: add liftwing revscoring streams [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302)'
2023-01-23 07:43:54 <wikibugs> ('PS7) ''Elukey: helmfile.d: add a new test workflow for Lifting to changeprop's staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302)'
2023-01-23 07:44:20 <wikibugs> ('CR) ''Elukey: "I added one last little change, namely the possibility to set the kafka topic :)" [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 07:44:47 <wikibugs> ('CR) ''Elukey: "Added the kafka topic parameter to the staging settings (now the chart allows to specify it)." [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 07:53:20 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
2023-01-23 07:55:31 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
2023-01-23 07:55:41 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
2023-01-23 08:00:05 <jouncebot> Amir1 and Urbanecm: (Dis)respected human, time to deploy UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T0800). Please do the needful.
2023-01-23 08:00:05 <jouncebot> MatmaRex: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2023-01-23 08:00:41 <MatmaRex> hi
2023-01-23 08:00:48 <MatmaRex> is anyone really working at this hour? :D
2023-01-23 08:02:53 <Amir1> MatmaRex: let me check
2023-01-23 08:03:58 <Amir1> is the sync order okay?
2023-01-23 08:04:09 <_joe_> sirenbot: wake up
2023-01-23 08:05:34 <MatmaRex> Amir1: order shouldn't matter for this backport
2023-01-23 08:05:39 <_joe_> sigh, didn't we give -O to it?
2023-01-23 08:06:32 <Amir1> it could change the hashes of the modules and such but meh
2023-01-23 08:06:43 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by ladsgroup@deploy1002 using scap backport" [extensions/DiscussionTools] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882174 (https://phabricator.wikimedia.org/T327328) (owner: ''Bartosz Dziewoński)'
2023-01-23 08:08:25 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
2023-01-23 08:10:20 <wikibugs> ('PS1) ''Func: SpecialUserrights: Allow updating the expiry of user groups [core] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882179 (https://phabricator.wikimedia.org/T327605)'
2023-01-23 08:10:36 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
2023-01-23 08:10:46 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
2023-01-23 08:12:07 <icinga-wm> PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2023-01-23 08:12:27 <icinga-wm> PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2023-01-23 08:12:29 <wikibugs> ('Merged) ''jenkins-bot: Tweaks for new heading HTML structure [extensions/DiscussionTools] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882174 (https://phabricator.wikimedia.org/T327328) (owner: ''Bartosz Dziewoński)'
2023-01-23 08:12:47 <logmsgbot> !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:882174|Tweaks for new heading HTML structure (T327328 T327469)]]
2023-01-23 08:12:52 <stashbot> T327469: Subscribe buttons/links are displayed out of place due to new heading HTML structure - https://phabricator.wikimedia.org/T327469
2023-01-23 08:12:52 <stashbot> T327328: Highlight skips the topic container for new topics, which looks odd - https://phabricator.wikimedia.org/T327328
2023-01-23 08:13:57 <wikibugs> ('CR) ''Muehlenhoff: "You also need to remove profile::idp::client:httpd from profile::racktables, then it will work." [puppet] - ''https://gerrit.wikimedia.org/r/881697 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 08:14:15 <icinga-wm> PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2023-01-23 08:14:23 <taavi> _joe_: sirenbot's +O was removed by ircservserv-wm_ with the last sync as that wasn't granted via its configuration
2023-01-23 08:14:41 <_joe_> taavi: yeah I just saw, I thought it was
2023-01-23 08:14:51 <_joe_> I remember someone writing a patch, I assumed it was merged
2023-01-23 08:15:03 <_joe_> I'll fix it once I'm done writing docs
2023-01-23 08:15:49 <taavi> yeah, it has +o not +O
2023-01-23 08:16:49 <icinga-wm> RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49419 bytes in 0.061 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2023-01-23 08:17:09 <icinga-wm> RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.943 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2023-01-23 08:17:19 <icinga-wm> RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Fri 21 Apr 2023 05:11:22 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
2023-01-23 08:17:23 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730) (owner: ''BCornwall)'
2023-01-23 08:19:03 <wikibugs> ('CR) ''Majavah: [V: ''+1] ldap: move ssh-key-ldap-lookup directly to ssh module (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 08:22:33 <logmsgbot> !log ladsgroup@deploy1002 ladsgroup and matmarex: Backport for [[gerrit:882174|Tweaks for new heading HTML structure (T327328 T327469)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
2023-01-23 08:22:38 <stashbot> T327469: Subscribe buttons/links are displayed out of place due to new heading HTML structure - https://phabricator.wikimedia.org/T327469
2023-01-23 08:22:38 <stashbot> T327328: Highlight skips the topic container for new topics, which looks odd - https://phabricator.wikimedia.org/T327328
2023-01-23 08:22:42 <Amir1> MatmaRex: it's in mwbdeug now
2023-01-23 08:23:21 <MatmaRex> Amir1: works as expected
2023-01-23 08:23:48 <Amir1> deploying
2023-01-23 08:25:03 <wikibugs> ('PS1) ''Muehlenhoff: Remove openldap_corp role from ldap-corp* [puppet] - ''https://gerrit.wikimedia.org/r/882573 (https://phabricator.wikimedia.org/T323820)'
2023-01-23 08:25:41 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
2023-01-23 08:25:51 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
2023-01-23 08:30:00 <logmsgbot> !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:882174|Tweaks for new heading HTML structure (T327328 T327469)]] (duration: 17m 12s)
2023-01-23 08:30:05 <stashbot> T327469: Subscribe buttons/links are displayed out of place due to new heading HTML structure - https://phabricator.wikimedia.org/T327469
2023-01-23 08:30:05 <stashbot> T327328: Highlight skips the topic container for new topics, which looks odd - https://phabricator.wikimedia.org/T327328
2023-01-23 08:30:08 <Amir1> MatmaRex: done
2023-01-23 08:30:36 <MatmaRex> thanks Amir1!
2023-01-23 08:33:51 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+2] Remove openldap_corp role from ldap-corp* [puppet] - ''https://gerrit.wikimedia.org/r/882573 (https://phabricator.wikimedia.org/T323820) (owner: ''Muehlenhoff)'
2023-01-23 08:34:33 <wikibugs> ('CR) ''Zabe: [C: ''+2] Remove oversight group from privileged groups [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882217 (https://phabricator.wikimedia.org/T112147) (owner: ''Zabe)'
2023-01-23 08:35:25 <wikibugs> ('Merged) ''jenkins-bot: Remove oversight group from privileged groups [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882217 (https://phabricator.wikimedia.org/T112147) (owner: ''Zabe)'
2023-01-23 08:36:19 <logmsgbot> !log ayounsi@deploy1002 Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
2023-01-23 08:36:37 <wikibugs> ('PS1) ''Zabe: Start reading from cuc_comment_id on wikidatawiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882577 (https://phabricator.wikimedia.org/T233004)'
2023-01-23 08:36:53 <wikibugs> ('CR) ''Zabe: [C: ''+2] Start reading from cuc_comment_id on wikidatawiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882577 (https://phabricator.wikimedia.org/T233004) (owner: ''Zabe)'
2023-01-23 08:37:28 <logmsgbot> !log ayounsi@deploy1002 Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
2023-01-23 08:37:37 <wikibugs> ('Merged) ''jenkins-bot: Start reading from cuc_comment_id on wikidatawiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882577 (https://phabricator.wikimedia.org/T233004) (owner: ''Zabe)'
2023-01-23 08:37:56 <logmsgbot> !log zabe@deploy1002 Started scap: Backport for [[gerrit:882217|Remove oversight group from privileged groups (T112147)]], [[gerrit:882577|Start reading from cuc_comment_id on wikidatawiki (T233004)]]
2023-01-23 08:38:01 <stashbot> T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
2023-01-23 08:38:01 <stashbot> T112147: Rename the oversight group on WMF projects to the MediaWiki standard (whatever that is) - https://phabricator.wikimedia.org/T112147
2023-01-23 08:39:37 <logmsgbot> !log zabe@deploy1002 zabe: Backport for [[gerrit:882217|Remove oversight group from privileged groups (T112147)]], [[gerrit:882577|Start reading from cuc_comment_id on wikidatawiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
2023-01-23 08:40:46 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
2023-01-23 08:40:56 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
2023-01-23 08:42:40 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
2023-01-23 08:42:43 <stashbot> T326669: Productionize db1206-db1225 - https://phabricator.wikimedia.org/T326669
2023-01-23 08:43:27 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
2023-01-23 08:45:44 <logmsgbot> !log zabe@deploy1002 Finished scap: Backport for [[gerrit:882217|Remove oversight group from privileged groups (T112147)]], [[gerrit:882577|Start reading from cuc_comment_id on wikidatawiki (T233004)]] (duration: 07m 48s)
2023-01-23 08:45:49 <stashbot> T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
2023-01-23 08:45:49 <stashbot> T112147: Rename the oversight group on WMF projects to the MediaWiki standard (whatever that is) - https://phabricator.wikimedia.org/T112147
2023-01-23 08:46:22 <logmsgbot> !log volans@cumin1001 START - Cookbook sre.dns.netbox
2023-01-23 08:47:35 <wikibugs> ('PS1) ''Marostegui: db1106: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/882578 (https://phabricator.wikimedia.org/T327616)'
2023-01-23 08:48:28 <wikibugs> ('CR) ''Marostegui: [C: ''+2] db1106: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/882578 (https://phabricator.wikimedia.org/T327616) (owner: ''Marostegui)'
2023-01-23 08:48:37 <logmsgbot> !log volans@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
2023-01-23 08:49:37 <logmsgbot> !log volans@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
2023-01-23 08:49:37 <logmsgbot> !log volans@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2023-01-23 08:52:01 <_joe_> taavi: do you know how I give +O to a user via ircservserv? The meta page has nothing, so checking before I read the sources
2023-01-23 08:52:41 <taavi> _joe_: not sure, but I wouldn't be surprised if there is not an option for that atm
2023-01-23 08:53:53 <_joe_> yeah https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/irc/ircservserv/+/refs/heads/master/src/channel.rs
2023-01-23 08:54:37 <_joe_> so yeah I guess I'll just make sirenbot ask chanserv for permissions where needed instead
2023-01-23 09:07:52 <taavi> _joe_: one option would be to grant it +t via the op rule, and then use `PRIVMSG ChanServ :TOPIC foo` instead of `TOPIC :foo` directly
2023-01-23 09:15:36 <wikibugs> ('CR) ''Filippo Giunchedi: [C: ''+1] Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
2023-01-23 09:16:12 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:16:14 <wikibugs> ('CR) ''Filippo Giunchedi: [C: ''+1] logstash: enable filters for ecs 1.11.0 [puppet] - ''https://gerrit.wikimedia.org/r/881812 (https://phabricator.wikimedia.org/T326794) (owner: ''Cwhite)'
2023-01-23 09:16:14 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:16:35 <wikibugs> ('CR) ''Filippo Giunchedi: [C: ''+1] conftool-data: add logstash[12]032 to kibana7 backend [puppet] - ''https://gerrit.wikimedia.org/r/881813 (owner: ''Cwhite)'
2023-01-23 09:17:11 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:17:13 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:19:59 <wikibugs> ('CR) ''Filippo Giunchedi: "The patch itself looks good, not +1'ing yet (I've left a comment in the task)" [puppet] - ''https://gerrit.wikimedia.org/r/881939 (https://phabricator.wikimedia.org/T318778) (owner: ''Andrea Denisse)'
2023-01-23 09:21:51 <wikibugs> ('PS5) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877'
2023-01-23 09:21:59 <wikibugs> ('CR) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
2023-01-23 09:26:15 <wikibugs> ('CR) ''Giuseppe Lavagetto: [C: ''-1] "I think there's a couple small mistakes but LGTM otherwise." [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
2023-01-23 09:29:13 <wikibugs> ('CR) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
2023-01-23 09:32:00 <wikibugs> ('CR) ''Hashar: [C: ''+2] wm-checks-api: fix TypeScript noImplicitAny [software/gerrit] (deploy/wmf/stable-3.5) - ''https://gerrit.wikimedia.org/r/876212 (owner: ''Hashar)'
2023-01-23 09:32:17 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) a:''BTullis I will pick up this ticket, since I work with Jennifer on the Data Engineering team.'
2023-01-23 09:32:34 <wikibugs> ('PS6) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877'
2023-01-23 09:32:58 <wikibugs> ('Merged) ''jenkins-bot: wm-checks-api: fix TypeScript noImplicitAny [software/gerrit] (deploy/wmf/stable-3.5) - ''https://gerrit.wikimedia.org/r/876212 (owner: ''Hashar)'
2023-01-23 09:33:27 <logmsgbot> !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
2023-01-23 09:35:28 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis)'
2023-01-23 09:40:00 <logmsgbot> !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
2023-01-23 09:41:13 <claime> btullis: <3
2023-01-23 09:41:55 <btullis> claime: Thanks :-)
2023-01-23 09:45:57 <wikibugs> 'SRE, ''LDAP-Access-Requests: Grant Access to wmf and ops for Jennifer Ebe - https://phabricator.wikimedia.org/T327255 (''BTullis)'
2023-01-23 09:46:33 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis)'
2023-01-23 09:47:19 <_joe_> jouncebot: nowandnext
2023-01-23 09:47:19 <jouncebot> No deployments scheduled for the next 1 hour(s) and 12 minute(s)
2023-01-23 09:47:19 <jouncebot> In 1 hour(s) and 12 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1100)
2023-01-23 09:47:27 <wikibugs> 'SRE, ''LDAP-Access-Requests: Grant Access to wmf and ops for Jennifer Ebe - https://phabricator.wikimedia.org/T327255 (''BTullis) Apologies for the confusion. This is a duplicate of {T327406} where we have collected the necessary approval.'
2023-01-23 09:54:39 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:54:41 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:55:41 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:55:43 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
2023-01-23 09:58:42 <Amir1> jouncebot: nowandnext
2023-01-23 09:58:42 <jouncebot> No deployments scheduled for the next 1 hour(s) and 1 minute(s)
2023-01-23 09:58:42 <jouncebot> In 1 hour(s) and 1 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1100)
2023-01-23 09:58:50 <wikibugs> ('PS2) ''Ladsgroup: Remove Flow as default in techconductwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244'
2023-01-23 09:58:58 <wikibugs> ('CR) ''Ladsgroup: [C: ''+2] Remove Flow as default in techconductwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244 (owner: ''Ladsgroup)'
2023-01-23 09:59:14 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by ladsgroup@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244 (owner: ''Ladsgroup)'
2023-01-23 09:59:41 <wikibugs> ('Merged) ''jenkins-bot: Remove Flow as default in techconductwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244 (owner: ''Ladsgroup)'
2023-01-23 09:59:56 <logmsgbot> !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:877244|Remove Flow as default in techconductwiki]]
2023-01-23 10:01:36 <logmsgbot> !log ladsgroup@deploy1002 ladsgroup: Backport for [[gerrit:877244|Remove Flow as default in techconductwiki]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
2023-01-23 10:03:01 <logmsgbot> !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
2023-01-23 10:07:48 <logmsgbot> !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:877244|Remove Flow as default in techconductwiki]] (duration: 07m 51s)
2023-01-23 10:12:51 <wikibugs> ('PS9) ''Giuseppe Lavagetto: Start using the ClusterConfig class [mediawiki-config] - ''https://gerrit.wikimedia.org/r/756016'
2023-01-23 10:12:55 <wikibugs> 'SRE, ''Traffic, ''Traffic-Icebox, ''WMF-General-or-Unknown, and 2 others: Pages whose title ends with semicolon (;) are intermittently inaccessible (likely due to ATS) - https://phabricator.wikimedia.org/T238285 (''Vgutierrez) since this bug was reported back in 2019, our CDN stack has changed a little b...'
2023-01-23 10:13:02 <wikibugs> ('CR) ''Giuseppe Lavagetto: Start using the ClusterConfig class (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/756016 (owner: ''Giuseppe Lavagetto)'
2023-01-23 10:16:33 <logmsgbot> !log btullis@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
2023-01-23 10:17:21 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''JEbe-WMF)'
2023-01-23 10:18:58 <logmsgbot> !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
2023-01-23 10:21:02 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert)'
2023-01-23 10:21:54 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) ''Open''In progress a:''Clement_Goubert'
2023-01-23 10:23:14 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''JEbe-WMF)'
2023-01-23 10:28:24 <icinga-wm> PROBLEM - Check systemd state on ms-be1069 is CRITICAL: CRITICAL - degraded: The following units failed: swift_rclone_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 10:30:54 <wikibugs> ('CR) ''Hnowlan: [C: ''+1] changeprop: add liftwing revscoring streams (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 10:31:54 <elukey> hnowlan: <3
2023-01-23 10:35:04 <wikibugs> ('PS1) ''Btullis: Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406)'
2023-01-23 10:37:08 <logmsgbot> !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
2023-01-23 10:37:35 <wikibugs> 'SRE, ''Traffic-Icebox, ''WMF-General-or-Unknown, ''Performance-Team (Radar): Disable caching on the main page for anonymous users - https://phabricator.wikimedia.org/T119366 (''Theklan) @Legoktm could you help me with this at euwiki? Thanks!'
2023-01-23 10:39:42 <logmsgbot> !log btullis@deploy1002 Installing scap version "4.33.1" for 1 hosts
2023-01-23 10:39:52 <logmsgbot> !log btullis@deploy1002 Installation of scap version "4.33.1" completed for 1 hosts
2023-01-23 10:40:05 <logmsgbot> !log btullis@deploy1002 Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
2023-01-23 10:40:24 <logmsgbot> !log btullis@deploy1002 Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
2023-01-23 10:40:39 <logmsgbot> !log btullis@deploy1002 Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
2023-01-23 10:40:44 <logmsgbot> !log btullis@deploy1002 Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
2023-01-23 10:46:32 <wikibugs> ('CR) ''Elukey: [C: ''+2] changeprop: add liftwing revscoring streams [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 10:48:02 <vgutierrez> !log rolling upgrade to HAProxy 2.4.20 on ulsfo
2023-01-23 10:48:03 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 10:48:32 <wikibugs> ('PS8) ''Elukey: helmfile.d: add a new test workflow for Lifting to changeprop's staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302)'
2023-01-23 10:48:57 <wikibugs> ('CR) ''Elukey: helmfile.d: add a new test workflow for Lifting to changeprop's staging (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 10:49:28 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/870901 (https://phabricator.wikimedia.org/T325397) (owner: ''JHathaway)'
2023-01-23 10:49:37 <wikibugs> 'SRE-tools, ''Infrastructure-Foundations, ''netops: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (''ayounsi) p:''Triage''Low'
2023-01-23 10:49:52 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
2023-01-23 10:50:40 <wikibugs> ('PS1) ''Gerrit maintenance bot: mariadb: Promote db2129 to s6 master [puppet] - ''https://gerrit.wikimedia.org/r/882260 (https://phabricator.wikimedia.org/T327644)'
2023-01-23 10:52:05 <wikibugs> ('PS1) ''Btullis: Enable the two new cache types in superset production [puppet] - ''https://gerrit.wikimedia.org/r/882599 (https://phabricator.wikimedia.org/T323458)'
2023-01-23 10:52:47 <jinxer-wm> (JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 10:54:11 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
2023-01-23 10:54:15 <stashbot> T327644: Switchover s6 master (db2114 -> db2129) - https://phabricator.wikimedia.org/T327644
2023-01-23 10:54:16 <wikibugs> ('PS2) ''Btullis: Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406)'
2023-01-23 10:54:40 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
2023-01-23 10:54:41 <wikibugs> ('CR) ''CI reject: [V: ''-1] Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406) (owner: ''Btullis)'
2023-01-23 10:55:21 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db2129 with weight 0 T327644', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
2023-01-23 10:55:37 <XioNoX> !log update management routers ACLs to add new bast hosts
2023-01-23 10:55:38 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 10:56:03 <wikibugs> ('PS2) ''Jbond: prometheus: decode utf-8 in puppet agent script [puppet] - ''https://gerrit.wikimedia.org/r/879957 (owner: ''Majavah)'
2023-01-23 10:56:20 <wikibugs> ('CR) ''Cathal Mooney: [C: ''+1] "LGTM!" [homer/public] - ''https://gerrit.wikimedia.org/r/881869 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
2023-01-23 10:56:27 <wikibugs> ('PS3) ''Btullis: Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406)'
2023-01-23 10:56:39 <wikibugs> ('CR) ''Btullis: [C: ''+2] Enable the two new cache types in superset production [puppet] - ''https://gerrit.wikimedia.org/r/882599 (https://phabricator.wikimedia.org/T323458) (owner: ''Btullis)'
2023-01-23 10:56:44 <wikibugs> ('CR) ''Cathal Mooney: [C: ''+1] "LGTM!" [homer/public] - ''https://gerrit.wikimedia.org/r/881837 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
2023-01-23 10:57:55 <wikibugs> ('CR) ''Jbond: [C: ''+2] "lgtm will merge thanks" [puppet] - ''https://gerrit.wikimedia.org/r/879957 (owner: ''Majavah)'
2023-01-23 10:57:58 <wikibugs> 'SRE-tools, ''Infrastructure-Foundations, ''netops: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (''Volans) This is a draft of a possible one-off script that can be run within homer's venv to gather the FQDNs to test, attempt a connection and grab the fingerpri...'
2023-01-23 11:00:05 <jouncebot> Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1100)
2023-01-23 11:01:33 <wikibugs> ('CR) ''Btullis: [C: ''+2] Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406) (owner: ''Btullis)'
2023-01-23 11:01:42 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''serviceops-collab, ''CAS-SSO, ''GitLab (Auth & Access): migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (''jbond) fyi we now have OIDC support in production, currently been tested by @SLyngshede-WMF'
2023-01-23 11:01:54 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+2] Move ping offload from ping2002 to ping2003 in codfw [homer/public] - ''https://gerrit.wikimedia.org/r/881837 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
2023-01-23 11:04:57 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
2023-01-23 11:07:45 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm ping me on irc (after lunch as catching up on things) and i can deploy" [puppet] - ''https://gerrit.wikimedia.org/r/875315 (https://phabricator.wikimedia.org/T326125) (owner: ''Hashar)'
2023-01-23 11:07:52 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) [] Merge access grant [] Create kerberos principal'
2023-01-23 11:08:50 <wikibugs> ('CR) ''Clément Goubert: "This change is ready for review." [puppet] - ''https://gerrit.wikimedia.org/r/882600 (https://phabricator.wikimedia.org/T327187) (owner: ''Clément Goubert)'
2023-01-23 11:10:11 <jinxer-wm> (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
2023-01-23 11:11:36 <wikibugs> ('CR) ''Jbond: hieradata: add wmcs-roots to clouddumps servers (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879274 (owner: ''Majavah)'
2023-01-23 11:11:39 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 11:11:41 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
2023-01-23 11:11:48 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
2023-01-23 11:11:51 <stashbot> T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
2023-01-23 11:12:20 <icinga-wm> PROBLEM - cassandra-a SSL 10.192.32.101:7001 on sessionstore2002 is CRITICAL: SSL CRITICAL - Certificate sessionstore2002-a valid until 2023-02-22 11:12:16 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 11:12:24 <icinga-wm> PROBLEM - cassandra-a SSL 10.192.16.95:7001 on sessionstore2001 is CRITICAL: SSL CRITICAL - Certificate sessionstore2001-a valid until 2023-02-22 11:12:13 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 11:12:40 <icinga-wm> PROBLEM - cassandra-a SSL 10.64.32.85:7001 on sessionstore1002 is CRITICAL: SSL CRITICAL - Certificate sessionstore1002-a valid until 2023-02-22 11:12:08 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 11:12:50 <icinga-wm> PROBLEM - cassandra-a SSL 10.64.48.178:7001 on sessionstore1003 is CRITICAL: SSL CRITICAL - Certificate sessionstore1003-a valid until 2023-02-22 11:12:10 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 11:13:36 <icinga-wm> PROBLEM - cassandra-a SSL 10.64.0.144:7001 on sessionstore1001 is CRITICAL: SSL CRITICAL - Certificate sessionstore1001-a valid until 2023-02-22 11:12:05 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 11:13:48 <icinga-wm> PROBLEM - cassandra-a SSL 10.192.48.132:7001 on sessionstore2003 is CRITICAL: SSL CRITICAL - Certificate sessionstore2003-a valid until 2023-02-22 11:12:18 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 11:15:12 <wikibugs> ('PS1) ''Vgutierrez: acme-chief: Restrict challenge type to valid ones [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942)'
2023-01-23 11:16:21 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) ''Open''In progress a:''Clement_Goubert Hi @Kappakayala, Please read and sign the [[ https://phabricator.wikimedia.org/L3 | Acknowledgement of Wikimed...'
2023-01-23 11:16:38 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert)'
2023-01-23 11:16:41 <wikibugs> ('CR) ''Vgutierrez: [V: ''+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39202/console"; [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
2023-01-23 11:17:47 <Amir1> !log Starting s6 codfw failover from db2114 to db2129 - T327644
2023-01-23 11:17:50 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 11:17:51 <stashbot> T327644: Switchover s6 master (db2114 -> db2129) - https://phabricator.wikimedia.org/T327644
2023-01-23 11:18:13 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db2129 to s6 primary T327644', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
2023-01-23 11:18:30 <wikibugs> ('PS2) ''Ladsgroup: mariadb: Promote db2129 to s6 master [puppet] - ''https://gerrit.wikimedia.org/r/882260 (https://phabricator.wikimedia.org/T327644) (owner: ''Gerrit maintenance bot)'
2023-01-23 11:18:42 <wikibugs> ('CR) ''Ladsgroup: [V: ''+2 C: ''+2] mariadb: Promote db2129 to s6 master [puppet] - ''https://gerrit.wikimedia.org/r/882260 (https://phabricator.wikimedia.org/T327644) (owner: ''Gerrit maintenance bot)'
2023-01-23 11:19:14 <wikibugs> ('PS1) ''Cathal Mooney: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605'
2023-01-23 11:19:31 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
2023-01-23 11:20:02 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
2023-01-23 11:21:35 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db2114 T327644', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
2023-01-23 11:22:31 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 11:22:33 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 11:22:38 <wikibugs> ('PS2) ''Cathal Mooney: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605'
2023-01-23 11:23:48 <wikibugs> ('PS3) ''Cathal Mooney: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605'
2023-01-23 11:24:48 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+1] "Looks good!" [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
2023-01-23 11:24:59 <wikibugs> ('CR) ''Vgutierrez: [V: ''+1 C: ''+2] acme-chief: Restrict challenge type to valid ones [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
2023-01-23 11:27:35 <wikibugs> ('PS1) ''Giuseppe Lavagetto: flink-app: use proper json [deployment-charts] - ''https://gerrit.wikimedia.org/r/882612'
2023-01-23 11:28:14 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering, ''Patch-For-Review: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) I have merged the changes to `data.yaml` so Jennifer should now have production shell access and access to the...'
2023-01-23 11:28:18 <wikibugs> ('PS1) ''Clément Goubert: admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172)'
2023-01-23 11:28:55 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) ''Open''In progress a:''Clement_Goubert'
2023-01-23 11:29:33 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) [] OOB SSH key validation [] Merge access grant [] Create kerberos principal'
2023-01-23 11:31:37 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 11:31:39 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 11:32:50 <wikibugs> ('CR) ''Michael Große: [C: ''-1] "This change should only be deployed after it was greenlit by the wmde-internal stage-gate meeting (scheduled on Tuesday)" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882615 (https://phabricator.wikimedia.org/T324999) (owner: ''Michael Große)'
2023-01-23 11:33:34 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) ''In progress''Resolved'
2023-01-23 11:33:59 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) The work is completed. I'll work with @JEbe-WMF to verify access.'
2023-01-23 11:34:21 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) ''Open''Resolved p:''Triage''Medium'
2023-01-23 11:35:07 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
2023-01-23 11:35:36 <icinga-wm> ACKNOWLEDGEMENT - Check systemd state on ms-be1069 is CRITICAL: CRITICAL - degraded: The following units failed: swift_rclone_sync.service MVernon https://phabricator.wikimedia.org/T327253 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 11:35:39 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/881598 (owner: ''Muehlenhoff)'
2023-01-23 11:37:14 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) ''Open''In progress a:''Clement_Goubert'
2023-01-23 11:37:16 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) @odimitrijevic @Ottomata Can I get your approval on this please?'
2023-01-23 11:40:32 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users & analytics-product-users for Hxi-ctr - https://phabricator.wikimedia.org/T325004 (''Clement_Goubert) @HXi-WMF, could you please confirm that we can proceed with the account renaming?'
2023-01-23 11:45:49 <wikibugs> ('CR) ''Ayounsi: [C: ''+1] Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605 (owner: ''Cathal Mooney)'
2023-01-23 11:47:00 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [software/pywmflib] - ''https://gerrit.wikimedia.org/r/881649 (https://phabricator.wikimedia.org/T327408) (owner: ''Volans)'
2023-01-23 11:47:10 <wikibugs> ('CR) ''Cathal Mooney: [C: ''+2] Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605 (owner: ''Cathal Mooney)'
2023-01-23 11:47:48 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [software/pywmflib] - ''https://gerrit.wikimedia.org/r/881650 (owner: ''Volans)'
2023-01-23 11:48:18 <wikibugs> ('Merged) ''jenkins-bot: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605 (owner: ''Cathal Mooney)'
2023-01-23 11:48:57 <wikibugs> ('CR) ''Clément Goubert: admin/canary_appserver: add group of users allowed to disable puppet (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
2023-01-23 11:50:50 <wikibugs> ('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881386 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
2023-01-23 11:51:06 <wikibugs> ('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881393 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
2023-01-23 11:51:21 <wikibugs> ('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881399 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
2023-01-23 11:51:40 <wikibugs> ('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881393 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
2023-01-23 11:51:57 <wikibugs> ('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881413 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
2023-01-23 11:52:29 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/868703 (https://phabricator.wikimedia.org/T308013) (owner: ''Muehlenhoff)'
2023-01-23 11:55:35 <wikibugs> 'SRE, ''LDAP-Access-Requests: Grant Access to Wmf group for MShilova - https://phabricator.wikimedia.org/T327546 (''Clement_Goubert) ''Open''In progress a:''Clement_Goubert'
2023-01-23 11:56:10 <wikibugs> 'SRE-swift-storage: >=27k objects listed in swift containers but not extant - https://phabricator.wikimedia.org/T327253 (''MatthewVernon) The timer job ran this morning, with our less picky settings, and ended thus: ` [...] Jan 23 10:24:35 ms-be1069 swift-rclone-sync[1539164]: ERROR : wikipedia-de-local-public....'
2023-01-23 11:57:06 <marostegui> !log Reboot db2132 (m1 codfw master)
2023-01-23 11:57:08 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 11:57:29 <wikibugs> ('CR) ''Jbond: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 11:57:49 <marostegui> !log dbmaint Reboot db2132 (m1 codfw master)
2023-01-23 11:57:51 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 11:58:22 <marostegui> !log dbmaint Reboot db2133 (m2 codfw master)
2023-01-23 11:58:23 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 11:59:33 <wikibugs> ('CR) ''Majavah: hieradata: add wmcs-roots to clouddumps servers (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879274 (owner: ''Majavah)'
2023-01-23 12:00:13 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
2023-01-23 12:00:16 <stashbot> T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
2023-01-23 12:03:15 <icinga-wm> PROBLEM - haproxy failover on dbproxy2001 is CRITICAL: CRITICAL check_failover servers up 2 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:03:24 <wikibugs> ('CR) ''Jbond: openstack: encapi: create parent directories for files (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881711 (owner: ''Majavah)'
2023-01-23 12:04:15 <icinga-wm> PROBLEM - haproxy failover on dbproxy2002 is CRITICAL: CRITICAL check_failover servers up 2 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:05:08 <Emperor> !log removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
2023-01-23 12:05:09 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 12:06:21 <wikibugs> ('CR) ''Clément Goubert: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 12:06:39 <marostegui> !log dbmaint Reboot db2134 (m3 codfw master)
2023-01-23 12:06:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 12:06:42 <marostegui> !log dbmaint Reboot db2135 (m5 codfw master)
2023-01-23 12:06:43 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 12:07:27 <icinga-wm> RECOVERY - haproxy failover on dbproxy2002 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:08:05 <icinga-wm> RECOVERY - haproxy failover on dbproxy2001 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:08:43 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 12:08:45 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 12:10:08 <wikibugs> ('CR) ''Clément Goubert: Fix xihua's account (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 12:10:09 <icinga-wm> PROBLEM - haproxy failover on dbproxy2004 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:11:18 <wikibugs> ('PS2) ''Clément Goubert: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 12:11:44 <wikibugs> ('CR) ''Jbond: P:gitlab: manage gitlab with gitlab module (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/684487 (owner: ''Jbond)'
2023-01-23 12:11:45 <icinga-wm> RECOVERY - haproxy failover on dbproxy2004 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:11:52 <wikibugs> ('Abandoned) ''Jbond: P:gitlab: manage gitlab with gitlab module [puppet] - ''https://gerrit.wikimedia.org/r/684487 (owner: ''Jbond)'
2023-01-23 12:12:02 <wikibugs> ('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 12:15:19 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
2023-01-23 12:22:25 <icinga-wm> PROBLEM - haproxy failover on dbproxy2003 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:22:31 <icinga-wm> PROBLEM - haproxy failover on dbproxy2001 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:22:37 <wikibugs> ('PS5) ''Daniel Kinzler: Increase PC writes from parsoid API to 10% [mediawiki-config] - ''https://gerrit.wikimedia.org/r/868127 (https://phabricator.wikimedia.org/T320534)'
2023-01-23 12:22:59 <icinga-wm> PROBLEM - haproxy failover on dbproxy2004 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:22:59 <wikibugs> 'SRE-swift-storage: >=27k objects listed in swift containers but not extant - https://phabricator.wikimedia.org/T327253 (''MatthewVernon) Picking one of those to go log-diving (via the hacky `sudo cumin O:swift::proxy 'grep Symbol_Limes.png /var/log/swift/proxy-access.log || true'`) gets 3 hits, one of which is...'
2023-01-23 12:23:33 <icinga-wm> PROBLEM - haproxy failover on dbproxy2002 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:24:01 <icinga-wm> RECOVERY - haproxy failover on dbproxy2003 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:24:07 <icinga-wm> RECOVERY - haproxy failover on dbproxy2001 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:24:07 <wikibugs> ('PS1) ''Clément Goubert: admin: Add Mariya Shilova to ldap_only_users [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546)'
2023-01-23 12:24:35 <icinga-wm> RECOVERY - haproxy failover on dbproxy2004 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:25:09 <icinga-wm> RECOVERY - haproxy failover on dbproxy2002 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
2023-01-23 12:30:26 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
2023-01-23 12:31:09 <wikibugs> 'SRE, ''Acme-chief, ''Traffic: Ci check for acme-chief changes - https://phabricator.wikimedia.org/T326942 (''Vgutierrez) p:''Triage''Low this has been mitigated by https://gerrit.wikimedia.org/r/882602, invalid challenge types will now trigger a puppet compilation failure'
2023-01-23 12:36:07 <wikibugs> ('CR) ''Jbond: [C: ''+1] ci: move lists of contint and zuul hosts to hieradata/common.yaml (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/850593 (owner: ''Dzahn)'
2023-01-23 12:38:38 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) p:''Triage''Medium'
2023-01-23 12:38:41 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) p:''Triage''Medium'
2023-01-23 12:38:52 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) p:''Triage''Medium'
2023-01-23 12:39:35 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) p:''Triage''Medium'
2023-01-23 12:41:23 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Patch-For-Review: Grant Access to Wmf group for MShilova - https://phabricator.wikimedia.org/T327546 (''Clement_Goubert) p:''Triage''Medium [] Merge CR [] Grant LDAP group access'
2023-01-23 12:43:10 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) ''Resolved''In progress'
2023-01-23 12:43:38 <wikibugs> ('CR) ''Jbond: "thanks" [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 12:45:32 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
2023-01-23 12:45:36 <stashbot> T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
2023-01-23 12:45:47 <wikibugs> ('CR) ''Hnowlan: [C: ''+1] "lgtm, one query" [software/thumbor-plugins] - ''https://gerrit.wikimedia.org/r/881909 (https://phabricator.wikimedia.org/T325811) (owner: ''Vlad.shapik)'
2023-01-23 12:50:42 <wikibugs> ('CR) ''Jelto: [C: ''+2] gitlab: stop using "latest" backup name [puppet] - ''https://gerrit.wikimedia.org/r/875309 (https://phabricator.wikimedia.org/T274463) (owner: ''Jelto)'
2023-01-23 12:51:11 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882600 (https://phabricator.wikimedia.org/T327187) (owner: ''Clément Goubert)'
2023-01-23 12:51:53 <wikibugs> ('PS2) ''Majavah: ldap: move ssh-key-ldap-lookup directly to ssh module [puppet] - ''https://gerrit.wikimedia.org/r/877964'
2023-01-23 12:52:38 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172) (owner: ''Clément Goubert)'
2023-01-23 12:53:00 <wikibugs> ('CR) ''Majavah: ldap: move ssh-key-ldap-lookup directly to ssh module (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 12:53:17 <wikibugs> ('CR) ''CI reject: [V: ''-1] ldap: move ssh-key-ldap-lookup directly to ssh module [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 12:53:27 <wikibugs> 'SRE, ''observability, ''Performance-Team (Radar): Set up a statsv-like endpoint for Prometheus - https://phabricator.wikimedia.org/T180105 (''Clement_Goubert)'
2023-01-23 12:53:29 <wikibugs> 'SRE: Update Media dashboard in Grafana to use Prometheus metrics - https://phabricator.wikimedia.org/T193445 (''Clement_Goubert) ''Open''Invalid The link in the task description 404s. Being bold and closing as Invalid, feel free to reopen with up to date information if needed.'
2023-01-23 12:55:46 <wikibugs> ('CR) ''Jbond: [C: ''-1] "i have added this as approval in the next IF meeting (today) will update after the meeting" [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
2023-01-23 12:55:48 <wikibugs> ('CR) ''Clément Goubert: [C: ''+2] admin: Grant ollieshotton access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882600 (https://phabricator.wikimedia.org/T327187) (owner: ''Clément Goubert)'
2023-01-23 12:56:38 <wikibugs> ('PS3) ''Majavah: ldap: move ssh-key-ldap-lookup directly to ssh module [puppet] - ''https://gerrit.wikimedia.org/r/877964'
2023-01-23 12:58:17 <wikibugs> ('CR) ''Majavah: [V: ''+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39204/console"; [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 12:59:47 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) ''In progress''Resolved @Ollie.Shotton_WMDE your access to the relevant groups has been granted. Please wait 30m (as of this comment) be...'
2023-01-23 13:02:03 <wikibugs> ('PS2) ''Clément Goubert: admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172)'
2023-01-23 13:03:39 <wikibugs> ('CR) ''Jbond: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 13:04:52 <wikibugs> ('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546) (owner: ''Clément Goubert)'
2023-01-23 13:04:56 <wikibugs> ('PS2) ''Clément Goubert: admin: Add Mariya Shilova to ldap_only_users [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546)'
2023-01-23 13:06:34 <wikibugs> ('CR) ''Brian Wolff: Force users with passwords shorter than 8 characters to change it (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882232 (https://phabricator.wikimedia.org/T285151) (owner: ''Zabe)'
2023-01-23 13:06:56 <wikibugs> ('CR) ''Clément Goubert: [C: ''+2] admin: Add Mariya Shilova to ldap_only_users [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546) (owner: ''Clément Goubert)'
2023-01-23 13:08:58 <wikibugs> ('CR) ''Jbond: [V: ''+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39207/console"; [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 13:09:43 <wikibugs> ('CR) ''Jbond: [C: ''+2] "LGTM will merge thanks <3" [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
2023-01-23 13:14:55 <wikibugs> ('CR) ''Jaime Nuche: "PCC: https://puppet-compiler.wmflabs.org/output/860837/39206/"; [puppet] - ''https://gerrit.wikimedia.org/r/860837 (https://phabricator.wikimedia.org/T323909) (owner: ''Jaime Nuche)'
2023-01-23 13:15:50 <wikibugs> ('PS3) ''Clément Goubert: admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172)'
2023-01-23 13:16:17 <wikibugs> 'SRE, ''LDAP-Access-Requests, ''Patch-For-Review: Grant Access to Wmf group for MShilova - https://phabricator.wikimedia.org/T327546 (''Clement_Goubert) ''In progress''Resolved @MShilova_WMF your access to the wmf group has been granted. Please wait 30m (as of this comment) before trying it out as the...'
2023-01-23 13:16:24 <wikibugs> ('PS4) ''Jbond: profile::performance: add a new profile for tweaking sysctl parameters [puppet] - ''https://gerrit.wikimedia.org/r/662932 (https://phabricator.wikimedia.org/T274230)'
2023-01-23 13:16:43 <wikibugs> ('CR) ''CI reject: [V: ''-1] profile::performance: add a new profile for tweaking sysctl parameters [puppet] - ''https://gerrit.wikimedia.org/r/662932 (https://phabricator.wikimedia.org/T274230) (owner: ''Jbond)'
2023-01-23 13:17:25 <wikibugs> ('CR) ''Jbond: profile::performance: add a new profile for tweaking sysctl parameters (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/662932 (https://phabricator.wikimedia.org/T274230) (owner: ''Jbond)'
2023-01-23 13:18:36 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) You should have received an email regarding Kerberos, you can follow the instructions on there to set your credentials. If you didn't, please...'
2023-01-23 13:19:30 <wikibugs> 'SRE, ''Traffic-Icebox, ''Patch-For-Review, ''User-MoritzMuehlenhoff: Create a generic network performance profile - https://phabricator.wikimedia.org/T274230 (''jbond) @BCornwall thanks for reviving this. i think that this ultimately stalled as there was a questions of wether it would be usefull. from...'
2023-01-23 13:20:17 <wikibugs> ('CR) ''Clément Goubert: [C: ''+2] admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172) (owner: ''Clément Goubert)'
2023-01-23 13:22:29 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) ''In progress''Resolved @Muhammad_Yasser_Jazirahly_WMDE your access to the relevant groups has been granted. Please wait 30m (as of...'
2023-01-23 13:23:08 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Muhammad_Yasser_Jazirahly_WMDE) Many thanks @Clement_Goubert'
2023-01-23 13:28:23 <icinga-wm> RECOVERY - Check systemd state on grafana1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 13:32:46 <wikibugs> 'SRE, ''Acme-chief, ''Traffic: Ci check for acme-chief changes - https://phabricator.wikimedia.org/T326942 (''Vgutierrez) ''Open''Resolved a:''Vgutierrez'
2023-01-23 13:40:07 <wikibugs> ('CR) ''Ottomata: [C: ''+2] flink-app: use proper json [deployment-charts] - ''https://gerrit.wikimedia.org/r/882612 (owner: ''Giuseppe Lavagetto)'
2023-01-23 13:45:13 <wikibugs> ('Merged) ''jenkins-bot: flink-app: use proper json [deployment-charts] - ''https://gerrit.wikimedia.org/r/882612 (owner: ''Giuseppe Lavagetto)'
2023-01-23 13:57:47 <jinxer-wm> (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 14:00:05 <jouncebot> RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1400)
2023-01-23 14:00:05 <jouncebot> sbailey, cirno, and Func: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2023-01-23 14:00:10 <cirno> o/
2023-01-23 14:00:18 <sbailey> I am here
2023-01-23 14:00:42 <Func> here
2023-01-23 14:01:24 <Winston_Sung[m]> Hello, I would like to ask where to see the deployment status for the CX Server https://cxserver.wikimedia.org ( https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/cxserver ) for the bug fix https://gerrit.wikimedia.org/r/c/882173 ( https://phabricator.wikimedia.org/T129470 ). Thanks.
2023-01-23 14:01:46 <wikibugs> ('PS1) ''Jbond: admin: data_tests improve error messages and correct typos [puppet] - ''https://gerrit.wikimedia.org/r/882648'
2023-01-23 14:02:11 <wikibugs> ('PS1) ''Ottomata: Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649'
2023-01-23 14:03:30 <wikibugs> ('CR) ''Jbond: [C: ''+2] admin: data_tests improve error messages and correct typos [puppet] - ''https://gerrit.wikimedia.org/r/882648 (owner: ''Jbond)'
2023-01-23 14:05:12 <wikibugs> ('PS3) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:05:50 <wikibugs> ('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:07:47 <jinxer-wm> (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 14:09:39 <wikibugs> ('PS4) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:10:11 <wikibugs> ('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:10:13 <taavi> I can deploy in a few minutes
2023-01-23 14:10:30 <sbailey> 10-4
2023-01-23 14:10:35 <sbailey> ;-)
2023-01-23 14:12:27 <taavi> Winston_Sung[m]: operations/deployment-charts.git
2023-01-23 14:12:58 <wikibugs> ('CR) ''Majavah: [C: ''+2] SpecialUserrights: Allow updating the expiry of user groups [core] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882179 (https://phabricator.wikimedia.org/T327605) (owner: ''Func)'
2023-01-23 14:14:25 <Winston_Sung[m]> taavi: Thanks.
2023-01-23 14:14:29 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131) (owner: ''Stang)'
2023-01-23 14:14:33 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
2023-01-23 14:14:40 <wikibugs> ('CR) ''Elukey: [C: ''+2] helmfile.d: add a new test workflow for Lifting to changeprop's staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 14:14:49 <wikibugs> ('PS2) ''Majavah: shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
2023-01-23 14:14:53 <wikibugs> ('CR) ''Majavah: [C: ''+2] shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
2023-01-23 14:15:18 <wikibugs> ('Merged) ''jenkins-bot: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131) (owner: ''Stang)'
2023-01-23 14:15:39 <wikibugs> ('Merged) ''jenkins-bot: shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
2023-01-23 14:15:57 <taavi> sbailey: was 880989 tested in beta or smaller wikis before being rolled out globally?
2023-01-23 14:16:04 <sbailey> yes
2023-01-23 14:16:17 <logmsgbot> !log taavi@deploy1002 Started scap: Backport for [[gerrit:882422|bnwikiquote: Update logo (T323131)]], [[gerrit:882425|shnwikibooks: Add project logo (T327380)]]
2023-01-23 14:16:24 <stashbot> T323131: New localized logo for bn.wikquote - https://phabricator.wikimedia.org/T323131
2023-01-23 14:16:24 <stashbot> T327380: Change Logo on shn.wikibooks.org - https://phabricator.wikimedia.org/T327380
2023-01-23 14:17:01 <wikibugs> ('CR) ''Alexandros Kosiaris: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:17:56 <logmsgbot> !log taavi@deploy1002 taavi and stang: Backport for [[gerrit:882422|bnwikiquote: Update logo (T323131)]], [[gerrit:882425|shnwikibooks: Add project logo (T327380)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
2023-01-23 14:18:06 <taavi> cirno: please test the logo patches
2023-01-23 14:18:10 <cirno> looking
2023-01-23 14:18:55 <taavi> !log mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # T326387
2023-01-23 14:18:57 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 14:18:58 <stashbot> T326387: Deploy PageAssessments to Chinese Wikipedia - https://phabricator.wikimedia.org/T326387
2023-01-23 14:19:35 <cirno> taavi, both two looks good to me
2023-01-23 14:19:42 <taavi> thanks, syncing
2023-01-23 14:19:46 <cirno> *look
2023-01-23 14:20:05 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
2023-01-23 14:20:49 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
2023-01-23 14:22:10 <Winston_Sung[m]> Is there any scheduled time to update cxserver or it depends on request?
2023-01-23 14:23:51 <wikibugs> ('PS5) ''Alexandros Kosiaris: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004)'
2023-01-23 14:24:28 <wikibugs> ('PS4) ''Majavah: zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387) (owner: ''Stang)'
2023-01-23 14:24:30 <wikibugs> ('PS6) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:24:32 <wikibugs> ('PS1) ''Jbond: admin: Add check for duplicate uid's [puppet] - ''https://gerrit.wikimedia.org/r/882652'
2023-01-23 14:24:34 <wikibugs> ('CR) ''Majavah: [C: ''+2] zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387) (owner: ''Stang)'
2023-01-23 14:24:53 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+2] Move ping offload from ping1002 to ping1003 in eqiad [homer/public] - ''https://gerrit.wikimedia.org/r/881869 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
2023-01-23 14:25:10 <wikibugs> ('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:25:12 <wikibugs> ('CR) ''CI reject: [V: ''-1] admin: Add check for duplicate uid's [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
2023-01-23 14:25:16 <wikibugs> ('Merged) ''jenkins-bot: zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387) (owner: ''Stang)'
2023-01-23 14:25:23 <logmsgbot> !log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
2023-01-23 14:25:34 <logmsgbot> !log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
2023-01-23 14:25:39 <logmsgbot> !log taavi@deploy1002 Finished scap: Backport for [[gerrit:882422|bnwikiquote: Update logo (T323131)]], [[gerrit:882425|shnwikibooks: Add project logo (T327380)]] (duration: 09m 22s)
2023-01-23 14:25:44 <stashbot> T323131: New localized logo for bn.wikquote - https://phabricator.wikimedia.org/T323131
2023-01-23 14:25:44 <stashbot> T327380: Change Logo on shn.wikibooks.org - https://phabricator.wikimedia.org/T327380
2023-01-23 14:25:58 <logmsgbot> !log taavi@deploy1002 Started scap: Backport for [[gerrit:876196|zhwiki: Install PageAssessments (T326387)]]
2023-01-23 14:26:01 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
2023-01-23 14:26:01 <stashbot> T326387: Deploy PageAssessments to Chinese Wikipedia - https://phabricator.wikimedia.org/T326387
2023-01-23 14:26:09 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
2023-01-23 14:27:29 <wikibugs> ('CR) ''Alexandros Kosiaris: [C: ''-1] "Interestingly, I can not re-use the same uid (which actually makes sense) but also the fact that hpham and phamhi (both absented) have uid" [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:27:38 <logmsgbot> !log taavi@deploy1002 stang and taavi: Backport for [[gerrit:876196|zhwiki: Install PageAssessments (T326387)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
2023-01-23 14:27:50 <taavi> cirno: please test the pageassessments one
2023-01-23 14:27:58 <cirno> looking
2023-01-23 14:30:16 <wikibugs> ('Merged) ''jenkins-bot: SpecialUserrights: Allow updating the expiry of user groups [core] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882179 (https://phabricator.wikimedia.org/T327605) (owner: ''Func)'
2023-01-23 14:31:12 <cirno> taavi, the magic word "{{#assessment}}" starts working, and special page Special:PageAssessments exist, so LGTM
2023-01-23 14:31:32 <taavi> thanks, syncing
2023-01-23 14:32:27 <cirno> taavi, could you please flush the caches of two logos? thanks
2023-01-23 14:32:49 <taavi> oh right, good point. give me a second
2023-01-23 14:33:00 <wikibugs> ('CR) ''Hashar: [C: ''-1] "Looks good, there is two minor issues though:" [puppet] - ''https://gerrit.wikimedia.org/r/860837 (https://phabricator.wikimedia.org/T323909) (owner: ''Jaime Nuche)'
2023-01-23 14:33:15 <cirno> oops, only bnwikiquote is needed
2023-01-23 14:34:01 <taavi> {{done}}
2023-01-23 14:36:03 <sbailey> Is 880989 getting deployed? it have been in beta for over a month?
2023-01-23 14:36:43 <taavi> sbailey: yes, I'm dealing with other patches at the moment, yours is still in the queue
2023-01-23 14:36:57 <wikibugs> ('PS1) ''Elukey: changeprop: fix uri in liftwing's template [deployment-charts] - ''https://gerrit.wikimedia.org/r/882654 (https://phabricator.wikimedia.org/T327302)'
2023-01-23 14:37:08 <sbailey> thx, new to backport proces
2023-01-23 14:37:22 <logmsgbot> !log taavi@deploy1002 Finished scap: Backport for [[gerrit:876196|zhwiki: Install PageAssessments (T326387)]] (duration: 11m 24s)
2023-01-23 14:37:26 <stashbot> T326387: Deploy PageAssessments to Chinese Wikipedia - https://phabricator.wikimedia.org/T326387
2023-01-23 14:37:36 <taavi> Func: yours is up next
2023-01-23 14:37:45 <sbailey> :)
2023-01-23 14:37:49 <Func> ok
2023-01-23 14:37:52 <logmsgbot> !log taavi@deploy1002 Started scap: Backport for [[gerrit:882179|SpecialUserrights: Allow updating the expiry of user groups (T327605)]]
2023-01-23 14:37:55 <stashbot> T327605: Special:UserRights: changing an already set permission's expiry to any new value fails - https://phabricator.wikimedia.org/T327605
2023-01-23 14:39:30 <logmsgbot> !log taavi@deploy1002 taavi and func: Backport for [[gerrit:882179|SpecialUserrights: Allow updating the expiry of user groups (T327605)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
2023-01-23 14:39:39 <taavi> sbailey: ah, sorry. we deployers generally change the order to be as quick as possible. I had to ask a question about 882179 so I couldn't start with it, and I had already +2'd F.unc's patch to save time on the core CI and it had merged in the mean time so I need to do that before I can get to yours
2023-01-23 14:39:51 <taavi> Func: can you test yours on a mwdebug server please?
2023-01-23 14:39:58 <Func> I don't have sufficient rights to test on prod, but this simple patch should just works.
2023-01-23 14:40:25 <taavi> sbailey: in the meantime: do you have the x-wikimedia-debug extension installed?
2023-01-23 14:40:41 <wikibugs> ('CR) ''Cathal Mooney: [C: ''+1] "LGTM! And TIL :)" [homer/public] - ''https://gerrit.wikimedia.org/r/877202 (https://phabricator.wikimedia.org/T325806) (owner: ''Ayounsi)'
2023-01-23 14:41:04 <sbailey> thanks for the explaination, very apprciative of your comments. Happy to watch. Have more patches to backport i the coming weeks that are trickier, such as two data migration patches
2023-01-23 14:41:21 <taavi> Func: ack. I gave it a quick test on testwiki just in case to not break stuff, works fine so deploying.
2023-01-23 14:42:16 <sbailey> the x-wikimedia-debug extension will not help e with this patch being verified. I need to use Quarry and actually look at error log and create pages with lint errors and see them show up in preports.
2023-01-23 14:42:20 <sukhe> !log rolling out pybal 1.15.10: T321191
2023-01-23 14:42:23 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 14:42:24 <stashbot> T321191: Cleanup pybal Prometheus metrics on monitor stop() - https://phabricator.wikimedia.org/T321191
2023-01-23 14:43:00 <taavi> sbailey: hmm. how/when are the rows inserted into the databases?
2023-01-23 14:43:35 <sbailey> taavi, pretty fast, but as part of a job that is invoked by VE and standard editor
2023-01-23 14:43:54 <sbailey> Linter recordLintJob
2023-01-23 14:43:56 <taavi> ah, it's a job? yeah, it can't be tested with x-wm-d then :/
2023-01-23 14:44:05 <sbailey> I know, annoying
2023-01-23 14:44:21 <sbailey> oh well
2023-01-23 14:45:18 <sbailey> part of the reparsing code path of parsoid
2023-01-23 14:46:39 <sbailey> parsoid queues up a bunch of linter error records when it reparses a page, then through a hook the job runs usually pretty quickly
2023-01-23 14:46:41 <logmsgbot> !log taavi@deploy1002 Finished scap: Backport for [[gerrit:882179|SpecialUserrights: Allow updating the expiry of user groups (T327605)]] (duration: 08m 48s)
2023-01-23 14:46:45 <stashbot> T327605: Special:UserRights: changing an already set permission's expiry to any new value fails - https://phabricator.wikimedia.org/T327605
2023-01-23 14:47:05 <taavi> in that case in the future please split the changes to multiple patches (for example group0 first, then group1 and finally all wikis) since that creates a much smaller blast radius if something goes wrong. I can do it this way this time, but for the future that's much easier to deploy
2023-01-23 14:47:16 <wikibugs> ('PS2) ''Jbond: admin: Add check for duplicate uid's [puppet] - ''https://gerrit.wikimedia.org/r/882652'
2023-01-23 14:47:19 <wikibugs> ('PS5) ''Majavah: Enable Linter write namespace tag and template using core config [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
2023-01-23 14:47:22 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
2023-01-23 14:47:40 <wikibugs> ('PS7) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:47:49 <sbailey> this is a very safe change, if it were more dangerous I would have done more stages
2023-01-23 14:48:20 <wikibugs> ('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:48:56 <wikibugs> ('CR) ''Jbond: "ready for review" [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
2023-01-23 14:49:55 <wikibugs> ('CR) ''Vgutierrez: [C: ''+1] Release 9.1.4-1wm1 [debs/trafficserver] - ''https://gerrit.wikimedia.org/r/869282 (https://phabricator.wikimedia.org/T325563) (owner: ''Ssingh)'
2023-01-23 14:50:59 <wikibugs> ('CR) ''Majavah: [C: ''+2] Enable Linter write namespace tag and template using core config [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
2023-01-23 14:51:09 <wikibugs> ('CR) ''Ssingh: [C: ''+2] Release 9.1.4-1wm1 [debs/trafficserver] - ''https://gerrit.wikimedia.org/r/869282 (https://phabricator.wikimedia.org/T325563) (owner: ''Ssingh)'
2023-01-23 14:51:27 <icinga-wm> PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-mlitn-singleuser-conda-analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 14:51:56 <wikibugs> ('Merged) ''jenkins-bot: Enable Linter write namespace tag and template using core config [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
2023-01-23 14:52:09 <sbailey> :-)
2023-01-23 14:52:11 <logmsgbot> !log taavi@deploy1002 Started scap: Backport for [[gerrit:880989|Enable Linter write namespace tag and template using core config (T299612)]]
2023-01-23 14:52:15 <stashbot> T299612: Add namespace column and index to table - https://phabricator.wikimedia.org/T299612
2023-01-23 14:52:32 <sbailey> Testing
2023-01-23 14:53:04 <taavi> testing what exactly? the patch is still not deployed anywhere
2023-01-23 14:53:19 <sbailey> ?
2023-01-23 14:53:32 <sbailey> Ah sync
2023-01-23 14:53:44 <taavi> yeah, it takes a while these days
2023-01-23 14:53:44 <wikibugs> ('CR) ''Elukey: [C: ''+2] changeprop: fix uri in liftwing's template [deployment-charts] - ''https://gerrit.wikimedia.org/r/882654 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 14:53:47 <logmsgbot> !log taavi@deploy1002 taavi and sbailey: Backport for [[gerrit:880989|Enable Linter write namespace tag and template using core config (T299612)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
2023-01-23 14:55:07 <wikibugs> ('PS1) ''Hashar: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656'
2023-01-23 14:56:09 <wikibugs> ('CR) ''Hashar: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 14:56:35 <wikibugs> ('CR) ''Vgutierrez: [C: ''+1] "looking good, please fix the mentioned typo on the changelog" [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
2023-01-23 14:56:59 <wikibugs> 'SRE, ''ops-esams, ''DC-Ops, ''Infrastructure-Foundations, ''decommission-hardware: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (''Volans) I've set the device back to active to reflect its current status and prevent some warnings to show up in the `sre.dns.netbox` cookbook runs.'
2023-01-23 14:57:32 <Winston_Sung[m]> So, is there any scheduled time to update the CX Server or it is required to fill a request somewhere?
2023-01-23 14:57:55 <wikibugs> ('PS2) ''Ssingh: Release 6.0.11-1wm1 [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634)'
2023-01-23 14:58:07 <wikibugs> ('CR) ''Ssingh: Release 6.0.11-1wm1 (''1 comment) [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
2023-01-23 14:58:15 <taavi> Winston_Sung[m]: if there was a scheduled time it would be listed on https://wikitech.wikimedia.org/wiki/Deployments, and if there is not you need to ask the cxserver maintainers somewhere else
2023-01-23 14:58:37 <wikibugs> ('PS8) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 14:58:49 <wikibugs> ('CR) ''Ladsgroup: "I'm too late for this now but for future cases, please enable it on a set of test wikis and then make sure it doesn't break anything and t" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
2023-01-23 14:58:57 <Winston_Sung[m]> Ok, thanks for the response.
2023-01-23 14:59:27 <logmsgbot> !log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
2023-01-23 14:59:34 <icinga-wm> PROBLEM - MariaDB Replica SQL: s2 #page on db1105 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Error Unknown column linter_template in field list on query. Default database: nlwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
2023-01-23 14:59:38 <logmsgbot> !log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
2023-01-23 14:59:40 <marostegui> checking
2023-01-23 14:59:43 <marostegui> Amir1: ^
2023-01-23 14:59:50 <Amir1> not me
2023-01-23 14:59:54 <marostegui> let me depool
2023-01-23 14:59:59 <taavi> sigh, that looks very related to the current deployment
2023-01-23 15:00:02 <taavi> should I revert?
2023-01-23 15:00:08 <logmsgbot> !log taavi@deploy1002 Finished scap: Backport for [[gerrit:880989|Enable Linter write namespace tag and template using core config (T299612)]] (duration: 07m 56s)
2023-01-23 15:00:10 <Amir1> taavi: very likely
2023-01-23 15:00:11 <stashbot> T299612: Add namespace column and index to table - https://phabricator.wikimedia.org/T299612
2023-01-23 15:00:13 <Amir1> please revert
2023-01-23 15:00:14 <icinga-wm> PROBLEM - MariaDB Replica SQL: s7 #page on db1170 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Error Unknown column linter_template in field list on query. Default database: metawiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
2023-01-23 15:00:16 <taavi> sure, doing
2023-01-23 15:00:19 <taavi> sorry :/
2023-01-23 15:00:19 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
2023-01-23 15:00:20 <sbailey> Amir1, the write code was running on Beta since mid december 880989
2023-01-23 15:00:21 <marostegui> taavi: revert
2023-01-23 15:00:21 <_joe_> taavi: revert, yes
2023-01-23 15:00:27 <wikibugs> ('PS1) ''TrainBranchBot: Revert "Enable Linter write namespace tag and template using core config" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882661'
2023-01-23 15:00:29 <wikibugs> ('CR) ''TrainBranchBot: "taavi@deploy1002 created a revert of this change as I76ef30bfd05fe069b2715e1933e8b81723149187" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
2023-01-23 15:00:29 <taavi> doing
2023-01-23 15:00:33 <Amir1> sbailey: beta and production dbs are different
2023-01-23 15:00:35 <marostegui> maybe those hosts didn't get the column?
2023-01-23 15:00:39 <Amir1> beta works with update.php
2023-01-23 15:00:43 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882661 (owner: ''TrainBranchBot)'
2023-01-23 15:00:44 <bblack> hey
2023-01-23 15:00:46 <Amir1> marostegui: yeah, that's my guess
2023-01-23 15:00:52 <_joe_> let's wait to talk about what went wrong until things are stable
2023-01-23 15:00:54 <marostegui> I can add it quickly
2023-01-23 15:00:55 <wikibugs> ('PS2) ''Ottomata: Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649'
2023-01-23 15:00:57 <wikibugs> ('PS1) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 15:01:11 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
2023-01-23 15:01:18 <bblack> sounds like it's handled!
2023-01-23 15:01:18 <marostegui> both hosts are now depooled
2023-01-23 15:01:27 <_joe_> bblack: it's ongoing
2023-01-23 15:01:31 <taavi> sorry about this
2023-01-23 15:01:50 <_joe_> taavi: are you taking care of the rollback?
2023-01-23 15:01:55 <wikibugs> ('Merged) ''jenkins-bot: Revert "Enable Linter write namespace tag and template using core config" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882661 (owner: ''TrainBranchBot)'
2023-01-23 15:01:57 <taavi> yes, I am rolling the mediawiki changes back
2023-01-23 15:02:10 <logmsgbot> !log taavi@deploy1002 Started scap: Backport for [[gerrit:882661|Revert "Enable Linter write namespace tag and template using core config"]]
2023-01-23 15:02:15 <Amir1> yup, it's the linter error
2023-01-23 15:02:15 <brett> thanks for confirming
2023-01-23 15:02:16 <_joe_> ok, thanks
2023-01-23 15:02:23 <Amir1> Last_Error: Error 'Unknown column 'linter_template' in 'field list'' on query. Default database: 'metawiki'. Query: 'INSERT /* MediaWiki\Linter\Database::setForPage */ IGNORE INTO `>
2023-01-23 15:02:29 <marostegui> yeah the column isn't present
2023-01-23 15:02:33 <bblack> new column didn't exist in prod dbs yet?
2023-01-23 15:02:33 <marostegui> I am going to add them on db1105 and db1170
2023-01-23 15:02:35 <Amir1> gradual rollout people, please
2023-01-23 15:02:36 <bblack> ok
2023-01-23 15:02:45 <marostegui> the hosts are not serving traffic now
2023-01-23 15:02:48 <marostegui> so we should be good
2023-01-23 15:02:53 <Amir1> yeah
2023-01-23 15:02:57 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+1] "Looks good, one comment inline." [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
2023-01-23 15:03:02 <marostegui> I will add it and let you know taavi
2023-01-23 15:03:02 <wikibugs> ('CR) ''Jbond: admin/canary_appserver: add group of users allowed to disable puppet (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
2023-01-23 15:03:29 <taavi> marostegui: I'll revert it anyways, it can be re-enabled at some later window
2023-01-23 15:03:34 <marostegui> taavi: sounds good
2023-01-23 15:03:48 <logmsgbot> !log taavi@deploy1002 taavi and trainbranchbot: Backport for [[gerrit:882661|Revert "Enable Linter write namespace tag and template using core config"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
2023-01-23 15:04:04 <_joe_> taavi: I'd release everywhere tbh
2023-01-23 15:04:08 <wikibugs> ('PS2) ''Hashar: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656'
2023-01-23 15:04:22 <wikibugs> ('CR) ''Hashar: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 15:06:14 <Amir1> taavi: now that it's reverted, please do gradual roll out, first testwikis, then one section, etc.
2023-01-23 15:06:37 <taavi> sbailey: ^
2023-01-23 15:07:08 <_joe_> can we claim the incident is over?
2023-01-23 15:07:16 <sbailey> Ok, how do I verify all databases have had the 3 columns added?
2023-01-23 15:07:51 <sbailey> Yes will figure out how to do more gradual roll out
2023-01-23 15:07:51 <Amir1> it's not possible manually, you can do a drift report
2023-01-23 15:08:09 <Amir1> https://drift-tracker.toolforge.org/report/core/
2023-01-23 15:08:16 <marostegui> db1105:3312 is now fixed
2023-01-23 15:08:19 <Amir1> https://drift-tracker.toolforge.org/report/flaggedrevs/
2023-01-23 15:08:20 <marostegui> I am fixing db1170:3317
2023-01-23 15:09:12 <icinga-wm> RECOVERY - MariaDB Replica SQL: s2 #page on db1105 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
2023-01-23 15:09:39 <logmsgbot> !log taavi@deploy1002 Finished scap: Backport for [[gerrit:882661|Revert "Enable Linter write namespace tag and template using core config"]] (duration: 07m 28s)
2023-01-23 15:09:44 <taavi> revert was finally synced
2023-01-23 15:10:11 <jinxer-wm> (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
2023-01-23 15:11:18 <wikibugs> ('CR) ''Hashar: "The compile has been triggered for `pcc-worker1001.puppet-diffs.eqiad1.wikimedia.cloud` which is a noop https://puppet-compiler.wmflabs.or"; [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 15:11:40 <wikibugs> ('PS3) ''Hashar: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656'
2023-01-23 15:11:53 <wikibugs> ('CR) ''Hashar: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 15:13:02 <icinga-wm> PROBLEM - MariaDB Replica Lag: s7 #page on db1170 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 892.41 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
2023-01-23 15:13:31 <wikibugs> ('CR) ''Hashar: "PCC https://puppet-compiler.wmflabs.org/output/882656/1584/ and the diff is https://puppet-compiler.wmflabs.org/output/882656/1584/pcc-db1"; [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 15:14:02 <brett> _joe_: Is there an incident doc? Is it necessary to create one for this?
2023-01-23 15:14:39 <marostegui> brett: probably not need to
2023-01-23 15:14:48 <sbailey> Was it just two machines that didn't have the columns?
2023-01-23 15:14:55 <marostegui> looks so for now yes
2023-01-23 15:15:02 <Amir1> I'm running linter drift report to see geenrally what could be wrong
2023-01-23 15:15:11 <marostegui> db1170 should be fixed now
2023-01-23 15:15:29 <sbailey> Can we deploy this if it was just 2 machines?
2023-01-23 15:15:35 <wikibugs> ('PS1) ''BBlack: Possibly mitigate ATS bug with semicolon in Path [puppet] - ''https://gerrit.wikimedia.org/r/882663 (https://phabricator.wikimedia.org/T238285)'
2023-01-23 15:15:41 <marostegui> sbailey: no, let's make sure it was just those two
2023-01-23 15:15:41 <Amir1> it'll take a bit of time
2023-01-23 15:15:47 <sbailey> ok
2023-01-23 15:16:12 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
2023-01-23 15:16:16 <icinga-wm> RECOVERY - MariaDB Replica Lag: s7 #page on db1170 is OK: OK slave_sql_lag Replication lag: 0.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
2023-01-23 15:16:22 <icinga-wm> RECOVERY - MariaDB Replica SQL: s7 #page on db1170 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
2023-01-23 15:16:40 <wikibugs> ('PS2) ''BBlack: Possibly mitigate ATS bug with semicolon in Path [puppet] - ''https://gerrit.wikimedia.org/r/882663 (https://phabricator.wikimedia.org/T238285)'
2023-01-23 15:16:42 <marostegui> I am now repooling both hosts
2023-01-23 15:16:43 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
2023-01-23 15:17:20 <sukhe> !log reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
2023-01-23 15:17:23 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 15:17:24 <stashbot> T325563: Package and deploy ATS 9.1.4 - https://phabricator.wikimedia.org/T325563
2023-01-23 15:17:42 <wikibugs> ('PS1) ''Bking: wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096)'
2023-01-23 15:19:39 <Amir1> sbailey: FWIW, I'm seeing drift on linter_params in every wiki:
2023-01-23 15:19:44 <Amir1> https://www.irccloud.com/pastebin/IsQT61W4/
2023-01-23 15:20:17 <Amir1> this means the field is nullable in code but not production or other way around
2023-01-23 15:21:11 <sbailey> Ah, hmm. How can beta be ok but others not?
2023-01-23 15:21:16 <wikibugs> ('CR) ''CI reject: [V: ''-1] Release 6.0.11-1wm1 [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
2023-01-23 15:21:20 <wikibugs> ('CR) ''JMeybohm: "Could you be more explicit and allow access to those ports by only the spawned job pods?" [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 15:22:02 <sbailey> Amir1, can we chat on slack offline so I can fix/understand how this might happen?
2023-01-23 15:22:34 <Amir1> sure
2023-01-23 15:23:16 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
2023-01-23 15:26:20 <wikibugs> ('CR) ''Ssingh: "Updated typo, ignoring the build failure as expected." [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
2023-01-23 15:28:31 <wikibugs> ('CR) ''DCausse: "seems like wdqs1010 is missing from ferm" [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
2023-01-23 15:31:00 <wikibugs> ('PS2) ''Bking: wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096)'
2023-01-23 15:31:17 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
2023-01-23 15:31:48 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
2023-01-23 15:32:00 <wikibugs> ('CR) ''Bking: wdqs: mount NFS to new hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
2023-01-23 15:32:53 <wikibugs> ('CR) ''Ssingh: [V: ''+2 C: ''+2] Release 6.0.11-1wm1 [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
2023-01-23 15:34:42 <wikibugs> ('CR) ''DCausse: [C: ''+1] wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
2023-01-23 15:35:54 <wikibugs> ('CR) ''Bking: wdqs: mount NFS to new hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
2023-01-23 15:37:30 <wikibugs> ('PS1) ''Vgutierrez: Stop parsing semi-colon as a URL path delimiter [debs/trafficserver] - ''https://gerrit.wikimedia.org/r/882667'
2023-01-23 15:37:34 <wikibugs> ('PS1) ''Elukey: changeprop: fix liftwing's body settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/882668 (https://phabricator.wikimedia.org/T327302)'
2023-01-23 15:40:29 <urbanecm> marostegui: is it ok if i ship a sec patch now? or should i wait a bit for the DB fixes to be finished?
2023-01-23 15:40:57 <wikibugs> ('CR) ''Hnowlan: [C: ''+1] "LGTM based on the example configs used by changeprop!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/882668 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 15:41:16 <marostegui> urbanecm: it should be fine
2023-01-23 15:41:24 <wikibugs> ('CR) ''Bking: [C: ''+2] wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
2023-01-23 15:41:26 <urbanecm> thank you, proceeding.
2023-01-23 15:44:25 <wikibugs> ('CR) ''Ottomata: [C: ''+2] Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649 (owner: ''Ottomata)'
2023-01-23 15:44:36 <papaul> !log on going maintenance on fasw-codfw
2023-01-23 15:44:37 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 15:46:01 <wikibugs> ('CR) ''Elukey: [C: ''+2] changeprop: fix liftwing's body settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/882668 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
2023-01-23 15:46:22 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
2023-01-23 15:46:53 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
2023-01-23 15:48:38 <logmsgbot> !log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
2023-01-23 15:48:49 <logmsgbot> !log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
2023-01-23 15:49:23 <wikibugs> ('Merged) ''jenkins-bot: Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649 (owner: ''Ottomata)'
2023-01-23 15:50:30 <wikibugs> ('PS2) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 15:50:32 <urbanecm> !log Deploy security patch for T327613
2023-01-23 15:50:34 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 15:51:09 <icinga-wm> PROBLEM - BGP status on pfw3-codfw is CRITICAL: BGP CRITICAL - AS64600/IPv4: Idle - PyBal, AS64600/IPv4: Idle - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
2023-01-23 15:51:25 <wikibugs> ('CR) ''CI reject: [V: ''-1] flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 15:51:27 <wikibugs> ('PS3) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 15:51:33 <icinga-wm> PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 34, down: 10, dormant: 0, excluded: 3, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
2023-01-23 15:52:11 <wikibugs> ('CR) ''CI reject: [V: ''-1] flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 15:52:30 <wikibugs> ('PS4) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 15:53:51 <sukhe> !log reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
2023-01-23 15:53:53 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 15:53:54 <stashbot> T326634: Package and deploy varnish 6.0.11 - https://phabricator.wikimedia.org/T326634
2023-01-23 15:54:11 <wikibugs> 'SRE, ''Traffic, ''Patch-For-Review: Package and deploy varnish 6.0.11 - https://phabricator.wikimedia.org/T326634 (''ssingh)'
2023-01-23 15:59:15 <icinga-wm> RECOVERY - BGP status on pfw3-codfw is OK: BGP OK - up: 5, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
2023-01-23 15:59:34 <urbanecm> the secpatch's deployment is done
2023-01-23 15:59:38 <wikibugs> ('CR) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 15:59:43 <icinga-wm> RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 58, down: 0, dormant: 0, excluded: 3, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
2023-01-23 16:01:28 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
2023-01-23 16:01:58 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
2023-01-23 16:04:20 <wikibugs> ('CR) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 16:08:00 <wikibugs> 'SRE, ''Traffic, ''Traffic-Icebox, ''WMF-General-or-Unknown, and 3 others: Pages whose title ends with semicolon (;) are intermittently inaccessible (likely due to ATS) - https://phabricator.wikimedia.org/T238285 (''Pigsonthewing) T261624 was merged here; in that ticket I asked: > On testing, I can see t...'
2023-01-23 16:11:45 <wikibugs> ('CR) ''Jbond: "@alex, i think ill take over this CR unless there are objections" [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 16:12:16 <wikibugs> ('CR) ''Jbond: Fix xihua's account (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 16:15:28 <wikibugs> ('PS5) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 16:16:00 <wikibugs> 'SRE, ''ops-esams, ''DC-Ops: ripe-atlas-esams down - https://phabricator.wikimedia.org/T303242 (''RobH)'
2023-01-23 16:16:02 <wikibugs> 'SRE, ''ops-esams, ''DC-Ops, ''Infrastructure-Foundations, ''decommission-hardware: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (''RobH) ''Open''Declined device resurrected itself, decom task declined as its now reporting into ripe portal'
2023-01-23 16:16:33 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
2023-01-23 16:16:39 <wikibugs> ('CR) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 16:17:03 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
2023-01-23 16:21:39 <wikibugs> ('PS1) ''Ottomata: flink - avoid adding an extra 'k8s_api_enabled' label by using component label instead [deployment-charts] - ''https://gerrit.wikimedia.org/r/882680 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 16:24:52 <wikibugs> ('PS1) ''Stang: newiki: Add new permissions to group reviewer [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882681 (https://phabricator.wikimedia.org/T327114)'
2023-01-23 16:25:22 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''fundraising-tech-ops, ''netops: Upgrade fasw to Junos 21 - https://phabricator.wikimedia.org/T316542 (''Papaul)'
2023-01-23 16:25:30 <wikibugs> ('CR) ''BCornwall: [V: ''+1 C: ''+2] tlsproxy: Remove nginx_tune_for_media [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730) (owner: ''BCornwall)'
2023-01-23 16:25:36 <wikibugs> ('PS2) ''BCornwall: tlsproxy: Remove nginx_tune_for_media [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730)'
2023-01-23 16:26:19 <wikibugs> ('PS1) ''Jdrewniak: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882682 (https://phabricator.wikimedia.org/T128546)'
2023-01-23 16:27:04 <wikibugs> ('CR) ''BCornwall: [V: ''+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39208/console"; [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730) (owner: ''BCornwall)'
2023-01-23 16:29:36 <wikibugs> ('CR) ''Ottomata: [C: ''+2] flink-app - explicitly set Flink ports and configure ingress netpol for them (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 16:29:44 <wikibugs> ('CR) ''Ottomata: [C: ''+2] flink - avoid adding an extra 'k8s_api_enabled' label by using component label instead [deployment-charts] - ''https://gerrit.wikimedia.org/r/882680 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 16:30:05 <jouncebot> jan_drewniak: OwO what's this, a deployment window?? Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1630). nyaa~
2023-01-23 16:30:39 <wikibugs> ('CR) ''Jdrewniak: [C: ''+2] Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882682 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2023-01-23 16:31:23 <wikibugs> ('Merged) ''jenkins-bot: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882682 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2023-01-23 16:31:38 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
2023-01-23 16:32:08 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
2023-01-23 16:34:28 <wikibugs> ('Merged) ''jenkins-bot: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 16:34:31 <wikibugs> ('Merged) ''jenkins-bot: flink - avoid adding an extra 'k8s_api_enabled' label by using component label instead [deployment-charts] - ''https://gerrit.wikimedia.org/r/882680 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
2023-01-23 16:35:06 <wikibugs> ('PS4) ''Jbond: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 16:35:07 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
2023-01-23 16:35:09 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
2023-01-23 16:35:55 <wikibugs> ('CR) ''Jbond: [C: ''+2] "updated slightly, thanks" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 16:36:12 <wikibugs> ('CR) ''Jbond: [V: ''+2 C: ''+2] puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 16:39:28 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Ottomata) Approved by me. I think we need someone at WMF to approve/sponser @taavi's membership in this group though. @taavi, could someone maybe in Cloud VPS do this for you?'
2023-01-23 16:40:00 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
2023-01-23 16:40:04 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
2023-01-23 16:41:12 <wikibugs> ('CR) ''Hashar: "I can confirm it makes Firefox pretty print the pson.gz ;) Thank you!" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
2023-01-23 16:41:23 <wikibugs> ('CR) ''Jbond: [C: ''+2] admin: Add check for duplicate uid's (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
2023-01-23 16:41:39 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
2023-01-23 16:41:43 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
2023-01-23 16:41:55 <wikibugs> ('CR) ''Jbond: [C: ''+2] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
2023-01-23 16:42:02 <logmsgbot> !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:882682| Bumping portals to master (T128546)]] (duration: 06m 48s)
2023-01-23 16:42:05 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2023-01-23 16:48:51 <logmsgbot> !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:882682| Bumping portals to master (T128546)]] (duration: 06m 48s)
2023-01-23 16:48:55 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2023-01-23 16:50:36 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 16:50:38 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 16:53:54 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users & analytics-product-users for Hxi-ctr - https://phabricator.wikimedia.org/T325004 (''jbond) ''Open''Resolved I have gone ahead and merged the changes to rename this account, please reopen if you have have...'
2023-01-23 16:56:46 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 16:56:48 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 16:58:02 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 16:58:05 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 16:59:41 <wikibugs> 'SRE, ''Traffic, ''Data Pipelines (Sprint 07): Document Impact of Jan 8&9 Traffic Data Loss - https://phabricator.wikimedia.org/T326658 (''Snwachukwu) #traffic Can you please confirm that there were cases of pages served in ##eqsin## but not reported in ##webrequest logs##.'
2023-01-23 17:02:26 <wikibugs> ('PS1) ''Ottomata: flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692'
2023-01-23 17:05:06 <wikibugs> ('PS2) ''Ottomata: flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692'
2023-01-23 17:05:51 <logmsgbot> !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 17:05:54 <logmsgbot> !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
2023-01-23 17:07:27 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
2023-01-23 17:22:32 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
2023-01-23 17:37:37 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
2023-01-23 17:39:26 <wikibugs> ('CR) ''Herron: [C: ''+1] conftool-data: add logstash[12]032 to kibana7 backend [puppet] - ''https://gerrit.wikimedia.org/r/881813 (owner: ''Cwhite)'
2023-01-23 17:41:48 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''jhathaway) Happy to sponsor @taavi for this request'
2023-01-23 17:44:02 <wikibugs> ('CR) ''Dzahn: [C: ''+2] idp: remove racktables related settings (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/881697 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 17:44:10 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert)'
2023-01-23 17:49:29 <wikibugs> ('PS3) ''Dzahn: idp: remove config for racktables [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405)'
2023-01-23 17:49:31 <wikibugs> ('PS1) ''Clément Goubert: admin: Grant taavi access to analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/882696 (https://phabricator.wikimedia.org/T327013)'
2023-01-23 17:50:22 <wikibugs> ('CR) ''Dzahn: [C: ''+1] "lgtm, has approval from ottomata and another SRE as sponsor" [puppet] - ''https://gerrit.wikimedia.org/r/882696 (https://phabricator.wikimedia.org/T327013) (owner: ''Clément Goubert)'
2023-01-23 17:50:43 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) @taavi Patch ready, assuming you don't need kerberos access. Here are the [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_access#U...'
2023-01-23 17:50:46 <icinga-wm> ACKNOWLEDGEMENT - cassandra-a SSL 10.64.0.144:7001 on sessionstore1001 is CRITICAL: SSL CRITICAL - Certificate sessionstore1001-a valid until 2023-02-22 11:12:05 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 17:50:46 <icinga-wm> ACKNOWLEDGEMENT - cassandra-a SSL 10.64.32.85:7001 on sessionstore1002 is CRITICAL: SSL CRITICAL - Certificate sessionstore1002-a valid until 2023-02-22 11:12:08 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 17:50:46 <icinga-wm> ACKNOWLEDGEMENT - cassandra-a SSL 10.64.48.178:7001 on sessionstore1003 is CRITICAL: SSL CRITICAL - Certificate sessionstore1003-a valid until 2023-02-22 11:12:10 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 17:50:46 <icinga-wm> ACKNOWLEDGEMENT - cassandra-a SSL 10.192.16.95:7001 on sessionstore2001 is CRITICAL: SSL CRITICAL - Certificate sessionstore2001-a valid until 2023-02-22 11:12:13 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 17:50:46 <icinga-wm> ACKNOWLEDGEMENT - cassandra-a SSL 10.192.32.101:7001 on sessionstore2002 is CRITICAL: SSL CRITICAL - Certificate sessionstore2002-a valid until 2023-02-22 11:12:16 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 17:50:46 <icinga-wm> ACKNOWLEDGEMENT - cassandra-a SSL 10.192.48.132:7001 on sessionstore2003 is CRITICAL: SSL CRITICAL - Certificate sessionstore2003-a valid until 2023-02-22 11:12:18 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
2023-01-23 17:51:15 <wikibugs> ('CR) ''Clément Goubert: [C: ''+2] admin: Grant taavi access to analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/882696 (https://phabricator.wikimedia.org/T327013) (owner: ''Clément Goubert)'
2023-01-23 17:52:44 <logmsgbot> !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
2023-01-23 17:53:57 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) ''In progress''Resolved @taavi Access request merged, you should have your access around 30 minutes from now when puppet has run. R...'
2023-01-23 17:56:15 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "https://puppet-compiler.wmflabs.org/output/881938/39209/"; [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 17:56:42 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "https://gerrit.wikimedia.org/r/c/operations/puppet/+/881938"; [puppet] - ''https://gerrit.wikimedia.org/r/881697 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 17:56:54 <wikibugs> ('PS3) ''Hnowlan: thumbor: add and use haproxy healthz lvs check [puppet] - ''https://gerrit.wikimedia.org/r/880898 (https://phabricator.wikimedia.org/T233196)'
2023-01-23 17:57:00 <wikibugs> ('PS2) ''Hnowlan: thumbor: add failure condition to health check [deployment-charts] - ''https://gerrit.wikimedia.org/r/881635 (https://phabricator.wikimedia.org/T233196)'
2023-01-23 17:57:37 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "Notice: /Stage[main]/Apereo_cas/File[/etc/cas/services/racktables-18.json]/ensure: removed" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 17:58:43 <wikibugs> ('CR) ''Hnowlan: [V: ''+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39210/console"; [puppet] - ''https://gerrit.wikimedia.org/r/880898 (https://phabricator.wikimedia.org/T233196) (owner: ''Hnowlan)'
2023-01-23 17:59:58 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "IDP config was removed on both idp servers and Apache config was removed on miscweb, no problem when refreshing apache" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 18:00:05 <jouncebot> Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1800)
2023-01-23 18:00:05 <jouncebot> ryankemper: I, the Bot under the Fountain, call upon thee, The Deployer, to do Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1800).
2023-01-23 18:00:36 <wikibugs> ('PS5) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 18:00:39 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "spoke too soon :) apache2.service (apache2-apache2-after-network-online-target)]: Skipping because of failed dependencies" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 18:00:43 <wikibugs> ('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
2023-01-23 18:02:23 <icinga-wm> PROBLEM - Check systemd state on miscweb2002 is CRITICAL: CRITICAL - degraded: The following units failed: apache2.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 18:02:35 <wikibugs> ('CR) ''Hnowlan: [V: ''+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39212/console"; [puppet] - ''https://gerrit.wikimedia.org/r/880898 (https://phabricator.wikimedia.org/T233196) (owner: ''Hnowlan)'
2023-01-23 18:02:44 <logmsgbot> !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
2023-01-23 18:02:57 <logmsgbot> !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
2023-01-23 18:03:13 <wikibugs> ('CR) ''JHathaway: rspamd: vendor github.com/oxc/puppet-rspamd (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/870901 (https://phabricator.wikimedia.org/T325397) (owner: ''JHathaway)'
2023-01-23 18:03:17 <wikibugs> ('CR) ''JHathaway: [C: ''+2] rspamd: vendor github.com/oxc/puppet-rspamd [puppet] - ''https://gerrit.wikimedia.org/r/870901 (https://phabricator.wikimedia.org/T325397) (owner: ''JHathaway)'
2023-01-23 18:04:06 <icinga-wm> ACKNOWLEDGEMENT - Check systemd state on miscweb2002 is CRITICAL: CRITICAL - degraded: The following units failed: apache2.service daniel_zahn inactive server, debugging in progress https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 18:04:06 <icinga-wm> ACKNOWLEDGEMENT - Static CodeReview archive HTTP on miscweb2002 is CRITICAL: connect to address 10.192.16.211 and port 80: Connection refused daniel_zahn inactive server, debugging in progress https://wikitech.wikimedia.org/wiki/Static-codereview.wikimedia.org
2023-01-23 18:04:06 <icinga-wm> ACKNOWLEDGEMENT - racktables.wikimedia.org requires authentication on miscweb2002 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 503 Service Unavailable daniel_zahn inactive server, debugging in progress https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
2023-01-23 18:05:05 <mutante> !log miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
2023-01-23 18:05:07 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 18:07:47 <jinxer-wm> (JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 18:08:16 <mutante> !log miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load
2023-01-23 18:08:18 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 18:08:49 <icinga-wm> RECOVERY - Check systemd state on miscweb2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 18:13:33 <wikibugs> ('PS6) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 18:13:46 <wikibugs> ('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
2023-01-23 18:14:29 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "it still broke because this way puppet did not unload the CAS apache module. so technically should be "ensure absent" instead of just remo" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 18:18:17 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "profile::idp::client::httpd would need to first get an "$ensure" class parameter that absents the mod_conf and the libapache2-mod-auth-cas" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 18:19:38 <mutante> !log miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - T327405
2023-01-23 18:19:41 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 18:19:42 <stashbot> T327405: Decommission Racktables - https://phabricator.wikimedia.org/T327405
2023-01-23 18:22:37 <wikibugs> ('PS7) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 18:23:03 <wikibugs> ('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
2023-01-23 18:28:26 <wikibugs> ('CR) ''Ssingh: "Sorry, I skipped reviewing this for quite a while. Are we still planning on merging these or are we doing a top-level declaration instead?" [puppet] - ''https://gerrit.wikimedia.org/r/863294 (https://phabricator.wikimedia.org/T308013) (owner: ''Muehlenhoff)'
2023-01-23 18:30:20 <wikibugs> ('PS8) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 18:30:59 <wikibugs> 'SRE: profile::idp::client::httpd should be absent-able - https://phabricator.wikimedia.org/T327678 (''Dzahn)'
2023-01-23 18:31:05 <wikibugs> ('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
2023-01-23 18:31:12 <wikibugs> ('PS1) ''Jelto: gitlab: exclude shell scripts and other backups from rsync jobs [puppet] - ''https://gerrit.wikimedia.org/r/882704 (https://phabricator.wikimedia.org/T274463)'
2023-01-23 18:32:13 <wikibugs> 'SRE, ''Infrastructure-Foundations: profile::idp::client::httpd should be absent-able - https://phabricator.wikimedia.org/T327678 (''Dzahn)'
2023-01-23 18:33:31 <wikibugs> ('PS9) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 18:38:32 <wikibugs> ('PS1) ''Jforrester: Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882705'
2023-01-23 18:38:34 <wikibugs> ('PS1) ''Jforrester: Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882706'
2023-01-23 18:42:39 <wikibugs> ('CR) ''Dzahn: [C: ''+1] "looks good to me" [puppet] - ''https://gerrit.wikimedia.org/r/882704 (https://phabricator.wikimedia.org/T274463) (owner: ''Jelto)'
2023-01-23 18:48:27 <mutante> !log miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
2023-01-23 18:48:29 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 18:48:58 <jinxer-wm> (KubernetesAPILatency) firing: High Kubernetes API latency (UPDATE certificaterequests) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2023-01-23 18:50:15 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "unloaded the module and removed the package manually on miscweb*, which are fine now. also did a follow-up ticket but not sure how importa" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 18:50:54 <wikibugs> ('PS2) ''Dzahn: miscweb: remove racktables profile from miscweb role [puppet] - ''https://gerrit.wikimedia.org/r/881694 (https://phabricator.wikimedia.org/T327405)'
2023-01-23 18:51:09 <wikibugs> 'SRE-swift-storage, ''Wikimedia-production-error: FileBackendError: Iterator page I/O error. - https://phabricator.wikimedia.org/T327681 (''TheresNoTime)'
2023-01-23 18:51:25 <wikibugs> ('CR) ''Ottomata: [C: ''+2] flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692 (owner: ''Ottomata)'
2023-01-23 18:53:58 <jinxer-wm> (KubernetesAPILatency) resolved: High Kubernetes API latency (UPDATE certificaterequests) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2023-01-23 18:55:24 <TheresNoTime> Just surfacing that T327681 is intermittently causing user facing exceptions — fairly low rate, not consistently repeatable
2023-01-23 18:55:24 <stashbot> T327681: FileBackendError: Iterator page I/O error. - https://phabricator.wikimedia.org/T327681
2023-01-23 18:57:33 <wikibugs> ('Merged) ''jenkins-bot: flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692 (owner: ''Ottomata)'
2023-01-23 19:10:11 <jinxer-wm> (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
2023-01-23 19:11:05 <wikibugs> 'SRE, ''Infrastructure-Foundations: profile::idp::client::httpd should be absent-able - https://phabricator.wikimedia.org/T327678 (''Dzahn) p:''Triage''Low'
2023-01-23 19:12:14 <wikibugs> ('CR) ''Dzahn: [C: ''+2] "https://puppet-compiler.wmflabs.org/output/881694/39213/"; [puppet] - ''https://gerrit.wikimedia.org/r/881694 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 19:16:18 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
2023-01-23 19:16:31 <logmsgbot> !log eevans@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
2023-01-23 19:17:53 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
2023-01-23 19:17:55 <logmsgbot> !log eevans@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
2023-01-23 19:18:17 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
2023-01-23 19:19:00 <wikibugs> ('CR) ''Dzahn: "we should not forget there is also this include: modules/profile/manifests/mariadb/grants/production.pp: include passwords::racktables " [puppet] - ''https://gerrit.wikimedia.org/r/881701 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
2023-01-23 19:24:44 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
2023-01-23 19:30:27 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
2023-01-23 19:36:24 <wikibugs> ('PS1) ''Jdrewniak: Enable Page Tools for logged-in users on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882715 (https://phabricator.wikimedia.org/T327686)'
2023-01-23 19:37:45 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
2023-01-23 19:41:49 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
2023-01-23 19:48:57 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
2023-01-23 19:49:30 <wikibugs> 'SRE, ''Domains, ''Traffic-Icebox: Redirecting incoming queries to non-existent subpages (due to Godaddy behavior on some external WikiJournal sites) - https://phabricator.wikimedia.org/T212914 (''BCornwall) ''Open''Resolved a:''BCornwall It looks like they've managed to escape the talons of godaddy...'
2023-01-23 19:58:51 <wikibugs> ('PS1) ''BCornwall: varnish: Reword misc-frontend vcl_switch comment [puppet] - ''https://gerrit.wikimedia.org/r/882716 (https://phabricator.wikimedia.org/T205988)'
2023-01-23 19:59:13 <wikibugs> 'SRE, ''Traffic-Icebox, ''Patch-For-Review: Simplify comment misc-frontend.inc.vcl.erb - https://phabricator.wikimedia.org/T205988 (''BCornwall) ''Open''In progress a:''BCornwall'
2023-01-23 19:59:25 <wikibugs> 'SRE: Expired puppet certificates - https://phabricator.wikimedia.org/T260110 (''Aklapper)'
2023-01-23 19:59:37 <wikibugs> 'SRE, ''Traffic-Icebox, ''Patch-For-Review: Simplify comment misc-frontend.inc.vcl.erb - https://phabricator.wikimedia.org/T205988 (''BCornwall) Since this ticket is relevant to the comment itself, let's just fix that and follow-up with another, more detailed description of what needs refactoring.'
2023-01-23 20:01:13 <wikibugs> 'SRE, ''CheckUser, ''Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368 (''Urbanecm)'
2023-01-23 20:05:58 <wikibugs> ('PS2) ''Krinkle: Use core's PoolCounterClient [mediawiki-config] - ''https://gerrit.wikimedia.org/r/881466 (https://phabricator.wikimedia.org/T327336) (owner: ''Zabe)'
2023-01-23 20:07:15 <wikibugs> ('CR) ''Krinkle: [C: ''+1] "LGTM. This needs careful testing on mwdebug with PC hits and misses, e.g. browse old and current revisions on various articles and confirm" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/881466 (https://phabricator.wikimedia.org/T327336) (owner: ''Zabe)'
2023-01-23 20:10:28 <wikibugs> 'SRE, ''Traffic-Icebox: Consider adding expect-CT: header to enforce certificate transparency - https://phabricator.wikimedia.org/T193521 (''BCornwall) ''Open''Invalid It's sad that no action was taken in the years since the report has been opened, but it appears that @tgr is correct and it's ready to be...'
2023-01-23 20:19:24 <wikibugs> ('CR) ''Dzahn: admin/canary_appserver: add group of users allowed to disable puppet (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
2023-01-23 20:23:19 <wikibugs> 'SRE, ''Traffic-Icebox: Unwanted service startups and their triggers - https://phabricator.wikimedia.org/T191017 (''BCornwall) ''Open''Resolved a:''BCornwall `systemctl mask` achieves what is desired here and has been successfully implemented with varnishncsa.service and varnishlog.service (see `Change...'
2023-01-23 20:26:14 <wikibugs> ('PS1) ''Andrea Denisse: centrallog2002: Apply partman standard software raid recipe [puppet] - ''https://gerrit.wikimedia.org/r/882718 (https://phabricator.wikimedia.org/T313858)'
2023-01-23 20:31:46 <wikibugs> ('CR) ''Andrea Denisse: [V: ''+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39214/console"; [puppet] - ''https://gerrit.wikimedia.org/r/882718 (https://phabricator.wikimedia.org/T313858) (owner: ''Andrea Denisse)'
2023-01-23 20:45:16 <taavi> !log restart T315510 on group1 after mwmaint restart, currently running on wikidatawiki
2023-01-23 20:45:18 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 20:45:20 <stashbot> T315510: Start maintenance script to backfill talk page comment database - https://phabricator.wikimedia.org/T315510
2023-01-23 20:45:40 <wikibugs> ('CR) ''Thcipriani: [C: ''+1] admin/canary_appserver: add group of users allowed to disable puppet [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
2023-01-23 20:45:54 <wikibugs> ('CR) ''Andrea Denisse: "PCC results: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39214/console"; [puppet] - ''https://gerrit.wikimedia.org/r/882718 (https://phabricator.wikimedia.org/T313858) (owner: ''Andrea Denisse)'
2023-01-23 20:56:10 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
2023-01-23 20:56:14 <logmsgbot> !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
2023-01-23 20:58:58 <wikibugs> 'SRE, ''Traffic-Icebox, ''Patch-For-Review: Remove unused plain HTTP services from LVS - https://phabricator.wikimedia.org/T236065 (''BCornwall) I also am not sure of how to find out consumers of the HTTP-only services, but I've created a WIP patch that at least lists the candidates.'
2023-01-23 21:00:05 <jouncebot> RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T2100).
2023-01-23 21:00:05 <jouncebot> jan_drewniak: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2023-01-23 21:00:19 <kindrobot> I can deploy.
2023-01-23 21:00:56 <jan_drewniak> kindrobot: ok thanks
2023-01-23 21:01:54 <kindrobot> !log start UTC late backport window
2023-01-23 21:01:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 21:02:20 <wikibugs> ('CR) ''TrainBranchBot: [C: ''+2] "Approved by kindrobot@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882715 (https://phabricator.wikimedia.org/T327686) (owner: ''Jdrewniak)'
2023-01-23 21:02:57 <wikibugs> ('Merged) ''jenkins-bot: Enable Page Tools for logged-in users on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882715 (https://phabricator.wikimedia.org/T327686) (owner: ''Jdrewniak)'
2023-01-23 21:03:10 <TheresNoTime> (thanks kindrobot, I'm finally back in the "right timezone" so should be able to pick up more again!)
2023-01-23 21:03:11 <logmsgbot> !log kindrobot@deploy1002 Started scap: Backport for [[gerrit:882715|Enable Page Tools for logged-in users on enwiki (T327686)]]
2023-01-23 21:03:15 <stashbot> T327686: Deploy page tools for logged-in users on English Wikipedia - https://phabricator.wikimedia.org/T327686
2023-01-23 21:04:35 <kindrobot> My pleasure TheresNoTime! The only days I am free for this window are Monday and Wednesday, so I try to pick up one of those a week if I can.
2023-01-23 21:04:54 <logmsgbot> !log kindrobot@deploy1002 jdrewniak and kindrobot: Backport for [[gerrit:882715|Enable Page Tools for logged-in users on enwiki (T327686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
2023-01-23 21:05:04 <kindrobot> jan_drewniak: can you confirm?
2023-01-23 21:05:58 <jan_drewniak> kindrobot: yup looks good
2023-01-23 21:06:08 <kindrobot> Great, syncing.
2023-01-23 21:09:29 <wikibugs> ('PS1) ''Andrea Denisse: centrallog1002: Add to eqiad anycast_neighbors [homer/public] - ''https://gerrit.wikimedia.org/r/882724 (https://phabricator.wikimedia.org/T318778)'
2023-01-23 21:10:09 <wikibugs> ('PS2) ''Bking: flink-kubernetes-operator: bump version to 1.3.1 [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/881907 (https://phabricator.wikimedia.org/T324576)'
2023-01-23 21:11:45 <wikibugs> ('CR) ''Ottomata: flink-kubernetes-operator: bump version to 1.3.1 (''1 comment) [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/881907 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
2023-01-23 21:12:12 <logmsgbot> !log kindrobot@deploy1002 Finished scap: Backport for [[gerrit:882715|Enable Page Tools for logged-in users on enwiki (T327686)]] (duration: 09m 00s)
2023-01-23 21:12:16 <stashbot> T327686: Deploy page tools for logged-in users on English Wikipedia - https://phabricator.wikimedia.org/T327686
2023-01-23 21:12:43 <kindrobot> !log close UTC late backport window
2023-01-23 21:12:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 21:22:11 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
2023-01-23 21:23:36 <wikibugs> ('PS1) ''Zabe: throttle: Remove expired rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882746'
2023-01-23 21:26:13 <wikibugs> 'SRE, ''DNS, ''Traffic-Icebox, ''Wikimedia-Apache-configuration, ''Patch-For-Review: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia - https://phabricator.wikimedia.org/T230382 (''BCornwall) I've brought the issue up with langcom on their [[ https://meta.wikimedia.org/wiki/Talk:Language_c...'
2023-01-23 21:26:23 <wikibugs> 'SRE, ''DNS, ''Traffic-Icebox, ''Wikimedia-Apache-configuration, ''Patch-For-Review: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia - https://phabricator.wikimedia.org/T230382 (''BCornwall) ''Open''In progress'
2023-01-23 21:29:20 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
2023-01-23 21:31:25 <wikibugs> ('CR) ''Zabe: [C: ''+2] throttle: Remove expired rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882746 (owner: ''Zabe)'
2023-01-23 21:31:47 <wikibugs> ('PS1) ''Andrea Denisse: centrallog: Add centrallog1002 as Kafka broker [puppet] - ''https://gerrit.wikimedia.org/r/882747 (https://phabricator.wikimedia.org/T318778)'
2023-01-23 21:32:09 <wikibugs> ('Merged) ''jenkins-bot: throttle: Remove expired rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882746 (owner: ''Zabe)'
2023-01-23 21:32:53 <logmsgbot> !log zabe@deploy1002 Started scap: Backport for [[gerrit:882746|throttle: Remove expired rule]]
2023-01-23 21:34:35 <logmsgbot> !log zabe@deploy1002 zabe: Backport for [[gerrit:882746|throttle: Remove expired rule]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
2023-01-23 21:35:26 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
2023-01-23 21:36:11 <wikibugs> ('PS1) ''Nray: Work around sticky-positioned layers disabling subpixel rendering [skins/Vector] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882727 (https://phabricator.wikimedia.org/T327460)'
2023-01-23 21:40:30 <wikibugs> 'SRE, ''Traffic-Icebox, ''Patch-For-Review: Remove unused plain HTTP services from LVS - https://phabricator.wikimedia.org/T236065 (''BCornwall) ''Open''In progress bblack has some ideas: ` 13:27 <bblack> we don't have a 100% reliable spot-check to know for sure 13:27 <bblack> but yeah, we can guestim...'
2023-01-23 21:40:39 <logmsgbot> !log zabe@deploy1002 Finished scap: Backport for [[gerrit:882746|throttle: Remove expired rule]] (duration: 07m 45s)
2023-01-23 21:42:51 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
2023-01-23 21:49:25 <wikibugs> ('CR) ''Cwhite: [C: ''+2] logstash: enable filters for ecs 1.11.0 [puppet] - ''https://gerrit.wikimedia.org/r/881812 (https://phabricator.wikimedia.org/T326794) (owner: ''Cwhite)'
2023-01-23 22:00:04 <jouncebot> Reedy, sbassett, Maryum, and manfredi: It is that lovely time of the day again! You are hereby commanded to deploy Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T2200).
2023-01-23 22:07:47 <jinxer-wm> (JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2023-01-23 22:08:35 <sbassett> Hey all - had a couple of security patches we were going to try to deploy today: T285159, T296593
2023-01-23 22:14:18 <jinxer-wm> (ProbeDown) firing: (2) Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2023-01-23 22:14:49 <icinga-wm> PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers thanos-fe1003.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers thanos-fe1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
2023-01-23 22:15:15 <icinga-wm> PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 22:16:27 <icinga-wm> RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
2023-01-23 22:19:18 <jinxer-wm> (ProbeDown) resolved: (2) Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2023-01-23 22:22:01 <wikibugs> ('PS1) ''Bking: dse-k8s: add rdf-streaming-update-ng namespace [puppet] - ''https://gerrit.wikimedia.org/r/882748 (https://phabricator.wikimedia.org/T289836)'
2023-01-23 22:24:40 <wikibugs> ('CR) ''Cwhite: [C: ''-1] mediawiki: Update ecs logging to 1.11.0 (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
2023-01-23 22:25:54 <wikibugs> ('CR) ''Cwhite: [C: ''+2] Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
2023-01-23 22:26:54 <wikibugs> ('CR) ''Cwhite: [C: ''+2] add error.stack.previous_trace field [software/ecs] - ''https://gerrit.wikimedia.org/r/831943 (https://phabricator.wikimedia.org/T314098) (owner: ''Cwhite)'
2023-01-23 22:27:25 <wikibugs> ('Merged) ''jenkins-bot: add error.stack.previous_trace field [software/ecs] - ''https://gerrit.wikimedia.org/r/831943 (https://phabricator.wikimedia.org/T314098) (owner: ''Cwhite)'
2023-01-23 22:27:47 <wikibugs> ('CR) ''Cwhite: [C: ''+2] Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
2023-01-23 22:28:15 <wikibugs> ('Merged) ''jenkins-bot: Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
2023-01-23 22:28:46 <wikibugs> ('CR) ''Cwhite: [C: ''+2] role: remove kibana7_ecs role [puppet] - ''https://gerrit.wikimedia.org/r/879888 (owner: ''Cwhite)'
2023-01-23 22:31:58 <maryum> !log Deployed patch for T285159
2023-01-23 22:32:00 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 22:33:47 <wikibugs> ('CR) ''Cwhite: [C: ''+1] "Tested upgrade and initial install on beta. Works great!" [puppet] - ''https://gerrit.wikimedia.org/r/849631 (https://phabricator.wikimedia.org/T304440) (owner: ''Hashar)'
2023-01-23 22:37:13 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
2023-01-23 22:45:09 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
2023-01-23 22:46:24 <ryankemper> !log [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
2023-01-23 22:46:25 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 22:48:59 <logmsgbot> !log ryankemper@deploy1002 Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
2023-01-23 22:49:52 <ryankemper> !log [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
2023-01-23 22:49:53 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 22:52:42 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
2023-01-23 22:56:29 <logmsgbot> !log ryankemper@deploy1002 Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
2023-01-23 22:57:43 <ryankemper> !log [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
2023-01-23 22:57:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 22:57:48 <ryankemper> !log [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
2023-01-23 22:57:50 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 22:57:55 <ryankemper> !log [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
2023-01-23 22:57:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2023-01-23 22:59:50 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
2023-01-23 23:07:41 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
2023-01-23 23:10:11 <jinxer-wm> (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
2023-01-23 23:11:33 <icinga-wm> RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2023-01-23 23:16:51 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
2023-01-23 23:17:12 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
2023-01-23 23:24:20 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
2023-01-23 23:24:32 <logmsgbot> !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
2023-01-23 23:31:35 <wikibugs> ('CR) ''Cwhite: [C: ''-2] "Blocking until we can work out a path forward." [puppet] - ''https://gerrit.wikimedia.org/r/880500 (https://phabricator.wikimedia.org/T325806) (owner: ''Filippo Giunchedi)'
2023-01-23 23:31:43 <logmsgbot> !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
2023-01-23 23:51:55 <wikibugs> ('PS2) ''Cwhite: logstash: Add PTR resolution to firewall logs [puppet] - ''https://gerrit.wikimedia.org/r/880889 (https://phabricator.wikimedia.org/T327095) (owner: ''Ayounsi)'
2023-01-23 23:57:59 <wikibugs> ('CR) ''Cwhite: logstash: Add PTR resolution to firewall logs (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/880889 (https://phabricator.wikimedia.org/T327095) (owner: ''Ayounsi)'

This page is generated from SQL logs, you can also download static txt files from here