2023-01-23 00:00:20
|
<icinga-wm>
|
RECOVERY - Check systemd state on maps2009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 00:05:10
|
<icinga-wm>
|
PROBLEM - Check systemd state on maps2009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 00:31:36
|
<icinga-wm>
|
RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 00:36:26
|
<icinga-wm>
|
PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_legacy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 02:07:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (4) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 02:12:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (10) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 02:27:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 02:37:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 02:47:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 02:52:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (12) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 03:10:11
|
<jinxer-wm>
|
(Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
|
2023-01-23 03:35:13
|
<wikibugs>
|
('PS1) ''Gerrit maintenance bot: mariadb: Promote db2104 to s2 master [puppet] - ''https://gerrit.wikimedia.org/r/882253 (https://phabricator.wikimedia.org/T327609)'
|
2023-01-23 03:51:22
|
<wikibugs>
|
('CR) ''Ladsgroup: [C: ''+2] mariadb: Promote db2104 to s2 master [puppet] - ''https://gerrit.wikimedia.org/r/882253 (https://phabricator.wikimedia.org/T327609) (owner: ''Gerrit maintenance bot)'
|
2023-01-23 03:52:30
|
<Amir1>
|
!log Starting s2 codfw failover from db2107 to db2104 - T327609
|
2023-01-23 03:52:33
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 03:52:34
|
<stashbot>
|
T327609: Switchover s2 master (db2107 -> db2104) - https://phabricator.wikimedia.org/T327609
|
2023-01-23 03:54:59
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
|
2023-01-23 03:56:47
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 03:56:50
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 04:02:28
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 04:02:30
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 04:12:28
|
<wikibugs>
|
('CR) ''Ladsgroup: [C: ''+1] "Can you add the ticket?" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/868127 (owner: ''Daniel Kinzler)'
|
2023-01-23 04:28:01
|
<wikibugs>
|
('PS1) ''Gerrit maintenance bot: mariadb: Promote db2123 to s5 master [puppet] - ''https://gerrit.wikimedia.org/r/882254 (https://phabricator.wikimedia.org/T327611)'
|
2023-01-23 04:30:53
|
<wikibugs>
|
('Abandoned) ''Ladsgroup: mariadb: Promote db2104 to s2 master [puppet] - ''https://gerrit.wikimedia.org/r/881375 (https://phabricator.wikimedia.org/T327370) (owner: ''Gerrit maintenance bot)'
|
2023-01-23 04:32:53
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
|
2023-01-23 04:32:57
|
<stashbot>
|
T327611: Switchover s5 master (db2113 -> db2123) - https://phabricator.wikimedia.org/T327611
|
2023-01-23 04:33:10
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
|
2023-01-23 04:33:25
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
|
2023-01-23 04:51:45
|
<wikibugs>
|
('CR) ''Ladsgroup: [C: ''+2] mariadb: Promote db2123 to s5 master [puppet] - ''https://gerrit.wikimedia.org/r/882254 (https://phabricator.wikimedia.org/T327611) (owner: ''Gerrit maintenance bot)'
|
2023-01-23 04:53:32
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 04:53:34
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 04:57:08
|
<Amir1>
|
!log Starting s5 codfw failover from db2113 to db2123 - T327611
|
2023-01-23 04:57:11
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 04:57:12
|
<stashbot>
|
T327611: Switchover s5 master (db2113 -> db2123) - https://phabricator.wikimedia.org/T327611
|
2023-01-23 04:57:41
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
|
2023-01-23 04:59:40
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
|
2023-01-23 05:01:57
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 05:02:00
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 05:07:37
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 05:07:39
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 05:13:37
|
<wikibugs>
|
('PS1) ''KartikMistry: Content Translation: Add campaign for Wiki Loves Living Heritage [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882266 (https://phabricator.wikimedia.org/T327587)'
|
2023-01-23 05:33:43
|
<icinga-wm>
|
PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
|
2023-01-23 05:34:37
|
<icinga-wm>
|
RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
|
2023-01-23 05:50:49
|
<wikibugs>
|
('PS3) ''KartikMistry: Update cxserver to 2023-01-20-051603-production [deployment-charts] - ''https://gerrit.wikimedia.org/r/881051 (https://phabricator.wikimedia.org/T323840)'
|
2023-01-23 05:56:33
|
<kart_>
|
Updating cxserver in a few minutes..
|
2023-01-23 05:57:12
|
<wikibugs>
|
('CR) ''KartikMistry: [C: ''+2] Update cxserver to 2023-01-20-051603-production [deployment-charts] - ''https://gerrit.wikimedia.org/r/881051 (https://phabricator.wikimedia.org/T323840) (owner: ''KartikMistry)'
|
2023-01-23 06:02:07
|
<wikibugs>
|
('Merged) ''jenkins-bot: Update cxserver to 2023-01-20-051603-production [deployment-charts] - ''https://gerrit.wikimedia.org/r/881051 (https://phabricator.wikimedia.org/T323840) (owner: ''KartikMistry)'
|
2023-01-23 06:12:06
|
<logmsgbot>
|
!log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
|
2023-01-23 06:12:33
|
<logmsgbot>
|
!log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
|
2023-01-23 06:16:21
|
<logmsgbot>
|
!log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
|
2023-01-23 06:17:01
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 06:17:04
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 06:17:05
|
<logmsgbot>
|
!log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
|
2023-01-23 06:18:15
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 06:18:17
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 06:18:37
|
<logmsgbot>
|
!log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
|
2023-01-23 06:19:31
|
<logmsgbot>
|
!log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
|
2023-01-23 06:23:29
|
<kart_>
|
!log Updated cxserver to 2023-01-20-051603-production (T323840, T326236)
|
2023-01-23 06:23:34
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 06:23:35
|
<stashbot>
|
T326236: Post-creation work for gucwiki - https://phabricator.wikimedia.org/T326236
|
2023-01-23 06:23:35
|
<stashbot>
|
T323840: Make the Google translate the default Machine Translation in Central Kurdish Wikipedia - https://phabricator.wikimedia.org/T323840
|
2023-01-23 06:52:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 06:56:36
|
<wikibugs>
|
('PS1) ''Stang: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131)'
|
2023-01-23 06:58:40
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 06:58:42
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 07:02:54
|
<wikibugs>
|
('PS1) ''Stang: shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380)'
|
2023-01-23 07:05:26
|
<wikibugs>
|
('PS2) ''Stang: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131)'
|
2023-01-23 07:05:46
|
<wikibugs>
|
('PS3) ''Stang: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131)'
|
2023-01-23 07:08:50
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 07:08:52
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 07:09:43
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 07:09:45
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 07:10:11
|
<jinxer-wm>
|
(Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
|
2023-01-23 07:13:23
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1106 db1206 T326669', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
|
2023-01-23 07:13:27
|
<stashbot>
|
T326669: Productionize db1206-db1225 - https://phabricator.wikimedia.org/T326669
|
2023-01-23 07:22:08
|
<wikibugs>
|
('CR) ''Ayounsi: Add PTR resolution to firewall logs (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/880889 (https://phabricator.wikimedia.org/T327095) (owner: ''Ayounsi)'
|
2023-01-23 07:23:10
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
|
2023-01-23 07:24:00
|
<wikibugs>
|
('PS1) ''Marostegui: mariadb: Switch s1 sanitarium master [puppet] - ''https://gerrit.wikimedia.org/r/882515 (https://phabricator.wikimedia.org/T326669)'
|
2023-01-23 07:24:44
|
<wikibugs>
|
('CR) ''Marostegui: [C: ''+2] mariadb: Switch s1 sanitarium master [puppet] - ''https://gerrit.wikimedia.org/r/882515 (https://phabricator.wikimedia.org/T326669) (owner: ''Marostegui)'
|
2023-01-23 07:25:21
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
|
2023-01-23 07:25:31
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
|
2023-01-23 07:37:24
|
<wikibugs>
|
('CR) ''Ayounsi: WIP: add rt_flow grokking (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/880500 (https://phabricator.wikimedia.org/T325806) (owner: ''Filippo Giunchedi)'
|
2023-01-23 07:38:15
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
|
2023-01-23 07:40:26
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
|
2023-01-23 07:40:36
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
|
2023-01-23 07:41:52
|
<wikibugs>
|
('PS3) ''Stang: zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387)'
|
2023-01-23 07:42:33
|
<icinga-wm>
|
PROBLEM - puppet last run on idm-test1001 is CRITICAL: CRITICAL: Puppet has been disabled for 604942 seconds, message: test OIDC - slyngshede, last run 7 days ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
|
2023-01-23 07:43:52
|
<wikibugs>
|
('PS6) ''Elukey: changeprop: add liftwing revscoring streams [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302)'
|
2023-01-23 07:43:54
|
<wikibugs>
|
('PS7) ''Elukey: helmfile.d: add a new test workflow for Lifting to changeprop's staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302)'
|
2023-01-23 07:44:20
|
<wikibugs>
|
('CR) ''Elukey: "I added one last little change, namely the possibility to set the kafka topic :)" [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 07:44:47
|
<wikibugs>
|
('CR) ''Elukey: "Added the kafka topic parameter to the staging settings (now the chart allows to specify it)." [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 07:53:20
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
|
2023-01-23 07:55:31
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
|
2023-01-23 07:55:41
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
|
2023-01-23 08:00:05
|
<jouncebot>
|
Amir1 and Urbanecm: (Dis)respected human, time to deploy UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T0800). Please do the needful.
|
2023-01-23 08:00:05
|
<jouncebot>
|
MatmaRex: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
|
2023-01-23 08:00:41
|
<MatmaRex>
|
hi
|
2023-01-23 08:00:48
|
<MatmaRex>
|
is anyone really working at this hour? :D
|
2023-01-23 08:02:53
|
<Amir1>
|
MatmaRex: let me check
|
2023-01-23 08:03:58
|
<Amir1>
|
is the sync order okay?
|
2023-01-23 08:04:09
|
<_joe_>
|
sirenbot: wake up
|
2023-01-23 08:05:34
|
<MatmaRex>
|
Amir1: order shouldn't matter for this backport
|
2023-01-23 08:05:39
|
<_joe_>
|
sigh, didn't we give -O to it?
|
2023-01-23 08:06:32
|
<Amir1>
|
it could change the hashes of the modules and such but meh
|
2023-01-23 08:06:43
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by ladsgroup@deploy1002 using scap backport" [extensions/DiscussionTools] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882174 (https://phabricator.wikimedia.org/T327328) (owner: ''Bartosz Dziewoński)'
|
2023-01-23 08:08:25
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
|
2023-01-23 08:10:20
|
<wikibugs>
|
('PS1) ''Func: SpecialUserrights: Allow updating the expiry of user groups [core] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882179 (https://phabricator.wikimedia.org/T327605)'
|
2023-01-23 08:10:36
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
|
2023-01-23 08:10:46
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
|
2023-01-23 08:12:07
|
<icinga-wm>
|
PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
|
2023-01-23 08:12:27
|
<icinga-wm>
|
PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
|
2023-01-23 08:12:29
|
<wikibugs>
|
('Merged) ''jenkins-bot: Tweaks for new heading HTML structure [extensions/DiscussionTools] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882174 (https://phabricator.wikimedia.org/T327328) (owner: ''Bartosz Dziewoński)'
|
2023-01-23 08:12:47
|
<logmsgbot>
|
!log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:882174|Tweaks for new heading HTML structure (T327328 T327469)]]
|
2023-01-23 08:12:52
|
<stashbot>
|
T327469: Subscribe buttons/links are displayed out of place due to new heading HTML structure - https://phabricator.wikimedia.org/T327469
|
2023-01-23 08:12:52
|
<stashbot>
|
T327328: Highlight skips the topic container for new topics, which looks odd - https://phabricator.wikimedia.org/T327328
|
2023-01-23 08:13:57
|
<wikibugs>
|
('CR) ''Muehlenhoff: "You also need to remove profile::idp::client:httpd from profile::racktables, then it will work." [puppet] - ''https://gerrit.wikimedia.org/r/881697 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 08:14:15
|
<icinga-wm>
|
PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
|
2023-01-23 08:14:23
|
<taavi>
|
_joe_: sirenbot's +O was removed by ircservserv-wm_ with the last sync as that wasn't granted via its configuration
|
2023-01-23 08:14:41
|
<_joe_>
|
taavi: yeah I just saw, I thought it was
|
2023-01-23 08:14:51
|
<_joe_>
|
I remember someone writing a patch, I assumed it was merged
|
2023-01-23 08:15:03
|
<_joe_>
|
I'll fix it once I'm done writing docs
|
2023-01-23 08:15:49
|
<taavi>
|
yeah, it has +o not +O
|
2023-01-23 08:16:49
|
<icinga-wm>
|
RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49419 bytes in 0.061 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
|
2023-01-23 08:17:09
|
<icinga-wm>
|
RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.943 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
|
2023-01-23 08:17:19
|
<icinga-wm>
|
RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Fri 21 Apr 2023 05:11:22 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
|
2023-01-23 08:17:23
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730) (owner: ''BCornwall)'
|
2023-01-23 08:19:03
|
<wikibugs>
|
('CR) ''Majavah: [V: ''+1] ldap: move ssh-key-ldap-lookup directly to ssh module (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 08:22:33
|
<logmsgbot>
|
!log ladsgroup@deploy1002 ladsgroup and matmarex: Backport for [[gerrit:882174|Tweaks for new heading HTML structure (T327328 T327469)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
|
2023-01-23 08:22:38
|
<stashbot>
|
T327469: Subscribe buttons/links are displayed out of place due to new heading HTML structure - https://phabricator.wikimedia.org/T327469
|
2023-01-23 08:22:38
|
<stashbot>
|
T327328: Highlight skips the topic container for new topics, which looks odd - https://phabricator.wikimedia.org/T327328
|
2023-01-23 08:22:42
|
<Amir1>
|
MatmaRex: it's in mwbdeug now
|
2023-01-23 08:23:21
|
<MatmaRex>
|
Amir1: works as expected
|
2023-01-23 08:23:48
|
<Amir1>
|
deploying
|
2023-01-23 08:25:03
|
<wikibugs>
|
('PS1) ''Muehlenhoff: Remove openldap_corp role from ldap-corp* [puppet] - ''https://gerrit.wikimedia.org/r/882573 (https://phabricator.wikimedia.org/T323820)'
|
2023-01-23 08:25:41
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
|
2023-01-23 08:25:51
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
|
2023-01-23 08:30:00
|
<logmsgbot>
|
!log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:882174|Tweaks for new heading HTML structure (T327328 T327469)]] (duration: 17m 12s)
|
2023-01-23 08:30:05
|
<stashbot>
|
T327469: Subscribe buttons/links are displayed out of place due to new heading HTML structure - https://phabricator.wikimedia.org/T327469
|
2023-01-23 08:30:05
|
<stashbot>
|
T327328: Highlight skips the topic container for new topics, which looks odd - https://phabricator.wikimedia.org/T327328
|
2023-01-23 08:30:08
|
<Amir1>
|
MatmaRex: done
|
2023-01-23 08:30:36
|
<MatmaRex>
|
thanks Amir1!
|
2023-01-23 08:33:51
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+2] Remove openldap_corp role from ldap-corp* [puppet] - ''https://gerrit.wikimedia.org/r/882573 (https://phabricator.wikimedia.org/T323820) (owner: ''Muehlenhoff)'
|
2023-01-23 08:34:33
|
<wikibugs>
|
('CR) ''Zabe: [C: ''+2] Remove oversight group from privileged groups [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882217 (https://phabricator.wikimedia.org/T112147) (owner: ''Zabe)'
|
2023-01-23 08:35:25
|
<wikibugs>
|
('Merged) ''jenkins-bot: Remove oversight group from privileged groups [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882217 (https://phabricator.wikimedia.org/T112147) (owner: ''Zabe)'
|
2023-01-23 08:36:19
|
<logmsgbot>
|
!log ayounsi@deploy1002 Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
|
2023-01-23 08:36:37
|
<wikibugs>
|
('PS1) ''Zabe: Start reading from cuc_comment_id on wikidatawiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882577 (https://phabricator.wikimedia.org/T233004)'
|
2023-01-23 08:36:53
|
<wikibugs>
|
('CR) ''Zabe: [C: ''+2] Start reading from cuc_comment_id on wikidatawiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882577 (https://phabricator.wikimedia.org/T233004) (owner: ''Zabe)'
|
2023-01-23 08:37:28
|
<logmsgbot>
|
!log ayounsi@deploy1002 Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
|
2023-01-23 08:37:37
|
<wikibugs>
|
('Merged) ''jenkins-bot: Start reading from cuc_comment_id on wikidatawiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882577 (https://phabricator.wikimedia.org/T233004) (owner: ''Zabe)'
|
2023-01-23 08:37:56
|
<logmsgbot>
|
!log zabe@deploy1002 Started scap: Backport for [[gerrit:882217|Remove oversight group from privileged groups (T112147)]], [[gerrit:882577|Start reading from cuc_comment_id on wikidatawiki (T233004)]]
|
2023-01-23 08:38:01
|
<stashbot>
|
T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
|
2023-01-23 08:38:01
|
<stashbot>
|
T112147: Rename the oversight group on WMF projects to the MediaWiki standard (whatever that is) - https://phabricator.wikimedia.org/T112147
|
2023-01-23 08:39:37
|
<logmsgbot>
|
!log zabe@deploy1002 zabe: Backport for [[gerrit:882217|Remove oversight group from privileged groups (T112147)]], [[gerrit:882577|Start reading from cuc_comment_id on wikidatawiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
|
2023-01-23 08:40:46
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
|
2023-01-23 08:40:56
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
|
2023-01-23 08:42:40
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
|
2023-01-23 08:42:43
|
<stashbot>
|
T326669: Productionize db1206-db1225 - https://phabricator.wikimedia.org/T326669
|
2023-01-23 08:43:27
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
|
2023-01-23 08:45:44
|
<logmsgbot>
|
!log zabe@deploy1002 Finished scap: Backport for [[gerrit:882217|Remove oversight group from privileged groups (T112147)]], [[gerrit:882577|Start reading from cuc_comment_id on wikidatawiki (T233004)]] (duration: 07m 48s)
|
2023-01-23 08:45:49
|
<stashbot>
|
T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
|
2023-01-23 08:45:49
|
<stashbot>
|
T112147: Rename the oversight group on WMF projects to the MediaWiki standard (whatever that is) - https://phabricator.wikimedia.org/T112147
|
2023-01-23 08:46:22
|
<logmsgbot>
|
!log volans@cumin1001 START - Cookbook sre.dns.netbox
|
2023-01-23 08:47:35
|
<wikibugs>
|
('PS1) ''Marostegui: db1106: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/882578 (https://phabricator.wikimedia.org/T327616)'
|
2023-01-23 08:48:28
|
<wikibugs>
|
('CR) ''Marostegui: [C: ''+2] db1106: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/882578 (https://phabricator.wikimedia.org/T327616) (owner: ''Marostegui)'
|
2023-01-23 08:48:37
|
<logmsgbot>
|
!log volans@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
|
2023-01-23 08:49:37
|
<logmsgbot>
|
!log volans@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
|
2023-01-23 08:49:37
|
<logmsgbot>
|
!log volans@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
|
2023-01-23 08:52:01
|
<_joe_>
|
taavi: do you know how I give +O to a user via ircservserv? The meta page has nothing, so checking before I read the sources
|
2023-01-23 08:52:41
|
<taavi>
|
_joe_: not sure, but I wouldn't be surprised if there is not an option for that atm
|
2023-01-23 08:53:53
|
<_joe_>
|
yeah https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/irc/ircservserv/+/refs/heads/master/src/channel.rs
|
2023-01-23 08:54:37
|
<_joe_>
|
so yeah I guess I'll just make sirenbot ask chanserv for permissions where needed instead
|
2023-01-23 09:07:52
|
<taavi>
|
_joe_: one option would be to grant it +t via the op rule, and then use `PRIVMSG ChanServ :TOPIC foo` instead of `TOPIC :foo` directly
|
2023-01-23 09:15:36
|
<wikibugs>
|
('CR) ''Filippo Giunchedi: [C: ''+1] Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
|
2023-01-23 09:16:12
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:16:14
|
<wikibugs>
|
('CR) ''Filippo Giunchedi: [C: ''+1] logstash: enable filters for ecs 1.11.0 [puppet] - ''https://gerrit.wikimedia.org/r/881812 (https://phabricator.wikimedia.org/T326794) (owner: ''Cwhite)'
|
2023-01-23 09:16:14
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:16:35
|
<wikibugs>
|
('CR) ''Filippo Giunchedi: [C: ''+1] conftool-data: add logstash[12]032 to kibana7 backend [puppet] - ''https://gerrit.wikimedia.org/r/881813 (owner: ''Cwhite)'
|
2023-01-23 09:17:11
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:17:13
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:19:59
|
<wikibugs>
|
('CR) ''Filippo Giunchedi: "The patch itself looks good, not +1'ing yet (I've left a comment in the task)" [puppet] - ''https://gerrit.wikimedia.org/r/881939 (https://phabricator.wikimedia.org/T318778) (owner: ''Andrea Denisse)'
|
2023-01-23 09:21:51
|
<wikibugs>
|
('PS5) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877'
|
2023-01-23 09:21:59
|
<wikibugs>
|
('CR) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
|
2023-01-23 09:26:15
|
<wikibugs>
|
('CR) ''Giuseppe Lavagetto: [C: ''-1] "I think there's a couple small mistakes but LGTM otherwise." [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
|
2023-01-23 09:29:13
|
<wikibugs>
|
('CR) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
|
2023-01-23 09:32:00
|
<wikibugs>
|
('CR) ''Hashar: [C: ''+2] wm-checks-api: fix TypeScript noImplicitAny [software/gerrit] (deploy/wmf/stable-3.5) - ''https://gerrit.wikimedia.org/r/876212 (owner: ''Hashar)'
|
2023-01-23 09:32:17
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) a:''BTullis I will pick up this ticket, since I work with Jennifer on the Data Engineering team.'
|
2023-01-23 09:32:34
|
<wikibugs>
|
('PS6) ''Clément Goubert: mediawiki: Update ecs logging to 1.11.0 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877'
|
2023-01-23 09:32:58
|
<wikibugs>
|
('Merged) ''jenkins-bot: wm-checks-api: fix TypeScript noImplicitAny [software/gerrit] (deploy/wmf/stable-3.5) - ''https://gerrit.wikimedia.org/r/876212 (owner: ''Hashar)'
|
2023-01-23 09:33:27
|
<logmsgbot>
|
!log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
|
2023-01-23 09:35:28
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis)'
|
2023-01-23 09:40:00
|
<logmsgbot>
|
!log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
|
2023-01-23 09:41:13
|
<claime>
|
btullis: <3
|
2023-01-23 09:41:55
|
<btullis>
|
claime: Thanks :-)
|
2023-01-23 09:45:57
|
<wikibugs>
|
'SRE, ''LDAP-Access-Requests: Grant Access to wmf and ops for Jennifer Ebe - https://phabricator.wikimedia.org/T327255 (''BTullis)'
|
2023-01-23 09:46:33
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis)'
|
2023-01-23 09:47:19
|
<_joe_>
|
jouncebot: nowandnext
|
2023-01-23 09:47:19
|
<jouncebot>
|
No deployments scheduled for the next 1 hour(s) and 12 minute(s)
|
2023-01-23 09:47:19
|
<jouncebot>
|
In 1 hour(s) and 12 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1100)
|
2023-01-23 09:47:27
|
<wikibugs>
|
'SRE, ''LDAP-Access-Requests: Grant Access to wmf and ops for Jennifer Ebe - https://phabricator.wikimedia.org/T327255 (''BTullis) Apologies for the confusion. This is a duplicate of {T327406} where we have collected the necessary approval.'
|
2023-01-23 09:54:39
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:54:41
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:55:41
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:55:43
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
|
2023-01-23 09:58:42
|
<Amir1>
|
jouncebot: nowandnext
|
2023-01-23 09:58:42
|
<jouncebot>
|
No deployments scheduled for the next 1 hour(s) and 1 minute(s)
|
2023-01-23 09:58:42
|
<jouncebot>
|
In 1 hour(s) and 1 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1100)
|
2023-01-23 09:58:50
|
<wikibugs>
|
('PS2) ''Ladsgroup: Remove Flow as default in techconductwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244'
|
2023-01-23 09:58:58
|
<wikibugs>
|
('CR) ''Ladsgroup: [C: ''+2] Remove Flow as default in techconductwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244 (owner: ''Ladsgroup)'
|
2023-01-23 09:59:14
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by ladsgroup@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244 (owner: ''Ladsgroup)'
|
2023-01-23 09:59:41
|
<wikibugs>
|
('Merged) ''jenkins-bot: Remove Flow as default in techconductwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/877244 (owner: ''Ladsgroup)'
|
2023-01-23 09:59:56
|
<logmsgbot>
|
!log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:877244|Remove Flow as default in techconductwiki]]
|
2023-01-23 10:01:36
|
<logmsgbot>
|
!log ladsgroup@deploy1002 ladsgroup: Backport for [[gerrit:877244|Remove Flow as default in techconductwiki]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
|
2023-01-23 10:03:01
|
<logmsgbot>
|
!log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
|
2023-01-23 10:07:48
|
<logmsgbot>
|
!log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:877244|Remove Flow as default in techconductwiki]] (duration: 07m 51s)
|
2023-01-23 10:12:51
|
<wikibugs>
|
('PS9) ''Giuseppe Lavagetto: Start using the ClusterConfig class [mediawiki-config] - ''https://gerrit.wikimedia.org/r/756016'
|
2023-01-23 10:12:55
|
<wikibugs>
|
'SRE, ''Traffic, ''Traffic-Icebox, ''WMF-General-or-Unknown, and 2 others: Pages whose title ends with semicolon (;) are intermittently inaccessible (likely due to ATS) - https://phabricator.wikimedia.org/T238285 (''Vgutierrez) since this bug was reported back in 2019, our CDN stack has changed a little b...'
|
2023-01-23 10:13:02
|
<wikibugs>
|
('CR) ''Giuseppe Lavagetto: Start using the ClusterConfig class (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/756016 (owner: ''Giuseppe Lavagetto)'
|
2023-01-23 10:16:33
|
<logmsgbot>
|
!log btullis@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
|
2023-01-23 10:17:21
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''JEbe-WMF)'
|
2023-01-23 10:18:58
|
<logmsgbot>
|
!log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
|
2023-01-23 10:21:02
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert)'
|
2023-01-23 10:21:54
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) ''Open→''In progress a:''Clement_Goubert'
|
2023-01-23 10:23:14
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''JEbe-WMF)'
|
2023-01-23 10:28:24
|
<icinga-wm>
|
PROBLEM - Check systemd state on ms-be1069 is CRITICAL: CRITICAL - degraded: The following units failed: swift_rclone_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 10:30:54
|
<wikibugs>
|
('CR) ''Hnowlan: [C: ''+1] changeprop: add liftwing revscoring streams (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 10:31:54
|
<elukey>
|
hnowlan: <3
|
2023-01-23 10:35:04
|
<wikibugs>
|
('PS1) ''Btullis: Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406)'
|
2023-01-23 10:37:08
|
<logmsgbot>
|
!log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
|
2023-01-23 10:37:35
|
<wikibugs>
|
'SRE, ''Traffic-Icebox, ''WMF-General-or-Unknown, ''Performance-Team (Radar): Disable caching on the main page for anonymous users - https://phabricator.wikimedia.org/T119366 (''Theklan) @Legoktm could you help me with this at euwiki? Thanks!'
|
2023-01-23 10:39:42
|
<logmsgbot>
|
!log btullis@deploy1002 Installing scap version "4.33.1" for 1 hosts
|
2023-01-23 10:39:52
|
<logmsgbot>
|
!log btullis@deploy1002 Installation of scap version "4.33.1" completed for 1 hosts
|
2023-01-23 10:40:05
|
<logmsgbot>
|
!log btullis@deploy1002 Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
|
2023-01-23 10:40:24
|
<logmsgbot>
|
!log btullis@deploy1002 Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
|
2023-01-23 10:40:39
|
<logmsgbot>
|
!log btullis@deploy1002 Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
|
2023-01-23 10:40:44
|
<logmsgbot>
|
!log btullis@deploy1002 Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
|
2023-01-23 10:46:32
|
<wikibugs>
|
('CR) ''Elukey: [C: ''+2] changeprop: add liftwing revscoring streams [deployment-charts] - ''https://gerrit.wikimedia.org/r/881594 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 10:48:02
|
<vgutierrez>
|
!log rolling upgrade to HAProxy 2.4.20 on ulsfo
|
2023-01-23 10:48:03
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 10:48:32
|
<wikibugs>
|
('PS8) ''Elukey: helmfile.d: add a new test workflow for Lifting to changeprop's staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302)'
|
2023-01-23 10:48:57
|
<wikibugs>
|
('CR) ''Elukey: helmfile.d: add a new test workflow for Lifting to changeprop's staging (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 10:49:28
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/870901 (https://phabricator.wikimedia.org/T325397) (owner: ''JHathaway)'
|
2023-01-23 10:49:37
|
<wikibugs>
|
'SRE-tools, ''Infrastructure-Foundations, ''netops: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (''ayounsi) p:''Triage→''Low'
|
2023-01-23 10:49:52
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
|
2023-01-23 10:50:40
|
<wikibugs>
|
('PS1) ''Gerrit maintenance bot: mariadb: Promote db2129 to s6 master [puppet] - ''https://gerrit.wikimedia.org/r/882260 (https://phabricator.wikimedia.org/T327644)'
|
2023-01-23 10:52:05
|
<wikibugs>
|
('PS1) ''Btullis: Enable the two new cache types in superset production [puppet] - ''https://gerrit.wikimedia.org/r/882599 (https://phabricator.wikimedia.org/T323458)'
|
2023-01-23 10:52:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 10:54:11
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
|
2023-01-23 10:54:15
|
<stashbot>
|
T327644: Switchover s6 master (db2114 -> db2129) - https://phabricator.wikimedia.org/T327644
|
2023-01-23 10:54:16
|
<wikibugs>
|
('PS2) ''Btullis: Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406)'
|
2023-01-23 10:54:40
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
|
2023-01-23 10:54:41
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406) (owner: ''Btullis)'
|
2023-01-23 10:55:21
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db2129 with weight 0 T327644', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
|
2023-01-23 10:55:37
|
<XioNoX>
|
!log update management routers ACLs to add new bast hosts
|
2023-01-23 10:55:38
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 10:56:03
|
<wikibugs>
|
('PS2) ''Jbond: prometheus: decode utf-8 in puppet agent script [puppet] - ''https://gerrit.wikimedia.org/r/879957 (owner: ''Majavah)'
|
2023-01-23 10:56:20
|
<wikibugs>
|
('CR) ''Cathal Mooney: [C: ''+1] "LGTM!" [homer/public] - ''https://gerrit.wikimedia.org/r/881869 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
|
2023-01-23 10:56:27
|
<wikibugs>
|
('PS3) ''Btullis: Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406)'
|
2023-01-23 10:56:39
|
<wikibugs>
|
('CR) ''Btullis: [C: ''+2] Enable the two new cache types in superset production [puppet] - ''https://gerrit.wikimedia.org/r/882599 (https://phabricator.wikimedia.org/T323458) (owner: ''Btullis)'
|
2023-01-23 10:56:44
|
<wikibugs>
|
('CR) ''Cathal Mooney: [C: ''+1] "LGTM!" [homer/public] - ''https://gerrit.wikimedia.org/r/881837 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
|
2023-01-23 10:57:55
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+2] "lgtm will merge thanks" [puppet] - ''https://gerrit.wikimedia.org/r/879957 (owner: ''Majavah)'
|
2023-01-23 10:57:58
|
<wikibugs>
|
'SRE-tools, ''Infrastructure-Foundations, ''netops: Add network devices fingerprints to known_hosts - https://phabricator.wikimedia.org/T327643 (''Volans) This is a draft of a possible one-off script that can be run within homer's venv to gather the FQDNs to test, attempt a connection and grab the fingerpri...'
|
2023-01-23 11:00:05
|
<jouncebot>
|
Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1100)
|
2023-01-23 11:01:33
|
<wikibugs>
|
('CR) ''Btullis: [C: ''+2] Grant production shell access to Jennifer Ebe [puppet] - ''https://gerrit.wikimedia.org/r/882596 (https://phabricator.wikimedia.org/T327406) (owner: ''Btullis)'
|
2023-01-23 11:01:42
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations, ''serviceops-collab, ''CAS-SSO, ''GitLab (Auth & Access): migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (''jbond) fyi we now have OIDC support in production, currently been tested by
@SLyngshede-WMF'
|
2023-01-23 11:01:54
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+2] Move ping offload from ping2002 to ping2003 in codfw [homer/public] - ''https://gerrit.wikimedia.org/r/881837 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
|
2023-01-23 11:04:57
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
|
2023-01-23 11:07:45
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm ping me on irc (after lunch as catching up on things) and i can deploy" [puppet] - ''https://gerrit.wikimedia.org/r/875315 (https://phabricator.wikimedia.org/T326125) (owner: ''Hashar)'
|
2023-01-23 11:07:52
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) [] Merge access grant [] Create kerberos principal'
|
2023-01-23 11:08:50
|
<wikibugs>
|
('CR) ''Clément Goubert: "This change is ready for review." [puppet] - ''https://gerrit.wikimedia.org/r/882600 (https://phabricator.wikimedia.org/T327187) (owner: ''Clément Goubert)'
|
2023-01-23 11:10:11
|
<jinxer-wm>
|
(Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
|
2023-01-23 11:11:36
|
<wikibugs>
|
('CR) ''Jbond: hieradata: add wmcs-roots to clouddumps servers (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879274 (owner: ''Majavah)'
|
2023-01-23 11:11:39
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 11:11:41
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
|
2023-01-23 11:11:48
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
|
2023-01-23 11:11:51
|
<stashbot>
|
T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
|
2023-01-23 11:12:20
|
<icinga-wm>
|
PROBLEM - cassandra-a SSL 10.192.32.101:7001 on sessionstore2002 is CRITICAL: SSL CRITICAL - Certificate sessionstore2002-a valid until 2023-02-22 11:12:16 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 11:12:24
|
<icinga-wm>
|
PROBLEM - cassandra-a SSL 10.192.16.95:7001 on sessionstore2001 is CRITICAL: SSL CRITICAL - Certificate sessionstore2001-a valid until 2023-02-22 11:12:13 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 11:12:40
|
<icinga-wm>
|
PROBLEM - cassandra-a SSL 10.64.32.85:7001 on sessionstore1002 is CRITICAL: SSL CRITICAL - Certificate sessionstore1002-a valid until 2023-02-22 11:12:08 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 11:12:50
|
<icinga-wm>
|
PROBLEM - cassandra-a SSL 10.64.48.178:7001 on sessionstore1003 is CRITICAL: SSL CRITICAL - Certificate sessionstore1003-a valid until 2023-02-22 11:12:10 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 11:13:36
|
<icinga-wm>
|
PROBLEM - cassandra-a SSL 10.64.0.144:7001 on sessionstore1001 is CRITICAL: SSL CRITICAL - Certificate sessionstore1001-a valid until 2023-02-22 11:12:05 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 11:13:48
|
<icinga-wm>
|
PROBLEM - cassandra-a SSL 10.192.48.132:7001 on sessionstore2003 is CRITICAL: SSL CRITICAL - Certificate sessionstore2003-a valid until 2023-02-22 11:12:18 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 11:15:12
|
<wikibugs>
|
('PS1) ''Vgutierrez: acme-chief: Restrict challenge type to valid ones [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942)'
|
2023-01-23 11:16:21
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) ''Open→''In progress a:''Clement_Goubert Hi @Kappakayala, Please read and sign the [[ https://phabricator.wikimedia.org/L3 | Acknowledgement of Wikimed...'
|
2023-01-23 11:16:38
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert)'
|
2023-01-23 11:16:41
|
<wikibugs>
|
('CR) ''Vgutierrez: [V: ''+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39202/console"; [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942)
(owner: ''Vgutierrez)'
|
2023-01-23 11:17:47
|
<Amir1>
|
!log Starting s6 codfw failover from db2114 to db2129 - T327644
|
2023-01-23 11:17:50
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 11:17:51
|
<stashbot>
|
T327644: Switchover s6 master (db2114 -> db2129) - https://phabricator.wikimedia.org/T327644
|
2023-01-23 11:18:13
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db2129 to s6 primary T327644', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
|
2023-01-23 11:18:30
|
<wikibugs>
|
('PS2) ''Ladsgroup: mariadb: Promote db2129 to s6 master [puppet] - ''https://gerrit.wikimedia.org/r/882260 (https://phabricator.wikimedia.org/T327644) (owner: ''Gerrit maintenance bot)'
|
2023-01-23 11:18:42
|
<wikibugs>
|
('CR) ''Ladsgroup: [V: ''+2 C: ''+2] mariadb: Promote db2129 to s6 master [puppet] - ''https://gerrit.wikimedia.org/r/882260 (https://phabricator.wikimedia.org/T327644) (owner: ''Gerrit maintenance bot)'
|
2023-01-23 11:19:14
|
<wikibugs>
|
('PS1) ''Cathal Mooney: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605'
|
2023-01-23 11:19:31
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
|
2023-01-23 11:20:02
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
|
2023-01-23 11:21:35
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db2114 T327644', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
|
2023-01-23 11:22:31
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 11:22:33
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 11:22:38
|
<wikibugs>
|
('PS2) ''Cathal Mooney: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605'
|
2023-01-23 11:23:48
|
<wikibugs>
|
('PS3) ''Cathal Mooney: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605'
|
2023-01-23 11:24:48
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+1] "Looks good!" [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
|
2023-01-23 11:24:59
|
<wikibugs>
|
('CR) ''Vgutierrez: [V: ''+1 C: ''+2] acme-chief: Restrict challenge type to valid ones [puppet] - ''https://gerrit.wikimedia.org/r/882602 (https://phabricator.wikimedia.org/T326942) (owner: ''Vgutierrez)'
|
2023-01-23 11:27:35
|
<wikibugs>
|
('PS1) ''Giuseppe Lavagetto: flink-app: use proper json [deployment-charts] - ''https://gerrit.wikimedia.org/r/882612'
|
2023-01-23 11:28:14
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering, ''Patch-For-Review: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) I have merged the changes to `data.yaml` so Jennifer should now have production shell access and access to the...'
|
2023-01-23 11:28:18
|
<wikibugs>
|
('PS1) ''Clément Goubert: admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172)'
|
2023-01-23 11:28:55
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) ''Open→''In progress a:''Clement_Goubert'
|
2023-01-23 11:29:33
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) [] OOB SSH key validation [] Merge access grant [] Create kerberos principal'
|
2023-01-23 11:31:37
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 11:31:39
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 11:32:50
|
<wikibugs>
|
('CR) ''Michael Große: [C: ''-1] "This change should only be deployed after it was greenlit by the wmde-internal stage-gate meeting (scheduled on Tuesday)" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882615 (https://phabricator.wikimedia.org/T324999) (owner: ''Michael Große)'
|
2023-01-23 11:33:34
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) ''In progress→''Resolved'
|
2023-01-23 11:33:59
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) The work is completed. I'll work with @JEbe-WMF to verify access.'
|
2023-01-23 11:34:21
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Data-Engineering: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (''BTullis) ''Open→''Resolved p:''Triage→''Medium'
|
2023-01-23 11:35:07
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
|
2023-01-23 11:35:36
|
<icinga-wm>
|
ACKNOWLEDGEMENT - Check systemd state on ms-be1069 is CRITICAL: CRITICAL - degraded: The following units failed: swift_rclone_sync.service MVernon https://phabricator.wikimedia.org/T327253 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 11:35:39
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/881598 (owner: ''Muehlenhoff)'
|
2023-01-23 11:37:14
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) ''Open→''In progress a:''Clement_Goubert'
|
2023-01-23 11:37:16
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) @odimitrijevic @Ottomata Can I get your approval on this please?'
|
2023-01-23 11:40:32
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users & analytics-product-users for Hxi-ctr - https://phabricator.wikimedia.org/T325004 (''Clement_Goubert) @HXi-WMF, could you please confirm that we can proceed with the account renaming?'
|
2023-01-23 11:45:49
|
<wikibugs>
|
('CR) ''Ayounsi: [C: ''+1] Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605 (owner: ''Cathal Mooney)'
|
2023-01-23 11:47:00
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [software/pywmflib] - ''https://gerrit.wikimedia.org/r/881649 (https://phabricator.wikimedia.org/T327408) (owner: ''Volans)'
|
2023-01-23 11:47:10
|
<wikibugs>
|
('CR) ''Cathal Mooney: [C: ''+2] Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605 (owner: ''Cathal Mooney)'
|
2023-01-23 11:47:48
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [software/pywmflib] - ''https://gerrit.wikimedia.org/r/881650 (owner: ''Volans)'
|
2023-01-23 11:48:18
|
<wikibugs>
|
('Merged) ''jenkins-bot: Remove atlas-ulsfo from cr-border-in.pol as it's not live [homer/public] - ''https://gerrit.wikimedia.org/r/882605 (owner: ''Cathal Mooney)'
|
2023-01-23 11:48:57
|
<wikibugs>
|
('CR) ''Clément Goubert: admin/canary_appserver: add group of users allowed to disable puppet (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
|
2023-01-23 11:50:50
|
<wikibugs>
|
('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881386 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
|
2023-01-23 11:51:06
|
<wikibugs>
|
('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881393 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
|
2023-01-23 11:51:21
|
<wikibugs>
|
('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881399 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
|
2023-01-23 11:51:40
|
<wikibugs>
|
('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881393 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
|
2023-01-23 11:51:57
|
<wikibugs>
|
('CR) ''ArielGlenn: [C: ''+1] "LGTM but per irc conversation WMCS folks should really give the thumbs up" [puppet] - ''https://gerrit.wikimedia.org/r/881413 (https://phabricator.wikimedia.org/T135991) (owner: ''Muehlenhoff)'
|
2023-01-23 11:52:29
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/868703 (https://phabricator.wikimedia.org/T308013) (owner: ''Muehlenhoff)'
|
2023-01-23 11:55:35
|
<wikibugs>
|
'SRE, ''LDAP-Access-Requests: Grant Access to Wmf group for MShilova - https://phabricator.wikimedia.org/T327546 (''Clement_Goubert) ''Open→''In progress a:''Clement_Goubert'
|
2023-01-23 11:56:10
|
<wikibugs>
|
'SRE-swift-storage: >=27k objects listed in swift containers but not extant - https://phabricator.wikimedia.org/T327253 (''MatthewVernon) The timer job ran this morning, with our less picky settings, and ended thus: ` [...] Jan 23 10:24:35 ms-be1069 swift-rclone-sync[1539164]: ERROR : wikipedia-de-local-public....'
|
2023-01-23 11:57:06
|
<marostegui>
|
!log Reboot db2132 (m1 codfw master)
|
2023-01-23 11:57:08
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 11:57:29
|
<wikibugs>
|
('CR) ''Jbond: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 11:57:49
|
<marostegui>
|
!log dbmaint Reboot db2132 (m1 codfw master)
|
2023-01-23 11:57:51
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 11:58:22
|
<marostegui>
|
!log dbmaint Reboot db2133 (m2 codfw master)
|
2023-01-23 11:58:23
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 11:59:33
|
<wikibugs>
|
('CR) ''Majavah: hieradata: add wmcs-roots to clouddumps servers (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879274 (owner: ''Majavah)'
|
2023-01-23 12:00:13
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
|
2023-01-23 12:00:16
|
<stashbot>
|
T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
|
2023-01-23 12:03:15
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2001 is CRITICAL: CRITICAL check_failover servers up 2 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:03:24
|
<wikibugs>
|
('CR) ''Jbond: openstack: encapi: create parent directories for files (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881711 (owner: ''Majavah)'
|
2023-01-23 12:04:15
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2002 is CRITICAL: CRITICAL check_failover servers up 2 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:05:08
|
<Emperor>
|
!log removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
|
2023-01-23 12:05:09
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 12:06:21
|
<wikibugs>
|
('CR) ''Clément Goubert: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 12:06:39
|
<marostegui>
|
!log dbmaint Reboot db2134 (m3 codfw master)
|
2023-01-23 12:06:40
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 12:06:42
|
<marostegui>
|
!log dbmaint Reboot db2135 (m5 codfw master)
|
2023-01-23 12:06:43
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 12:07:27
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2002 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:08:05
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2001 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:08:43
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 12:08:45
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 12:10:08
|
<wikibugs>
|
('CR) ''Clément Goubert: Fix xihua's account (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 12:10:09
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2004 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:11:18
|
<wikibugs>
|
('PS2) ''Clément Goubert: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 12:11:44
|
<wikibugs>
|
('CR) ''Jbond: P:gitlab: manage gitlab with gitlab module (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/684487 (owner: ''Jbond)'
|
2023-01-23 12:11:45
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2004 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:11:52
|
<wikibugs>
|
('Abandoned) ''Jbond: P:gitlab: manage gitlab with gitlab module [puppet] - ''https://gerrit.wikimedia.org/r/684487 (owner: ''Jbond)'
|
2023-01-23 12:12:02
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 12:15:19
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
|
2023-01-23 12:22:25
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2003 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:22:31
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2001 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:22:37
|
<wikibugs>
|
('PS5) ''Daniel Kinzler: Increase PC writes from parsoid API to 10% [mediawiki-config] - ''https://gerrit.wikimedia.org/r/868127 (https://phabricator.wikimedia.org/T320534)'
|
2023-01-23 12:22:59
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2004 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:22:59
|
<wikibugs>
|
'SRE-swift-storage: >=27k objects listed in swift containers but not extant - https://phabricator.wikimedia.org/T327253 (''MatthewVernon) Picking one of those to go log-diving (via the hacky `sudo cumin O:swift::proxy 'grep Symbol_Limes.png /var/log/swift/proxy-access.log || true'`) gets 3 hits, one of which is...'
|
2023-01-23 12:23:33
|
<icinga-wm>
|
PROBLEM - haproxy failover on dbproxy2002 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:24:01
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2003 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:24:07
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2001 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:24:07
|
<wikibugs>
|
('PS1) ''Clément Goubert: admin: Add Mariya Shilova to ldap_only_users [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546)'
|
2023-01-23 12:24:35
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2004 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:25:09
|
<icinga-wm>
|
RECOVERY - haproxy failover on dbproxy2002 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
|
2023-01-23 12:30:26
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
|
2023-01-23 12:31:09
|
<wikibugs>
|
'SRE, ''Acme-chief, ''Traffic: Ci check for acme-chief changes - https://phabricator.wikimedia.org/T326942 (''Vgutierrez) p:''Triage→''Low this has been mitigated by https://gerrit.wikimedia.org/r/882602, invalid challenge types
will now trigger a puppet compilation failure'
|
2023-01-23 12:36:07
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] ci: move lists of contint and zuul hosts to hieradata/common.yaml (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/850593 (owner: ''Dzahn)'
|
2023-01-23 12:38:38
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) p:''Triage→''Medium'
|
2023-01-23 12:38:41
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) p:''Triage→''Medium'
|
2023-01-23 12:38:52
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) p:''Triage→''Medium'
|
2023-01-23 12:39:35
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) p:''Triage→''Medium'
|
2023-01-23 12:41:23
|
<wikibugs>
|
'SRE, ''LDAP-Access-Requests, ''Patch-For-Review: Grant Access to Wmf group for MShilova - https://phabricator.wikimedia.org/T327546 (''Clement_Goubert) p:''Triage→''Medium [] Merge CR [] Grant LDAP group access'
|
2023-01-23 12:43:10
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to WMF Production for Kavitha Appakayala - https://phabricator.wikimedia.org/T327450 (''Clement_Goubert) ''Resolved→''In progress'
|
2023-01-23 12:43:38
|
<wikibugs>
|
('CR) ''Jbond: "thanks" [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 12:45:32
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
|
2023-01-23 12:45:36
|
<stashbot>
|
T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
|
2023-01-23 12:45:47
|
<wikibugs>
|
('CR) ''Hnowlan: [C: ''+1] "lgtm, one query" [software/thumbor-plugins] - ''https://gerrit.wikimedia.org/r/881909 (https://phabricator.wikimedia.org/T325811) (owner: ''Vlad.shapik)'
|
2023-01-23 12:50:42
|
<wikibugs>
|
('CR) ''Jelto: [C: ''+2] gitlab: stop using "latest" backup name [puppet] - ''https://gerrit.wikimedia.org/r/875309 (https://phabricator.wikimedia.org/T274463) (owner: ''Jelto)'
|
2023-01-23 12:51:11
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882600 (https://phabricator.wikimedia.org/T327187) (owner: ''Clément Goubert)'
|
2023-01-23 12:51:53
|
<wikibugs>
|
('PS2) ''Majavah: ldap: move ssh-key-ldap-lookup directly to ssh module [puppet] - ''https://gerrit.wikimedia.org/r/877964'
|
2023-01-23 12:52:38
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172) (owner: ''Clément Goubert)'
|
2023-01-23 12:53:00
|
<wikibugs>
|
('CR) ''Majavah: ldap: move ssh-key-ldap-lookup directly to ssh module (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 12:53:17
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] ldap: move ssh-key-ldap-lookup directly to ssh module [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 12:53:27
|
<wikibugs>
|
'SRE, ''observability, ''Performance-Team (Radar): Set up a statsv-like endpoint for Prometheus - https://phabricator.wikimedia.org/T180105 (''Clement_Goubert)'
|
2023-01-23 12:53:29
|
<wikibugs>
|
'SRE: Update Media dashboard in Grafana to use Prometheus metrics - https://phabricator.wikimedia.org/T193445 (''Clement_Goubert) ''Open→''Invalid The link in the task description 404s. Being bold and closing as Invalid, feel free to reopen with up to date information if needed.'
|
2023-01-23 12:55:46
|
<wikibugs>
|
('CR) ''Jbond: [C: ''-1] "i have added this as approval in the next IF meeting (today) will update after the meeting" [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
|
2023-01-23 12:55:48
|
<wikibugs>
|
('CR) ''Clément Goubert: [C: ''+2] admin: Grant ollieshotton access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882600 (https://phabricator.wikimedia.org/T327187) (owner: ''Clément Goubert)'
|
2023-01-23 12:56:38
|
<wikibugs>
|
('PS3) ''Majavah: ldap: move ssh-key-ldap-lookup directly to ssh module [puppet] - ''https://gerrit.wikimedia.org/r/877964'
|
2023-01-23 12:58:17
|
<wikibugs>
|
('CR) ''Majavah: [V: ''+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39204/console"; [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 12:59:47
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) ''In progress→''Resolved @Ollie.Shotton_WMDE your access to the relevant groups has been granted. Please wait 30m (as of this
comment) be...'
|
2023-01-23 13:02:03
|
<wikibugs>
|
('PS2) ''Clément Goubert: admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172)'
|
2023-01-23 13:03:39
|
<wikibugs>
|
('CR) ''Jbond: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 13:04:52
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+1] "lgtm" [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546) (owner: ''Clément Goubert)'
|
2023-01-23 13:04:56
|
<wikibugs>
|
('PS2) ''Clément Goubert: admin: Add Mariya Shilova to ldap_only_users [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546)'
|
2023-01-23 13:06:34
|
<wikibugs>
|
('CR) ''Brian Wolff: Force users with passwords shorter than 8 characters to change it (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882232 (https://phabricator.wikimedia.org/T285151) (owner: ''Zabe)'
|
2023-01-23 13:06:56
|
<wikibugs>
|
('CR) ''Clément Goubert: [C: ''+2] admin: Add Mariya Shilova to ldap_only_users [puppet] - ''https://gerrit.wikimedia.org/r/882644 (https://phabricator.wikimedia.org/T327546) (owner: ''Clément Goubert)'
|
2023-01-23 13:08:58
|
<wikibugs>
|
('CR) ''Jbond: [V: ''+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39207/console"; [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 13:09:43
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+2] "LGTM will merge thanks <3" [puppet] - ''https://gerrit.wikimedia.org/r/877964 (owner: ''Majavah)'
|
2023-01-23 13:14:55
|
<wikibugs>
|
('CR) ''Jaime Nuche: "PCC: https://puppet-compiler.wmflabs.org/output/860837/39206/"; [puppet] - ''https://gerrit.wikimedia.org/r/860837 (https://phabricator.wikimedia.org/T323909) (owner: ''Jaime Nuche)'
|
2023-01-23 13:15:50
|
<wikibugs>
|
('PS3) ''Clément Goubert: admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172)'
|
2023-01-23 13:16:17
|
<wikibugs>
|
'SRE, ''LDAP-Access-Requests, ''Patch-For-Review: Grant Access to Wmf group for MShilova - https://phabricator.wikimedia.org/T327546 (''Clement_Goubert) ''In progress→''Resolved @MShilova_WMF your access to the wmf group has been granted. Please wait 30m (as of this comment) before trying it out as
the...'
|
2023-01-23 13:16:24
|
<wikibugs>
|
('PS4) ''Jbond: profile::performance: add a new profile for tweaking sysctl parameters [puppet] - ''https://gerrit.wikimedia.org/r/662932 (https://phabricator.wikimedia.org/T274230)'
|
2023-01-23 13:16:43
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] profile::performance: add a new profile for tweaking sysctl parameters [puppet] - ''https://gerrit.wikimedia.org/r/662932 (https://phabricator.wikimedia.org/T274230) (owner: ''Jbond)'
|
2023-01-23 13:17:25
|
<wikibugs>
|
('CR) ''Jbond: profile::performance: add a new profile for tweaking sysctl parameters (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/662932 (https://phabricator.wikimedia.org/T274230) (owner: ''Jbond)'
|
2023-01-23 13:18:36
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Ollie_Shotton - https://phabricator.wikimedia.org/T327187 (''Clement_Goubert) You should have received an email regarding Kerberos, you can follow the instructions on there to set your credentials. If you didn't, please...'
|
2023-01-23 13:19:30
|
<wikibugs>
|
'SRE, ''Traffic-Icebox, ''Patch-For-Review, ''User-MoritzMuehlenhoff: Create a generic network performance profile - https://phabricator.wikimedia.org/T274230 (''jbond) @BCornwall thanks for reviving this. i think that this ultimately stalled as there was a questions of wether it would be usefull. from...'
|
2023-01-23 13:20:17
|
<wikibugs>
|
('CR) ''Clément Goubert: [C: ''+2] admin: Grant Muhammad Jaziraly access to analytics data [puppet] - ''https://gerrit.wikimedia.org/r/882613 (https://phabricator.wikimedia.org/T327172) (owner: ''Clément Goubert)'
|
2023-01-23 13:22:29
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Clement_Goubert) ''In progress→''Resolved @Muhammad_Yasser_Jazirahly_WMDE your access to the relevant groups has been granted. Please wait 30m
(as of...'
|
2023-01-23 13:23:08
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytics Data for Muhammad Jaziraly - https://phabricator.wikimedia.org/T327172 (''Muhammad_Yasser_Jazirahly_WMDE) Many thanks @Clement_Goubert'
|
2023-01-23 13:28:23
|
<icinga-wm>
|
RECOVERY - Check systemd state on grafana1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 13:32:46
|
<wikibugs>
|
'SRE, ''Acme-chief, ''Traffic: Ci check for acme-chief changes - https://phabricator.wikimedia.org/T326942 (''Vgutierrez) ''Open→''Resolved a:''Vgutierrez'
|
2023-01-23 13:40:07
|
<wikibugs>
|
('CR) ''Ottomata: [C: ''+2] flink-app: use proper json [deployment-charts] - ''https://gerrit.wikimedia.org/r/882612 (owner: ''Giuseppe Lavagetto)'
|
2023-01-23 13:45:13
|
<wikibugs>
|
('Merged) ''jenkins-bot: flink-app: use proper json [deployment-charts] - ''https://gerrit.wikimedia.org/r/882612 (owner: ''Giuseppe Lavagetto)'
|
2023-01-23 13:57:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 14:00:05
|
<jouncebot>
|
RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1400)
|
2023-01-23 14:00:05
|
<jouncebot>
|
sbailey, cirno, and Func: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
|
2023-01-23 14:00:10
|
<cirno>
|
o/
|
2023-01-23 14:00:18
|
<sbailey>
|
I am here
|
2023-01-23 14:00:42
|
<Func>
|
here
|
2023-01-23 14:01:24
|
<Winston_Sung[m]>
|
Hello, I would like to ask where to see the deployment status for the CX Server https://cxserver.wikimedia.org ( https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/cxserver ) for the bug fix https://gerrit.wikimedia.org/r/c/882173 ( https://phabricator.wikimedia.org/T129470 ). Thanks.
|
2023-01-23 14:01:46
|
<wikibugs>
|
('PS1) ''Jbond: admin: data_tests improve error messages and correct typos [puppet] - ''https://gerrit.wikimedia.org/r/882648'
|
2023-01-23 14:02:11
|
<wikibugs>
|
('PS1) ''Ottomata: Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649'
|
2023-01-23 14:03:30
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+2] admin: data_tests improve error messages and correct typos [puppet] - ''https://gerrit.wikimedia.org/r/882648 (owner: ''Jbond)'
|
2023-01-23 14:05:12
|
<wikibugs>
|
('PS3) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:05:50
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:07:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 14:09:39
|
<wikibugs>
|
('PS4) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:10:11
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:10:13
|
<taavi>
|
I can deploy in a few minutes
|
2023-01-23 14:10:30
|
<sbailey>
|
10-4
|
2023-01-23 14:10:35
|
<sbailey>
|
;-)
|
2023-01-23 14:12:27
|
<taavi>
|
Winston_Sung[m]: operations/deployment-charts.git
|
2023-01-23 14:12:58
|
<wikibugs>
|
('CR) ''Majavah: [C: ''+2] SpecialUserrights: Allow updating the expiry of user groups [core] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882179 (https://phabricator.wikimedia.org/T327605) (owner: ''Func)'
|
2023-01-23 14:14:25
|
<Winston_Sung[m]>
|
taavi: Thanks.
|
2023-01-23 14:14:29
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131) (owner: ''Stang)'
|
2023-01-23 14:14:33
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
|
2023-01-23 14:14:40
|
<wikibugs>
|
('CR) ''Elukey: [C: ''+2] helmfile.d: add a new test workflow for Lifting to changeprop's staging [deployment-charts] - ''https://gerrit.wikimedia.org/r/881664 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 14:14:49
|
<wikibugs>
|
('PS2) ''Majavah: shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
|
2023-01-23 14:14:53
|
<wikibugs>
|
('CR) ''Majavah: [C: ''+2] shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
|
2023-01-23 14:15:18
|
<wikibugs>
|
('Merged) ''jenkins-bot: bnwikiquote: Update logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882422 (https://phabricator.wikimedia.org/T323131) (owner: ''Stang)'
|
2023-01-23 14:15:39
|
<wikibugs>
|
('Merged) ''jenkins-bot: shnwikibooks: Add project logo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882425 (https://phabricator.wikimedia.org/T327380) (owner: ''Stang)'
|
2023-01-23 14:15:57
|
<taavi>
|
sbailey: was 880989 tested in beta or smaller wikis before being rolled out globally?
|
2023-01-23 14:16:04
|
<sbailey>
|
yes
|
2023-01-23 14:16:17
|
<logmsgbot>
|
!log taavi@deploy1002 Started scap: Backport for [[gerrit:882422|bnwikiquote: Update logo (T323131)]], [[gerrit:882425|shnwikibooks: Add project logo (T327380)]]
|
2023-01-23 14:16:24
|
<stashbot>
|
T323131: New localized logo for bn.wikquote - https://phabricator.wikimedia.org/T323131
|
2023-01-23 14:16:24
|
<stashbot>
|
T327380: Change Logo on shn.wikibooks.org - https://phabricator.wikimedia.org/T327380
|
2023-01-23 14:17:01
|
<wikibugs>
|
('CR) ''Alexandros Kosiaris: Fix xihua's account (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:17:56
|
<logmsgbot>
|
!log taavi@deploy1002 taavi and stang: Backport for [[gerrit:882422|bnwikiquote: Update logo (T323131)]], [[gerrit:882425|shnwikibooks: Add project logo (T327380)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
|
2023-01-23 14:18:06
|
<taavi>
|
cirno: please test the logo patches
|
2023-01-23 14:18:10
|
<cirno>
|
looking
|
2023-01-23 14:18:55
|
<taavi>
|
!log mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # T326387
|
2023-01-23 14:18:57
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 14:18:58
|
<stashbot>
|
T326387: Deploy PageAssessments to Chinese Wikipedia - https://phabricator.wikimedia.org/T326387
|
2023-01-23 14:19:35
|
<cirno>
|
taavi, both two looks good to me
|
2023-01-23 14:19:42
|
<taavi>
|
thanks, syncing
|
2023-01-23 14:19:46
|
<cirno>
|
*look
|
2023-01-23 14:20:05
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
|
2023-01-23 14:20:49
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
|
2023-01-23 14:22:10
|
<Winston_Sung[m]>
|
Is there any scheduled time to update cxserver or it depends on request?
|
2023-01-23 14:23:51
|
<wikibugs>
|
('PS5) ''Alexandros Kosiaris: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004)'
|
2023-01-23 14:24:28
|
<wikibugs>
|
('PS4) ''Majavah: zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387) (owner: ''Stang)'
|
2023-01-23 14:24:30
|
<wikibugs>
|
('PS6) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:24:32
|
<wikibugs>
|
('PS1) ''Jbond: admin: Add check for duplicate uid's [puppet] - ''https://gerrit.wikimedia.org/r/882652'
|
2023-01-23 14:24:34
|
<wikibugs>
|
('CR) ''Majavah: [C: ''+2] zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387) (owner: ''Stang)'
|
2023-01-23 14:24:53
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+2] Move ping offload from ping1002 to ping1003 in eqiad [homer/public] - ''https://gerrit.wikimedia.org/r/881869 (https://phabricator.wikimedia.org/T273509) (owner: ''Muehlenhoff)'
|
2023-01-23 14:25:10
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:25:12
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] admin: Add check for duplicate uid's [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
|
2023-01-23 14:25:16
|
<wikibugs>
|
('Merged) ''jenkins-bot: zhwiki: Install PageAssessments [mediawiki-config] - ''https://gerrit.wikimedia.org/r/876196 (https://phabricator.wikimedia.org/T326387) (owner: ''Stang)'
|
2023-01-23 14:25:23
|
<logmsgbot>
|
!log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
|
2023-01-23 14:25:34
|
<logmsgbot>
|
!log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
|
2023-01-23 14:25:39
|
<logmsgbot>
|
!log taavi@deploy1002 Finished scap: Backport for [[gerrit:882422|bnwikiquote: Update logo (T323131)]], [[gerrit:882425|shnwikibooks: Add project logo (T327380)]] (duration: 09m 22s)
|
2023-01-23 14:25:44
|
<stashbot>
|
T323131: New localized logo for bn.wikquote - https://phabricator.wikimedia.org/T323131
|
2023-01-23 14:25:44
|
<stashbot>
|
T327380: Change Logo on shn.wikibooks.org - https://phabricator.wikimedia.org/T327380
|
2023-01-23 14:25:58
|
<logmsgbot>
|
!log taavi@deploy1002 Started scap: Backport for [[gerrit:876196|zhwiki: Install PageAssessments (T326387)]]
|
2023-01-23 14:26:01
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
|
2023-01-23 14:26:01
|
<stashbot>
|
T326387: Deploy PageAssessments to Chinese Wikipedia - https://phabricator.wikimedia.org/T326387
|
2023-01-23 14:26:09
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
|
2023-01-23 14:27:29
|
<wikibugs>
|
('CR) ''Alexandros Kosiaris: [C: ''-1] "Interestingly, I can not re-use the same uid (which actually makes sense) but also the fact that hpham and phamhi (both absented) have uid" [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:27:38
|
<logmsgbot>
|
!log taavi@deploy1002 stang and taavi: Backport for [[gerrit:876196|zhwiki: Install PageAssessments (T326387)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
|
2023-01-23 14:27:50
|
<taavi>
|
cirno: please test the pageassessments one
|
2023-01-23 14:27:58
|
<cirno>
|
looking
|
2023-01-23 14:30:16
|
<wikibugs>
|
('Merged) ''jenkins-bot: SpecialUserrights: Allow updating the expiry of user groups [core] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882179 (https://phabricator.wikimedia.org/T327605) (owner: ''Func)'
|
2023-01-23 14:31:12
|
<cirno>
|
taavi, the magic word "{{#assessment}}" starts working, and special page Special:PageAssessments exist, so LGTM
|
2023-01-23 14:31:32
|
<taavi>
|
thanks, syncing
|
2023-01-23 14:32:27
|
<cirno>
|
taavi, could you please flush the caches of two logos? thanks
|
2023-01-23 14:32:49
|
<taavi>
|
oh right, good point. give me a second
|
2023-01-23 14:33:00
|
<wikibugs>
|
('CR) ''Hashar: [C: ''-1] "Looks good, there is two minor issues though:" [puppet] - ''https://gerrit.wikimedia.org/r/860837 (https://phabricator.wikimedia.org/T323909) (owner: ''Jaime Nuche)'
|
2023-01-23 14:33:15
|
<cirno>
|
oops, only bnwikiquote is needed
|
2023-01-23 14:34:01
|
<taavi>
|
{{done}}
|
2023-01-23 14:36:03
|
<sbailey>
|
Is 880989 getting deployed? it have been in beta for over a month?
|
2023-01-23 14:36:43
|
<taavi>
|
sbailey: yes, I'm dealing with other patches at the moment, yours is still in the queue
|
2023-01-23 14:36:57
|
<wikibugs>
|
('PS1) ''Elukey: changeprop: fix uri in liftwing's template [deployment-charts] - ''https://gerrit.wikimedia.org/r/882654 (https://phabricator.wikimedia.org/T327302)'
|
2023-01-23 14:37:08
|
<sbailey>
|
thx, new to backport proces
|
2023-01-23 14:37:22
|
<logmsgbot>
|
!log taavi@deploy1002 Finished scap: Backport for [[gerrit:876196|zhwiki: Install PageAssessments (T326387)]] (duration: 11m 24s)
|
2023-01-23 14:37:26
|
<stashbot>
|
T326387: Deploy PageAssessments to Chinese Wikipedia - https://phabricator.wikimedia.org/T326387
|
2023-01-23 14:37:36
|
<taavi>
|
Func: yours is up next
|
2023-01-23 14:37:45
|
<sbailey>
|
:)
|
2023-01-23 14:37:49
|
<Func>
|
ok
|
2023-01-23 14:37:52
|
<logmsgbot>
|
!log taavi@deploy1002 Started scap: Backport for [[gerrit:882179|SpecialUserrights: Allow updating the expiry of user groups (T327605)]]
|
2023-01-23 14:37:55
|
<stashbot>
|
T327605: Special:UserRights: changing an already set permission's expiry to any new value fails - https://phabricator.wikimedia.org/T327605
|
2023-01-23 14:39:30
|
<logmsgbot>
|
!log taavi@deploy1002 taavi and func: Backport for [[gerrit:882179|SpecialUserrights: Allow updating the expiry of user groups (T327605)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
|
2023-01-23 14:39:39
|
<taavi>
|
sbailey: ah, sorry. we deployers generally change the order to be as quick as possible. I had to ask a question about 882179 so I couldn't start with it, and I had already +2'd F.unc's patch to save time on the core CI and it had merged in the mean time so I need to do that before I can get to yours
|
2023-01-23 14:39:51
|
<taavi>
|
Func: can you test yours on a mwdebug server please?
|
2023-01-23 14:39:58
|
<Func>
|
I don't have sufficient rights to test on prod, but this simple patch should just works.
|
2023-01-23 14:40:25
|
<taavi>
|
sbailey: in the meantime: do you have the x-wikimedia-debug extension installed?
|
2023-01-23 14:40:41
|
<wikibugs>
|
('CR) ''Cathal Mooney: [C: ''+1] "LGTM! And TIL :)" [homer/public] - ''https://gerrit.wikimedia.org/r/877202 (https://phabricator.wikimedia.org/T325806) (owner: ''Ayounsi)'
|
2023-01-23 14:41:04
|
<sbailey>
|
thanks for the explaination, very apprciative of your comments. Happy to watch. Have more patches to backport i the coming weeks that are trickier, such as two data migration patches
|
2023-01-23 14:41:21
|
<taavi>
|
Func: ack. I gave it a quick test on testwiki just in case to not break stuff, works fine so deploying.
|
2023-01-23 14:42:16
|
<sbailey>
|
the x-wikimedia-debug extension will not help e with this patch being verified. I need to use Quarry and actually look at error log and create pages with lint errors and see them show up in preports.
|
2023-01-23 14:42:20
|
<sukhe>
|
!log rolling out pybal 1.15.10: T321191
|
2023-01-23 14:42:23
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 14:42:24
|
<stashbot>
|
T321191: Cleanup pybal Prometheus metrics on monitor stop() - https://phabricator.wikimedia.org/T321191
|
2023-01-23 14:43:00
|
<taavi>
|
sbailey: hmm. how/when are the rows inserted into the databases?
|
2023-01-23 14:43:35
|
<sbailey>
|
taavi, pretty fast, but as part of a job that is invoked by VE and standard editor
|
2023-01-23 14:43:54
|
<sbailey>
|
Linter recordLintJob
|
2023-01-23 14:43:56
|
<taavi>
|
ah, it's a job? yeah, it can't be tested with x-wm-d then :/
|
2023-01-23 14:44:05
|
<sbailey>
|
I know, annoying
|
2023-01-23 14:44:21
|
<sbailey>
|
oh well
|
2023-01-23 14:45:18
|
<sbailey>
|
part of the reparsing code path of parsoid
|
2023-01-23 14:46:39
|
<sbailey>
|
parsoid queues up a bunch of linter error records when it reparses a page, then through a hook the job runs usually pretty quickly
|
2023-01-23 14:46:41
|
<logmsgbot>
|
!log taavi@deploy1002 Finished scap: Backport for [[gerrit:882179|SpecialUserrights: Allow updating the expiry of user groups (T327605)]] (duration: 08m 48s)
|
2023-01-23 14:46:45
|
<stashbot>
|
T327605: Special:UserRights: changing an already set permission's expiry to any new value fails - https://phabricator.wikimedia.org/T327605
|
2023-01-23 14:47:05
|
<taavi>
|
in that case in the future please split the changes to multiple patches (for example group0 first, then group1 and finally all wikis) since that creates a much smaller blast radius if something goes wrong. I can do it this way this time, but for the future that's much easier to deploy
|
2023-01-23 14:47:16
|
<wikibugs>
|
('PS2) ''Jbond: admin: Add check for duplicate uid's [puppet] - ''https://gerrit.wikimedia.org/r/882652'
|
2023-01-23 14:47:19
|
<wikibugs>
|
('PS5) ''Majavah: Enable Linter write namespace tag and template using core config [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
|
2023-01-23 14:47:22
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
|
2023-01-23 14:47:40
|
<wikibugs>
|
('PS7) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:47:49
|
<sbailey>
|
this is a very safe change, if it were more dangerous I would have done more stages
|
2023-01-23 14:48:20
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:48:56
|
<wikibugs>
|
('CR) ''Jbond: "ready for review" [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
|
2023-01-23 14:49:55
|
<wikibugs>
|
('CR) ''Vgutierrez: [C: ''+1] Release 9.1.4-1wm1 [debs/trafficserver] - ''https://gerrit.wikimedia.org/r/869282 (https://phabricator.wikimedia.org/T325563) (owner: ''Ssingh)'
|
2023-01-23 14:50:59
|
<wikibugs>
|
('CR) ''Majavah: [C: ''+2] Enable Linter write namespace tag and template using core config [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
|
2023-01-23 14:51:09
|
<wikibugs>
|
('CR) ''Ssingh: [C: ''+2] Release 9.1.4-1wm1 [debs/trafficserver] - ''https://gerrit.wikimedia.org/r/869282 (https://phabricator.wikimedia.org/T325563) (owner: ''Ssingh)'
|
2023-01-23 14:51:27
|
<icinga-wm>
|
PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-mlitn-singleuser-conda-analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 14:51:56
|
<wikibugs>
|
('Merged) ''jenkins-bot: Enable Linter write namespace tag and template using core config [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
|
2023-01-23 14:52:09
|
<sbailey>
|
:-)
|
2023-01-23 14:52:11
|
<logmsgbot>
|
!log taavi@deploy1002 Started scap: Backport for [[gerrit:880989|Enable Linter write namespace tag and template using core config (T299612)]]
|
2023-01-23 14:52:15
|
<stashbot>
|
T299612: Add namespace column and index to table - https://phabricator.wikimedia.org/T299612
|
2023-01-23 14:52:32
|
<sbailey>
|
Testing
|
2023-01-23 14:53:04
|
<taavi>
|
testing what exactly? the patch is still not deployed anywhere
|
2023-01-23 14:53:19
|
<sbailey>
|
?
|
2023-01-23 14:53:32
|
<sbailey>
|
Ah sync
|
2023-01-23 14:53:44
|
<taavi>
|
yeah, it takes a while these days
|
2023-01-23 14:53:44
|
<wikibugs>
|
('CR) ''Elukey: [C: ''+2] changeprop: fix uri in liftwing's template [deployment-charts] - ''https://gerrit.wikimedia.org/r/882654 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 14:53:47
|
<logmsgbot>
|
!log taavi@deploy1002 taavi and sbailey: Backport for [[gerrit:880989|Enable Linter write namespace tag and template using core config (T299612)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
|
2023-01-23 14:55:07
|
<wikibugs>
|
('PS1) ''Hashar: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656'
|
2023-01-23 14:56:09
|
<wikibugs>
|
('CR) ''Hashar: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 14:56:35
|
<wikibugs>
|
('CR) ''Vgutierrez: [C: ''+1] "looking good, please fix the mentioned typo on the changelog" [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
|
2023-01-23 14:56:59
|
<wikibugs>
|
'SRE, ''ops-esams, ''DC-Ops, ''Infrastructure-Foundations, ''decommission-hardware: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (''Volans) I've set the device back to active to reflect its current status and prevent some warnings to show up in the `sre.dns.netbox`
cookbook runs.'
|
2023-01-23 14:57:32
|
<Winston_Sung[m]>
|
So, is there any scheduled time to update the CX Server or it is required to fill a request somewhere?
|
2023-01-23 14:57:55
|
<wikibugs>
|
('PS2) ''Ssingh: Release 6.0.11-1wm1 [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634)'
|
2023-01-23 14:58:07
|
<wikibugs>
|
('CR) ''Ssingh: Release 6.0.11-1wm1 (''1 comment) [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
|
2023-01-23 14:58:15
|
<taavi>
|
Winston_Sung[m]: if there was a scheduled time it would be listed on https://wikitech.wikimedia.org/wiki/Deployments, and if there is not you need to ask the cxserver maintainers somewhere else
|
2023-01-23 14:58:37
|
<wikibugs>
|
('PS8) ''Jbond: Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 14:58:49
|
<wikibugs>
|
('CR) ''Ladsgroup: "I'm too late for this now but for future cases, please enable it on a set of test wikis and then make sure it doesn't break anything and t" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
|
2023-01-23 14:58:57
|
<Winston_Sung[m]>
|
Ok, thanks for the response.
|
2023-01-23 14:59:27
|
<logmsgbot>
|
!log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
|
2023-01-23 14:59:34
|
<icinga-wm>
|
PROBLEM - MariaDB Replica SQL: s2 #page on db1105 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Error Unknown column linter_template in field list on query. Default database: nlwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
|
2023-01-23 14:59:38
|
<logmsgbot>
|
!log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
|
2023-01-23 14:59:40
|
<marostegui>
|
checking
|
2023-01-23 14:59:43
|
<marostegui>
|
Amir1: ^
|
2023-01-23 14:59:50
|
<Amir1>
|
not me
|
2023-01-23 14:59:54
|
<marostegui>
|
let me depool
|
2023-01-23 14:59:59
|
<taavi>
|
sigh, that looks very related to the current deployment
|
2023-01-23 15:00:02
|
<taavi>
|
should I revert?
|
2023-01-23 15:00:08
|
<logmsgbot>
|
!log taavi@deploy1002 Finished scap: Backport for [[gerrit:880989|Enable Linter write namespace tag and template using core config (T299612)]] (duration: 07m 56s)
|
2023-01-23 15:00:10
|
<Amir1>
|
taavi: very likely
|
2023-01-23 15:00:11
|
<stashbot>
|
T299612: Add namespace column and index to table - https://phabricator.wikimedia.org/T299612
|
2023-01-23 15:00:13
|
<Amir1>
|
please revert
|
2023-01-23 15:00:14
|
<icinga-wm>
|
PROBLEM - MariaDB Replica SQL: s7 #page on db1170 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Error Unknown column linter_template in field list on query. Default database: metawiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
|
2023-01-23 15:00:16
|
<taavi>
|
sure, doing
|
2023-01-23 15:00:19
|
<taavi>
|
sorry :/
|
2023-01-23 15:00:19
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
|
2023-01-23 15:00:20
|
<sbailey>
|
Amir1, the write code was running on Beta since mid december 880989
|
2023-01-23 15:00:21
|
<marostegui>
|
taavi: revert
|
2023-01-23 15:00:21
|
<_joe_>
|
taavi: revert, yes
|
2023-01-23 15:00:27
|
<wikibugs>
|
('PS1) ''TrainBranchBot: Revert "Enable Linter write namespace tag and template using core config" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882661'
|
2023-01-23 15:00:29
|
<wikibugs>
|
('CR) ''TrainBranchBot: "taavi@deploy1002 created a revert of this change as I76ef30bfd05fe069b2715e1933e8b81723149187" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/880989 (https://phabricator.wikimedia.org/T299612) (owner: ''Sbailey)'
|
2023-01-23 15:00:29
|
<taavi>
|
doing
|
2023-01-23 15:00:33
|
<Amir1>
|
sbailey: beta and production dbs are different
|
2023-01-23 15:00:35
|
<marostegui>
|
maybe those hosts didn't get the column?
|
2023-01-23 15:00:39
|
<Amir1>
|
beta works with update.php
|
2023-01-23 15:00:43
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882661 (owner: ''TrainBranchBot)'
|
2023-01-23 15:00:44
|
<bblack>
|
hey
|
2023-01-23 15:00:46
|
<Amir1>
|
marostegui: yeah, that's my guess
|
2023-01-23 15:00:52
|
<_joe_>
|
let's wait to talk about what went wrong until things are stable
|
2023-01-23 15:00:54
|
<marostegui>
|
I can add it quickly
|
2023-01-23 15:00:55
|
<wikibugs>
|
('PS2) ''Ottomata: Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649'
|
2023-01-23 15:00:57
|
<wikibugs>
|
('PS1) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 15:01:11
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
|
2023-01-23 15:01:18
|
<bblack>
|
sounds like it's handled!
|
2023-01-23 15:01:18
|
<marostegui>
|
both hosts are now depooled
|
2023-01-23 15:01:27
|
<_joe_>
|
bblack: it's ongoing
|
2023-01-23 15:01:31
|
<taavi>
|
sorry about this
|
2023-01-23 15:01:50
|
<_joe_>
|
taavi: are you taking care of the rollback?
|
2023-01-23 15:01:55
|
<wikibugs>
|
('Merged) ''jenkins-bot: Revert "Enable Linter write namespace tag and template using core config" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882661 (owner: ''TrainBranchBot)'
|
2023-01-23 15:01:57
|
<taavi>
|
yes, I am rolling the mediawiki changes back
|
2023-01-23 15:02:10
|
<logmsgbot>
|
!log taavi@deploy1002 Started scap: Backport for [[gerrit:882661|Revert "Enable Linter write namespace tag and template using core config"]]
|
2023-01-23 15:02:15
|
<Amir1>
|
yup, it's the linter error
|
2023-01-23 15:02:15
|
<brett>
|
thanks for confirming
|
2023-01-23 15:02:16
|
<_joe_>
|
ok, thanks
|
2023-01-23 15:02:23
|
<Amir1>
|
Last_Error: Error 'Unknown column 'linter_template' in 'field list'' on query. Default database: 'metawiki'. Query: 'INSERT /* MediaWiki\Linter\Database::setForPage */ IGNORE INTO `>
|
2023-01-23 15:02:29
|
<marostegui>
|
yeah the column isn't present
|
2023-01-23 15:02:33
|
<bblack>
|
new column didn't exist in prod dbs yet?
|
2023-01-23 15:02:33
|
<marostegui>
|
I am going to add them on db1105 and db1170
|
2023-01-23 15:02:35
|
<Amir1>
|
gradual rollout people, please
|
2023-01-23 15:02:36
|
<bblack>
|
ok
|
2023-01-23 15:02:45
|
<marostegui>
|
the hosts are not serving traffic now
|
2023-01-23 15:02:48
|
<marostegui>
|
so we should be good
|
2023-01-23 15:02:53
|
<Amir1>
|
yeah
|
2023-01-23 15:02:57
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+1] "Looks good, one comment inline." [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
|
2023-01-23 15:03:02
|
<marostegui>
|
I will add it and let you know taavi
|
2023-01-23 15:03:02
|
<wikibugs>
|
('CR) ''Jbond: admin/canary_appserver: add group of users allowed to disable puppet (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
|
2023-01-23 15:03:29
|
<taavi>
|
marostegui: I'll revert it anyways, it can be re-enabled at some later window
|
2023-01-23 15:03:34
|
<marostegui>
|
taavi: sounds good
|
2023-01-23 15:03:48
|
<logmsgbot>
|
!log taavi@deploy1002 taavi and trainbranchbot: Backport for [[gerrit:882661|Revert "Enable Linter write namespace tag and template using core config"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
|
2023-01-23 15:04:04
|
<_joe_>
|
taavi: I'd release everywhere tbh
|
2023-01-23 15:04:08
|
<wikibugs>
|
('PS2) ''Hashar: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656'
|
2023-01-23 15:04:22
|
<wikibugs>
|
('CR) ''Hashar: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 15:06:14
|
<Amir1>
|
taavi: now that it's reverted, please do gradual roll out, first testwikis, then one section, etc.
|
2023-01-23 15:06:37
|
<taavi>
|
sbailey: ^
|
2023-01-23 15:07:08
|
<_joe_>
|
can we claim the incident is over?
|
2023-01-23 15:07:16
|
<sbailey>
|
Ok, how do I verify all databases have had the 3 columns added?
|
2023-01-23 15:07:51
|
<sbailey>
|
Yes will figure out how to do more gradual roll out
|
2023-01-23 15:07:51
|
<Amir1>
|
it's not possible manually, you can do a drift report
|
2023-01-23 15:08:09
|
<Amir1>
|
https://drift-tracker.toolforge.org/report/core/
|
2023-01-23 15:08:16
|
<marostegui>
|
db1105:3312 is now fixed
|
2023-01-23 15:08:19
|
<Amir1>
|
https://drift-tracker.toolforge.org/report/flaggedrevs/
|
2023-01-23 15:08:20
|
<marostegui>
|
I am fixing db1170:3317
|
2023-01-23 15:09:12
|
<icinga-wm>
|
RECOVERY - MariaDB Replica SQL: s2 #page on db1105 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
|
2023-01-23 15:09:39
|
<logmsgbot>
|
!log taavi@deploy1002 Finished scap: Backport for [[gerrit:882661|Revert "Enable Linter write namespace tag and template using core config"]] (duration: 07m 28s)
|
2023-01-23 15:09:44
|
<taavi>
|
revert was finally synced
|
2023-01-23 15:10:11
|
<jinxer-wm>
|
(Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
|
2023-01-23 15:11:18
|
<wikibugs>
|
('CR) ''Hashar: "The compile has been triggered for `pcc-worker1001.puppet-diffs.eqiad1.wikimedia.cloud` which is a noop https://puppet-compiler.wmflabs.or"; [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 15:11:40
|
<wikibugs>
|
('PS3) ''Hashar: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656'
|
2023-01-23 15:11:53
|
<wikibugs>
|
('CR) ''Hashar: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 15:13:02
|
<icinga-wm>
|
PROBLEM - MariaDB Replica Lag: s7 #page on db1170 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 892.41 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
|
2023-01-23 15:13:31
|
<wikibugs>
|
('CR) ''Hashar: "PCC https://puppet-compiler.wmflabs.org/output/882656/1584/ and the diff is https://puppet-compiler.wmflabs.org/output/882656/1584/pcc-db1"; [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 15:14:02
|
<brett>
|
_joe_: Is there an incident doc? Is it necessary to create one for this?
|
2023-01-23 15:14:39
|
<marostegui>
|
brett: probably not need to
|
2023-01-23 15:14:48
|
<sbailey>
|
Was it just two machines that didn't have the columns?
|
2023-01-23 15:14:55
|
<marostegui>
|
looks so for now yes
|
2023-01-23 15:15:02
|
<Amir1>
|
I'm running linter drift report to see geenrally what could be wrong
|
2023-01-23 15:15:11
|
<marostegui>
|
db1170 should be fixed now
|
2023-01-23 15:15:29
|
<sbailey>
|
Can we deploy this if it was just 2 machines?
|
2023-01-23 15:15:35
|
<wikibugs>
|
('PS1) ''BBlack: Possibly mitigate ATS bug with semicolon in Path [puppet] - ''https://gerrit.wikimedia.org/r/882663 (https://phabricator.wikimedia.org/T238285)'
|
2023-01-23 15:15:41
|
<marostegui>
|
sbailey: no, let's make sure it was just those two
|
2023-01-23 15:15:41
|
<Amir1>
|
it'll take a bit of time
|
2023-01-23 15:15:47
|
<sbailey>
|
ok
|
2023-01-23 15:16:12
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
|
2023-01-23 15:16:16
|
<icinga-wm>
|
RECOVERY - MariaDB Replica Lag: s7 #page on db1170 is OK: OK slave_sql_lag Replication lag: 0.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
|
2023-01-23 15:16:22
|
<icinga-wm>
|
RECOVERY - MariaDB Replica SQL: s7 #page on db1170 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
|
2023-01-23 15:16:40
|
<wikibugs>
|
('PS2) ''BBlack: Possibly mitigate ATS bug with semicolon in Path [puppet] - ''https://gerrit.wikimedia.org/r/882663 (https://phabricator.wikimedia.org/T238285)'
|
2023-01-23 15:16:42
|
<marostegui>
|
I am now repooling both hosts
|
2023-01-23 15:16:43
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
|
2023-01-23 15:17:20
|
<sukhe>
|
!log reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
|
2023-01-23 15:17:23
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 15:17:24
|
<stashbot>
|
T325563: Package and deploy ATS 9.1.4 - https://phabricator.wikimedia.org/T325563
|
2023-01-23 15:17:42
|
<wikibugs>
|
('PS1) ''Bking: wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096)'
|
2023-01-23 15:19:39
|
<Amir1>
|
sbailey: FWIW, I'm seeing drift on linter_params in every wiki:
|
2023-01-23 15:19:44
|
<Amir1>
|
https://www.irccloud.com/pastebin/IsQT61W4/
|
2023-01-23 15:20:17
|
<Amir1>
|
this means the field is nullable in code but not production or other way around
|
2023-01-23 15:21:11
|
<sbailey>
|
Ah, hmm. How can beta be ok but others not?
|
2023-01-23 15:21:16
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] Release 6.0.11-1wm1 [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
|
2023-01-23 15:21:20
|
<wikibugs>
|
('CR) ''JMeybohm: "Could you be more explicit and allow access to those ports by only the spawned job pods?" [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 15:22:02
|
<sbailey>
|
Amir1, can we chat on slack offline so I can fix/understand how this might happen?
|
2023-01-23 15:22:34
|
<Amir1>
|
sure
|
2023-01-23 15:23:16
|
<wikibugs>
|
('CR) ''Muehlenhoff: [C: ''+1] "LGTM" [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
|
2023-01-23 15:26:20
|
<wikibugs>
|
('CR) ''Ssingh: "Updated typo, ignoring the build failure as expected." [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
|
2023-01-23 15:28:31
|
<wikibugs>
|
('CR) ''DCausse: "seems like wdqs1010 is missing from ferm" [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
|
2023-01-23 15:31:00
|
<wikibugs>
|
('PS2) ''Bking: wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096)'
|
2023-01-23 15:31:17
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
|
2023-01-23 15:31:48
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
|
2023-01-23 15:32:00
|
<wikibugs>
|
('CR) ''Bking: wdqs: mount NFS to new hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
|
2023-01-23 15:32:53
|
<wikibugs>
|
('CR) ''Ssingh: [V: ''+2 C: ''+2] Release 6.0.11-1wm1 [debs/varnish4] (debian-wmf) - ''https://gerrit.wikimedia.org/r/878049 (https://phabricator.wikimedia.org/T326634) (owner: ''Ssingh)'
|
2023-01-23 15:34:42
|
<wikibugs>
|
('CR) ''DCausse: [C: ''+1] wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
|
2023-01-23 15:35:54
|
<wikibugs>
|
('CR) ''Bking: wdqs: mount NFS to new hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
|
2023-01-23 15:37:30
|
<wikibugs>
|
('PS1) ''Vgutierrez: Stop parsing semi-colon as a URL path delimiter [debs/trafficserver] - ''https://gerrit.wikimedia.org/r/882667'
|
2023-01-23 15:37:34
|
<wikibugs>
|
('PS1) ''Elukey: changeprop: fix liftwing's body settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/882668 (https://phabricator.wikimedia.org/T327302)'
|
2023-01-23 15:40:29
|
<urbanecm>
|
marostegui: is it ok if i ship a sec patch now? or should i wait a bit for the DB fixes to be finished?
|
2023-01-23 15:40:57
|
<wikibugs>
|
('CR) ''Hnowlan: [C: ''+1] "LGTM based on the example configs used by changeprop!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/882668 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 15:41:16
|
<marostegui>
|
urbanecm: it should be fine
|
2023-01-23 15:41:24
|
<wikibugs>
|
('CR) ''Bking: [C: ''+2] wdqs: mount NFS to new hosts [puppet] - ''https://gerrit.wikimedia.org/r/882664 (https://phabricator.wikimedia.org/T323096) (owner: ''Bking)'
|
2023-01-23 15:41:26
|
<urbanecm>
|
thank you, proceeding.
|
2023-01-23 15:44:25
|
<wikibugs>
|
('CR) ''Ottomata: [C: ''+2] Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649 (owner: ''Ottomata)'
|
2023-01-23 15:44:36
|
<papaul>
|
!log on going maintenance on fasw-codfw
|
2023-01-23 15:44:37
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 15:46:01
|
<wikibugs>
|
('CR) ''Elukey: [C: ''+2] changeprop: fix liftwing's body settings [deployment-charts] - ''https://gerrit.wikimedia.org/r/882668 (https://phabricator.wikimedia.org/T327302) (owner: ''Elukey)'
|
2023-01-23 15:46:22
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
|
2023-01-23 15:46:53
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
|
2023-01-23 15:48:38
|
<logmsgbot>
|
!log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
|
2023-01-23 15:48:49
|
<logmsgbot>
|
!log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
|
2023-01-23 15:49:23
|
<wikibugs>
|
('Merged) ''jenkins-bot: Add to admin_ng/README.md on how to deploy limiting the release [deployment-charts] - ''https://gerrit.wikimedia.org/r/882649 (owner: ''Ottomata)'
|
2023-01-23 15:50:30
|
<wikibugs>
|
('PS2) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 15:50:32
|
<urbanecm>
|
!log Deploy security patch for T327613
|
2023-01-23 15:50:34
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 15:51:09
|
<icinga-wm>
|
PROBLEM - BGP status on pfw3-codfw is CRITICAL: BGP CRITICAL - AS64600/IPv4: Idle - PyBal, AS64600/IPv4: Idle - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
|
2023-01-23 15:51:25
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 15:51:27
|
<wikibugs>
|
('PS3) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 15:51:33
|
<icinga-wm>
|
PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 34, down: 10, dormant: 0, excluded: 3, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
|
2023-01-23 15:52:11
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 15:52:30
|
<wikibugs>
|
('PS4) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 15:53:51
|
<sukhe>
|
!log reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
|
2023-01-23 15:53:53
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 15:53:54
|
<stashbot>
|
T326634: Package and deploy varnish 6.0.11 - https://phabricator.wikimedia.org/T326634
|
2023-01-23 15:54:11
|
<wikibugs>
|
'SRE, ''Traffic, ''Patch-For-Review: Package and deploy varnish 6.0.11 - https://phabricator.wikimedia.org/T326634 (''ssingh)'
|
2023-01-23 15:59:15
|
<icinga-wm>
|
RECOVERY - BGP status on pfw3-codfw is OK: BGP OK - up: 5, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
|
2023-01-23 15:59:34
|
<urbanecm>
|
the secpatch's deployment is done
|
2023-01-23 15:59:38
|
<wikibugs>
|
('CR) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 15:59:43
|
<icinga-wm>
|
RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 58, down: 0, dormant: 0, excluded: 3, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
|
2023-01-23 16:01:28
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
|
2023-01-23 16:01:58
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
|
2023-01-23 16:04:20
|
<wikibugs>
|
('CR) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 16:08:00
|
<wikibugs>
|
'SRE, ''Traffic, ''Traffic-Icebox, ''WMF-General-or-Unknown, and 3 others: Pages whose title ends with semicolon (;) are intermittently inaccessible (likely due to ATS) - https://phabricator.wikimedia.org/T238285 (''Pigsonthewing) T261624 was merged here; in that ticket I asked: > On testing, I can see t...'
|
2023-01-23 16:11:45
|
<wikibugs>
|
('CR) ''Jbond: "@alex, i think ill take over this CR unless there are objections" [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 16:12:16
|
<wikibugs>
|
('CR) ''Jbond: Fix xihua's account (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 16:15:28
|
<wikibugs>
|
('PS5) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 16:16:00
|
<wikibugs>
|
'SRE, ''ops-esams, ''DC-Ops: ripe-atlas-esams down - https://phabricator.wikimedia.org/T303242 (''RobH)'
|
2023-01-23 16:16:02
|
<wikibugs>
|
'SRE, ''ops-esams, ''DC-Ops, ''Infrastructure-Foundations, ''decommission-hardware: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (''RobH) ''Open→''Declined
device resurrected itself, decom task declined as its now reporting into ripe portal'
|
2023-01-23 16:16:33
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
|
2023-01-23 16:16:39
|
<wikibugs>
|
('CR) ''Ottomata: flink-app - explicitly set Flink ports and configure ingress netpol for them (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 16:17:03
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
|
2023-01-23 16:21:39
|
<wikibugs>
|
('PS1) ''Ottomata: flink - avoid adding an extra 'k8s_api_enabled' label by using component label instead [deployment-charts] - ''https://gerrit.wikimedia.org/r/882680 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 16:24:52
|
<wikibugs>
|
('PS1) ''Stang: newiki: Add new permissions to group reviewer [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882681 (https://phabricator.wikimedia.org/T327114)'
|
2023-01-23 16:25:22
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations, ''fundraising-tech-ops, ''netops: Upgrade fasw to Junos 21 - https://phabricator.wikimedia.org/T316542 (''Papaul)'
|
2023-01-23 16:25:30
|
<wikibugs>
|
('CR) ''BCornwall: [V: ''+1 C: ''+2] tlsproxy: Remove nginx_tune_for_media [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730) (owner: ''BCornwall)'
|
2023-01-23 16:25:36
|
<wikibugs>
|
('PS2) ''BCornwall: tlsproxy: Remove nginx_tune_for_media [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730)'
|
2023-01-23 16:26:19
|
<wikibugs>
|
('PS1) ''Jdrewniak: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882682 (https://phabricator.wikimedia.org/T128546)'
|
2023-01-23 16:27:04
|
<wikibugs>
|
('CR) ''BCornwall: [V: ''+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39208/console"; [puppet] - ''https://gerrit.wikimedia.org/r/881902 (https://phabricator.wikimedia.org/T228730)
(owner: ''BCornwall)'
|
2023-01-23 16:29:36
|
<wikibugs>
|
('CR) ''Ottomata: [C: ''+2] flink-app - explicitly set Flink ports and configure ingress netpol for them (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 16:29:44
|
<wikibugs>
|
('CR) ''Ottomata: [C: ''+2] flink - avoid adding an extra 'k8s_api_enabled' label by using component label instead [deployment-charts] - ''https://gerrit.wikimedia.org/r/882680 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 16:30:05
|
<jouncebot>
|
jan_drewniak: OwO what's this, a deployment window?? Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1630). nyaa~
|
2023-01-23 16:30:39
|
<wikibugs>
|
('CR) ''Jdrewniak: [C: ''+2] Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882682 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
|
2023-01-23 16:31:23
|
<wikibugs>
|
('Merged) ''jenkins-bot: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882682 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
|
2023-01-23 16:31:38
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
|
2023-01-23 16:32:08
|
<logmsgbot>
|
!log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
|
2023-01-23 16:34:28
|
<wikibugs>
|
('Merged) ''jenkins-bot: flink-app - explicitly set Flink ports and configure ingress netpol for them [deployment-charts] - ''https://gerrit.wikimedia.org/r/882662 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 16:34:31
|
<wikibugs>
|
('Merged) ''jenkins-bot: flink - avoid adding an extra 'k8s_api_enabled' label by using component label instead [deployment-charts] - ''https://gerrit.wikimedia.org/r/882680 (https://phabricator.wikimedia.org/T324576) (owner: ''Ottomata)'
|
2023-01-23 16:35:06
|
<wikibugs>
|
('PS4) ''Jbond: puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 16:35:07
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
|
2023-01-23 16:35:09
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
|
2023-01-23 16:35:55
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+2] "updated slightly, thanks" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 16:36:12
|
<wikibugs>
|
('CR) ''Jbond: [V: ''+2 C: ''+2] puppet_compiler: serve pson.gz as application/json [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 16:39:28
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Ottomata) Approved by me. I think we need someone at WMF to approve/sponser @taavi's membership in this group though. @taavi, could someone maybe in Cloud VPS do this for you?'
|
2023-01-23 16:40:00
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
|
2023-01-23 16:40:04
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
|
2023-01-23 16:41:12
|
<wikibugs>
|
('CR) ''Hashar: "I can confirm it makes Firefox pretty print the pson.gz ;) Thank you!" [puppet] - ''https://gerrit.wikimedia.org/r/882656 (owner: ''Hashar)'
|
2023-01-23 16:41:23
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+2] admin: Add check for duplicate uid's (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/882652 (owner: ''Jbond)'
|
2023-01-23 16:41:39
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
|
2023-01-23 16:41:43
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
|
2023-01-23 16:41:55
|
<wikibugs>
|
('CR) ''Jbond: [C: ''+2] Fix xihua's account [puppet] - ''https://gerrit.wikimedia.org/r/881872 (https://phabricator.wikimedia.org/T325004) (owner: ''Alexandros Kosiaris)'
|
2023-01-23 16:42:02
|
<logmsgbot>
|
!log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:882682| Bumping portals to master (T128546)]] (duration: 06m 48s)
|
2023-01-23 16:42:05
|
<stashbot>
|
T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
|
2023-01-23 16:48:51
|
<logmsgbot>
|
!log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:882682| Bumping portals to master (T128546)]] (duration: 06m 48s)
|
2023-01-23 16:48:55
|
<stashbot>
|
T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
|
2023-01-23 16:50:36
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 16:50:38
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 16:53:54
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users & analytics-product-users for Hxi-ctr - https://phabricator.wikimedia.org/T325004 (''jbond) ''Open→''Resolved I have gone ahead and merged the changes to rename this account, please reopen if you have
have...'
|
2023-01-23 16:56:46
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 16:56:48
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 16:58:02
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 16:58:05
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 16:59:41
|
<wikibugs>
|
'SRE, ''Traffic, ''Data Pipelines (Sprint 07): Document Impact of Jan 8&9 Traffic Data Loss - https://phabricator.wikimedia.org/T326658 (''Snwachukwu) #traffic Can you please confirm that there were cases of pages served in ##eqsin## but not reported in ##webrequest logs##.'
|
2023-01-23 17:02:26
|
<wikibugs>
|
('PS1) ''Ottomata: flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692'
|
2023-01-23 17:05:06
|
<wikibugs>
|
('PS2) ''Ottomata: flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692'
|
2023-01-23 17:05:51
|
<logmsgbot>
|
!log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 17:05:54
|
<logmsgbot>
|
!log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
|
2023-01-23 17:07:27
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
|
2023-01-23 17:22:32
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
|
2023-01-23 17:37:37
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
|
2023-01-23 17:39:26
|
<wikibugs>
|
('CR) ''Herron: [C: ''+1] conftool-data: add logstash[12]032 to kibana7 backend [puppet] - ''https://gerrit.wikimedia.org/r/881813 (owner: ''Cwhite)'
|
2023-01-23 17:41:48
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''jhathaway) Happy to sponsor @taavi for this request'
|
2023-01-23 17:44:02
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] idp: remove racktables related settings (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/881697 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 17:44:10
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert)'
|
2023-01-23 17:49:29
|
<wikibugs>
|
('PS3) ''Dzahn: idp: remove config for racktables [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405)'
|
2023-01-23 17:49:31
|
<wikibugs>
|
('PS1) ''Clément Goubert: admin: Grant taavi access to analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/882696 (https://phabricator.wikimedia.org/T327013)'
|
2023-01-23 17:50:22
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+1] "lgtm, has approval from ottomata and another SRE as sponsor" [puppet] - ''https://gerrit.wikimedia.org/r/882696 (https://phabricator.wikimedia.org/T327013) (owner: ''Clément Goubert)'
|
2023-01-23 17:50:43
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) @taavi Patch ready, assuming you don't need kerberos access. Here are the [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_access#U...'
|
2023-01-23 17:50:46
|
<icinga-wm>
|
ACKNOWLEDGEMENT - cassandra-a SSL 10.64.0.144:7001 on sessionstore1001 is CRITICAL: SSL CRITICAL - Certificate sessionstore1001-a valid until 2023-02-22 11:12:05 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 17:50:46
|
<icinga-wm>
|
ACKNOWLEDGEMENT - cassandra-a SSL 10.64.32.85:7001 on sessionstore1002 is CRITICAL: SSL CRITICAL - Certificate sessionstore1002-a valid until 2023-02-22 11:12:08 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 17:50:46
|
<icinga-wm>
|
ACKNOWLEDGEMENT - cassandra-a SSL 10.64.48.178:7001 on sessionstore1003 is CRITICAL: SSL CRITICAL - Certificate sessionstore1003-a valid until 2023-02-22 11:12:10 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 17:50:46
|
<icinga-wm>
|
ACKNOWLEDGEMENT - cassandra-a SSL 10.192.16.95:7001 on sessionstore2001 is CRITICAL: SSL CRITICAL - Certificate sessionstore2001-a valid until 2023-02-22 11:12:13 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 17:50:46
|
<icinga-wm>
|
ACKNOWLEDGEMENT - cassandra-a SSL 10.192.32.101:7001 on sessionstore2002 is CRITICAL: SSL CRITICAL - Certificate sessionstore2002-a valid until 2023-02-22 11:12:16 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 17:50:46
|
<icinga-wm>
|
ACKNOWLEDGEMENT - cassandra-a SSL 10.192.48.132:7001 on sessionstore2003 is CRITICAL: SSL CRITICAL - Certificate sessionstore2003-a valid until 2023-02-22 11:12:18 +0000 (expires in 29 days) eevans See: https://phabricator.wikimedia.org/T327675 - The acknowledgement expires at: 2023-01-30 17:50:07. https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
|
2023-01-23 17:51:15
|
<wikibugs>
|
('CR) ''Clément Goubert: [C: ''+2] admin: Grant taavi access to analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/882696 (https://phabricator.wikimedia.org/T327013) (owner: ''Clément Goubert)'
|
2023-01-23 17:52:44
|
<logmsgbot>
|
!log ladsgroup@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
|
2023-01-23 17:53:57
|
<wikibugs>
|
'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to analytics-privatedata-users for Taavi - https://phabricator.wikimedia.org/T327013 (''Clement_Goubert) ''In progress→''Resolved @taavi Access request merged, you should have your access around 30 minutes from now when puppet has
run. R...'
|
2023-01-23 17:56:15
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "https://puppet-compiler.wmflabs.org/output/881938/39209/"; [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 17:56:42
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "https://gerrit.wikimedia.org/r/c/operations/puppet/+/881938"; [puppet] - ''https://gerrit.wikimedia.org/r/881697 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 17:56:54
|
<wikibugs>
|
('PS3) ''Hnowlan: thumbor: add and use haproxy healthz lvs check [puppet] - ''https://gerrit.wikimedia.org/r/880898 (https://phabricator.wikimedia.org/T233196)'
|
2023-01-23 17:57:00
|
<wikibugs>
|
('PS2) ''Hnowlan: thumbor: add failure condition to health check [deployment-charts] - ''https://gerrit.wikimedia.org/r/881635 (https://phabricator.wikimedia.org/T233196)'
|
2023-01-23 17:57:37
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "Notice: /Stage[main]/Apereo_cas/File[/etc/cas/services/racktables-18.json]/ensure: removed" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 17:58:43
|
<wikibugs>
|
('CR) ''Hnowlan: [V: ''+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39210/console"; [puppet] - ''https://gerrit.wikimedia.org/r/880898 (https://phabricator.wikimedia.org/T233196)
(owner: ''Hnowlan)'
|
2023-01-23 17:59:58
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "IDP config was removed on both idp servers and Apache config was removed on miscweb, no problem when refreshing apache" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 18:00:05
|
<jouncebot>
|
Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1800)
|
2023-01-23 18:00:05
|
<jouncebot>
|
ryankemper: I, the Bot under the Fountain, call upon thee, The Deployer, to do Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T1800).
|
2023-01-23 18:00:36
|
<wikibugs>
|
('PS5) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 18:00:39
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "spoke too soon :) apache2.service (apache2-apache2-after-network-online-target)]: Skipping because of failed dependencies" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 18:00:43
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
|
2023-01-23 18:02:23
|
<icinga-wm>
|
PROBLEM - Check systemd state on miscweb2002 is CRITICAL: CRITICAL - degraded: The following units failed: apache2.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 18:02:35
|
<wikibugs>
|
('CR) ''Hnowlan: [V: ''+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39212/console"; [puppet] - ''https://gerrit.wikimedia.org/r/880898 (https://phabricator.wikimedia.org/T233196)
(owner: ''Hnowlan)'
|
2023-01-23 18:02:44
|
<logmsgbot>
|
!log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
|
2023-01-23 18:02:57
|
<logmsgbot>
|
!log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
|
2023-01-23 18:03:13
|
<wikibugs>
|
('CR) ''JHathaway: rspamd: vendor github.com/oxc/puppet-rspamd (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/870901 (https://phabricator.wikimedia.org/T325397) (owner: ''JHathaway)'
|
2023-01-23 18:03:17
|
<wikibugs>
|
('CR) ''JHathaway: [C: ''+2] rspamd: vendor github.com/oxc/puppet-rspamd [puppet] - ''https://gerrit.wikimedia.org/r/870901 (https://phabricator.wikimedia.org/T325397) (owner: ''JHathaway)'
|
2023-01-23 18:04:06
|
<icinga-wm>
|
ACKNOWLEDGEMENT - Check systemd state on miscweb2002 is CRITICAL: CRITICAL - degraded: The following units failed: apache2.service daniel_zahn inactive server, debugging in progress https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 18:04:06
|
<icinga-wm>
|
ACKNOWLEDGEMENT - Static CodeReview archive HTTP on miscweb2002 is CRITICAL: connect to address 10.192.16.211 and port 80: Connection refused daniel_zahn inactive server, debugging in progress https://wikitech.wikimedia.org/wiki/Static-codereview.wikimedia.org
|
2023-01-23 18:04:06
|
<icinga-wm>
|
ACKNOWLEDGEMENT - racktables.wikimedia.org requires authentication on miscweb2002 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 503 Service Unavailable daniel_zahn inactive server, debugging in progress https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
|
2023-01-23 18:05:05
|
<mutante>
|
!log miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
|
2023-01-23 18:05:07
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 18:07:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 18:08:16
|
<mutante>
|
!log miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load
|
2023-01-23 18:08:18
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 18:08:49
|
<icinga-wm>
|
RECOVERY - Check systemd state on miscweb2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 18:13:33
|
<wikibugs>
|
('PS6) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 18:13:46
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
|
2023-01-23 18:14:29
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "it still broke because this way puppet did not unload the CAS apache module. so technically should be "ensure absent" instead of just remo" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 18:18:17
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "profile::idp::client::httpd would need to first get an "$ensure" class parameter that absents the mod_conf and the libapache2-mod-auth-cas" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 18:19:38
|
<mutante>
|
!log miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - T327405
|
2023-01-23 18:19:41
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 18:19:42
|
<stashbot>
|
T327405: Decommission Racktables - https://phabricator.wikimedia.org/T327405
|
2023-01-23 18:22:37
|
<wikibugs>
|
('PS7) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 18:23:03
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
|
2023-01-23 18:28:26
|
<wikibugs>
|
('CR) ''Ssingh: "Sorry, I skipped reviewing this for quite a while. Are we still planning on merging these or are we doing a top-level declaration instead?" [puppet] - ''https://gerrit.wikimedia.org/r/863294 (https://phabricator.wikimedia.org/T308013) (owner: ''Muehlenhoff)'
|
2023-01-23 18:30:20
|
<wikibugs>
|
('PS8) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 18:30:59
|
<wikibugs>
|
'SRE: profile::idp::client::httpd should be absent-able - https://phabricator.wikimedia.org/T327678 (''Dzahn)'
|
2023-01-23 18:31:05
|
<wikibugs>
|
('CR) ''CI reject: [V: ''-1] flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
|
2023-01-23 18:31:12
|
<wikibugs>
|
('PS1) ''Jelto: gitlab: exclude shell scripts and other backups from rsync jobs [puppet] - ''https://gerrit.wikimedia.org/r/882704 (https://phabricator.wikimedia.org/T274463)'
|
2023-01-23 18:32:13
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations: profile::idp::client::httpd should be absent-able - https://phabricator.wikimedia.org/T327678 (''Dzahn)'
|
2023-01-23 18:33:31
|
<wikibugs>
|
('PS9) ''Bking: flink-operator: bump version to 1.3.1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/881458 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 18:38:32
|
<wikibugs>
|
('PS1) ''Jforrester: Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882705'
|
2023-01-23 18:38:34
|
<wikibugs>
|
('PS1) ''Jforrester: Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882706'
|
2023-01-23 18:42:39
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+1] "looks good to me" [puppet] - ''https://gerrit.wikimedia.org/r/882704 (https://phabricator.wikimedia.org/T274463) (owner: ''Jelto)'
|
2023-01-23 18:48:27
|
<mutante>
|
!log miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
|
2023-01-23 18:48:29
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 18:48:58
|
<jinxer-wm>
|
(KubernetesAPILatency) firing: High Kubernetes API latency (UPDATE certificaterequests) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
|
2023-01-23 18:50:15
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "unloaded the module and removed the package manually on miscweb*, which are fine now. also did a follow-up ticket but not sure how importa" [puppet] - ''https://gerrit.wikimedia.org/r/881938 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 18:50:54
|
<wikibugs>
|
('PS2) ''Dzahn: miscweb: remove racktables profile from miscweb role [puppet] - ''https://gerrit.wikimedia.org/r/881694 (https://phabricator.wikimedia.org/T327405)'
|
2023-01-23 18:51:09
|
<wikibugs>
|
'SRE-swift-storage, ''Wikimedia-production-error: FileBackendError: Iterator page I/O error. - https://phabricator.wikimedia.org/T327681 (''TheresNoTime)'
|
2023-01-23 18:51:25
|
<wikibugs>
|
('CR) ''Ottomata: [C: ''+2] flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692 (owner: ''Ottomata)'
|
2023-01-23 18:53:58
|
<jinxer-wm>
|
(KubernetesAPILatency) resolved: High Kubernetes API latency (UPDATE certificaterequests) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
|
2023-01-23 18:55:24
|
<TheresNoTime>
|
Just surfacing that T327681 is intermittently causing user facing exceptions — fairly low rate, not consistently repeatable
|
2023-01-23 18:55:24
|
<stashbot>
|
T327681: FileBackendError: Iterator page I/O error. - https://phabricator.wikimedia.org/T327681
|
2023-01-23 18:57:33
|
<wikibugs>
|
('Merged) ''jenkins-bot: flink-app - netpol must use app: <chart>-<release> podSelector [deployment-charts] - ''https://gerrit.wikimedia.org/r/882692 (owner: ''Ottomata)'
|
2023-01-23 19:10:11
|
<jinxer-wm>
|
(Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
|
2023-01-23 19:11:05
|
<wikibugs>
|
'SRE, ''Infrastructure-Foundations: profile::idp::client::httpd should be absent-able - https://phabricator.wikimedia.org/T327678 (''Dzahn) p:''Triage→''Low'
|
2023-01-23 19:12:14
|
<wikibugs>
|
('CR) ''Dzahn: [C: ''+2] "https://puppet-compiler.wmflabs.org/output/881694/39213/"; [puppet] - ''https://gerrit.wikimedia.org/r/881694 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 19:16:18
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
|
2023-01-23 19:16:31
|
<logmsgbot>
|
!log eevans@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
|
2023-01-23 19:17:53
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
|
2023-01-23 19:17:55
|
<logmsgbot>
|
!log eevans@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
|
2023-01-23 19:18:17
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
|
2023-01-23 19:19:00
|
<wikibugs>
|
('CR) ''Dzahn: "we should not forget there is also this include: modules/profile/manifests/mariadb/grants/production.pp: include passwords::racktables " [puppet] - ''https://gerrit.wikimedia.org/r/881701 (https://phabricator.wikimedia.org/T327405) (owner: ''Dzahn)'
|
2023-01-23 19:24:44
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
|
2023-01-23 19:30:27
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
|
2023-01-23 19:36:24
|
<wikibugs>
|
('PS1) ''Jdrewniak: Enable Page Tools for logged-in users on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882715 (https://phabricator.wikimedia.org/T327686)'
|
2023-01-23 19:37:45
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
|
2023-01-23 19:41:49
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
|
2023-01-23 19:48:57
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
|
2023-01-23 19:49:30
|
<wikibugs>
|
'SRE, ''Domains, ''Traffic-Icebox: Redirecting incoming queries to non-existent subpages (due to Godaddy behavior on some external WikiJournal sites) - https://phabricator.wikimedia.org/T212914 (''BCornwall) ''Open→''Resolved a:''BCornwall It looks like they've managed to escape the talons of godaddy...'
|
2023-01-23 19:58:51
|
<wikibugs>
|
('PS1) ''BCornwall: varnish: Reword misc-frontend vcl_switch comment [puppet] - ''https://gerrit.wikimedia.org/r/882716 (https://phabricator.wikimedia.org/T205988)'
|
2023-01-23 19:59:13
|
<wikibugs>
|
'SRE, ''Traffic-Icebox, ''Patch-For-Review: Simplify comment misc-frontend.inc.vcl.erb - https://phabricator.wikimedia.org/T205988 (''BCornwall) ''Open→''In progress a:''BCornwall'
|
2023-01-23 19:59:25
|
<wikibugs>
|
'SRE: Expired puppet certificates - https://phabricator.wikimedia.org/T260110 (''Aklapper)'
|
2023-01-23 19:59:37
|
<wikibugs>
|
'SRE, ''Traffic-Icebox, ''Patch-For-Review: Simplify comment misc-frontend.inc.vcl.erb - https://phabricator.wikimedia.org/T205988 (''BCornwall) Since this ticket is relevant to the comment itself, let's just fix that and follow-up with another, more detailed description of what needs refactoring.'
|
2023-01-23 20:01:13
|
<wikibugs>
|
'SRE, ''CheckUser, ''Traffic: Log source port for anonymous users and expose it for sysops/checkusers - https://phabricator.wikimedia.org/T181368 (''Urbanecm)'
|
2023-01-23 20:05:58
|
<wikibugs>
|
('PS2) ''Krinkle: Use core's PoolCounterClient [mediawiki-config] - ''https://gerrit.wikimedia.org/r/881466 (https://phabricator.wikimedia.org/T327336) (owner: ''Zabe)'
|
2023-01-23 20:07:15
|
<wikibugs>
|
('CR) ''Krinkle: [C: ''+1] "LGTM. This needs careful testing on mwdebug with PC hits and misses, e.g. browse old and current revisions on various articles and confirm" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/881466 (https://phabricator.wikimedia.org/T327336) (owner: ''Zabe)'
|
2023-01-23 20:10:28
|
<wikibugs>
|
'SRE, ''Traffic-Icebox: Consider adding expect-CT: header to enforce certificate transparency - https://phabricator.wikimedia.org/T193521 (''BCornwall) ''Open→''Invalid It's sad that no action was taken in the years since the report has been opened, but it appears that @tgr is correct and it's ready to be...'
|
2023-01-23 20:19:24
|
<wikibugs>
|
('CR) ''Dzahn: admin/canary_appserver: add group of users allowed to disable puppet (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
|
2023-01-23 20:23:19
|
<wikibugs>
|
'SRE, ''Traffic-Icebox: Unwanted service startups and their triggers - https://phabricator.wikimedia.org/T191017 (''BCornwall) ''Open→''Resolved a:''BCornwall `systemctl mask` achieves what is desired here and has been successfully implemented with varnishncsa.service and varnishlog.service (see
`Change...'
|
2023-01-23 20:26:14
|
<wikibugs>
|
('PS1) ''Andrea Denisse: centrallog2002: Apply partman standard software raid recipe [puppet] - ''https://gerrit.wikimedia.org/r/882718 (https://phabricator.wikimedia.org/T313858)'
|
2023-01-23 20:31:46
|
<wikibugs>
|
('CR) ''Andrea Denisse: [V: ''+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39214/console"; [puppet] - ''https://gerrit.wikimedia.org/r/882718 (https://phabricator.wikimedia.org/T313858)
(owner: ''Andrea Denisse)'
|
2023-01-23 20:45:16
|
<taavi>
|
!log restart T315510 on group1 after mwmaint restart, currently running on wikidatawiki
|
2023-01-23 20:45:18
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 20:45:20
|
<stashbot>
|
T315510: Start maintenance script to backfill talk page comment database - https://phabricator.wikimedia.org/T315510
|
2023-01-23 20:45:40
|
<wikibugs>
|
('CR) ''Thcipriani: [C: ''+1] admin/canary_appserver: add group of users allowed to disable puppet [puppet] - ''https://gerrit.wikimedia.org/r/879147 (https://phabricator.wikimedia.org/T305979) (owner: ''Dzahn)'
|
2023-01-23 20:45:54
|
<wikibugs>
|
('CR) ''Andrea Denisse: "PCC results: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39214/console"; [puppet] - ''https://gerrit.wikimedia.org/r/882718 (https://phabricator.wikimedia.org/T313858) (owner: ''Andrea Denisse)'
|
2023-01-23 20:56:10
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
|
2023-01-23 20:56:14
|
<logmsgbot>
|
!log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
|
2023-01-23 20:58:58
|
<wikibugs>
|
'SRE, ''Traffic-Icebox, ''Patch-For-Review: Remove unused plain HTTP services from LVS - https://phabricator.wikimedia.org/T236065 (''BCornwall) I also am not sure of how to find out consumers of the HTTP-only services, but I've created a WIP patch that at least lists the candidates.'
|
2023-01-23 21:00:05
|
<jouncebot>
|
RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T2100).
|
2023-01-23 21:00:05
|
<jouncebot>
|
jan_drewniak: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
|
2023-01-23 21:00:19
|
<kindrobot>
|
I can deploy.
|
2023-01-23 21:00:56
|
<jan_drewniak>
|
kindrobot: ok thanks
|
2023-01-23 21:01:54
|
<kindrobot>
|
!log start UTC late backport window
|
2023-01-23 21:01:56
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 21:02:20
|
<wikibugs>
|
('CR) ''TrainBranchBot: [C: ''+2] "Approved by kindrobot@deploy1002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882715 (https://phabricator.wikimedia.org/T327686) (owner: ''Jdrewniak)'
|
2023-01-23 21:02:57
|
<wikibugs>
|
('Merged) ''jenkins-bot: Enable Page Tools for logged-in users on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882715 (https://phabricator.wikimedia.org/T327686) (owner: ''Jdrewniak)'
|
2023-01-23 21:03:10
|
<TheresNoTime>
|
(thanks kindrobot, I'm finally back in the "right timezone" so should be able to pick up more again!)
|
2023-01-23 21:03:11
|
<logmsgbot>
|
!log kindrobot@deploy1002 Started scap: Backport for [[gerrit:882715|Enable Page Tools for logged-in users on enwiki (T327686)]]
|
2023-01-23 21:03:15
|
<stashbot>
|
T327686: Deploy page tools for logged-in users on English Wikipedia - https://phabricator.wikimedia.org/T327686
|
2023-01-23 21:04:35
|
<kindrobot>
|
My pleasure TheresNoTime! The only days I am free for this window are Monday and Wednesday, so I try to pick up one of those a week if I can.
|
2023-01-23 21:04:54
|
<logmsgbot>
|
!log kindrobot@deploy1002 jdrewniak and kindrobot: Backport for [[gerrit:882715|Enable Page Tools for logged-in users on enwiki (T327686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
|
2023-01-23 21:05:04
|
<kindrobot>
|
jan_drewniak: can you confirm?
|
2023-01-23 21:05:58
|
<jan_drewniak>
|
kindrobot: yup looks good
|
2023-01-23 21:06:08
|
<kindrobot>
|
Great, syncing.
|
2023-01-23 21:09:29
|
<wikibugs>
|
('PS1) ''Andrea Denisse: centrallog1002: Add to eqiad anycast_neighbors [homer/public] - ''https://gerrit.wikimedia.org/r/882724 (https://phabricator.wikimedia.org/T318778)'
|
2023-01-23 21:10:09
|
<wikibugs>
|
('PS2) ''Bking: flink-kubernetes-operator: bump version to 1.3.1 [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/881907 (https://phabricator.wikimedia.org/T324576)'
|
2023-01-23 21:11:45
|
<wikibugs>
|
('CR) ''Ottomata: flink-kubernetes-operator: bump version to 1.3.1 (''1 comment) [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/881907 (https://phabricator.wikimedia.org/T324576) (owner: ''Bking)'
|
2023-01-23 21:12:12
|
<logmsgbot>
|
!log kindrobot@deploy1002 Finished scap: Backport for [[gerrit:882715|Enable Page Tools for logged-in users on enwiki (T327686)]] (duration: 09m 00s)
|
2023-01-23 21:12:16
|
<stashbot>
|
T327686: Deploy page tools for logged-in users on English Wikipedia - https://phabricator.wikimedia.org/T327686
|
2023-01-23 21:12:43
|
<kindrobot>
|
!log close UTC late backport window
|
2023-01-23 21:12:44
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 21:22:11
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
|
2023-01-23 21:23:36
|
<wikibugs>
|
('PS1) ''Zabe: throttle: Remove expired rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882746'
|
2023-01-23 21:26:13
|
<wikibugs>
|
'SRE, ''DNS, ''Traffic-Icebox, ''Wikimedia-Apache-configuration, ''Patch-For-Review: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia - https://phabricator.wikimedia.org/T230382 (''BCornwall) I've brought the issue up with langcom on their [[ https://meta.wikimedia.org/wiki/Talk:Language_c...'
|
2023-01-23 21:26:23
|
<wikibugs>
|
'SRE, ''DNS, ''Traffic-Icebox, ''Wikimedia-Apache-configuration, ''Patch-For-Review: Remove aliases `minnan` and `zh-cfr` for the Min Nan Wikipedia - https://phabricator.wikimedia.org/T230382 (''BCornwall) ''Open→''In progress'
|
2023-01-23 21:29:20
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
|
2023-01-23 21:31:25
|
<wikibugs>
|
('CR) ''Zabe: [C: ''+2] throttle: Remove expired rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882746 (owner: ''Zabe)'
|
2023-01-23 21:31:47
|
<wikibugs>
|
('PS1) ''Andrea Denisse: centrallog: Add centrallog1002 as Kafka broker [puppet] - ''https://gerrit.wikimedia.org/r/882747 (https://phabricator.wikimedia.org/T318778)'
|
2023-01-23 21:32:09
|
<wikibugs>
|
('Merged) ''jenkins-bot: throttle: Remove expired rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/882746 (owner: ''Zabe)'
|
2023-01-23 21:32:53
|
<logmsgbot>
|
!log zabe@deploy1002 Started scap: Backport for [[gerrit:882746|throttle: Remove expired rule]]
|
2023-01-23 21:34:35
|
<logmsgbot>
|
!log zabe@deploy1002 zabe: Backport for [[gerrit:882746|throttle: Remove expired rule]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
|
2023-01-23 21:35:26
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
|
2023-01-23 21:36:11
|
<wikibugs>
|
('PS1) ''Nray: Work around sticky-positioned layers disabling subpixel rendering [skins/Vector] (wmf/1.40.0-wmf.19) - ''https://gerrit.wikimedia.org/r/882727 (https://phabricator.wikimedia.org/T327460)'
|
2023-01-23 21:40:30
|
<wikibugs>
|
'SRE, ''Traffic-Icebox, ''Patch-For-Review: Remove unused plain HTTP services from LVS - https://phabricator.wikimedia.org/T236065 (''BCornwall) ''Open→''In progress bblack has some ideas: ` 13:27 <bblack> we don't have a 100% reliable spot-check to know for sure 13:27 <bblack> but
yeah, we can guestim...'
|
2023-01-23 21:40:39
|
<logmsgbot>
|
!log zabe@deploy1002 Finished scap: Backport for [[gerrit:882746|throttle: Remove expired rule]] (duration: 07m 45s)
|
2023-01-23 21:42:51
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
|
2023-01-23 21:49:25
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''+2] logstash: enable filters for ecs 1.11.0 [puppet] - ''https://gerrit.wikimedia.org/r/881812 (https://phabricator.wikimedia.org/T326794) (owner: ''Cwhite)'
|
2023-01-23 22:00:04
|
<jouncebot>
|
Reedy, sbassett, Maryum, and manfredi: It is that lovely time of the day again! You are hereby commanded to deploy Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230123T2200).
|
2023-01-23 22:07:47
|
<jinxer-wm>
|
(JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
|
2023-01-23 22:08:35
|
<sbassett>
|
Hey all - had a couple of security patches we were going to try to deploy today: T285159, T296593
|
2023-01-23 22:14:18
|
<jinxer-wm>
|
(ProbeDown) firing: (2) Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
|
2023-01-23 22:14:49
|
<icinga-wm>
|
PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers thanos-fe1003.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers thanos-fe1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
|
2023-01-23 22:15:15
|
<icinga-wm>
|
PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 22:16:27
|
<icinga-wm>
|
RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
|
2023-01-23 22:19:18
|
<jinxer-wm>
|
(ProbeDown) resolved: (2) Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
|
2023-01-23 22:22:01
|
<wikibugs>
|
('PS1) ''Bking: dse-k8s: add rdf-streaming-update-ng namespace [puppet] - ''https://gerrit.wikimedia.org/r/882748 (https://phabricator.wikimedia.org/T289836)'
|
2023-01-23 22:24:40
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''-1] mediawiki: Update ecs logging to 1.11.0 (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/881877 (owner: ''Clément Goubert)'
|
2023-01-23 22:25:54
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''+2] Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
|
2023-01-23 22:26:54
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''+2] add error.stack.previous_trace field [software/ecs] - ''https://gerrit.wikimedia.org/r/831943 (https://phabricator.wikimedia.org/T314098) (owner: ''Cwhite)'
|
2023-01-23 22:27:25
|
<wikibugs>
|
('Merged) ''jenkins-bot: add error.stack.previous_trace field [software/ecs] - ''https://gerrit.wikimedia.org/r/831943 (https://phabricator.wikimedia.org/T314098) (owner: ''Cwhite)'
|
2023-01-23 22:27:47
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''+2] Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
|
2023-01-23 22:28:15
|
<wikibugs>
|
('Merged) ''jenkins-bot: Clarify ecs.version field format in docs [software/ecs] - ''https://gerrit.wikimedia.org/r/881809 (https://phabricator.wikimedia.org/T292585) (owner: ''Cwhite)'
|
2023-01-23 22:28:46
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''+2] role: remove kibana7_ecs role [puppet] - ''https://gerrit.wikimedia.org/r/879888 (owner: ''Cwhite)'
|
2023-01-23 22:31:58
|
<maryum>
|
!log Deployed patch for T285159
|
2023-01-23 22:32:00
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 22:33:47
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''+1] "Tested upgrade and initial install on beta. Works great!" [puppet] - ''https://gerrit.wikimedia.org/r/849631 (https://phabricator.wikimedia.org/T304440) (owner: ''Hashar)'
|
2023-01-23 22:37:13
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
|
2023-01-23 22:45:09
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
|
2023-01-23 22:46:24
|
<ryankemper>
|
!log [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
|
2023-01-23 22:46:25
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 22:48:59
|
<logmsgbot>
|
!log ryankemper@deploy1002 Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
|
2023-01-23 22:49:52
|
<ryankemper>
|
!log [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
|
2023-01-23 22:49:53
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 22:52:42
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
|
2023-01-23 22:56:29
|
<logmsgbot>
|
!log ryankemper@deploy1002 Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
|
2023-01-23 22:57:43
|
<ryankemper>
|
!log [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
|
2023-01-23 22:57:44
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 22:57:48
|
<ryankemper>
|
!log [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
|
2023-01-23 22:57:50
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 22:57:55
|
<ryankemper>
|
!log [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
|
2023-01-23 22:57:56
|
<stashbot>
|
Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
|
2023-01-23 22:59:50
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
|
2023-01-23 23:07:41
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
|
2023-01-23 23:10:11
|
<jinxer-wm>
|
(Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
|
2023-01-23 23:11:33
|
<icinga-wm>
|
RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
|
2023-01-23 23:16:51
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
|
2023-01-23 23:17:12
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
|
2023-01-23 23:24:20
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
|
2023-01-23 23:24:32
|
<logmsgbot>
|
!log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
|
2023-01-23 23:31:35
|
<wikibugs>
|
('CR) ''Cwhite: [C: ''-2] "Blocking until we can work out a path forward." [puppet] - ''https://gerrit.wikimedia.org/r/880500 (https://phabricator.wikimedia.org/T325806) (owner: ''Filippo Giunchedi)'
|
2023-01-23 23:31:43
|
<logmsgbot>
|
!log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
|
2023-01-23 23:51:55
|
<wikibugs>
|
('PS2) ''Cwhite: logstash: Add PTR resolution to firewall logs [puppet] - ''https://gerrit.wikimedia.org/r/880889 (https://phabricator.wikimedia.org/T327095) (owner: ''Ayounsi)'
|
2023-01-23 23:57:59
|
<wikibugs>
|
('CR) ''Cwhite: logstash: Add PTR resolution to firewall logs (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/880889 (https://phabricator.wikimedia.org/T327095) (owner: ''Ayounsi)'
|