[00:04:20] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1010 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 309 bytes in 7.067 second response time https://wikitech.wikimedia.org/wiki/Swift
[00:06:15] <wikibugs>	 10SRE-swift-storage: Spike in Swift errors - https://phabricator.wikimedia.org/T313102 (10tstarling) p:05Triage→03Unbreak! Logstash search for SwiftFileBackend {F35317523}
[00:06:42] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1010 is OK: HTTP OK: HTTP/1.1 200 OK - 451 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Swift
[00:11:00] <icinga-wm>	 RECOVERY - Swift https frontend on ms-fe1010 is OK: HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Swift
[00:13:50] <icinga-wm>	 PROBLEM - Check systemd state on doc1002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc2001.codfw.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:19:08] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1010 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 309 bytes in 7.094 second response time https://wikitech.wikimedia.org/wiki/Swift
[00:26:24] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1010 is OK: HTTP OK: HTTP/1.1 200 OK - 451 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Swift
[00:27:18] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_compress_logs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:27:33] <wikibugs>	 10SRE-swift-storage: Spike in Swift errors - https://phabricator.wikimedia.org/T313102 (10tstarling) You can see it in the nginx log sizes on ms-fe1010:  ` -rw-r----- 1 www-data www-data   6904870 Jul 15 00:24 unified.error.log -rw-r----- 1 www-data www-data 667552422 Jul 15 00:00 unified.error.log.1 -rw-r-----...
[00:30:49] <TimStarling>	 !log on ms-fe1010 restarting swift-proxy
[00:30:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:39:06] <wikibugs>	 10SRE-swift-storage: Spike in Swift errors - https://phabricator.wikimedia.org/T313102 (10tstarling) I restarted swift-proxy on ms-fe1010, which I think has fixed it. Here's how I realised it was a problem specific to ms-fe1010:  {F35317550}
[00:43:24] <wikibugs>	 10SRE-swift-storage: Spike in Swift errors - https://phabricator.wikimedia.org/T313102 (10tstarling) p:05Unbreak!→03Medium Logstash, CPU usage and nginx logs all show recovery. I will leave it open at reduced priority until the relevant SRE folks see it, for post mortem analysis and followup.
[01:07:48] <icinga-wm>	 RECOVERY - Check systemd state on doc1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:18:44] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:37:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job workhorse in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:42:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:47:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:52:45] <jinxer-wm>	 (JobUnavailable) resolved: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:17:43] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] fix flask/jinja2 semver snafu [software/klaxon] - 10https://gerrit.wikimedia.org/r/813938 (owner: 10CDanis)
[03:18:08] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] restore styling accidentally removed in 16f1d6c [software/klaxon] - 10https://gerrit.wikimedia.org/r/813939 (owner: 10CDanis)
[03:22:14] <wikibugs>	 (03CR) 10RLazarus: Don't hardcode v1 of the api in the base path (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813940 (owner: 10CDanis)
[03:23:38] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:25:52] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.067 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:26:46] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] Don't hardcode v1 of the api in the base path (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813940 (owner: 10CDanis)
[03:43:19] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] Add support for fetching current oncallers (033 comments) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813941 (owner: 10CDanis)
[03:59:54] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] display current oncallers in Klaxon UI (032 comments) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813942 (owner: 10CDanis)
[04:29:46] <icinga-wm>	 PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The following units failed: docker-system-prune-dangling.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:02:29] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1135,dbproxy1021: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/813960
[05:04:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31114 and previous config saved to /var/cache/conftool/dbconfig/20220715-050400-root.json
[05:04:01] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1135,dbproxy1021: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/813960 (owner: 10Marostegui)
[05:09:26] <wikibugs>	 (03PS2) 10Krinkle: xenon: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/804546 (owner: 10Muehlenhoff)
[05:10:11] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "Perhaps we should rename this at some point, to match the current service and directory naming." [puppet] - 10https://gerrit.wikimedia.org/r/804546 (owner: 10Muehlenhoff)
[05:19:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31115 and previous config saved to /var/cache/conftool/dbconfig/20220715-051904-root.json
[05:20:55] <wikibugs>	 (03PS1) 10Krinkle: WIP: Testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011
[05:20:57] <wikibugs>	 (03PS1) 10Krinkle: [DNM] Verify buildConfigCache.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814012
[05:21:47] <wikibugs>	 (03PS2) 10Krinkle: tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821)
[05:21:49] <wikibugs>	 (03PS2) 10Krinkle: [DNM] Verify buildConfigCache.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814012
[05:22:48] <wikibugs>	 (03PS3) 10Krinkle: tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821)
[05:22:50] <wikibugs>	 (03PS3) 10Krinkle: [DNM] Verify buildConfigCache.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814012
[05:25:11] <wikibugs>	 (03PS4) 10Krinkle: tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821)
[05:25:13] <wikibugs>	 (03PS4) 10Krinkle: [DNM] Verify buildConfigCache.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814012
[05:32:45] <wikibugs>	 (03PS5) 10Krinkle: tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821)
[05:33:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle)
[05:34:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31116 and previous config saved to /var/cache/conftool/dbconfig/20220715-053408-root.json
[05:35:14] <wikibugs>	 (03PS6) 10Krinkle: tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821)
[05:35:42] <wikibugs>	 (03PS7) 10Krinkle: tests: Move buildConfigCache.php to tests/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821)
[05:49:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31117 and previous config saved to /var/cache/conftool/dbconfig/20220715-054912-root.json
[06:04:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31118 and previous config saved to /var/cache/conftool/dbconfig/20220715-060416-root.json
[06:08:37] <ryankemper>	 !log T311939 Updated list of masters for psi-codfw search to `elastic2027.codfw.wmnet:9700,elastic2029.codfw.wmnet:9700,elastic2054.codfw.wmnet:9700`
[06:08:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:40] <stashbot>	 T311939: Degraded RAID on elastic2049 - https://phabricator.wikimedia.org/T311939
[06:11:24] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery-Search, 10Elasticsearch, 10Patch-For-Review: Degraded RAID on elastic2049 - https://phabricator.wikimedia.org/T311939 (10RKemper) Following method in https://phabricator.wikimedia.org/T294805#7701855, set the new codfw psi seeds:  With: ` ryankemper@mwmaint1002:~/elastic$...
[06:15:00] <RhinosF1>	 Thanks ryankemper
[06:17:08] <wikibugs>	 10SRE-swift-storage: Spike in Swift errors - https://phabricator.wikimedia.org/T313102 (10RhinosF1) ms-fe1010 was flapping saying it's frontend was critical yesterday.
[06:19:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31119 and previous config saved to /var/cache/conftool/dbconfig/20220715-061920-root.json
[06:30:51] <wikibugs>	 (03PS1) 10KartikMistry: Enable Content and Section translation on WPs with NLLB-200 MT support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814015 (https://phabricator.wikimedia.org/T309384)
[06:31:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Enable Content and Section translation on WPs with NLLB-200 MT support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814015 (https://phabricator.wikimedia.org/T309384) (owner: 10KartikMistry)
[06:34:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31120 and previous config saved to /var/cache/conftool/dbconfig/20220715-063424-root.json
[06:34:53] <wikibugs>	 (03PS2) 10KartikMistry: Enable Content and Section translation on WPs with NLLB-200 MT support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814015 (https://phabricator.wikimedia.org/T309384)
[06:35:52] <wikibugs>	 (03PS3) 10Marostegui: core.pp: Make sync_binlog and trx_commit configurable [puppet] - 10https://gerrit.wikimedia.org/r/813917
[06:48:33] <icinga-wm>	 PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:49:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31121 and previous config saved to /var/cache/conftool/dbconfig/20220715-064928-root.json
[06:53:54] <wikibugs>	 (03PS1) 10Marostegui: db2084: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/814094 (https://phabricator.wikimedia.org/T311493)
[06:54:59] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2084: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/814094 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[06:57:07] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Productionize db2166 [puppet] - 10https://gerrit.wikimedia.org/r/814095 (https://phabricator.wikimedia.org/T311493)
[06:58:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize db2166 [puppet] - 10https://gerrit.wikimedia.org/r/814095 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220715T0700)
[07:10:15] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Remove db2166 from insetup [puppet] - 10https://gerrit.wikimedia.org/r/814096 (https://phabricator.wikimedia.org/T311493)
[07:11:28] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Remove db2166 from insetup [puppet] - 10https://gerrit.wikimedia.org/r/814096 (https://phabricator.wikimedia.org/T311493) (owner: 10Marostegui)
[07:16:11] <wikibugs>	 (03PS1) 10Marostegui: change_change_time_T313070.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814097 (https://phabricator.wikimedia.org/T313070)
[07:26:03] <moritzm>	 !log update thirdparty/node14 to Node 14.20.0
[07:26:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:26:22] <moritzm>	 !log update thirdparty/node16 to Node 16.16.0
[07:26:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:25] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to LDAP wmf group for Aline Bruenger WMDE - https://phabricator.wikimedia.org/T312220 (10karapayneWMDE) >>! In T312220#8066406, @jhathaway wrote: > @karapayneWMDE do you happen to know?   Apologies for the delay, Aline is indeed a new WMDE employee. They're not in my...
[07:42:31] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM, just fix the errors in jenkins (see message below), feel free to ignore the nits." [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/812916 (owner: 10Nskaggs)
[07:54:22] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/page/mobile-html-offline-resources/{title} (Get offline resource links to accompany page content HTML for test page) is CRITICAL: Test Get offline resource links to accompany page content HTML for test page returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[07:56:32] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[08:00:26] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:02:06] <icinga-wm>	 PROBLEM - SSH on db1109.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:04:10] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:12:50] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: eqiad: move non WMCS servers out of rack D5 - https://phabricator.wikimedia.org/T308331 (10ayounsi) 05Resolved→03Open https://netbox.wikimedia.org/dcim/devices/2612/ and https://netbox.wikimedia.org/dcim/devices/2252/ still show up as being in rack `D5` but cabled to a differen...
[08:20:11] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA: eqiad: move non WMCS servers out of rack D5 - https://phabricator.wikimedia.org/T308331 (10ayounsi)
[09:03:30] <icinga-wm>	 RECOVERY - SSH on db1109.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:07:39] <wikibugs>	 (03PS1) 10Cparle: Update config for commons custommatch search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814108
[09:13:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[09:13:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[09:15:25] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] change_change_time_T313070.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814097 (https://phabricator.wikimedia.org/T313070) (owner: 10Marostegui)
[09:15:40] <icinga-wm>	 PROBLEM - Check systemd state on logstash1026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:22:29] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "It works https://integration.wikimedia.org/ci/job/operations-mw-config-php72-composer-diffConfig-docker/12389/console" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814011 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle)
[09:24:38] <icinga-wm>	 RECOVERY - ElasticSearch setting check - 9400 on elastic2047 is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23Administration
[09:28:08] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] change_change_time_T313070.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814097 (https://phabricator.wikimedia.org/T313070) (owner: 10Marostegui)
[09:28:35] <wikibugs>	 (03Merged) 10jenkins-bot: change_change_time_T313070.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/814097 (https://phabricator.wikimedia.org/T313070) (owner: 10Marostegui)
[09:30:41] <wikibugs>	 (03PS1) 10Cparle: Make weighted_tags search default for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814111
[09:34:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
[09:34:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
[09:34:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31123 and previous config saved to /var/cache/conftool/dbconfig/20220715-093449-ladsgroup.json
[09:34:53] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[09:37:30] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 03+1] Make weighted_tags search default for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814111 (owner: 10Cparle)
[09:37:33] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 03+1] Update config for commons custommatch search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814108 (owner: 10Cparle)
[09:38:12] <Amir1>	 !log killed refreshLinkRecommendations.php in testwiki (T299021)
[09:38:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:15] <stashbot>	 T299021: Shorten running time of refreshLinkRecommendations.php - https://phabricator.wikimedia.org/T299021
[09:49:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31124 and previous config saved to /var/cache/conftool/dbconfig/20220715-094958-ladsgroup.json
[09:50:03] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[10:03:20] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:05:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31125 and previous config saved to /var/cache/conftool/dbconfig/20220715-100503-ladsgroup.json
[10:06:24] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] fix flask/jinja2 semver snafu (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813938 (owner: 10CDanis)
[10:08:26] <wikibugs>	 (03Merged) 10jenkins-bot: fix flask/jinja2 semver snafu [software/klaxon] - 10https://gerrit.wikimedia.org/r/813938 (owner: 10CDanis)
[10:08:28] <wikibugs>	 (03Merged) 10jenkins-bot: Use ProxyFix middleware to correctly recognize HTTPS usage [software/klaxon] - 10https://gerrit.wikimedia.org/r/794759 (https://phabricator.wikimedia.org/T308941) (owner: 10Legoktm)
[10:09:52] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] restore styling accidentally removed in 16f1d6c [software/klaxon] - 10https://gerrit.wikimedia.org/r/813939 (owner: 10CDanis)
[10:12:21] <wikibugs>	 (03Merged) 10jenkins-bot: restore styling accidentally removed in 16f1d6c [software/klaxon] - 10https://gerrit.wikimedia.org/r/813939 (owner: 10CDanis)
[10:15:26] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for bgwiki / Bethany Gerdemann - https://phabricator.wikimedia.org/T312827 (10Joe) p:05Triage→03Medium
[10:20:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31126 and previous config saved to /var/cache/conftool/dbconfig/20220715-102008-ladsgroup.json
[10:35:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31127 and previous config saved to /var/cache/conftool/dbconfig/20220715-103513-ladsgroup.json
[10:35:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[10:35:19] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[10:35:28] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[10:41:53] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: admin: add bgwiki to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/814121 (https://phabricator.wikimedia.org/T312827)
[10:43:11] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: admin: add bgwiki to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/814121 (https://phabricator.wikimedia.org/T312827)
[10:43:54] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: admin: add bgwiki to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/814121 (https://phabricator.wikimedia.org/T312827)
[10:45:36] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] admin: add bgwiki to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/814121 (https://phabricator.wikimedia.org/T312827) (owner: 10Giuseppe Lavagetto)
[10:46:56] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for bgwiki / Bethany Gerdemann - https://phabricator.wikimedia.org/T312827 (10Joe)
[10:56:16] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for bgwiki / Bethany Gerdemann - https://phabricator.wikimedia.org/T312827 (10Joe) Hi @Bethany in about 30 minutes you should be able to access all systems and to ssh to the hadoop nodes, and change your kerber...
[10:56:16] <logmsgbot>	 !log hashar@deploy1002 Started deploy [integration/docroot@e563641]: Add banan-i18n library
[10:56:25] <logmsgbot>	 !log hashar@deploy1002 Finished deploy [integration/docroot@e563641]: Add banan-i18n library (duration: 00m 08s)
[10:56:40] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: admin: add kerberos to bgwiki [puppet] - 10https://gerrit.wikimedia.org/r/814122 (https://phabricator.wikimedia.org/T312827)
[10:57:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[10:57:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[10:57:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31128 and previous config saved to /var/cache/conftool/dbconfig/20220715-105748-ladsgroup.json
[10:57:52] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[11:15:26] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on cumin1001 is CRITICAL: CRITICAL: the following (25) node(s) change every puppet run: aqs2001, aqs2002, aqs2003, aqs2004, aqs2005, aqs2006, aqs2007, aqs2008, aqs2009, aqs2010, aqs2011, aqs2012, cloudservices1003, cloudservices1004, ms-fe1010, ms-fe1011, ms-fe1012, ms-fe2010, ms-fe2011, ms-fe2012, thanos-fe1002, thanos-fe1003, thanos-fe2001, thanos-fe2002, thanos-fe20
[11:15:26] <icinga-wm>	 ://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[11:21:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] admin: add kerberos to bgwiki [puppet] - 10https://gerrit.wikimedia.org/r/814122 (https://phabricator.wikimedia.org/T312827) (owner: 10Giuseppe Lavagetto)
[11:21:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31129 and previous config saved to /var/cache/conftool/dbconfig/20220715-112157-ladsgroup.json
[11:22:03] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[11:37:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31130 and previous config saved to /var/cache/conftool/dbconfig/20220715-113702-ladsgroup.json
[11:52:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31131 and previous config saved to /var/cache/conftool/dbconfig/20220715-115207-ladsgroup.json
[12:07:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31132 and previous config saved to /var/cache/conftool/dbconfig/20220715-120713-ladsgroup.json
[12:07:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[12:07:18] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[12:07:29] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[12:07:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:07:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:07:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31133 and previous config saved to /var/cache/conftool/dbconfig/20220715-120750-ladsgroup.json
[12:09:17] <wikibugs>	 10SRE, 10serviceops, 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10hashar)
[12:10:04] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:13:53] <wikibugs>	 10SRE, 10serviceops, 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10hashar) `pcov` got build and uploaded to `component/php74`....
[12:14:06] <wikibugs>	 10SRE, 10serviceops, 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Test-Coverage: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10hashar) a:05Legoktm→03None
[12:21:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31134 and previous config saved to /var/cache/conftool/dbconfig/20220715-122119-ladsgroup.json
[12:21:23] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[12:36:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31135 and previous config saved to /var/cache/conftool/dbconfig/20220715-123624-ladsgroup.json
[12:44:01] <wikibugs>	 10SRE, 10API Platform, 10Traffic, 10VisualEditor, and 2 others: Find out if Varnish is messing with ETags, and what to do about it. - https://phabricator.wikimedia.org/T310904 (10daniel) 05Open→03Resolved
[12:46:31] <wikibugs>	 (03CR) 10Ladsgroup: "I'm taking over mwhahahaha" [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[12:47:18] <wikibugs>	 (03CR) 10Marostegui: core.pp: Make sync_binlog and trx_commit configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[12:50:48] <wikibugs>	 (03PS1) 10Hashar: ci: enable docker on machine start [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119)
[12:51:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31136 and previous config saved to /var/cache/conftool/dbconfig/20220715-125129-ladsgroup.json
[12:53:54] <wikibugs>	 (03CR) 10Hashar: "See T313119#8080674 for the details." [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119) (owner: 10Hashar)
[12:59:39] <wikibugs>	 (03PS1) 10David Caro: novafullstack: fix timing issue [puppet] - 10https://gerrit.wikimedia.org/r/814162
[13:01:38] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] novafullstack: fix timing issue [puppet] - 10https://gerrit.wikimedia.org/r/814162 (owner: 10David Caro)
[13:02:05] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10ayounsi) From diffscan, those two hosts have their SSH port exposed to the world: ` New Open Service List --------------------- STATUS HOST POR...
[13:05:21] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "Looks good, minor detail in comment." [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119) (owner: 10Hashar)
[13:05:21] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.elasticsearch.force-shard-allocation
[13:05:24] <logmsgbot>	 !log bking@cumin1001 END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
[13:06:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31137 and previous config saved to /var/cache/conftool/dbconfig/20220715-130634-ladsgroup.json
[13:06:36] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[13:06:39] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[13:07:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[13:07:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31138 and previous config saved to /var/cache/conftool/dbconfig/20220715-130706-ladsgroup.json
[13:07:12] <wikibugs>	 (03CR) 10Majavah: "is there any reason not to go with just" [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119) (owner: 10Hashar)
[13:14:52] <icinga-wm>	 RECOVERY - Check systemd state on mw2392 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:17:52] <wikibugs>	 (03CR) 10Hashar: ci: enable docker on machine start (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119) (owner: 10Hashar)
[13:19:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31139 and previous config saved to /var/cache/conftool/dbconfig/20220715-131916-ladsgroup.json
[13:19:20] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[13:32:14] <wikibugs>	 (03PS2) 10Hashar: ci: enable docker on machine start [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119)
[13:33:35] <wikibugs>	 (03CR) 10Hashar: ci: enable docker on machine start (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/814157 (https://phabricator.wikimedia.org/T313119) (owner: 10Hashar)
[13:34:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31140 and previous config saved to /var/cache/conftool/dbconfig/20220715-133421-ladsgroup.json
[13:42:51] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] admin: add kerberos to bgwiki [puppet] - 10https://gerrit.wikimedia.org/r/814122 (https://phabricator.wikimedia.org/T312827) (owner: 10Giuseppe Lavagetto)
[13:43:45] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/814170
[13:45:35] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for bgwiki / Bethany Gerdemann - https://phabricator.wikimedia.org/T312827 (10Joe) 05Open→03Resolved a:03Joe Tentatively resolving. Please let us know if you have issues by re-opening the task.
[13:49:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31141 and previous config saved to /var/cache/conftool/dbconfig/20220715-134926-ladsgroup.json
[13:51:18] <wikibugs>	 10SRE, 10SRE-Access-Requests: Add Zabe to #mediawiki_security - https://phabricator.wikimedia.org/T313026 (10Joe) 05Open→03Resolved p:05Triage→03Medium a:03Joe
[13:53:16] <wikibugs>	 (03PS4) 10David Caro: Change formatting of a few openstack calls [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/810107 (owner: 10Andrew Bogott)
[13:53:18] <wikibugs>	 (03PS11) 10David Caro: wmcs: vps: create_instance_with_prefix: unbreak [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/802170 (owner: 10Majavah)
[14:04:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31143 and previous config saved to /var/cache/conftool/dbconfig/20220715-140431-ladsgroup.json
[14:04:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[14:04:36] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[14:04:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[14:04:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31144 and previous config saved to /var/cache/conftool/dbconfig/20220715-140451-ladsgroup.json
[14:15:06] <icinga-wm>	 PROBLEM - SSH on db1110.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:15:15] <wikibugs>	 (03PS1) 10Daniel Kinzler: Make $wgAccountCreationThrottle must be an array. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814176
[14:21:51] <wikibugs>	 (03PS2) 10Daniel Kinzler: Make $wgAccountCreationThrottle an array. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814176
[14:26:36] <wikibugs>	 (03CR) 10RhinosF1: [C: 03+1] Make $wgAccountCreationThrottle an array. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814176 (owner: 10Daniel Kinzler)
[14:32:34] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations: Decommissioning two hosts end up with: Failed to wipe swraid - https://phabricator.wikimedia.org/T311593 (10MoritzMuehlenhoff) @Marostegui Did this happen again for any reimage after I merged by patch above?
[14:33:45] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations: Decommissioning two hosts end up with: Failed to wipe swraid - https://phabricator.wikimedia.org/T311593 (10Marostegui) Nope, it all went fine! Good to close. I need to decom a lot more in the upcoming days, will reopen if needed. Thanks for fixing it!
[14:40:18] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations: Decommissioning two hosts end up with: Failed to wipe swraid - https://phabricator.wikimedia.org/T311593 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff Ack, closing then :-)
[14:47:29] <wikibugs>	 (03CR) 10RhinosF1: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[14:47:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] beta: [fix ci - until next week] add --skip-config-validation to update.php [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[14:49:04] <wikibugs>	 (03PS3) 10RhinosF1: beta: [fix ci - until next week] add --skip-config-validation to update.php [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128)
[14:49:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] beta: [fix ci - until next week] add --skip-config-validation to update.php [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[14:52:00] <wikibugs>	 (03PS4) 10RhinosF1: beta: [fix ci - until next week] add --skip-config-validation to update.php [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128)
[14:55:56] <wikibugs>	 (03CR) 10Samtar: [C: 03+1] "Echoing my (limited understanding) comment on IRC that `--skip-config-validation` feels a bit 😐 that being said, it'd only affect beta if " [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[14:57:08] <RhinosF1>	 anyone SRE wise wish to merge ^
[15:04:12] <wikibugs>	 (03CR) 10Bking: [V: 03+2] beta: [fix ci - until next week] add --skip-config-validation to update.php [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[15:05:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31146 and previous config saved to /var/cache/conftool/dbconfig/20220715-150505-ladsgroup.json
[15:05:11] <wikibugs>	 (03CR) 10Bking: [V: 03+2 C: 03+2] beta: [fix ci - until next week] add --skip-config-validation to update.php [puppet] - 10https://gerrit.wikimedia.org/r/814134 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[15:05:12] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[15:13:59] <wikibugs>	 (03PS1) 10Cmjohnson: updating site.pp for cloudweb servers, setup incorrectly for private vlan [puppet] - 10https://gerrit.wikimedia.org/r/814185 (https://phabricator.wikimedia.org/T305414)
[15:14:34] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:16:28] <icinga-wm>	 RECOVERY - SSH on db1110.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:16:41] <wikibugs>	 (03CR) 10Cmjohnson: [C: 03+2] updating site.pp for cloudweb servers, setup incorrectly for private vlan [puppet] - 10https://gerrit.wikimedia.org/r/814185 (https://phabricator.wikimedia.org/T305414) (owner: 10Cmjohnson)
[15:19:56] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] beta: fix multiline string being treated as 2 commands. [puppet] - 10https://gerrit.wikimedia.org/r/814135 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[15:20:04] <wikibugs>	 (03PS4) 10Ladsgroup: core.pp: Make sync_binlog and trx_commit configurable [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[15:20:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31147 and previous config saved to /var/cache/conftool/dbconfig/20220715-152010-ladsgroup.json
[15:23:04] <wikibugs>	 (03CR) 10Bking: [C: 03+2] beta: fix multiline string being treated as 2 commands. [puppet] - 10https://gerrit.wikimedia.org/r/814135 (https://phabricator.wikimedia.org/T313128) (owner: 10RhinosF1)
[15:28:42] <wikibugs>	 (03CR) 10Ladsgroup: "Seems to be working fine: https://puppet-compiler.wmflabs.org/pcc-worker1002/36278/db2144.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[15:31:10] <wikibugs>	 (03CR) 10Joal: "Small nits,and I think you have been missing file: puppet/modules/profile/manifests/analytics/refinery/job/test/data_purge.pp" [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433) (owner: 10Mforns)
[15:35:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31148 and previous config saved to /var/cache/conftool/dbconfig/20220715-153515-ladsgroup.json
[15:48:32] <icinga-wm>	 RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:50:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31149 and previous config saved to /var/cache/conftool/dbconfig/20220715-155021-ladsgroup.json
[15:50:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
[15:50:25] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[15:50:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
[15:50:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on 6 hosts with reason: Maintenance
[15:50:56] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 6 hosts with reason: Maintenance
[15:51:02] <mforns>	 joal: thanks for the review :] would you be available for a quick chat about that? if not, that's ok! since silent-friday. We can do async
[16:16:02] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:16:20] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[16:16:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[16:20:12] <wikibugs>	 (03CR) 10Marostegui: [C: 04-1] "parsercache hosts should not have that enabled, both parameters should be set to 0 there. They are currently 0 and should remain like that" [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[16:30:55] <wikibugs>	 10SRE, 10DSE-Kubernetes-Cluster, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad : 3 VMs requested for Etcd cluster in support of the new DSE Kubernetes cluster - https://phabricator.wikimedia.org/T311131 (10BTullis)
[16:33:04] <wikibugs>	 10SRE, 10DSE-Kubernetes-Cluster, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad : 3 VMs requested for Etcd cluster in support of the new DSE Kubernetes cluster - https://phabricator.wikimedia.org/T311131 (10BTullis)
[16:37:08] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] loki-beta: increase grpc message size [puppet] - 10https://gerrit.wikimedia.org/r/813985 (https://phabricator.wikimedia.org/T222826) (owner: 10Cwhite)
[16:40:10] <wikibugs>	 (03CR) 10Mforns: analytics:refinery:job:data_purge: Add --allowed-interval to deletion jobs (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433) (owner: 10Mforns)
[16:40:20] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1093 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:42:52] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on an-worker1093 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Btullis Investigating - T313130 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:57:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
[16:57:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
[16:57:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
[16:57:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
[16:59:39] <wikibugs>	 (03PS5) 10Ladsgroup: core.pp: Make sync_binlog and trx_commit configurable [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[16:59:49] <wikibugs>	 (03CR) 10Ladsgroup: core.pp: Make sync_binlog and trx_commit configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[17:00:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[17:00:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[17:02:00] <wikibugs>	 (03CR) 10Mforns: analytics:refinery:job:data_purge: Add --allowed-interval to deletion jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433) (owner: 10Mforns)
[17:03:35] <wikibugs>	 10SRE-swift-storage: Spike in Swift errors - https://phabricator.wikimedia.org/T313102 (10MatthewVernon) @tstarling thanks for fixing.  What is surprising to me at least is that the grafana swift dashboards don't reflect this - you can see a brief spike in read errors around the failure on 12th July, but then ba...
[17:05:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[17:05:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[17:05:25] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[17:05:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[17:05:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31150 and previous config saved to /var/cache/conftool/dbconfig/20220715-170545-ladsgroup.json
[17:05:50] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[17:05:56] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 92 probes of 678 (alerts on 90) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:06:14] <wikibugs>	 (03CR) 10Ori: [C: 03+2] New service: function-evaluator [deployment-charts] - 10https://gerrit.wikimedia.org/r/793862 (https://phabricator.wikimedia.org/T295698) (owner: 10Ori)
[17:10:48] <wikibugs>	 (03Merged) 10jenkins-bot: New service: function-evaluator [deployment-charts] - 10https://gerrit.wikimedia.org/r/793862 (https://phabricator.wikimedia.org/T295698) (owner: 10Ori)
[17:12:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31151 and previous config saved to /var/cache/conftool/dbconfig/20220715-171246-ladsgroup.json
[17:12:50] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[17:14:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: (2) Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[17:18:58] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 57 probes of 677 (alerts on 90) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:19:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: (2) Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh
[17:20:08] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
[17:20:14] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudweb1003.wikimedia.org with OS bullseye
[17:27:06] <wikibugs>	 (03PS1) 10Majavah: dynamicproxy: urlproxy: add a simple rate limit [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131)
[17:27:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31152 and previous config saved to /var/cache/conftool/dbconfig/20220715-172751-ladsgroup.json
[17:27:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] dynamicproxy: urlproxy: add a simple rate limit [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131) (owner: 10Majavah)
[17:28:24] <wikibugs>	 (03PS2) 10Majavah: dynamicproxy: urlproxy: add a simple rate limit [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131)
[17:31:38] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
[17:31:38] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
[17:31:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudweb1004.wikimedia.org with OS bullseye
[17:35:15] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
[17:35:57] <wikibugs>	 (03PS3) 10Mforns: analytics:refinery:job:data_purge: Add --allowed-interval to deletion jobs [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433)
[17:36:18] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1093 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[17:42:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31154 and previous config saved to /var/cache/conftool/dbconfig/20220715-174256-ladsgroup.json
[17:43:15] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
[17:46:53] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
[17:48:04] <wikibugs>	 (03CR) 10Joal: analytics:refinery:job:data_purge: Add --allowed-interval to deletion jobs (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433) (owner: 10Mforns)
[17:48:34] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS bullseye
[17:48:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudweb1003.wikimedia.org with OS bullseye co...
[17:55:29] <wikibugs>	 (03CR) 10BryanDavis: dynamicproxy: urlproxy: add a simple rate limit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/814193 (https://phabricator.wikimedia.org/T313131) (owner: 10Majavah)
[17:58:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31155 and previous config saved to /var/cache/conftool/dbconfig/20220715-175801-ladsgroup.json
[17:58:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[17:58:07] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[17:58:17] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[17:58:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31156 and previous config saved to /var/cache/conftool/dbconfig/20220715-175822-ladsgroup.json
[18:01:09] <logmsgbot>	 !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS bullseye
[18:01:15] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudweb1004.wikimedia.org with OS bullseye co...
[18:05:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31157 and previous config saved to /var/cache/conftool/dbconfig/20220715-180532-ladsgroup.json
[18:05:37] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[18:20:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31158 and previous config saved to /var/cache/conftool/dbconfig/20220715-182037-ladsgroup.json
[18:30:17] <wikibugs>	 (03CR) 10Marostegui: "thanks, on Monday I'll pick a host from every section (master and slave) and run PPC to make sure we are not changing it where we shouldn'" [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[18:30:32] <ryankemper>	 !log T300943 Re-imaging `elastic20[61-72]` from buster -> bullseye, one host at a time. These hosts are not in service currently so re-imaging is safe.
[18:30:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:37] <stashbot>	 T300943: Service implementation for elastic20[61-86].codfw.wmnet - https://phabricator.wikimedia.org/T300943
[18:31:04] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS bullseye
[18:35:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31159 and previous config saved to /var/cache/conftool/dbconfig/20220715-183542-ladsgroup.json
[18:44:53] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
[18:47:27] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
[18:50:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31160 and previous config saved to /var/cache/conftool/dbconfig/20220715-185047-ladsgroup.json
[18:50:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[18:50:51] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[18:51:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[18:51:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31161 and previous config saved to /var/cache/conftool/dbconfig/20220715-185107-ladsgroup.json
[18:56:56] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10Cmjohnson)
[18:57:18] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4:(Need By: TBD) rack/setup/install cloudweb100[34] - https://phabricator.wikimedia.org/T305414 (10Cmjohnson) 05Open→03Resolved
[18:58:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31162 and previous config saved to /var/cache/conftool/dbconfig/20220715-185842-ladsgroup.json
[18:58:48] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[18:59:14] <wikibugs>	 (03PS4) 10Mforns: analytics:refinery:job:data_purge: Add --allowed-interval to deletion jobs [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433)
[19:01:19] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS bullseye
[19:01:38] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS bullseye
[19:07:48] <wikibugs>	 (03CR) 10Mforns: [V: 04-1] "I think this is ready." [puppet] - 10https://gerrit.wikimedia.org/r/813921 (https://phabricator.wikimedia.org/T270433) (owner: 10Mforns)
[19:13:22] <wikibugs>	 (03CR) 10CDanis: Don't hardcode v1 of the api in the base path (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813940 (owner: 10CDanis)
[19:13:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31163 and previous config saved to /var/cache/conftool/dbconfig/20220715-191347-ladsgroup.json
[19:15:28] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
[19:18:02] <icinga-wm>	 PROBLEM - SSH on db1109.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:18:27] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
[19:26:15] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery-Search, 10Elasticsearch, 10Patch-For-Review: Degraded RAID on elastic2049 - https://phabricator.wikimedia.org/T311939 (10Papaul) I looked into this yesterday and today,  it looks like we are having some HW issues on this server and unfortunately the server is out of warran...
[19:26:41] <wikibugs>	 10SRE, 10MediaWiki-General, 10Traffic-Icebox, 10Patch-For-Review: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093 (10ori) OK, current status:  * libvmod-querysort is [[ https://gerrit.wikimedia.org/g/operations/software/varnish/libvmod-querysort | in Ge...
[19:28:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31164 and previous config saved to /var/cache/conftool/dbconfig/20220715-192852-ladsgroup.json
[19:31:10] <wikibugs>	 (03PS2) 10CDanis: Don't hardcode v1 of the api in the base path [software/klaxon] - 10https://gerrit.wikimedia.org/r/813940
[19:31:12] <wikibugs>	 (03PS2) 10CDanis: Add support for fetching current oncallers [software/klaxon] - 10https://gerrit.wikimedia.org/r/813941
[19:31:15] <wikibugs>	 (03PS2) 10CDanis: display current oncallers in Klaxon UI [software/klaxon] - 10https://gerrit.wikimedia.org/r/813942
[19:31:51] <wikibugs>	 (03CR) 10CDanis: Add support for fetching current oncallers (033 comments) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813941 (owner: 10CDanis)
[19:32:00] <wikibugs>	 (03CR) 10CDanis: display current oncallers in Klaxon UI (032 comments) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813942 (owner: 10CDanis)
[19:32:03] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS bullseye
[19:33:42] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Don't hardcode v1 of the api in the base path [software/klaxon] - 10https://gerrit.wikimedia.org/r/813940 (owner: 10CDanis)
[19:33:54] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Add support for fetching current oncallers [software/klaxon] - 10https://gerrit.wikimedia.org/r/813941 (owner: 10CDanis)
[19:37:39] <wikibugs>	 (03Merged) 10jenkins-bot: Don't hardcode v1 of the api in the base path [software/klaxon] - 10https://gerrit.wikimedia.org/r/813940 (owner: 10CDanis)
[19:37:41] <wikibugs>	 (03Merged) 10jenkins-bot: Add support for fetching current oncallers [software/klaxon] - 10https://gerrit.wikimedia.org/r/813941 (owner: 10CDanis)
[19:42:55] <wikibugs>	 (03PS1) 10CDanis: Revert "Use ProxyFix middleware to correctly recognize HTTPS usage" [software/klaxon] - 10https://gerrit.wikimedia.org/r/814139
[19:43:28] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Revert "Use ProxyFix middleware to correctly recognize HTTPS usage" [software/klaxon] - 10https://gerrit.wikimedia.org/r/814139 (owner: 10CDanis)
[19:43:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31165 and previous config saved to /var/cache/conftool/dbconfig/20220715-194358-ladsgroup.json
[19:43:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
[19:44:03] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[19:44:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
[19:44:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31166 and previous config saved to /var/cache/conftool/dbconfig/20220715-194418-ladsgroup.json
[19:45:17] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Use ProxyFix middleware to correctly recognize HTTPS usage" [software/klaxon] - 10https://gerrit.wikimedia.org/r/814139 (owner: 10CDanis)
[19:47:23] <legoktm>	 cdanis: bahhhh
[19:47:26] <wikibugs>	 10SRE, 10Sustainability (Incident Followup): Klaxon redirects to http://klaxon.wikimedia.org (not https) - https://phabricator.wikimedia.org/T308941 (10CDanis) Unfortunately I had to revert the above patch because the necessary middlewear library isn't included in Debian's `python3-werkzeug` until Bullseye.
[19:47:32] <cdanis>	 legoktm: i KNOW
[19:47:41] <cdanis>	 :(
[19:50:04] <legoktm>	 cdanis: https://sources.debian.org/src/python-werkzeug/0.14.1%2Bdfsg1-4%2Bdeb10u1/werkzeug/contrib/fixers.py/#L97
[19:50:20] <cdanis>	 ahahaha
[19:50:22] <cdanis>	 okay, thanks
[19:50:26] <cdanis>	 I was just looking at file names
[19:50:27] <legoktm>	 so try / except ImportException with the legacy name
[19:50:29] <cdanis>	 I'll write a new one with -- yeah, that
[19:50:36] <cdanis>	 <3
[19:51:19] <wikibugs>	 (03PS3) 10CDanis: display current oncallers in Klaxon UI [software/klaxon] - 10https://gerrit.wikimedia.org/r/813942
[19:51:48] <legoktm>	 I remembered ProxyFix being a pretty old thing, so I codesearched in Debian and found https://sources.debian.org/src/flask-login/0.5.0-2/test_login.py/?hl=24#L24
[19:53:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31167 and previous config saved to /var/cache/conftool/dbconfig/20220715-195334-ladsgroup.json
[19:53:39] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[20:08:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31168 and previous config saved to /var/cache/conftool/dbconfig/20220715-200839-ladsgroup.json
[20:10:24] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] display current oncallers in Klaxon UI (031 comment) [software/klaxon] - 10https://gerrit.wikimedia.org/r/813942 (owner: 10CDanis)
[20:12:41] <wikibugs>	 (03Merged) 10jenkins-bot: display current oncallers in Klaxon UI [software/klaxon] - 10https://gerrit.wikimedia.org/r/813942 (owner: 10CDanis)
[20:14:52] <wikibugs>	 (03PS1) 10CDanis: Use ProxyFix middleware to recognize HTTPS usage, attempt #2 [software/klaxon] - 10https://gerrit.wikimedia.org/r/814251 (https://phabricator.wikimedia.org/T308941)
[20:16:52] <wikibugs>	 (03CR) 10Legoktm: [C: 03+1] "LGTM!" [software/klaxon] - 10https://gerrit.wikimedia.org/r/814251 (https://phabricator.wikimedia.org/T308941) (owner: 10CDanis)
[20:17:51] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Use ProxyFix middleware to recognize HTTPS usage, attempt #2 [software/klaxon] - 10https://gerrit.wikimedia.org/r/814251 (https://phabricator.wikimedia.org/T308941) (owner: 10CDanis)
[20:19:09] <legoktm>	 :shipit:
[20:19:26] <icinga-wm>	 RECOVERY - SSH on db1109.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:20:27] <wikibugs>	 (03Merged) 10jenkins-bot: Use ProxyFix middleware to recognize HTTPS usage, attempt #2 [software/klaxon] - 10https://gerrit.wikimedia.org/r/814251 (https://phabricator.wikimedia.org/T308941) (owner: 10CDanis)
[20:23:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31169 and previous config saved to /var/cache/conftool/dbconfig/20220715-202344-ladsgroup.json
[20:28:30] <wikibugs>	 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Klaxon redirects to http://klaxon.wikimedia.org (not https) - https://phabricator.wikimedia.org/T308941 (10CDanis) 05Open→03Resolved Thanks to @legoktm for catching that this was packaged pre-Bullseye under a different name.
[20:29:06] <legoktm>	 :D
[20:38:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31170 and previous config saved to /var/cache/conftool/dbconfig/20220715-203849-ladsgroup.json
[20:38:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[20:38:54] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS bullseye
[20:38:54] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[20:39:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[20:39:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31171 and previous config saved to /var/cache/conftool/dbconfig/20220715-203909-ladsgroup.json
[20:46:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31172 and previous config saved to /var/cache/conftool/dbconfig/20220715-204617-ladsgroup.json
[20:46:22] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[20:52:43] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
[20:55:10] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
[20:57:42] <wikibugs>	 (03PS1) 10CDanis: Show oncallers in both locations of The Button [software/klaxon] - 10https://gerrit.wikimedia.org/r/814255
[21:00:15] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Show oncallers in both locations of The Button [software/klaxon] - 10https://gerrit.wikimedia.org/r/814255 (owner: 10CDanis)
[21:01:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31173 and previous config saved to /var/cache/conftool/dbconfig/20220715-210122-ladsgroup.json
[21:01:48] <wikibugs>	 (03Merged) 10jenkins-bot: Show oncallers in both locations of The Button [software/klaxon] - 10https://gerrit.wikimedia.org/r/814255 (owner: 10CDanis)
[21:04:35] <wikibugs>	 (03CR) 10Krinkle: core.pp: Make sync_binlog and trx_commit configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/813917 (owner: 10Marostegui)
[21:08:39] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS bullseye
[21:16:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31174 and previous config saved to /var/cache/conftool/dbconfig/20220715-211628-ladsgroup.json
[21:31:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31175 and previous config saved to /var/cache/conftool/dbconfig/20220715-213133-ladsgroup.json
[21:31:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[21:31:37] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[21:31:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[21:31:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31176 and previous config saved to /var/cache/conftool/dbconfig/20220715-213153-ladsgroup.json
[21:33:12] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] Move CirrusSearch settings from IS.php to ext-CirrusSearch.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799272 (https://phabricator.wikimedia.org/T308932) (owner: 10Ladsgroup)
[21:38:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31177 and previous config saved to /var/cache/conftool/dbconfig/20220715-213852-ladsgroup.json
[21:38:56] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[21:41:40] <wikibugs>	 (03CR) 10Dzahn: "Could we set the severity to critical-but-not-paging first and then upgrade it to that after confirming everything?" [puppet] - 10https://gerrit.wikimedia.org/r/812846 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[21:50:50] <icinga-wm>	 PROBLEM - SSH on mw1321.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:53:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31178 and previous config saved to /var/cache/conftool/dbconfig/20220715-215357-ladsgroup.json
[22:09:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31179 and previous config saved to /var/cache/conftool/dbconfig/20220715-220902-ladsgroup.json
[22:24:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31180 and previous config saved to /var/cache/conftool/dbconfig/20220715-222407-ladsgroup.json
[22:24:09] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[22:24:11] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[22:24:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[22:24:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31181 and previous config saved to /var/cache/conftool/dbconfig/20220715-222427-ladsgroup.json
[22:28:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31182 and previous config saved to /var/cache/conftool/dbconfig/20220715-222845-ladsgroup.json
[22:42:54] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:43:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31183 and previous config saved to /var/cache/conftool/dbconfig/20220715-224350-ladsgroup.json
[22:52:12] <icinga-wm>	 RECOVERY - SSH on mw1321.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:58:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31184 and previous config saved to /var/cache/conftool/dbconfig/20220715-225855-ladsgroup.json
[23:14:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31185 and previous config saved to /var/cache/conftool/dbconfig/20220715-231400-ladsgroup.json
[23:14:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[23:14:06] <stashbot>	 T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984
[23:14:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[23:20:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:20:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:24:52] <icinga-wm>	 PROBLEM - Check systemd state on doc1002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc2001.codfw.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:26:16] <icinga-wm>	 PROBLEM - SSH on db1110.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:57:36] <wikibugs>	 (03PS3) 10Cwhite: hiera: deploy and enable loki on grafana hosts [puppet] - 10https://gerrit.wikimedia.org/r/813724 (https://phabricator.wikimedia.org/T222826)
[23:59:33] <wikibugs>	 (03CR) 10Cwhite: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/799001 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)