[02:54:42] <wikibugs>	 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10775088 (10Kirilloparma)    >>! In T374230#10771849, @Silvan_WMDE wrote: > @Kirillopa...
[03:19:18] <wikibugs>	 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10775097 (10Jakob_WMDE) >>! In T374230#10775088, @Kirilloparma wrote: >  > @Silvan_WMD...
[08:28:28] <wikibugs>	 06Traffic: Move host normalization to haproxy - https://phabricator.wikimedia.org/T392880 (10Fabfur) 03NEW
[08:43:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:44:11] <vgutierrez>	 need to check why this alert doesn't get silenced 
[08:52:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:12:46] <wikibugs>	 06Traffic: Move method check from varnish to HAProxy - https://phabricator.wikimedia.org/T392073#10775640 (10Fabfur) Leaving this open as memo to remove Varnish configuration at a later moment
[09:12:59] <wikibugs>	 06Traffic: Move method check from varnish to HAProxy - https://phabricator.wikimedia.org/T392073#10775644 (10Fabfur) 05Open→03In progress p:05Medium→03Low
[09:17:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs7001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:17:39] <wikibugs>	 06Traffic: Move method check from varnish to HAProxy - https://phabricator.wikimedia.org/T392073#10775654 (10Vgutierrez) 05In progress→03Stalled
[09:22:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs7001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:26:05] <nemo-yiannis>	 👋 We are having an issue when purging edge cache for PCS endpoint URLs that contain ":". I ve added steps to reproduce the issue here: https://phabricator.wikimedia.org/T392849
[09:26:27] <nemo-yiannis>	 Any idea what is wrong with the way we send purge events ?
[09:32:20] <wikibugs>	 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10775677 (10Jgiannelos) FYI this is not reproduced on endpoints not migrated to rest-gateway yet. eg:  * Given this page  * https://en.wikipedia...
[09:36:59] <vgutierrez>	 nemo-yiannis: last I've heard from _joe_ is that  mwscript-k8s can't be used to purge requests yet
[09:38:18] <vgutierrez>	 effie: ^^ do you know if that bug has been fixed?
[09:40:43] <nemo-yiannis>	 I don't think that our main issue is mwscript-but rather the purge events we send to kafka from PCS via eventgate
[09:41:37] <nemo-yiannis>	 *mwscript-k8s but
[09:43:59] <vgutierrez>	 nemo-yiannis: have you tried using %3A instead of :?
[09:45:12] <effie>	 I am afraid I do not know
[09:45:24] <vgutierrez>	 effie: who could be aware of that?
[09:45:29] <nemo-yiannis>	 Yeah, we escape the title part
[09:45:42] <nemo-yiannis>	 There are some requests responses and events from kafka for details on the ticket
[09:46:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs5005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:46:34] <effie>	 vgutierrez: rz.l 
[09:47:01] <effie>	 let me take a look see if I can help 
[09:47:30] <wikibugs>	 06Traffic, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools: Spicerack's Icinga module should provide a way to skip specific services in sub-optimal but desired state - https://phabricator.wikimedia.org/T392848#10775722 (10elukey) We discussed the options on IRC, to summarize:  1) The DNS cookbook co...
[09:48:07] <vgutierrez>	 hmmm `FYI this is not reproduced on endpoints not migrated to rest-gateway yet. eg:`
[09:49:53] <hnowlan>	 first thing ^ that made me think is an issue with normalisation 
[09:50:15] <hnowlan>	 but the events that hit kafka are have the same correct URLs for migrated and unmigrated wikis
[09:50:39] <hnowlan>	 and purges work for other URLs for migrated wikis
[09:51:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs5005:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:51:42] <wikibugs>	 06Traffic, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools: Spicerack's Icinga module should provide a way to skip specific services in sub-optimal but desired state - https://phabricator.wikimedia.org/T392848#10775738 (10elukey) From https://icinga.com/docs/icinga-2/latest/doc/24-appendix/ it seems...
[09:51:52] <vgutierrez>	 migration happens via gateway-check.lua right?
[09:52:25] <hnowlan>	 yeah
[09:52:52] <vgutierrez>	 so same kind of normalization happens for migrated and not migrated wikis
[09:52:56] <vgutierrez>	 at least at the CDN
[09:53:21] <vgutierrez>	 https://www.irccloud.com/pastebin/JGuJ96AH/
[09:53:32] <vgutierrez>	 that's the relevant part of backend.yaml
[09:54:37] <vgutierrez>	 nemo-yiannis: from normalize-path.lua comments...
[09:54:47] <vgutierrez>	 -- path = "/wiki/User:Ema%2fProfiling_Python%28Now you know[dude]"
[09:54:47] <vgutierrez>	 -- return "/wiki/User:Ema/Profiling_Python(Now you know%5Bdude%5D"
[09:55:07] <hnowlan>	 jgiannelos
[09:55:15] <vgutierrez>	 so it looks like `:` shouldn't be encoded in your PURGE requests
[09:57:02] <vgutierrez>	 so `:` gets decoded but not encoded
[09:57:07] <vgutierrez>	 (at the CDN)
[09:57:17] <vgutierrez>	 rest-gateway is aware of that?
[09:57:36] <wikibugs>	 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10775748 (10hnowlan) From kafka - a successful enwiki purge and a failing testwiki purge:  ` {   "$schema": "/resource_change/1.0.0",   "meta":...
[09:58:40] <hnowlan>	 by default the gateway doesn't touch normalisation either way
[09:59:01] <hnowlan>	 it appears that normalisation isn't affecting the purges themselves, see https://phabricator.wikimedia.org/T392849#10775748 
[09:59:11] <hnowlan>	 but it clearly is if it's only happening for the gateway 
[09:59:31] <vgutierrez>	 ack... let's see if I can debug this :D
[10:00:41] <hnowlan>	 purges are working for other migrated pages fwiw
[10:01:00] <vgutierrez>	 other as in not containing `:^ in the URL?
[10:01:03] <hnowlan>	 yeah 
[10:01:03] <vgutierrez>	 `:`
[10:01:05] <vgutierrez>	 ack
[10:05:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs5004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:10:23] <nemo-yiannis>	 FWIW i havent tested other special characters but this came up while testing `User:` namespace pages
[10:17:36] <vgutierrez>	 so.. a quick check filtering by ReqUrl = /api/rest_v1/page/mobile-html/User%3AJGiannelos_%28WMF%29%2Ftest-pcs-rollout
[10:17:50] <vgutierrez>	 cp6015 didn't receive a PURGE request after I edited that page
[10:18:37] <vgutierrez>	 let's wide the net... and filter urls that contains JGiannelos
[10:19:55] <vgutierrez>	 ok... now I see the PURGEs
[10:20:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs5004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:22:33] <wikibugs>	 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10775807 (10Vgutierrez) from varnish point of view, after editing https://test.wikipedia.org/wiki/User:JGiannelos_(WMF)/test-pcs-rollout the fol...
[10:25:10] <vgutierrez>	 so http://test.wikipedia.org/api/rest_v1/page/mobile-html/User%3AJGiannelos_(WMF)%2Ftest-pcs-rollout gets a PURGE request
[10:25:26] <vgutierrez>	 note that `(` and `)` aren't encoded
[10:27:47] <hnowlan>	 huh 
[10:27:51] <wikibugs>	 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10775829 (10Vgutierrez) a quick check shows that the URL receiving the PURGE is purged as expected: ` vgutierrez@carrot:~$ curl -4 'https://test...
[10:28:31] <vgutierrez>	 see https://phabricator.wikimedia.org/T392849#10775829
[10:30:38] <nemo-yiannis>	 Let me try again without encoding the parenthesis
[10:36:41] <nemo-yiannis>	 vgutierrez: I do get `x-cache-status: miss` on your steps on the ticket but the content is not updated
[10:37:26] <vgutierrez>	 that's definitely another issue...
[10:37:35] <vgutierrez>	 let me track the whole request flow via ATS
[10:37:43] <nemo-yiannis>	 at the same time but the response from the service is up-to-date
[10:38:04] <nemo-yiannis>	 `curl -k "https://mobileapps.svc.eqiad.wmnet:4102/test.wikipedia.org/v1/page/mobile-html/User%3AJGiannelos_(WMF)%2Ftest-pcs-rollout"`
[10:38:14] <nemo-yiannis>	 not via rest-gateway
[10:38:34] <vgutierrez>	 nemo-yiannis: URI Hostname could have an impact there?
[10:38:47] <nemo-yiannis>	 i don't think s
[10:38:48] <vgutierrez>	 you can use --connect-to
[10:39:12] <nemo-yiannis>	 hnowlan: Is there any way i can send the same request but on rest-gateway level ?
[10:40:00] <hnowlan>	 nemo-yiannis: yep, curl 'https://rest-gateway.discovery.wmnet:4113/test.wikipedia.org/v1/page/mobile-html/User%3AJGiannelos_(WMF)%2Ftest-pcs-rollout'
[10:40:31] <nemo-yiannis>	 its also renders the latest
[10:41:14] <vgutierrez>	 Date:2025-04-29 Time:10:40:51 ConnAttempts:0 ConnReuse:0 TTFetchHeaders:271 ClientTTFB:271 CacheReadTime:0 CacheWriteTime:0 TotalSMTime:271 TotalPluginTime:0 ActivePluginTime:0 TotalTime:271 OriginServer:rest-gateway.discovery.wmnet OriginServerTime:271 CacheResultCode:TCP_MISS CacheWriteResult:FIN ReqMethod:GET RespStatus:200 OriginStatus:200 
[10:41:14] <vgutierrez>	 ReqURL:http://test.wikipedia.org/api/rest_v1/page/mobile-html/User:JGiannelos_(WMF)%2Ftest-pcs-rollout ReqHeader:User-Agent:curl/7.88.1 ReqHeader:Host:test.wikipedia.org ReqHeader:X-Client-IP:81.39.0.137 ReqHeader:Cookie: BerespHeader:Set-Cookie:- BerespHeader:Cache-Control:s-maxage=1209600, max-age=0 BerespHeader:Connection:- RespHeader:X-Cache-Int:cp6010 miss RespHeader:Backend-Timing:-
[10:41:23] <vgutierrez>	 that's a cache miss logged by ATS as well after the PURGE
[10:42:11] <vgutierrez>	 so ATS is actually re-fetching the content
[10:44:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs4009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:44:50] <nemo-yiannis>	 ha
[10:45:27] <nemo-yiannis>	 This returns the latest revision: `curl "https://rest-gateway.discovery.wmnet:4113/test.wikipedia.org/v1/page/mobile-html/User%3AJGiannelos_(WMF)%2Ftest-pcs-rollout"`
[10:45:44] <vgutierrez>	 but User:JGiannelos offers a a stale version?
[10:45:51] <nemo-yiannis>	 curl "https://rest-gateway.discovery.wmnet:4113/test.wikipedia.org/v1/page/mobile-html/User:JGiannelos_(WMF)%2Ftest-pcs-rollout"
[10:45:54] <nemo-yiannis>	 stale
[10:46:06] <vgutierrez>	 .)
[10:46:07] <vgutierrez>	 :)
[10:47:58] <nemo-yiannis>	 i am confused :)
[10:49:01] <wikibugs>	 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10775884 (10Vgutierrez) ATS also shows how it's performing the request to the origin server after a PURGE: ` Date:2025-04-29 Time:10:40:51 ConnA...
[10:49:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs4009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:49:47] <vgutierrez>	 nemo-yiannis: origin server is definitely out of my scope :)
[10:50:02] <nemo-yiannis>	 hnowlan: any ideas ?
[10:54:30] <nemo-yiannis>	 I need to check if RB did some extra normalization
[10:56:50] <hnowlan>	 nemo-yiannis: mobileapps also returns stale content directly when using ":"
[10:57:38] <hnowlan>	 possibly some kind of cache key issue? 
[10:58:50] <hnowlan>	 looking at rb 
[10:59:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs4008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:04:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs4009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:31:30] <jinxer-wm>	 FIRING: [4x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum2001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running  - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted
[12:31:47] <sukhe>	 ^ yes, restarts in progress
[12:36:30] <jinxer-wm>	 RESOLVED: [4x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum1001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running  - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted
[12:36:45] <jinxer-wm>	 FIRING: [4x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum1001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running  - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted
[12:41:30] <jinxer-wm>	 RESOLVED: [4x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum1001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running  - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted
[12:46:30] <jinxer-wm>	 FIRING: [4x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum1001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running  - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted
[12:51:30] <jinxer-wm>	 RESOLVED: [3x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum1001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running  - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted
[12:58:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: anycast-healthchecker.service on durum3003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:58:36] <sukhe>	 expected ^
[13:03:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: anycast-healthchecker.service on durum3003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:14:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs6002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:21:18] <vgutierrez>	 I'm wondering why this keep firing if in theory spicerack takes care of silencing both icinga & alertmanager
[13:22:03] <sukhe>	 should also be true for example for the anycast-hc alerts but I have always presumed that there is a race condition somewhere in when the alert is detected and fired, because otherwise it should happen in all cases and it doesn't?
[13:24:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs6002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:28:19] <vgutierrez>	 2025-04-29 13:08:11,222 vgutierrez 235109 [INFO] Scheduling downtime on Icinga server alert1002.wikimedia.org for hosts: lvs6002
[13:28:30] <vgutierrez>	 alert got triggered 3 minutes later
[13:28:39] <vgutierrez>	 sorry... 6 minutes later
[13:36:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs6001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:42:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs6001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:59:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs3009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:00:38] <sukhe>	 yeah we should look into this, given how frequently it is firing
[14:03:58] <wikibugs>	 06Traffic, 06Data-Engineering, 10DPE HAProxy Migration: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10776594 (10Fabfur) @JAllemandou the change has been deployed in production, now all haproxykafka instances on cache hosts are sending the `termination_state` field...
[14:04:15] <wikibugs>	 06Traffic, 06Data-Engineering, 10DPE HAProxy Migration: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10776600 (10Fabfur)
[14:05:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs3009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:15:56] <vgutierrez>	 sukhe: it's firing for every single depool
[14:17:55] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs3008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:24:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: prometheus_liberica_cp_checks.service on lvs3008:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:13:15] <wikibugs>	 06Traffic: wmfuniq-keygen: Install to /usr/bin, not /usr/sbin - https://phabricator.wikimedia.org/T392937 (10BCornwall) 03NEW
[17:13:36] <wikibugs>	 06Traffic: wmfuniq-keygen: Install to /usr/bin, not /usr/sbin - https://phabricator.wikimedia.org/T392937#10777672 (10BCornwall) 05Open→03In progress p:05Triage→03Low
[17:16:08] <wikibugs>	 06Traffic: wmfuniq-keygen: Install to /usr/bin, not /usr/sbin - https://phabricator.wikimedia.org/T392937#10777680 (10Dzahn) Or should it be /usr/local/bin/ because it's our own software that we install?  https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
[17:18:08] <wikibugs>	 06Traffic: wmfuniq-keygen: Install to /usr/bin, not /usr/sbin - https://phabricator.wikimedia.org/T392937#10777683 (10BCornwall) Considering it's a proper debian package, I think /usr/bin is more appropriate IMO.
[17:20:48] <wikibugs>	 06Traffic, 13Patch-For-Review: wmfuniq-keygen: Install to /usr/bin, not /usr/sbin - https://phabricator.wikimedia.org/T392937#10777690 (10Dzahn) Ah, I see!. Yea, whatever the definition of "locally installed" is then. Not a strong opinion either way!
[17:54:17] <wikibugs>	 06Traffic, 06Data-Engineering-Radar, 06Data-Platform-SRE, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Replicate current low-message alerting from VarnishKafka - https://phabricator.wikimedia.org/T391810#10777852 (10Ahoelzl)
[20:51:21] <ryankemper>	 Would someone be available tomorrow to work on tearing down wdqs-internal lvs? (cc brett)
[21:00:20] <brett>	 ryankemper: Sure, we can do that. When's a good time?
[21:02:48] <ryankemper>	 brett: starting either at 11am or 2pm pst works for me, what's your preference?
[21:03:23] <brett>	 Sorry to be pedantic but I assume you meant pdt? 11 am works for me
[21:13:14] <ryankemper>	 yes :) I can never remember which one we're in
[21:13:20] <ryankemper>	 cool let's plan on 11am then. i'll make a calendar event
[21:26:33] <ryankemper>	 brett: okay, calendar event up. I added links to the 2 patch chains (dns repo and puppet repo) in the description
[21:27:23] <brett>	 Thank you!
[21:58:42] <wikibugs>	 06Traffic, 06DC-Ops, 10ops-codfw, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10778576 (10BCornwall)