[05:01:10] 10Data-Engineering, 10Wikibase change dispatching scripts to jobs, 10serviceops-radar: Better observability/visualization for MediaWiki jobs - https://phabricator.wikimedia.org/T291620 (10Tgr) [05:01:24] 10Data-Engineering, 10Wikibase change dispatching scripts to jobs, 10serviceops-radar: Better observability/visualization for MediaWiki jobs - https://phabricator.wikimedia.org/T291620 (10Tgr) >>! In T291620#7376583, @Ladsgroup wrote: > - I had an action item for `X-Wikimedia-Debug` case (to implement it).... [10:50:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [11:15:28] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [12:33:13] (DiskSpace) firing: Disk space dbstore1003:9100:/srv 5.41% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=dbstore1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:53:14] 10Data-Platform-SRE (23/24 Q3 Milestone 1): Bring dbstore1008 into service to replace dbstore1003 - https://phabricator.wikimedia.org/T351921 (10Marostegui) p:05High→03Unbreak! And as expected...this got filled up. ` /dev/mapper/tank-data xfs 4.4T 4.4T 20K 100% /srv ` [15:47:13] 10Data-Platform-SRE (23/24 Q3 Milestone 1): Bring dbstore1008 into service to replace dbstore1003 - https://phabricator.wikimedia.org/T351921 (10BTullis) >>! In T351921#9426370, @Marostegui wrote: > And as expected...this got filled up. > ` > /dev/mapper/tank-data xfs 4.4T 4.4T 20K 100% /srv > ` Oh de... [15:53:13] (DiskSpace) resolved: Disk space dbstore1003:9100:/srv 4.312e-07% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=dbstore1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [16:00:35] 10Data-Platform-SRE (23/24 Q3 Milestone 1): Bring dbstore1008 into service to replace dbstore1003 - https://phabricator.wikimedia.org/T351921 (10BTullis) I stopped the mariadb@s1 service which automatically removed the 400 GB of temporary files in `/srv/tmp.s1`. I have now restarted the service and started the... [16:07:07] 10Data-Platform-SRE (23/24 Q3 Milestone 1): Bring dbstore1008 into service to replace dbstore1003 - https://phabricator.wikimedia.org/T351921 (10BTullis) The replica lag is dropping and the free space is now 8%. ` btullis@dbstore1003:/srv$ df -h /srv Filesystem Size Used Avail Use% Mounted on /dev/m... [16:29:01] 10Data-Platform-SRE (23/24 Q3 Milestone 1): Bring dbstore1008 into service to replace dbstore1003 - https://phabricator.wikimedia.org/T351921 (10BTullis) Unfortunately, it looks like some kind of query caused it to spill 400 GB or so to disk, so I don't know whether this will happen again in the next few days. {... [16:30:24] 10Data-Platform-SRE (23/24 Q3 Milestone 1): Bring dbstore1008 into service to replace dbstore1003 - https://phabricator.wikimedia.org/T351921 (10BTullis) p:05Unbreak!→03High [18:47:29] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10Ottomata) > suggested implementation to be use medaiwiki-config/docroot/mediawiki.org I t... [19:00:28] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10Ottomata) I might be able to set the `ip` field to the client IP, usually parsed and provi... [23:23:15] (EventgateValidationErrors) firing: ... [23:23:16] eventgate-analytics-external stream eventlogging_UniversalLanguageSelector validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [23:28:15] (EventgateValidationErrors) firing: ... [23:28:15] (2) eventgate-analytics-external stream eventlogging_SearchSatisfaction validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [23:38:15] (EventgateValidationErrors) resolved: ... [23:38:16] (2) eventgate-analytics-external stream eventlogging_SearchSatisfaction validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors