[00:14:37] <wikibugs>	 10Data-Engineering: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10Mayakp.wiki) Another issue discovered recently T355608 which could benefit from improving automated bot detection.
[00:31:13] <wikibugs>	 10Data-Engineering (Sprint 7), 10Patch-For-Review: [Iceberg Migration] Migrate browser_general tables to Iceberg - https://phabricator.wikimedia.org/T352670 (10CodeReviewBot) ebysans merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/576  Update browser_general dag to gene...
[01:50:38] <jinxer-wm>	 (SystemdUnitFailed) firing: (12) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:40:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[03:00:16] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[03:10:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[03:30:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[03:55:50] <jinxer-wm>	 (DiskSpace) firing: Disk space stat1005:9100:/ 2.067% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[04:04:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[04:09:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[05:04:20] <jinxer-wm>	 (SystemdUnitFailed) firing: (14) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:05:41] <icinga-wm>	 PROBLEM - Check systemd state on clouddb1015 is CRITICAL: CRITICAL - degraded: The following units failed: check-private-data.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:06:23] <icinga-wm>	 PROBLEM - Check systemd state on clouddb1019 is CRITICAL: CRITICAL - degraded: The following units failed: check-private-data.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:13:33] <icinga-wm>	 PROBLEM - Check systemd state on clouddb1021 is CRITICAL: CRITICAL - degraded: The following units failed: check-private-data.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:30:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[07:55:51] <jinxer-wm>	 (DiskSpace) firing: Disk space stat1005:9100:/ 2.05% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[08:41:27] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Create 3 microsites for wdqs full graph, main graph, & scholarly articles - https://phabricator.wikimedia.org/T354658 (10Gehel) 05Open→03Resolved
[08:41:33] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Expose 3 new dedicated WDQS endpoints - https://phabricator.wikimedia.org/T351650 (10Gehel)
[08:41:43] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Expose 3 new dedicated WDQS endpoints - https://phabricator.wikimedia.org/T351650 (10Gehel) 05Open→03Resolved
[08:50:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[09:02:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[09:05:38] <jinxer-wm>	 (SystemdUnitFailed) firing: (14) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:30:27] <wikibugs>	 (03CR) 10Phuedx: [C: 03+1] Remove trvwikisource from scoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/992944 (owner: 10Aqu)
[09:35:53] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware: decommission druid1006.eqiad.wmnet - https://phabricator.wikimedia.org/T354743 (10Stevemunene)
[09:36:17] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware: decommission druid1006.eqiad.wmnet - https://phabricator.wikimedia.org/T354743 (10Stevemunene) a:05Stevemunene→03None
[09:36:59] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware: decommission druid1005.eqiad.wmnet - https://phabricator.wikimedia.org/T354742 (10Stevemunene)
[09:37:38] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware: decommission druid1004.eqiad.wmnet - https://phabricator.wikimedia.org/T354741 (10Stevemunene)
[09:37:56] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware: decommission druid1004.eqiad.wmnet - https://phabricator.wikimedia.org/T354741 (10Stevemunene) a:05Stevemunene→03None
[09:38:40] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware: decommission druid1005.eqiad.wmnet - https://phabricator.wikimedia.org/T354742 (10Stevemunene) a:05Stevemunene→03None
[09:39:08] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043 (10Stevemunene)
[09:40:33] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043 (10Stevemunene) All SRE steps have been completed and the hosts have been decommissioned and handed over to dc ops for the final step.
[09:52:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[09:55:22] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Check log rotation settings on airflow instances - https://phabricator.wikimedia.org/T339015 (10Stevemunene) Current airflow logs are managed by a systemd timer job that runs everyday at 0300HRS UTC and deletes any logs older than 90 days. However, this does not del...
[10:04:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[10:14:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[10:21:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[10:28:17] <gehel>	 stevemunene: about T336043, could you add a link to the decommission tickets for DC-Ops?
[10:28:17] <stashbot>	 T336043: Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043
[10:29:01] <gehel>	 Oh sorry, it's already on the sub tasks
[10:30:08] <gehel>	 stevemunene: and to make sure, have you updated https://docs.google.com/spreadsheets/d/1Obj5ozGQYl7Zei0MBLELVD8eDGqqsF_t9T3ZbrOsmZg/edit#gid=0 as well?
[10:30:39] <btullis>	 I think that those subtasks need reassigning to j.clark-ctr and the ops-eqiad tag adding, otherwise dc-ops won't see them.
[10:31:34] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043 (10Gehel) 05Open→03Resolved
[10:36:57] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10akosiaris) >>! In T355685#9484621, @Lucas_Werkmeister_WMDE wrote: >>>! In T355685#9484091, @akosiaris wrote: >> My high level suggestion woul...
[10:44:20] <jinxer-wm>	 (SystemdUnitFailed) firing: (15) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:45:38] <jinxer-wm>	 (SystemdUnitFailed) firing: (15) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:50:35] <jinxer-wm>	 (DiskSpace) resolved: Disk space stat1005:9100:/ 1.999% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[10:53:23] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10Lucas_Werkmeister_WMDE) >>! In T355685#9490204, @akosiaris wrote: >> Would it be possible to have just one helm release, but have Test Wikida...
[11:46:22] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware, 10ops-eqiad: decommission druid1006.eqiad.wmnet - https://phabricator.wikimedia.org/T354743 (10BTullis)
[11:46:24] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware, 10ops-eqiad: decommission druid1005.eqiad.wmnet - https://phabricator.wikimedia.org/T354742 (10BTullis)
[11:46:28] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10decommission-hardware, 10ops-eqiad: decommission druid1004.eqiad.wmnet - https://phabricator.wikimedia.org/T354741 (10BTullis)
[11:48:30] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043 (10BTullis) Thanks for the decommissions @Stevemunene - I just added the #ops-eqiad tag to the subtasks to help make sure that they are seen by the right team.
[11:49:35] <jinxer-wm>	 (DiskSpace) firing: Disk space stat1005:9100:/ 2.394% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[12:56:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) resolved: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[13:24:15] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[13:45:45] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10akosiaris) >>! In T355685#9490230, @Lucas_Werkmeister_WMDE wrote: >>>! In T355685#9490204, @akosiaris wrote: >>> Would it be possible to have...
[14:20:48] <wikibugs>	 (03CR) 10Joal: "One last nit, then good to go" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/986839 (https://phabricator.wikimedia.org/T352671) (owner: 10TChin)
[14:49:20] <jinxer-wm>	 (SystemdUnitFailed) firing: (14) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:02:32] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10Lucas_Werkmeister_WMDE) Thanks a lot – I’ve added some of that information at https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service#D...
[15:25:35] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10akosiaris) >>! In T355685#9490871, @Lucas_Werkmeister_WMDE wrote: > Thanks a lot – I’ve added some of that information at https://wikitech.wi...
[15:37:00] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:38:01] <btullis>	 I have reached out to mnz who is making heavy use of /tmp on stat1005, to see if he can move these files beneath /srv.
[15:39:20] <jinxer-wm>	 (SystemdUnitFailed) firing: (14) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:45:16] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10Lucas_Werkmeister_WMDE) >>! In T355685#9490969, @akosiaris wrote: > Definitely different task. I am also not at all sure right now that the t...
[15:49:17] <wikibugs>	 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 2 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10akosiaris) >>! In T355685#9491033, @Lucas_Werkmeister_WMDE wrote: >>>! In T355685#9490969, @akosiaris wrote: >> Definitely different task. I...
[15:49:49] <jinxer-wm>	 (DiskSpace) firing: Disk space stat1005:9100:/ 2.368% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[15:51:04] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1019 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:51:24] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1021 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:54:20] <jinxer-wm>	 (SystemdUnitFailed) firing: (14) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:55:46] <wikibugs>	 10Analytics, 10AQS2.0, 10Tech-Docs-Team, 10Data Products (Epics Timeline), and 3 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin)
[16:01:33] <wikibugs>	 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: Support querying a range of hourly data partitions - https://phabricator.wikimedia.org/T294654 (10mpopov) @nettrom_WMF Thank you for sharing that code! I recently used it in T353666 and it was very helpful! Just wanted to show my appreciation.
[16:33:44] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by bking@cumin2002 for hosts: `cloudelastic1010.wikimedia.org` - cloudelastic1010.wikim...
[16:35:09] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Wmfdata should connect to Presto using the analytics-presto CNAME - https://phabricator.wikimedia.org/T345482 (10BTullis) I've been testing out various approaches on this task and I have received great help from @brouberol for which I am very g...
[16:37:47] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Bring an-coord100[3-4] into service - https://phabricator.wikimedia.org/T336045 (10BTullis) This is now waiting on: https://github.com/wikimedia/wmfdata-python/pull/50 and {T345482}. Once that is merged, we will create a new conda-analytics pac...
[16:43:49] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Upgrade Airflow instances to Bullseye - https://phabricator.wikimedia.org/T335261 (10BTullis) a:03BTullis
[17:14:30] <wikibugs>	 10Data-Engineering (Sprint 7), 10Data Products, 10Structured-Data-Backlog: [Maintenance] Set up deletion jobs for Structured Data's data pipelines - https://phabricator.wikimedia.org/T347561 (10mfossati) @lbowmaker @JAllemandou , I was thinking that perhaps we could implement these deletion jobs as tasks in...
[17:17:58] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host cloudelastic1010.eqiad.wmnet with OS bullseye
[17:22:19] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Wmfdata should connect to Presto using the analytics-presto CNAME - https://phabricator.wikimedia.org/T345482 (10brouberol) @BTullis My pleasure!
[17:24:30] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[17:37:57] <wikibugs>	 10Data-Engineering, 10Patch-For-Review: Data Quality Issue: Wikitext History Job fail / rerun in Airflow - https://phabricator.wikimedia.org/T342911 (10CodeReviewBot) xcollazo opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/595  Don't retry convert_history_xml_to_parquet.
[17:54:59] <wikibugs>	 10Data-Engineering, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 10 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Jdforrester-WMF)
[17:57:45] <wikibugs>	 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host cloudelastic1010.eqiad.wmnet with OS bullseye completed:...
[19:49:50] <jinxer-wm>	 (DiskSpace) firing: Disk space stat1005:9100:/ 2.341% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[19:55:38] <jinxer-wm>	 (SystemdUnitFailed) firing: (12) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:24:31] <jinxer-wm>	 (HdfsCapacityRemainingPercent) firing: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent
[23:49:50] <jinxer-wm>	 (DiskSpace) firing: Disk space stat1005:9100:/ 2.318% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[23:55:38] <jinxer-wm>	 (SystemdUnitFailed) firing: (12) refinery-sqoop-mediawiki-production-daily.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed