[00:13:23] <jinxer-wm>	 (DiskSpace) firing: Disk space an-web1001:9100:/srv 5.29% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-web1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[00:23:28] <icinga-wm>	 RECOVERY - Check systemd state on an-presto1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:24:44] <jinxer-wm>	 (SystemdUnitFailed) firing: (20) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:30:28] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1081 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:30:30] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1135 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:33:12] <icinga-wm>	 RECOVERY - Check systemd state on clouddb1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:33:48] <icinga-wm>	 RECOVERY - Check systemd state on analytics1074 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:34:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (20) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:36:10] <icinga-wm>	 RECOVERY - Check systemd state on kafka-jumbo1012 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:37:04] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1142 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:38:02] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1116 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:38:44] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1109 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:39:02] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1134 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:39:14] <icinga-wm>	 RECOVERY - Check systemd state on an-presto1012 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:39:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (20) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:41:24] <icinga-wm>	 RECOVERY - Check systemd state on kafka-jumbo1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:41:28] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1106 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:41:56] <icinga-wm>	 RECOVERY - Check systemd state on an-presto1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:43:52] <icinga-wm>	 RECOVERY - Check systemd state on analytics1075 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:44:44] <jinxer-wm>	 (SystemdUnitFailed) firing: (19) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:49:42] <icinga-wm>	 RECOVERY - Check systemd state on an-presto1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:49:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (19) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:54:44] <jinxer-wm>	 (SystemdUnitFailed) firing: (14) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:59:42] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1105 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:15:06] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:19:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:32:56] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:34:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:48:16] <icinga-wm>	 RECOVERY - Check systemd state on an-conf1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:49:43] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) export_smart_data_dump.service Failed on an-conf1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:13:23] <jinxer-wm>	 (DiskSpace) firing: Disk space an-web1001:9100:/srv 5.288% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-web1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[05:49:43] <jinxer-wm>	 (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:13:23] <jinxer-wm>	 (DiskSpace) firing: Disk space an-web1001:9100:/srv 5.288% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-web1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[08:27:41] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix mismatched allocation error from fdopen/pclose to fdopen/fclose. This is to resolve a "mismatched-dealloc" error that blocked packaging  [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/961174 (owner: 10Jgreen)
[08:50:55] <wikibugs>	 10Data-Platform-SRE: Monitor kafka topics with a replication factor of 1 - https://phabricator.wikimedia.org/T346887 (10brouberol) The `_schemas` topic is actually legit and should not be removed, as it is where `Karapace` stores its data: https://github.com/Aiven-Open/karapace#backing-up-your-karapace. Although...
[08:54:28] <wikibugs>	 10Data-Platform-SRE: Monitor kafka topics with a replication factor of 1 - https://phabricator.wikimedia.org/T346887 (10brouberol) >>! In T346887#9220603, @Ottomata wrote: > You can also probably delete ANY topic that has ksql in it.  We've never used KSQL in prod.   ` brouberol@kafka-jumbo1010:~$ for topic in $...
[09:01:53] <wikibugs>	 10Data-Platform-SRE: Monitor kafka topics with a replication factor of 1 - https://phabricator.wikimedia.org/T346887 (10brouberol) The 3 remaining topics with RF=1 are empty:  ` brouberol@kafka-jumbo1010:~$ kafka topics --describe | grep 'ReplicationFactor:1' Topic:faust-app-__assignor-__leader PartitionCount:1...
[09:02:44] <wikibugs>	 10Data-Platform-SRE: Monitor kafka topics with a replication factor of 1 - https://phabricator.wikimedia.org/T346887 (10brouberol) ` brouberol@kafka-jumbo1010:~$ kafka topics --describe | grep 'ReplicationFactor:1' brouberol@kafka-jumbo1010:~$  `  We no longer have a topic with RF=1. We can now work on adding mo...
[09:10:51] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye
[09:38:07] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10Stevemunene) Hit a bit of a block with the reimage at the partitioning step, exploring the options to find the best way forward  for `druid1008`  {F41524833}  {F41524831}
[09:38:15] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10Stevemunene)
[09:45:03] <wikibugs>	 10Data-Engineering, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 9 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10MSantos)
[09:46:32] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Platform-SRE, 10SRE, 10Event-Platform: Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Gehel) 05Declined→03Open Re-opening after discussion with @brouberol, having better auto discovery is still interesting.
[09:46:50] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10MediaWiki-extensions-EventLogging, and 2 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Gehel)
[09:47:08] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Platform-SRE, 10SRE, 10Event-Platform: Discovery for Kafka cluster brokers - https://phabricator.wikimedia.org/T213561 (10Gehel) a:05Ottomata→03brouberol
[09:49:44] <jinxer-wm>	 (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:59:57] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye executed with errors: - druid1008 (**FAIL**)   - Downtimed on...
[10:22:22] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye
[10:34:23] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye executed with errors: - druid1008 (**FAIL**)   - Removed from...
[11:05:52] <gehel>	 !log testing SAL and logging
[11:05:54] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:25:49] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye
[12:11:03] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on an-tool1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[12:11:25] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye executed with errors: - druid1008 (**FAIL**)   - Removed from...
[12:13:23] <jinxer-wm>	 (DiskSpace) firing: Disk space an-web1001:9100:/srv 5.289% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-web1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[12:19:52] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye
[12:20:47] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Privacy Engineering, 10SecTeam-Processed: Enable the TagManager plugin for Matomo - https://phabricator.wikimedia.org/T349910 (10BTullis) There has been an improvement, but it's still not working correctly. Here's a screenshot from the page with the preview contai...
[12:44:07] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Privacy Engineering, 10Patch-For-Review, 10SecTeam-Processed: Enable the TagManager plugin for Matomo - https://phabricator.wikimedia.org/T349910 (10BTullis) Hi @SCampos-WMF   I've tested the settings in https://gerrit.wikimedia.org/r/977057 manually, and they s...
[12:44:47] <btullis>	 !log removing oozie configuration from core hadoop files with https://gerrit.wikimedia.org/r/c/operations/puppet/+/974647 for T341893
[12:44:49] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:44:50] <stashbot>	 T341893: [Data Platform] Stop and remove oozie services - https://phabricator.wikimedia.org/T341893
[13:39:09] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye executed with errors: - druid1008 (**FAIL**)   - Removed from...
[13:46:04] <wikibugs>	 10Analytics, 10Data-Engineering (Sprint 5), 10Event-Platform, 10Patch-For-Review, 10User-notice: [Event Platform] Enable canary events for all MediaWiki streams - https://phabricator.wikimedia.org/T266798 (10REsquito-WMF) @Ottomata We have deployed our changes to prod.  Is there any place or anyhow we ca...
[13:49:58] <jinxer-wm>	 (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:11:56] <btullis>	 !log roll-restarting hadoop masters on test cluster for T341893
[14:12:03] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:12:03] <stashbot>	 T341893: [Data Platform] Stop and remove oozie services - https://phabricator.wikimedia.org/T341893
[14:27:00] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye
[14:27:13] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye executed with errors: - druid1008 (**FAIL**)   - Removed from...
[14:33:58] <wikibugs>	 10Data-Engineering, 10Data-Platform-SRE, 10Privacy Engineering, 10Patch-For-Review, 10SecTeam-Processed: Enable the TagManager plugin for Matomo - https://phabricator.wikimedia.org/T349910 (10SCampos-WMF) Hey @Btullis, thanks for addressing this issue! I'll generate a ticket and share it with our technic...
[14:38:21] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10Stevemunene) We fixed a partman recipe issue that was causing some errors, then proceeded as expected with the expected options below then {F41525119} selected Yes from the image below {F41525854} The...
[14:38:45] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye
[14:44:23] <brouberol>	 btullis: with the monitor for kafka topics with RF=1 merged, and the stale topics deleted, all that's left to do is remove these old topics from the datahub data. You mentioned I needed to find a conda env in a stat box with datahub installed, is that right?
[14:46:28] <brouberol>	 or should I maybe create one with just datahub, in my home dir, and cleanup after myself?
[14:47:32] <brouberol>	 (I found /home/aqu/afdeb/usr/lib/airflow/envs/airflow_2.3.1_1/bin/datahub on stat1004)
[14:47:36] <btullis>	 brouberol: Yes, you can do it on a stat box. Either a conda-analytics environment or a basic python venv will work.
[14:48:01] <btullis>	 Hang on, I'll look out an example. 
[14:49:46] <btullis>	 Oh, this was the comment that I was thinking of, but it wasn't as useful as I had thought. It doesn't have the creation of the environment, just using it to do a manual ingestion. https://phabricator.wikimedia.org/T327884#8574080
[14:50:15] <btullis>	 https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/DataHub#Manual_Ingestion_Example
[14:58:33] <btullis>	 !log merging 974649: Remove all remaining references to oozie and clean up | https://gerrit.wikimedia.org/r/c/operations/puppet/+/974649 for T341893
[14:58:35] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:58:36] <stashbot>	 T341893: [Data Platform] Stop and remove oozie services - https://phabricator.wikimedia.org/T341893
[15:12:01] <wikibugs>	 10Data-Platform-SRE: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host druid1008.eqiad.wmnet with OS bullseye completed: - druid1008 (**PASS**)   - Removed from Puppet and...
[15:19:30] <brouberol>	 Afk for a bit 
[16:11:03] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on an-tool1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:12:49] * brouberol back
[16:13:23] <jinxer-wm>	 (DiskSpace) firing: Disk space an-web1001:9100:/srv 5.289% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-web1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[16:13:41] <brouberol>	 btullis: I'm struggling to get the datahub CLI to talk to the server. I'm getting various SSL related errors. I'm curious: did you ever get it to work?
[16:13:53] <wikibugs>	 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install an-worker11[57-75] - https://phabricator.wikimedia.org/T349936 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host an-worker1160.eqiad.wmnet with OS bullseye
[16:14:03] <btullis>	 Oh yes, hang on. There is an environment variable that helps. Let me dig it out.
[16:15:03] <brouberol>	 🙏
[16:15:48] <brouberol>	 I dug a bit on the box and found a .datahubenv file with
[16:15:48] <brouberol>	 gms:
[16:15:48] <brouberol>	   server: https://datahub-gms.discovery.wmnet:30443
[16:15:48] <brouberol>	 so I copied that
[16:16:04] <btullis>	 Try this  `REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt` and then your command.
[16:16:07] <brouberol>	 aaah
[16:16:14] <brouberol>	 perfect, thank you
[16:16:15] <btullis>	 https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/analytics/dags/datahub/ingestion/ingest_daily_dag.py#L60
[16:17:03] <btullis>	 It's something to do with the conda-analytics environment not using the right system CA file by default.
[16:17:31] <brouberol>	 it worked!
[16:17:47] <brouberol>	 do we have that documented somewhere? If not, I'll make sure to do so
[16:18:34] <btullis>	 I remember that we had this ticket back when it was `anaconda-wmf`, before it was `conda-analytics` https://phabricator.wikimedia.org/T306197
[16:18:50] <btullis>	 I thought it was going to go away with conda-analytics, but it hasn't.
[16:20:40] <btullis>	 I seem to have removed said documentation, thinking that it was fixed: https://wikitech.wikimedia.org/w/index.php?title=Data_Engineering/Systems/DataHub&diff=prev&oldid=2091885
[16:22:14] <brouberol>	 ack, thank you. I'll write a little something in our doc then
[16:22:37] <brouberol>	 I'm about to soft delete the topics from datahub, and if everything goes right, I'll hard delete them as well
[16:27:16] <brouberol>	 `Took 40.972 seconds to soft delete -1 versioned rows and 0 timeseries aspect rows for 1 entities.` jeez take your time datahub, no-one's in a rush or anything
[16:28:19] <volans>	 so deleting -1 rows means that it inserted one row?!?!?! :D
[16:29:09] <brouberol>	 hahaha
[16:29:28] <brouberol>	 up is down, down is up, what is true anymore?
[16:29:34] <volans>	 False
[16:30:40] * brouberol slow claps
[16:31:17] <wikibugs>	 10Data-Platform-SRE: Monitor kafka topics with a replication factor of 1 - https://phabricator.wikimedia.org/T346887 (10brouberol) We can now delete these topics from datahub, from stat1004:  ` (2023-05-05T16.44.55_milimetric) milimetric@stat1004:~$ export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt  (...
[16:55:06] <wikibugs>	 10Data-Platform-SRE: Monitor kafka topics with a replication factor of 1 - https://phabricator.wikimedia.org/T346887 (10brouberol) 05Open→03Resolved
[17:06:22] <wikibugs>	 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install an-worker11[57-75] - https://phabricator.wikimedia.org/T349936 (10cmooney) @robh @Jclark-ctr I kicked off the reimage of an-worker1160 again.  I think the problem here wasn't actually an error on the DHCP config, but a problem we have...
[17:49:58] <jinxer-wm>	 (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:04:44] <wikibugs>	 10Data-Engineering, 10Data Products: Duplicate keys in x_analytics header corrupt some wmf_raw.webrequest rows and break refinement of wmf.webrequest - https://phabricator.wikimedia.org/T351909 (10mforns)
[19:44:55] <wikibugs>	 10Data-Engineering, 10Data Products: Duplicate keys in x_analytics header corrupt some wmf_raw.webrequest rows and break refinement of wmf.webrequest - https://phabricator.wikimedia.org/T351909 (10gmodena) Just had a chat with @JAllemandou , this could be a good use case for {T349763}.  >  Compromise: If we ch...
[20:11:17] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on an-tool1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[20:13:23] <jinxer-wm>	 (DiskSpace) firing: Disk space an-web1001:9100:/srv 5.289% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-web1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[21:49:59] <jinxer-wm>	 (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed