[00:20:12] (VarnishkafkaNoMessages) firing: varnishkafka for instance cp3062:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp3062:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:25:12] (VarnishkafkaNoMessages) resolved: varnishkafka for instance cp3062:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp3062:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [04:05:55] PROBLEM - Check unit status of eventlogging_to_druid_network_flows_internal_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_network_flows_internal_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:58:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:13:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [06:03:27] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:25:57] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:45:12] (VarnishkafkaNoMessages) firing: varnishkafka for instance cp2031:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2031:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [08:46:19] 10Data-Engineering, 10Data-Engineering-Kanban: Add CU-UA high entropy hints to Hive webrequest tables - https://phabricator.wikimedia.org/T304850 (10JAllemandou) 05Open→03Resolved [08:50:12] (VarnishkafkaNoMessages) resolved: varnishkafka for instance cp2031:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2031:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:22:23] --^ This is interesting. I merged the change that I thought was going to exclude statsv from these alerts. [09:22:38] Will take another look. [09:26:12] (VarnishkafkaNoMessages) firing: varnishkafka for instance cp2037:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2037:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:31:12] (VarnishkafkaNoMessages) resolved: varnishkafka for instance cp2037:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2037:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:33:50] Ah, no I had forgotten to merge it. Done now. These statsv alerts should stop arriving. [09:53:12] (VarnishkafkaNoMessages) firing: varnishkafka for instance cp2035:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2035:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:54:40] Oh what! --^ They're really supposed to have stopped now. [09:58:12] (VarnishkafkaNoMessages) resolved: varnishkafka for instance cp2035:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2035:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [10:25:44] (03PS6) 10Aqu: Fix: Prevent empty normalized host [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (https://phabricator.wikimedia.org/T300029) [10:27:56] 10Data-Engineering-Kanban, 10Airflow: Fix use of Java LinkedHashMap caching in Spark multi-threaded environment - https://phabricator.wikimedia.org/T305386 (10Antoine_Quhen) https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/772027/ [10:30:32] (03PS7) 10Aqu: Fix: Prevent empty normalized host [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (https://phabricator.wikimedia.org/T300029) [10:43:40] (03PS6) 10Aqu: Add archiving job for Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/774383 (https://phabricator.wikimedia.org/T300039) [11:57:55] hi btullis - I'm interested in trying to understand why we have those alerts - do you know/have ideas| [11:57:58] ? [12:25:15] 10Analytics: PySpark is unable to find hive tables - https://phabricator.wikimedia.org/T305457 (10bmansurov) [12:26:49] 10Analytics: PySpark is unable to find Hive tables - https://phabricator.wikimedia.org/T305457 (10bmansurov) [12:37:08] joal: Why we're having alerts from the statsv source? [12:37:49] yes :) [12:40:01] I think that the reason is just that the statsv throughput is so much lower than webrequest or eventlogging, sometimes during periods of low activity it trips the alert. [12:40:15] This is the metric that I am using: `sum by (source,instance,cluster) (rate(rdkafka_producer_topic_partition_msgs{ partition != "-1" }[5m])) * 60` [12:40:23] https://grafana-rw.wikimedia.org/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22thanos%22,%7B%22refId%22:%22A%22,%22instant%22:true,%22range%22:true,%22exemplar%22:false,%22expr%22:%22sum%20by%20(source,instance,cluster)%20(rate(rdkafka_producer_topic_partition_msgs%7B%20partition%20!%3D%20%5C%22-1%5C%22%20%7D%5B5m%5D))%20*%2060%22%7D%5D [12:41:56] It's intended to be "rate of messages being sent is 0 per second for 1 minute" - and the comparison is 'equals zero' [12:43:09] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-data-engineering/varnishkafka.yaml#5 [12:43:55] makes sense btullis [12:45:21] I don't yet know much about the statsv source, although I do understand statsd and graphite etc. I think that statsv is generated by this extension: https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/3cd62f95f49d23a475770a98e48e16c65c8fa2b2/modules/all/ext.wikimediaEvents.statsd.js [12:46:23] ...but it may be only lightly used. The number of events generated using this stream, divided by the number of edge cache servers, means that it's a very low throughput. [12:46:51] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10ayounsi) >>! In T300977#7821702, @Isaac wrote: > Chiming in as a heavy user of the stat boxes. It's difficult for me... [12:46:57] I therefore thought it best to disable this (https://gerrit.wikimedia.org/r/c/operations/alerts/+/776912) while I investigate more. Maybe set up a different alert for statsv if necessary. [12:48:05] ack - thanks for the explanation btullis :) [13:13:25] (03PS8) 10Aqu: Fix: Prevent empty normalized host [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (https://phabricator.wikimedia.org/T300029) [13:20:03] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10ayounsi) @BTullis > I realize that this suggestion increases the scope if the task considerably yup :) We unfortunate... [14:18:18] 10Data-Engineering: Deploy the GDI Equity Landscape Dashboard - https://phabricator.wikimedia.org/T305468 (10EChetty) [14:26:52] ottomata: https://wikitech.wikimedia.org/wiki/File:Possible_Shared_Data_Platform.jpg "WikiData" is like writing "WikiPedia". Every time you do it, a kitten dies [14:29:49] hahaha [14:35:50] FYI, I'm upgrading libapache-mod-auth-cas on the hosts running Yarn, Hue, Superset and Piwik, there will be brief glitches during the Apache restarts involved [14:36:04] 10Data-Engineering: Deploy the GDI Equity Landscape Dashboard - https://phabricator.wikimedia.org/T305468 (10EChetty) [14:36:29] moritzm: ack - Thanks for the heads-up. [14:42:26] these are all done now [14:43:58] 10Data-Engineering: Milestone 1: Input Data Models Complete. - https://phabricator.wikimedia.org/T305473 (10EChetty) [14:46:39] 10Data-Engineering: Milestone: Input Data Models Complete. - https://phabricator.wikimedia.org/T305473 (10EChetty) [14:48:44] 10Data-Engineering: Milestone: Transformation Definitions Complete: - https://phabricator.wikimedia.org/T305474 (10EChetty) [14:50:53] (03CR) 10Tchanders: [C: 03+2] Add new event action [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774980 (https://phabricator.wikimedia.org/T296428) (owner: 10AGueyte) [14:51:23] 10Data-Engineering: Milestone: Ingest and Transform Input Data - https://phabricator.wikimedia.org/T305475 (10EChetty) [14:52:17] (03Merged) 10jenkins-bot: Add new event action [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774980 (https://phabricator.wikimedia.org/T296428) (owner: 10AGueyte) [14:54:40] 10Data-Engineering: Milestone: Dashboard Mockup Complete - https://phabricator.wikimedia.org/T305476 (10EChetty) [14:56:14] 10Data-Engineering: Milestone: Dashboard Interaction Map Complete - https://phabricator.wikimedia.org/T305477 (10EChetty) [14:57:05] (03CR) 10Tchanders: Add new event action (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774980 (https://phabricator.wikimedia.org/T296428) (owner: 10AGueyte) [14:57:57] 10Data-Engineering: Milestone: Data Visualization Table Views define - https://phabricator.wikimedia.org/T305478 (10EChetty) [14:58:07] Hi btullis, I'm about to start the upgrading of the dbstore hosts, want to hangout? [15:01:55] !log set dbstore1003.eqiad.wmnet to downtime for upgrade [15:01:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:02:03] Yep, bee there is asec. [15:02:05] !log set dbstore1003.eqiad.wmnet to downtime for upgrade T299481 [15:02:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:02:10] T299481: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 [15:03:27] razzi: bc ? [15:05:11] 10Data-Engineering: Milestone: Dashboard Template Complete - https://phabricator.wikimedia.org/T305479 (10EChetty) [15:08:11] 10Data-Engineering: Milestone: Create and Publish Data Visualisation Views: - https://phabricator.wikimedia.org/T305480 (10EChetty) [15:08:26] 10Data-Engineering: Milestone: Create and Publish Data Visualisation Views: - https://phabricator.wikimedia.org/T305480 (10EChetty) [15:08:40] joal: I have to run a query on a month of webrequest. It's only looking at 5 columns. I was thinking just a hive query would be the most gentle on resources, right? [15:10:11] with SET mapred.job.queue.name=nice; [15:10:11] !log razzi@cumin1001:~$ sudo cookbook sre.hosts.reimage --os bullseye -t T299481 dbstore1003 [15:10:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:10:14] T299481: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 [15:10:38] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by razzi@cumin1001 for host dbstore1003.eqiad.wmnet with O... [15:11:45] 10Data-Engineering: Milestone: Publish the Dashboard! - https://phabricator.wikimedia.org/T305481 (10EChetty) [15:12:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka for instance cp1087:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:13:31] 10Data-Engineering: Milestone: Data Visualization Table Views defined - https://phabricator.wikimedia.org/T305478 (10EChetty) [15:14:45] 10Data-Engineering: Milestone: Input Data Models Complete. - https://phabricator.wikimedia.org/T305473 (10EChetty) [15:17:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka for instance cp1087:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:32:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka for instance cp1087:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:39:16] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by razzi@cumin1001 for host dbstore1003.eqiad.wmnet with OS bu... [15:39:35] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Epic: [Airflow] User manual and documentation - https://phabricator.wikimedia.org/T295199 (10mforns) [15:40:44] hi a-team, we need to store some permanent data (section aligments) in HDFS, now we have this on Muniza's (reserach contractor) HDFS folder, but we would like to put in some permanent folder. Where will be the best place to copy this? [15:42:26] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10razzi) Looks like reimage went fine; the warning about icinga status is that the replication has not caught up, but I see the r... [15:43:13] dsaez: Is this a one-off data, or is it updated recurrently? [15:45:10] mforns: They should be recomputed every-year probabibly, but we don't have the scheduling system ready yet [15:46:48] dsaez: and do they contain identifying information or data indicative of sensible user patterns? [15:47:04] sorry, I don't know about section alignments [15:47:30] *sensitive [15:47:36] :D no, no sensitive data [15:47:39] (not sensible :P) [15:48:27] dsaez: and do you want to have a Hive table on top of them? [15:50:31] if is not problematic, it would be good to have the hive table [15:52:35] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by razzi@cumin1001 for host dbstore1005.eqiad.wmnet with O... [15:54:06] !log razzi@cumin1001:~$ sudo cookbook sre.hosts.reimage --os bullseye -t T299481 dbstore1005 [15:54:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:54:09] T299481: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 [15:59:51] (03PS4) 10Bearloga: movement_metrics: Migration and cleanup [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/766196 (https://phabricator.wikimedia.org/T295332) (owner: 10Mayakpwiki) [16:04:56] joal: ottomata standup !! [16:05:02] milimetric: too [16:13:10] razzi: skipping standupu today to focus on airflow sprint stuff will be there tomorrow [16:17:47] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Cmjohnson) Moved all 3 servers to xe-0/0/28 on their respective switches, and committed the change on homer. [16:19:35] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by razzi@cumin1001 for host dbstore1005.eqiad.wmnet with OS bu... [16:28:54] 10Analytics: PySpark is unable to find Hive tables - https://phabricator.wikimedia.org/T305457 (10JAllemandou) Hi @bmansurov - The new tag for the team is "Data Engineering" - I updated the task accordingly. I think your problem comes from you using a non-standard python env for the cluster that wouldn't contain... [16:30:22] 10Data-Engineering: PySpark is unable to find Hive tables - https://phabricator.wikimedia.org/T305457 (10JAllemandou) [16:32:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics (Kanban): Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10BTullis) a:03BTullis [16:33:09] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=27c2b587-9114-435a-8894-b5c96a8ee85b) set by razzi@cumin1001 f... [16:33:11] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10razzi) [16:33:39] (03PS5) 10Bearloga: movement_metrics: Migration and cleanup [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/766196 (https://phabricator.wikimedia.org/T295332) (owner: 10Mayakpwiki) [16:34:03] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Deprecate/delete the mw.eventLog.Schema class - https://phabricator.wikimedia.org/T305491 (10phuedx) [16:34:13] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Deprecate/delete the mw.eventLog.Schema class - https://phabricator.wikimedia.org/T305491 (10phuedx) [16:37:02] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by razzi@cumin1001 for host dbstore1007.eqiad.wmnet with O... [16:37:55] (03PS6) 10Bearloga: movement_metrics: Migration and cleanup [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/766196 (https://phabricator.wikimedia.org/T295332) (owner: 10Mayakpwiki) [16:38:01] (03CR) 10Snwachukwu: "Hi Joseph," [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [16:39:00] (03PS7) 10Bearloga: movement_metrics: Migration and cleanup [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/766196 (https://phabricator.wikimedia.org/T295332) (owner: 10Mayakpwiki) [16:40:03] (03CR) 10Bearloga: [V: 03+2 C: 03+2] "Verification notes in T295332#7832646" [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/766196 (https://phabricator.wikimedia.org/T295332) (owner: 10Mayakpwiki) [16:42:15] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics (Kanban): Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10Mayakp.wiki) [16:54:14] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster [16:56:48] aqu: I managed to solve the test issue I showed you, and now the dag tests are passing! [16:56:54] were they failing for you? [16:57:45] Yes, they were. At least aqs/hourly, I remember. [16:58:24] aqu: with the latest code? [16:59:45] I think so: 82bb2ef [17:00:54] Still failing for me : FAILED tests/analytics/aqs/hourly_dag_test.py::test_aqs_hourly_loaded - assert {'/Users/aqu/Documents/airflow-dags/analytics_test/dags/aqs/hourly_dag.py': 'Traceback '\n ... [17:05:42] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by razzi@cumin1001 for host dbstore1007.eqiad.wmnet with OS bu... [17:06:40] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10razzi) [17:12:32] (03CR) 10Joal: "Thank you for your answers Sandra :) I added 2 comments to lines #14 and #46 :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [17:20:27] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10mpopov) [17:20:49] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10mpopov) p:05Triage→03High [17:23:39] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster execut... [17:25:34] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10BTullis) I have now changed the ownership as requested. ====Before ==== ` btullis@an-launcher1002:~$ sudo -u hdfs kerberos-run-co... [17:42:23] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10razzi) All the reimages are done. Thanks for your input @Marostegui and @Ladsgroup . You were right @Marostegui that the start... [17:58:56] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10cmooney) ^^ Above reimage seemed to fail due to some disk problem, I suspect maybe the raid config needs to be done in the BIOS (I was running... [18:25:23] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10Mayakp.wiki) thanks @BTullis ! I will let you know if something fails. [19:05:50] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Add UI in mobile view to switch to table view - https://phabricator.wikimedia.org/T191019 (10Umherirrender) [19:46:55] 10Data-Engineering, 10Platform Engineering Roadmap: Audit/review pageviews test cases - https://phabricator.wikimedia.org/T305502 (10Eevans) [19:47:05] 10Data-Engineering, 10Platform Engineering Roadmap: Audit/review pageviews test cases - https://phabricator.wikimedia.org/T305502 (10Eevans) p:05Triage→03Medium [20:05:17] 10Analytics-Kanban, 10Data-Engineering, 10Product-Analytics, 10SRE, 10wmfdata-python: wmfdata.mariadb relies on analytics-mysql being available - https://phabricator.wikimedia.org/T292479 (10JArguello-WMF) [20:06:46] 10Data-Engineering, 10Platform Engineering Roadmap, 10User-Eevans: Retroactively fix logging to use a RequestScopedLogger where applicable - https://phabricator.wikimedia.org/T305504 (10Eevans) [20:07:49] 10Data-Engineering, 10Platform Engineering Roadmap, 10User-Eevans: Retroactively fix logging to use a RequestScopedLogger where applicable - https://phabricator.wikimedia.org/T305504 (10Eevans) p:05Triage→03Medium a:03FGoodwin [20:07:57] 10Data-Engineering, 10Platform Engineering Roadmap: Retroactively fix logging to use a RequestScopedLogger where applicable - https://phabricator.wikimedia.org/T305504 (10Eevans) [20:18:54] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Cmjohnson) @cmooney Can you confirm the raid setup please. analytics-flex is first 2 ssds are raid 1 and the rest jbod? [20:41:03] !log deploying refinery for https://gerrit.wikimedia.org/r/c/analytics/refinery/+/776269/ [20:41:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:52:34] 10Data-Engineering, 10Platform Engineering Roadmap: Retroactively fix uses of fmt.Sprintf where no formatting occurs - https://phabricator.wikimedia.org/T305510 (10Eevans) [20:52:44] 10Data-Engineering, 10Platform Engineering Roadmap: Retroactively fix uses of fmt.Sprintf where no formatting occurs - https://phabricator.wikimedia.org/T305510 (10Eevans) p:05Triage→03Medium [20:55:17] 10Data-Engineering, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: Problem details for HTTP APIs (rfc7807) - https://phabricator.wikimedia.org/T302536 (10Eevans) p:05Triage→03Medium [20:58:56] 10Data-Engineering, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: Problem details for HTTP APIs (rfc7807) - https://phabricator.wikimedia.org/T302536 (10Eevans) >>! In T302536#7818374, @Milimetric wrote: > [ ... ] > I thought a bit about how this might affect users that are... [21:36:51] 10Data-Engineering, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: Problem details for HTTP APIs (rfc7807) - https://phabricator.wikimedia.org/T302536 (10Eevans) I'm going to boldly attempt to summarize the discussion, and see if we've reached a point where we can move forwa... [21:45:09] 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety, 10serviceops: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10Dzahn) [21:55:01] 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety, 10serviceops: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10Dzahn) > Modify the puppet code to no longer download the databases from MaxMind and then propagate to other servers/destinations. This is done. puppet c... [21:57:33] 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety, 10serviceops: Disable GeoIP Legacy Download / Identify all users of legacy (v1) GeoIP datasets and inform them of the need to switch to GeoIP2 dataset - https://phabricator.wikimedia.org/T303464 (10Dzahn) a:05Dzahn→03None [22:02:54] 10Analytics, 10SRE, 10Traffic, 10Patch-For-Review: Remove ganglia leftovers from ops/puppet - https://phabricator.wikimedia.org/T253555 (10Dzahn) [22:06:39] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset, 10Patch-For-Review: Upgrade Superset to 1.4.2 - https://phabricator.wikimedia.org/T304972 (10razzi) I thought I'd update the staging database to be the same as production before sharing superset staging widely, and I'm glad I did, because it looks... [22:20:52] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset, 10Patch-For-Review: Upgrade Superset to 1.4.2 - https://phabricator.wikimedia.org/T304972 (10razzi) Ok, looks like the following will resolve it: ` razzi@an-coord1001:~$ sudo mysql superset_staging MariaDB [superset_staging]> update dbs set encryp... [22:27:38] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset, 10Patch-For-Review: Upgrade Superset to 1.4.2 - https://phabricator.wikimedia.org/T304972 (10razzi) Hi Product Analytics, superset 1.4.2 is ready to be tested on staging. Once we confirm there are no showstopping bugs we'll r... [23:37:51] (03PS3) 10Snwachukwu: [WIP] Create a Hive to Graphite job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) [23:43:36] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Create a Hive to Graphite job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu)