[00:05:14] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Dreamy_Jazz) [00:20:18] 10Data-Engineering, 10Metrics-Platform-Planning, 10Platform Engineering, 10User-Urbanecm: Access to aggregate User Agent statistics - https://phabricator.wikimedia.org/T298912 (10Dreamy_Jazz) Direct access to the checkuser cu_changes table is unlikely to work. Currently the table that stores the user agent... [09:38:52] 10Quarry, 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro, 10cloud-services-team (Kanban): [quarry] worker-04 down - https://phabricator.wikimedia.org/T324402 (10dcaro) I'll leave it to you to decide what solution to implement :) [09:38:58] 10Quarry, 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro, 10cloud-services-team (Kanban): [quarry] worker-04 down - https://phabricator.wikimedia.org/T324402 (10dcaro) 05In progress→03Resolved [09:39:27] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10JMeybohm) >>! In T324576#8451544, @Ottomata wrote: > Is this possible to do with helm, or will that require manual e.g. `kubectl ed... [09:40:30] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update varnishkafka client certificate for authenticating to kafka-jumbo - https://phabricator.wikimedia.org/T323771 (10Stevemunene) disabling puppet temporarily on cp hosts stevemunene@cumin1001:~$ sudo cumin A:cp "disabl... [09:51:26] 10Data-Engineering, 10Event-Platform Value Stream: Flink wrappers and helper libraries should be moved into a dedicated git repo with packaging and CI. - https://phabricator.wikimedia.org/T324746 (10gmodena) [09:52:05] 10Data-Engineering, 10Event-Platform Value Stream: [EPIC] Streaming and event driven Python services - https://phabricator.wikimedia.org/T324689 (10gmodena) [09:52:23] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 06): Flink wrappers and helper libraries should be moved into a dedicated git repo with packaging and CI. - https://phabricator.wikimedia.org/T324746 (10gmodena) [09:52:35] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 05): Flink wrappers and helper libraries should be moved into a dedicated git repo with packaging and CI. - https://phabricator.wikimedia.org/T324746 (10gmodena) [09:54:06] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update varnishkafka client certificate for authenticating to kafka-jumbo - https://phabricator.wikimedia.org/T323771 (10Stevemunene) Generate the certificates ` root@puppetmaster1001:~# cergen --generate --force -c 'varnish... [10:28:54] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update varnishkafka client certificate for authenticating to kafka-jumbo - https://phabricator.wikimedia.org/T323771 (10Stevemunene) Successfully restarted services ` varnishkafka-eventlogging.service varnishkafka-statsv.s... [11:07:16] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update varnishkafka client certificate for authenticating to kafka-jumbo - https://phabricator.wikimedia.org/T323771 (10Stevemunene) batch restarting varnishkafka-eventlogging.service to pick new certs. ` stevemunene@cumin1... [11:14:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:19:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:32:43] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update varnishkafka client certificate for authenticating to kafka-jumbo - https://phabricator.wikimedia.org/T323771 (10Stevemunene) batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between `... [11:48:09] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 05): Flink wrappers and helper libraries should be moved into a dedicated git repo with packaging and CI. - https://phabricator.wikimedia.org/T324746 (10gmodena) I'm setting up two repos: - https://gitlab.wikimedia.org/repos/data-engineering/eventutilit... [12:19:48] Checking the webrequest error now [12:23:41] !log rerun webrequest failed jobs for hour 2022-12-08-T11:00Z [12:23:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:23:44] T11: enable DarkConsole in phabricator - https://phabricator.wikimedia.org/T11 [12:31:33] joal: many thanks indeed [12:33:33] joal: anything I can do to help? [12:38:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp6010 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [12:43:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp6010 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [12:46:20] btullis: it seems that oozie fails to send emails with attachments since we have changed the emailing sstem [12:46:25] btullis: could that be true? [12:49:40] btullis: In any case, we need to rerun the failed hour with dedicated coordinators as explained here https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dealing_with_data_loss_alarms (rerunning a failed webrequest job section [12:57:02] joal: I don't think that the mailman list is supposed to strip any attachments, but I can look into it. [12:58:06] joal: But I'm happy to rerun the failed hour though. [13:00:42] btullis: would you be the right person to help pfischer to get a kerberos principal ? [13:00:54] for context: T323822 and maybe https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_principal_for_a_real_user [13:00:54] T323822: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 [13:01:43] 10Analytics: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10Gehel) In case this is needed, as @pfischer's manager, I approve! [13:02:01] gehel, yes. I'm on Ops Week with steven_munene this week, so this is something that I should have picked up already. [13:03:36] btullis: thanks ! Let me know if you need any other info to get this moving [13:03:53] Ah right, it only had the 'Analytics' tag, which we now longer used, so it didn't get spotted. If there were some instructions stating to add the analytics tag, we should update those to say 'Data Engineering' [13:04:43] 10Data-Engineering: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10BTullis) a:03BTullis Adding to #data-engineering and claiming. [13:05:40] I'm not sure where Peter got the instructions. I'll ask him to jump in here and check with you [13:07:07] 10Data-Engineering: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10BTullis) Principal checked and created. ` btullis@krb1001:~$ sudo manage_principals.py get pfischer get_principal: Principal does not exist while retrieving "pfischer@WIKIMEDIA". btullis@krb1001:~$ sudo manage_prin... [13:08:28] 10Data-Engineering: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10BTullis) @pfischer - Coulyou check your email and follow the instructions there please? Apologies for the delay in responding to this request. [13:09:45] gehel: pfischer: I've created that kerberos principal now. Please do let us know if it doesn't work as expected. [13:09:57] btullis: thanks a lot! [13:10:10] A pleasure. [13:10:32] Hi btullis - wanna chat a minute about those webrequest errors? [13:11:04] Yes please, I'm a bit lost to be honest. [13:12:16] let's batcavw btullis [13:28:53] 10Data-Engineering: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10pfischer) @BTullis, thank you! I followed the instructions and now I'm able to run kinit with my self-defined password. You may close this request. [13:47:50] 10Data-Engineering: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10BTullis) 05Open→03Resolved Great, thanks for confirming @pfischer. [13:55:33] !log rerun webrequest failed jobs for hour 2022-12-08-T11:00Z with updated workflow (no dataloss checks) [13:55:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:55:36] T11: enable DarkConsole in phabricator - https://phabricator.wikimedia.org/T11 [13:57:18] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) > But editing via a values change deployed by helmfile.d would be fine. Actually, this works just fine, we don't have to... [14:20:42] 10Data-Engineering, 10Data Pipelines: Fix oozie webrequest-load error-check corner case - https://phabricator.wikimedia.org/T324757 (10JAllemandou) [15:34:03] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) > I will test and see what happens to a running Flink app when I take the operator offline... # Installed flink-kubernet... [15:47:43] 10Data-Engineering-Planning, 10API Platform (Sprint 02), 10AQS2.0, 10Platform Engineering Roadmap, 10User-Eevans: Obtain security review of uniqueDevices - https://phabricator.wikimedia.org/T320976 (10JArguello-WMF) T324710 has been created to request the security review [16:51:57] 10Data-Engineering, 10Equity-Landscape: Add country_meta_data - https://phabricator.wikimedia.org/T324681 (10JAnstee_WMF) @ntsako "country_meta_data to HIVE" sheet is now updated for upload of the reduced column version as agreed: https://docs.google.com/spreadsheets/d/1kGL-s7EACBjD_z0YjlBm24Q0macxtJCoXmnbnAEP... [17:01:24] 10Data-Engineering, 10Data Pipelines: Update Automated Traffic Detection Documentation - https://phabricator.wikimedia.org/T324777 (10odimitrijevic) [17:08:44] 10Data-Engineering, 10Data Pipelines (Sprint 05-06): Update Automated Traffic Detection Documentation - https://phabricator.wikimedia.org/T324777 (10JArguello-WMF) [17:12:22] 10Data-Engineering, 10Data Pipelines: Update Automated Traffic Detection Documentation - https://phabricator.wikimedia.org/T324777 (10JArguello-WMF) [17:23:25] 10Data-Engineering, 10Data Pipelines: Update Automated Traffic Detection Documentation - https://phabricator.wikimedia.org/T324777 (10JArguello-WMF) @odimitrijevic What is the due date of this task? [18:09:41] 10Data-Engineering, 10Data Pipelines: When moving oozie webrequest-load to airflow/spark avoid the error-check corner case - https://phabricator.wikimedia.org/T324757 (10JAllemandou) [18:24:53] 10Data-Engineering, 10API Platform (Sprint 02), 10AQS2.0, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews: Implement Unit Tests - https://phabricator.wikimedia.org/T299735 (10BPirkle) [18:59:32] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) Oh, re webhook again: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes... [21:40:13] 10Data-Engineering: Document how to show your work in phabricator and/or elsewhere - https://phabricator.wikimedia.org/T324796 (10Ottomata) [21:40:27] 10Data-Engineering: Document how to show your work in phabricator and/or elsewhere - https://phabricator.wikimedia.org/T324796 (10Ottomata) p:05Triage→03High [21:43:58] (03CR) 10Ottomata: [C: 03+2] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [21:44:36] (03Merged) 10jenkins-bot: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [22:10:16] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) Got a WIP first draft of a flink-app helm chart [[ https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/866510...