[01:55:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [02:00:16] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [05:02:51] 10Analytics, 10Data-Engineering, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10Marostegui) >>! In T299481#7816378, @razzi wrote: > - start mysql service > > ` > systemctl start 'mariadb@s*' > ` I don't think this will work, you'll... [06:39:03] PROBLEM - Check unit status of mediawiki-history-drop-snapshot on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit mediawiki-history-drop-snapshot https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:45:30] (03CR) 10STran: [C: 03+1] "this lgtm. I don't think we have to increment the version or anything, do we? Since afaik we haven't collected anything yet and additional" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774980 (https://phabricator.wikimedia.org/T296428) (owner: 10AGueyte) [07:29:05] 10Data-Engineering-Kanban, 10Airflow: Medium Risk Oozie Migration: mediarequest - https://phabricator.wikimedia.org/T302876 (10Antoine_Quhen) https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/41 [07:47:54] (03PS1) 10Aqu: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/775255 (https://phabricator.wikimedia.org/T302876) [09:27:24] 10Data-Engineering, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: [DISCUSS]: Problem details for HTTP APIs (rfc7807) - https://phabricator.wikimedia.org/T302536 (10BTullis) This is very interesting, thanks for reaching out. I can see the discrepancy between what AQS returns... [10:08:22] 10Analytics, 10Data-Engineering-Radar, 10Event-Platform, 10Metrics-Platform, 10Browser-Support-Microsoft-Edge: Problem with delay caused by intake-analytics.wikimedia.org - https://phabricator.wikimedia.org/T295427 (10phuedx) As @AlexisJazz has said, analytics events are sent using [[ https://developer.m... [10:12:12] (VarnishkafkaNoMessages) firing: varnishkafka for instance cp2037:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2037:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [10:12:56] looking --^ [10:17:12] (VarnishkafkaNoMessages) resolved: varnishkafka for instance cp2037:9132 is not logging cache_text requests from statsv - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=cp2037:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:51:27] OK, so this alert was genuine, but it came from the statsv source, which has very low throughput compared with the webrequest or eventlogging source. Maybe we should be excluding this, or setting a different threshold. [11:52:58] a-team: I'd be grateful if someone could double-check the IP addresses for me in these two files please: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/764375/46/helmfile.d/services/datahub/values.yaml and https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/764375/46/helmfile.d/services/datahub/values-staging.yaml [11:54:40] I've added some grep and dig commands that I've used to extract them and reverse-DNS them as a comment on the thread, but it would really help if somebody could sanity check them too please. [11:55:02] https://usercontent.irccloud-cdn.com/file/boTBKKYU/image.png [11:55:34] It's all about getting DataHub deployed to Kubernetes today, with a bit of luck. [11:58:11] Broadly speaking these egress rules should match the diagram here: https://phabricator.wikimedia.org/T303049 [12:08:33] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Piotrus) @Milimetric Thanks for the replies. I am not sure what privacy concerns exist here (I did read the linked page... [12:16:50] btullis: o/ [12:18:07] you want me to do the same ip verification script you did just to double check? or verify in some other way? [12:22:12] You could run the scripts again, but it's more of a sanity check from a second pair of eyes that's needed. Janis has helped me to get the helm charts to a good state, but hasn't checked that the network policies actually make sense. I've been staring at them for too long to make sense of them 🙂 [12:25:54] oh okay, so not really checking IPs, but checking that logically you've got what you intended? [12:34:58] Yes please. [13:08:16] 10Data-Engineering, 10Project-Admins: Archive Analytics tag - https://phabricator.wikimedia.org/T298671 (10Aklapper) a:03odimitrijevic [13:34:06] ottomata: Many thanks for checking and for those comments. Looking fairly hopeful for a +1 from serviceops today. [13:48:27] 10Data-Engineering, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: [DISCUSS]: Problem details for HTTP APIs (rfc7807) - https://phabricator.wikimedia.org/T302536 (10Milimetric) I agree with making the spec match the implementation. I also agree with making the spec complian... [13:56:19] 10Data-Engineering, 10Data-Services, 10User-Ladsgroup: Make linktarget table visible on cloud wiki replicas - https://phabricator.wikimedia.org/T305064 (10Majavah) [14:01:40] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Update anaconda-wmf's wmfdata-python to 1.3.3 - https://phabricator.wikimedia.org/T305067 (10mpopov) [14:32:37] 10Data-Engineering, 10Data-Services, 10User-Ladsgroup: Make linktarget table visible on cloud wiki replicas - https://phabricator.wikimedia.org/T305064 (10Lucas_Werkmeister_WMDE) As far as I can tell from the code so far, `linktarget` rows don’t get deleted even when they’re no longer used – but the replicas... [14:35:36] 10Data-Engineering, 10Data-Services, 10User-Ladsgroup: Make linktarget table visible on cloud wiki replicas - https://phabricator.wikimedia.org/T305064 (10Majavah) >>! In T305064#7818583, @Lucas_Werkmeister_WMDE wrote: > As far as I can tell from the code so far, `linktarget` rows don’t get deleted even when... [14:50:34] 10Data-Engineering, 10Data-Services, 10User-Ladsgroup: Make linktarget table visible on cloud wiki replicas - https://phabricator.wikimedia.org/T305064 (10Ladsgroup) Yes. It should have a view similar to actor. [15:00:16] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Milimetric) >>! In T300365#7817962, @Piotrus wrote: > @Milimetric Thanks for the replies. I am not sure what privacy con... [15:27:15] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Piotrus) @Milimetric To put things in context, such data is useful for studies of Wikimedia community, which I believe... [15:37:01] 10Data-Engineering, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: [DISCUSS]: Problem details for HTTP APIs (rfc7807) - https://phabricator.wikimedia.org/T302536 (10Eevans) >>! In T302536#7817585, @BTullis wrote: > [ ... ] > > 3) You mentioned removing the `type` field in t... [15:52:38] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) [15:55:25] btullis: awesooome! [16:01:46] ping standup folks :) [16:05:13] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Un-fork analytics/gobblin - https://phabricator.wikimedia.org/T292396 (10JAllemandou) 05Open→03Resolved [16:05:30] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Structured-Data-Backlog, and 3 others: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10JAllemandou) 05Open→03Resolved [16:05:38] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10JAllemandou) 05Open→03Resolved [16:06:00] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic, 10Patch-For-Review: Replace Camus by Gobblin - https://phabricator.wikimedia.org/T271232 (10JAllemandou) 05Open→03Resolved [16:08:18] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Feature requests for Active Editors by Country - https://phabricator.wikimedia.org/T304720 (10Milimetric) [16:10:27] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Milimetric) >>! In T300365#7818847, @Piotrus wrote: > to give you an idea of what those studies look like: https://journ... [16:14:46] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Low Risk Oozie Migration: interlanguage - https://phabricator.wikimedia.org/T300025 (10mforns) [16:15:15] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10razzi) [16:20:00] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) [16:20:54] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) [16:23:57] The Data Infra brainstorm meeting is a bit on the late side for me today, given that the clocks have now changed. Any chance we could make it 30 minutes instead of an hour? [16:28:33] Sorry, I cannot make SRE sync. Ben roped into a singing rehearsal I wasn't expecting. [16:29:31] (03PS1) 10Milimetric: Release 2.9.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/775348 [16:29:49] (03CR) 10Milimetric: [C: 03+2] Release 2.9.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/775348 (owner: 10Milimetric) [16:31:41] (03Merged) 10jenkins-bot: Release 2.9.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/775348 (owner: 10Milimetric) [17:07:56] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Milimetric) The other small fixes (not on the subtask) have been deployed and are available on the main site now. Pleas... [17:28:30] (03CR) 10Tchanders: [C: 04-1] Add new event action (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774980 (https://phabricator.wikimedia.org/T296428) (owner: 10AGueyte) [17:33:11] (03PS1) 10Milimetric: Fix broken footer link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/775359 [17:42:39] (03PS2) 10Milimetric: Fix broken footer link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/775359 [17:47:58] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) hi @Ottomata, I've clicked around Hue and verified that events are being logged successf... [18:15:28] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 2 others: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Awesome! no if you see them that's great. There are some backend finalization steps that I can take from here. I'l... [18:23:01] (03CR) 10Milimetric: [C: 03+2] Fix broken footer link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/775359 (owner: 10Milimetric) [18:26:11] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Product-Analytics, 10MW-1.36-notes (1.36.0-wmf.18; 2020-11-17): Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:37:33] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 2 others: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) [18:37:53] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 2 others: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) [18:42:58] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 2 others: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) [18:43:17] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 2 others: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Just waiting to get my (expired) edit-protect permissions on metawiki back, then I can close this! [19:08:33] (03PS1) 10Snwachukwu: Create a Hive to Graphite job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) [19:16:37] (03CR) 10Milimetric: "(ignore, not for real, just showing how Gerrit works)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [19:18:51] (03PS2) 10Snwachukwu: [WIP] Create a Hive to Graphite job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) [19:25:03] (03CR) 10Snwachukwu: "Thanks Dan." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [20:21:28] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) [20:23:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) The hiera for the kafka jumbo and main clusters in deployment-prep see... [20:27:06] (03CR) 10Ottomata: ":)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [20:27:31] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) [21:37:11] 10Data-Engineering, 10Product-Analytics (Kanban): Change ownership of wmf_product.new_editors to analytics-product - https://phabricator.wikimedia.org/T305109 (10Mayakp.wiki)