[00:19:59] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Services, 10Patch-For-Review: Move wikireplicas dbproxy haproxy config to etcd - https://phabricator.wikimedia.org/T304478 (10razzi) I have opened a patch for this that writes to a config file other than the actual one, so it can be inspected for correc... [00:54:48] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10razzi) Merged my latest patch to remove Type=notify (https://gerrit.wikimedia.org/r/c/operations/puppet/+/773387, I used the other karapace ticket so it didn't po... [01:29:07] PROBLEM - Check unit status of monitor_refine_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:54:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Services, 10Patch-For-Review: Move wikireplicas dbproxy haproxy config to etcd - https://phabricator.wikimedia.org/T304478 (10Joe) Hey sorry I didn't take the time to reply earlier, I was a bit busy. We already have a structure in etcd to describe pool... [09:19:00] 10Data-Engineering, 10Data-Engineering-Kanban: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10elukey) 05Declined→03Open Hey @Milimetric, can you add more info about what are the concerns for this task? It is a simple alert, we have sim... [09:19:02] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10elukey) [09:58:14] I will restart stat1005 and stat1008 shortly, as per my email to analytics-announce. [10:28:38] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10MoritzMuehlenhoff) >>! In T304450#7797902, @Ottomata wrote: > @MoritzMuehlenhoff advice? Can I import [[ https://docs.conda.io/projects/conda/en/latest/user-guide/install/r... [11:14:03] 10Analytics, 10Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 10QuickSurveys, and 2 others: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10awight) I agree with Milimetric's summary. In my opinion, the instrumentation can stay, but wi... [11:22:21] I'm going to begin a rolling restart of the hadoop datanode and nodemanager processes to pick up a new JVM version. T300626 [11:35:47] 10Data-Engineering-Radar, 10SRE, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10jbond) p:05Triage→03Medium [11:41:18] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents, 10Patch-For-Review: Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) [11:51:06] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents, 10Patch-For-Review: Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) > Request the deletion of any previously allowlisted data if it's necessary @nshahquinn-wmf: Are you... [13:30:42] (03CR) 10Ottomata: "Hm, yes this will cause no problems! At least for now, the enum values are used for validation only, not for any type integration (e.g. H" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [15:06:57] (03CR) 10Jdlrobson: [C: 03+2] Updating desktopwebuiactionstracking with viewport buckets [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/767439 (https://phabricator.wikimedia.org/T301391) (owner: 10Jdrewniak) [15:07:42] (03Merged) 10jenkins-bot: Updating desktopwebuiactionstracking with viewport buckets [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/767439 (https://phabricator.wikimedia.org/T301391) (owner: 10Jdrewniak) [15:33:51] 10Data-Engineering, 10Privacy Engineering: Investigate releasing historical top-pageview-per-country data - https://phabricator.wikimedia.org/T299627 (10Htriedman) Hi @JAllemandou — does this pageview data exist in a private table somewhere stripped of the `actor_signature` field? Or is it preaggregated someho... [15:53:37] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents, 10MW-1.39-notes (1.39.0-wmf.5; 2022-03-28): Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10nshahquinn-wmf) @phuedx no, you can go ahead and delete that data. Thanks for taking... [15:55:30] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10BTullis) I'm in agreement with @elukey I think. It seems sensible for us to implement this prometheus based check anyway, g... [16:26:19] 10Data-Engineering: Drop sanitized externalguidance data - https://phabricator.wikimedia.org/T304714 (10phuedx) [16:26:27] 10Data-Engineering: Drop sanitized ExternalGuidance data - https://phabricator.wikimedia.org/T304714 (10phuedx) [16:27:11] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents, 10MW-1.39-notes (1.39.0-wmf.5; 2022-03-28): Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) [16:27:17] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents, 10MW-1.39-notes (1.39.0-wmf.5; 2022-03-28): Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) [16:27:19] 10Data-Engineering: Drop sanitized ExternalGuidance data - https://phabricator.wikimedia.org/T304714 (10phuedx) [16:27:35] 10Data-Engineering: Drop sanitized ExternalGuidance data - https://phabricator.wikimedia.org/T304714 (10phuedx) [16:27:38] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [16:27:42] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents, 10MW-1.39-notes (1.39.0-wmf.5; 2022-03-28): Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) 05Open→03Resolved a:03phuedx Being **bold**. [16:34:36] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10BTullis) I asked a question [[https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-traffic/20220325.txt|in #wikimedia-traffi... [16:44:20] RECOVERY - Check unit status of monitor_refine_event on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:44:25] yay! [16:45:40] (03PS3) 10Bearloga: movement_metrics: Migration and cleanup [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/766196 (https://phabricator.wikimedia.org/T295332) (owner: 10Mayakpwiki) [16:52:55] (03CR) 10Neil P. Quinn-WMF: "New patchset coming shortly!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) (owner: 10Neil P. Quinn-WMF) [16:54:29] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Milimetric) >>! In T300365#7661344, @Piotrus wrote: > This request is related to the Wikipedia discussion https://en.wik... [16:54:36] (03PS1) 10Milimetric: Fix usability bugs on active editors by country [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/773816 (https://phabricator.wikimedia.org/T300365) [16:55:00] 10Data-Engineering, 10Data-Engineering-Kanban: Investigate trend of gradual hive server heap exhaustion - https://phabricator.wikimedia.org/T303168 (10BTullis) I have copied the heap dumps to my workstation for analysis and removed then from an-coord1001. [17:04:02] (03PS2) 10Neil P. Quinn-WMF: Create schemas for Wikistories instrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) [17:06:07] (03PS1) 10Milimetric: Fix missing tooltip and another misleading one [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/773818 (https://phabricator.wikimedia.org/T303990) [17:09:50] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Feature requests for Active Editors by Country - https://phabricator.wikimedia.org/T304720 (10Milimetric) [17:10:09] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Feature requests for Active Editors by Country - https://phabricator.wikimedia.org/T304720 (10Milimetric) [17:10:32] (03CR) 10Neil P. Quinn-WMF: Create schemas for Wikistories instrumentation (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) (owner: 10Neil P. Quinn-WMF) [17:21:39] (03PS5) 10Sharvaniharan: New schema for edit history screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 [17:23:47] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Isaac) Thanks @Mayakp.wiki for sharing these presentations! Is there a suggested public reference for this... [17:25:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10Milimetric) Ok, sorry for the noise, and thanks for explaining. Makes sense. I would say that if the false positives get... [17:27:52] (03PS6) 10Sharvaniharan: New schema for measuring article screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772910 (https://phabricator.wikimedia.org/T304335) [18:14:20] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Mayakp.wiki) @Isaac We provided updates on the data loss as a part of our monthly key metrics for February... [18:33:40] (03PS7) 10Sharvaniharan: New schema for measuring article screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772910 [18:36:29] (03CR) 10Sharvaniharan: "@" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan) [18:54:19] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, 10cloud-services-team (Kanban): View 'centralauth_p.localuser' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T304733 (10Zabe) [18:54:27] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, 10cloud-services-team (Kanban): View 'centralauth_p.localuser' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T304733 (10Zabe) p:05Triage→03High [19:01:01] 10Analytics, 10API Platform: AQS 2.0 local tests fail when mwcli is running - https://phabricator.wikimedia.org/T304735 (10BPirkle) [19:01:14] 10Analytics, 10API Platform: AQS 2.0 local tests fail when mwcli is running - https://phabricator.wikimedia.org/T304735 (10BPirkle) p:05Triage→03Low [19:23:52] 10Analytics-Radar, 10Wikimedia-production-error: eventgate_validation_error: '.web_session_id' should NOT be shorter than 20 characters - https://phabricator.wikimedia.org/T297521 (10cjming) related to T283881 ? [19:36:59] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, 10cloud-services-team (Kanban): View 'centralauth_p.localuser' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T304733 (10bd808) I think this is just a missed rebuild... [19:38:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [19:43:16] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [19:58:43] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, 10cloud-services-team (Kanban): View 'centralauth_p.localuser' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T304733 (10Zabe) Yeah, before https://gerrit.wikimedia.o... [20:02:02] (03PS1) 10Milimetric: Change all output columns to lowercase [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/773842 (https://phabricator.wikimedia.org/T298928) [20:02:50] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "Just changing case on output column names" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/773842 (https://phabricator.wikimedia.org/T298928) (owner: 10Milimetric) [20:05:19] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Recreate views for globaluser table - https://phabricator.wikimedia.org/T301674 (10Zabe) Since the gu_hidden change also affected the localuser table, that one also needs to be rebuild, see T304733. [20:07:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, 10cloud-services-team (Kanban): View 'centralauth_p.localuser' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T304733 (10Marostegui) Yes, the views need to be recreat... [20:19:36] (03PS3) 10Clare Ming: Add new enum value to webuiscroll schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) [20:22:29] (03PS4) 10Clare Ming: Add new enum value to webuiscroll schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) [20:24:10] (03CR) 10Clare Ming: Add new enum value to webuiscroll schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [21:06:22] 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety, and 2 others: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10Dzahn) a:03Dzahn [21:24:16] (03PS5) 10Jdlrobson: Add new enum value to webuiscroll schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [21:49:04] (03CR) 10Jdlrobson: [C: 03+2] Add new enum value to webuiscroll schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [21:51:12] (03CR) 10Jdlrobson: [C: 03+2] "S" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [21:53:06] (03Merged) 10jenkins-bot: Add new enum value to webuiscroll schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [21:53:52] 10Analytics-Radar, 10Wikimedia-production-error: eventgate_validation_error: '.web_session_id' should NOT be shorter than 20 characters - https://phabricator.wikimedia.org/T297521 (10Jdlrobson) [22:07:11] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:18:25] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:32:50] 10Data-Engineering, 10Data-Engineering-Radar, 10wmfdata-python, 10Product-Analytics (Kanban): Update Wmfdata-Python documention to describe code stewardship - https://phabricator.wikimedia.org/T304545 (10nshahquinn-wmf) 05Open→03Resolved Done in [aa39297](https://github.com/wikimedia/wmfdata-python/com... [23:35:35] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Support importing a Parquet file into HDFS using wmfdata-python - https://phabricator.wikimedia.org/T273196 (10nshahquinn-wmf) [23:37:56] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Support importing a Parquet file into HDFS using wmfdata-python - https://phabricator.wikimedia.org/T273196 (10nshahquinn-wmf) I plan to finish the [pull request](https://github.com/wikimedia/wmfdata-python/pull/25) in July, after I return from sabba... [23:53:38] (03PS1) 10Milimetric: Fix gulp build properly this time [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/773891 [23:53:40] (03PS1) 10Milimetric: Improve usability of the text table [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/773892 (https://phabricator.wikimedia.org/T298929)