[00:39:01] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refinery-drop-webrequest-sequence-stats-partitions.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:43:33] PROBLEM - Check unit status of refinery-drop-webrequest-sequence-stats-partitions on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-webrequest-sequence-stats-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:37:25] RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:11:37] (03CR) 10Gergő Tisza: "This change is ready for review." [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/822151 (https://phabricator.wikimedia.org/T314672) (owner: 10Gergő Tisza) [04:12:09] (03CR) 10CI reject: [V: 04-1] Update docs for mediawiki/recentchange type field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/822151 (https://phabricator.wikimedia.org/T314672) (owner: 10Gergő Tisza) [04:17:25] (03PS4) 10Gergő Tisza: Update docs for mediawiki/recentchange type field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/822151 (https://phabricator.wikimedia.org/T314672) [04:20:11] PROBLEM - Check unit status of monitor_refine_event_sanitized_main_immediate on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event_sanitized_main_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:06:40] 10Analytics-Radar, 10Data-Engineering, 10MediaWiki-extensions-EventLogging: SearchSatisfaction has validation errors for event.query - https://phabricator.wikimedia.org/T257331 (10Gehel) [08:08:06] 10Data-Engineering: Late events in wdqs-external.sparql-query? - https://phabricator.wikimedia.org/T310790 (10Gehel) [08:12:25] 10Analytics-Clusters: Expand the is_search UDF to detect non-API search requests - https://phabricator.wikimedia.org/T111073 (10Gehel) 05Open→03Declined Analytics has changed too much since 2015, this isn't relevant anymore. [08:25:30] 10Data-Engineering, 10Foundational Technology Requests, 10SRE: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10fgiunchedi) [12:31:07] 10Data-Engineering, 10Event-Platform Value Stream, 10Metrics-Platform, 10Browser-Support-Microsoft-Edge, 10Performance-Team (Radar): Problem with delay caused by intake-analytics.wikimedia.org - https://phabricator.wikimedia.org/T295427 (10EChetty) p:05Triage→03Low [13:00:23] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:03:29] RECOVERY - Check unit status of refinery-drop-webrequest-sequence-stats-partitions on an-launcher1002 is OK: OK: Status of the systemd unit refinery-drop-webrequest-sequence-stats-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:05:39] RECOVERY - Check unit status of monitor_refine_event_sanitized_main_immediate on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event_sanitized_main_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:06:48] 10Data-Engineering, 10Foundational Technology Requests, 10SRE: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10BTullis) Since our meeting, I have been reading the docs around benthos and I've got to say, I find it really compelling! This looks to m... [13:27:17] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) [13:38:17] elukey: [13:38:19] https://dominionilibri.it/en/products/la-giornata-dellumarell/ [13:59:18] ottomata: ahahahahahha <3 [14:09:48] (03PS5) 10Ottomata: WIP - Add new mediawiki entity fragments, and use them in new mediawiki page change schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/807565 (https://phabricator.wikimedia.org/T308017) [14:11:49] (03PS6) 10Ottomata: WIP - Add new mediawiki entity fragments, and use them in new mediawiki page change schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/807565 (https://phabricator.wikimedia.org/T308017) [14:14:35] elukey: i translated the page [14:14:42] > From this moment, by rolling the dice, the umarells "leave" the house to reach the goal of the game : [14:14:43] > Get to the center of the board to admire the Great Construction Site. A real sight for the eyes. . . [14:15:38] exactly yes! [14:16:03] haha [14:23:35] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Pageviews integration testing - https://phabricator.wikimedia.org/T299735 (10BPirkle) [14:23:42] 10Data-Engineering, 10Foundational Technology Requests, 10SRE: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Ottomata) I agree benthos looks really fun! I think there is a real need for easy to use stream processors. We evaluated Knative Event... [14:31:49] 10Data-Engineering-Kanban, 10Event-Platform Value Stream, 10Metrics-Platform, 10Wikidata, and 6 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10phuedx) I tested https://gerrit.wikimedia.org/r/818137 against the wrong... [14:40:06] Well, today I learnt: https://en.wikipedia.org/wiki/Umarell :-) [14:45:28] you haven't met my umarell?! [14:45:31] i will introduce you [14:46:06] I look forward to it :-) [14:48:19] hehehe [14:50:24] 10Data-Engineering-Kanban, 10Event-Platform Value Stream, 10Metrics-Platform, 10Wikidata, and 6 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10mforns) Thanks a lot @phuedx! [16:08:23] 10Data-Engineering-Kanban, 10Event-Platform Value Stream, 10Metrics-Platform, 10Wikidata, and 6 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10EChetty) a:05mforns→03phuedx [16:09:11] 10Data-Engineering, 10Product-Analytics: PySpark warning messages - https://phabricator.wikimedia.org/T315024 (10Mayakp.wiki) [16:38:57] 10Data-Engineering, 10Data-Engineering-Kanban: Pageview Data loss due to wrong version of package installed on some varnishkafka instances - https://phabricator.wikimedia.org/T300164 (10BTullis) [16:57:46] 10Data-Engineering, 10Patch-For-Review: Cleanup analytics/refinery/source pom.files - https://phabricator.wikimedia.org/T306193 (10Ottomata) Oh ho just noticed this one! Any reason not to merge this now? @JAllemandou [17:07:02] 10Data-Engineering-Kanban, 10Data Engineering Planning (Sprint 02): Create conda-base-env with last pyspark - https://phabricator.wikimedia.org/T309227 (10Antoine_Quhen) As of today, the changes have been ported between the Dockerfile and the Gitlab-ci. So I am now working on the tar error and eventually the... [17:33:49] 10Analytics-Radar, 10Metrics-Platform, 10Product-Analytics: Draft of full process for instrumentation using new client libraries - https://phabricator.wikimedia.org/T275694 (10Ottomata) [17:34:15] 10Analytics, 10MediaWiki-Core-JobQueue, 10WMF-JobQueue, 10Developer Productivity, 10Platform Team Workboards (Clinic Duty Team): showJobs.php maintenance script useless and misleading in production - https://phabricator.wikimedia.org/T221224 (10Ottomata) [17:34:50] 10Analytics-Radar, 10ChangeProp, 10Community-Tech, 10WMF-JobQueue, and 2 others: RFC: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Ottomata) [17:35:54] 10Analytics-Radar, 10ChangeProp, 10Platform Engineering (Icebox): RESTBase content rerenders sometimes don't pick up the newest changes - https://phabricator.wikimedia.org/T176412 (10Ottomata) [17:36:39] 10Analytics-Radar, 10ChangeProp, 10MediaWiki-Core-JobQueue: Allow easy tuning of the jobqueue concurrency. - https://phabricator.wikimedia.org/T175800 (10Ottomata) > we don't partition Kafka topics by wiki, so events in each topic are shuffled. We could key message in Kafka by wiki, which would ensure that t... [17:38:13] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream: Ensure that EventBus extension gracefully handles service failures - https://phabricator.wikimedia.org/T125394 (10Ottomata) 05Open→03Declined Declining. Some of the underlying bits have changed since 2016, so I'm not sure this task is... [17:39:18] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream, 10Multi-Content-Revisions, 10Platform Team Initiatives (MCR): Redesign revision-related event schemas for MCR - https://phabricator.wikimedia.org/T186371 (10Ottomata) 05Open→03Declined Revision slots were added to revision-create.... [17:42:35] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/822151 (https://phabricator.wikimedia.org/T314672) (owner: 10Gergő Tisza) [17:45:04] 10Data-Engineering, 10Event-Platform Value Stream, 10Technical-Debt: Migrate usage of Database::select to SelectQueryBuilder in EventBus - https://phabricator.wikimedia.org/T312354 (10Ottomata) 05Open→03Declined I'm looking now, and I'm not aware of any `IDatabase::select` usage in the EventBus extension... [17:51:44] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream, 10Platform Engineering (Icebox): revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10Ottomata) 05Open→03Declined Declining due to lack of activity or impact. Please reopen if... [17:54:03] 10Data-Engineering, 10Event-Platform Value Stream, 10Wikidata: Realtime editing UI and API - https://phabricator.wikimedia.org/T298305 (10Ottomata) I'm not sure where this task belongs. What UI and API are you refering to? See also: https://stream.wikimedia.org/?doc https://wikitech.wikimedia.org/wiki/Ev... [17:54:37] 10Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 10Metrics-Platform: Non-deterministic unit test "streamInSample() - session sampling resets" - https://phabricator.wikimedia.org/T304379 (10Ottomata) [17:57:45] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream: Eventgate validation error: '.event.connectEnd' should be >= 0, '.event.connectStart' should be >= 0, '.event.fetchStart' should be >= 0, '.event.requestStart' should be >= 0, '.event.responseEnd' ... - https://phabricator.wikimedia.org/T299670 [17:57:54] 10Data-Engineering, 10Event-Platform Value Stream, 10SRE, 10Traffic, and 2 others: Incident: 2022-03-4 Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10Ottomata) Are there actionables on this task? I'm considering removing the Event Pla... [17:58:02] 10Analytics-Radar, 10NavigationTiming, 10Performance-Team (Radar): Invalid EventLogging messages for NavigationTiming topic - https://phabricator.wikimedia.org/T261665 (10Ottomata) [18:01:21] 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10Platform Engineering: EventStreams sending same data over and over (page links change) - https://phabricator.wikimedia.org/T290211 (10Ottomata) Possibly relevant: https://github.com/tlarock/pywikibot/issues/1 [18:03:09] 10Analytics, 10WMDE-New-Editors-Banner-Campaigns: '.event.finalSlide' should be integer - https://phabricator.wikimedia.org/T289866 (10Ottomata) [18:04:00] 10Analytics-Radar, 10Data-Engineering, 10Discovery-Search: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string - https://phabricator.wikimedia.org/T286814 (10Ottomata) [18:04:36] 10Analytics-Radar, 10Data-Engineering, 10Metrics-Platform, 10CSS: Schema code samples popup appears under the JSON table - https://phabricator.wikimedia.org/T272857 (10Ottomata) [18:05:47] 10Data-Engineering, 10Event-Platform Value Stream, 10Internet-Archive, 10The-Wikipedia-Library, and 2 others: page-links-change stream is assigning template propagation events to the wrong edits - https://phabricator.wikimedia.org/T216504 (10Ottomata) Possibly relevant: https://github.com/tlarock/pywikibot... [18:06:38] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream: eventlogging Dockerfile doesn't work - https://phabricator.wikimedia.org/T208679 (10Ottomata) 05Open→03Resolved a:03Ottomata I think this has been fixed: https://www.mediawiki.org/wiki/MediaWiki-Docker/Configuration_recipes/EventLogg... [18:07:00] 10Analytics-Radar, 10Data-Engineering-Radar, 10TimedMediaHandler, 10Wikimedia-Video: Record and report metrics for audio and video playback - https://phabricator.wikimedia.org/T108522 (10Ottomata) [18:07:04] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream, 10Internet-Archive, 10The-Wikipedia-Library: Store page-links-change data in a database table and make available through a Special page - https://phabricator.wikimedia.org/T221397 (10Ottomata) [18:08:46] 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Enable canary events for all streams - https://phabricator.wikimedia.org/T266798 (10Ottomata) [18:08:51] 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream: Refine event pipeline at this time refines data in hourly partitions without knowing if the partition is complete - https://phabricator.wikimedia.org/T252585 (10Ottomata) [18:10:27] 10Analytics, 10Data-Engineering: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10Ottomata) [18:19:19] 10Data-Engineering, 10Event-Platform Value Stream, 10SRE, 10Traffic, and 2 others: Incident: 2022-03-4 Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10jcrespo) @Ottomata: The actionables of the task pending is to understand what the act... [18:50:38] 10Data-Engineering-Kanban, 10Data Engineering Planning (Sprint 02): Create conda-base-env with last pyspark - https://phabricator.wikimedia.org/T309227 (10Ottomata) > As of today, the changes have been ported between the Dockerfile and the Gitlab-ci. Wow nice. Honestly the automated Gitlab-CI stuff is just S... [18:52:07] (03CR) 10Ottomata: [C: 03+2] Update docs for mediawiki/recentchange type field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/822151 (https://phabricator.wikimedia.org/T314672) (owner: 10Gergő Tisza) [19:40:25] 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10Platform Engineering: EventStreams sending same data over and over (page links change) - https://phabricator.wikimedia.org/T290211 (10Green_Cardamom) My workaround is compare the date of the diff (via MW API) with the date in the JSON and if... [19:42:06] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Update anaconda-wmf's wmfdata-python to 1.3.3 - https://phabricator.wikimedia.org/T305067 (10nshahquinn-wmf) Just verified that anaconda-wmf still has Wmfdata-Python 1.3.2. [20:21:17] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10herron) [20:22:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:25:52] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:31:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:37:10] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:56:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2031 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2031%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:56:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp5015 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5015%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:01:12] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp2031 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:01:13] (VarnishkafkaNoMessages) resolved: varnishkafka on cp5015 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5015%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:36:09] ^^ I'm interested in this because I updated the alert earlier today. But I am afk at the moment. [21:58:23] 10Data-Engineering, 10Event-Platform Value Stream, 10Wikidata: Realtime editing UI and API - https://phabricator.wikimedia.org/T298305 (10Lectrician1) The Wikibase UI and API.