[02:14:34] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) @Ottomata I'm planning to deploy the change to the portals repo Thursday March 24, mornin... [02:16:49] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) [03:25:30] (03PS1) 10Neil P. Quinn-WMF: Create schemas for Wikistories instrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) [03:42:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [03:47:16] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [03:58:59] 10Analytics, 10Data-Engineering: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10razzi) I can get started on this one. Here's my plan; if it looks good we can announce downtime; I vote to do the upgrade next Tuesday the 29th of March; I think all the reimages could be done... [05:18:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [05:23:16] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [06:40:27] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for shnwikivoyage - https://phabricator.wikimedia.org/T302798 (10Marostegui) Datata sanitized. User and `_p` database created. I have ran a check_private_data.py just to be fully sure there's no priv... [06:40:33] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for guwwiki - https://phabricator.wikimedia.org/T303761 (10Marostegui) Datata sanitized. User and `_p` database created. I have ran a check_private_data.py just to be fully sure there's no private da... [06:47:49] 10Analytics, 10Data-Engineering, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10Marostegui) >>! In T299481#7802420, @razzi wrote: > I can get started on this one. Here's my plan; if it looks good we can announce downtime; I vote to... [06:56:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [07:01:16] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [09:46:10] I am going to begin restarting an-worker1096 -> an-worker1101 this morning, to pick up the new kernel. [10:31:07] (03PS5) 10Phuedx: analytics/legacy/quicksurveyinitiation: Add editCountBucket property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) [10:35:18] (03PS1) 10Urbanecm: Update pageview allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/773472 (https://phabricator.wikimedia.org/T302799) [11:01:22] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) > For other wdqs related artifacts (they're mostly for internal use) https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/blazegraph, https://archi... [11:09:01] 10Data-Engineering, 10Data-Services: View wb_changes_dispatch in commonswiki_p shows an error - https://phabricator.wikimedia.org/T304591 (10Majavah) [11:11:52] I have finished rebooting the affected an-worker nodes to get their new kernel. [11:13:11] I am about to roll-restart the brokers on kafka-jumbo to pick up their new JVM. [11:15:11] !log roll-restarting kafka-jumbo brokers T300626 [11:15:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:44:48] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10dcausse) >>! In T304224#7802862, @BTullis wrote: > I notice that you didn't include https://archiva.wikimedia.org/#artifact~releases/org.wikidata.query.rdf/blazegraph... [11:58:00] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) > I did not include it on purpose because these are the artifacts that are currently exposed and probably linked from third parties so deleting them will bre... [12:07:07] (03PS1) 10Btullis: Tag using date-stage as well as latest [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/773479 (https://phabricator.wikimedia.org/T301453) [12:08:19] (03CR) 10jerkins-bot: [V: 04-1] Tag using date-stage as well as latest [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/773479 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [12:12:56] (03PS2) 10Btullis: Tag using date-stage as well as latest [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/773479 (https://phabricator.wikimedia.org/T301453) [12:31:17] (03CR) 10Btullis: [C: 03+2] Tag using date-stage as well as latest [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/773479 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [12:37:52] 10Data-Engineering, 10SRE: Adding snwachukwu@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T304541 (10jbond) 05Open→03Resolved a:03jbond This has been completed [12:50:22] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Ack, and awesome! :) [12:57:02] mornin! [12:57:12] Hi :) [12:57:25] Hello [13:06:23] heyaa [13:21:44] (03PS1) 10Btullis: Use git commit SHA for image label [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/773500 (https://phabricator.wikimedia.org/T301453) [13:25:22] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of clarakosi - https://phabricator.wikimedia.org/T304065 (10Snwachukwu) a:03Snwachukwu [13:26:03] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of clarakosi - https://phabricator.wikimedia.org/T304065 (10JAllemandou) I think the manager to ping here is Leila Zia :) [13:29:13] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of bumeh-ctr - https://phabricator.wikimedia.org/T300607 (10JAllemandou) Gentle ping @JAnstee_WMF - Have your new folks been able to take a look at this data? If not, would you be able to provide us with an expected time f... [13:56:19] (03CR) 10Btullis: [C: 03+2] Use git commit SHA for image label [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/773500 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [14:03:57] 10Data-Engineering-Radar, 10Growth-Team, 10MediaWiki-extensions-GuidedTour, 10Patch-For-Review: Finish decommissioning the legacy GuidedTour schemas - https://phabricator.wikimedia.org/T303712 (10kostajh) >>! In T303712#7783123, @phuedx wrote: >>>! In T303712#7781001, @kostajh wrote: >> Thanks for filing t... [14:21:54] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of clarakosi - https://phabricator.wikimedia.org/T304065 (10Snwachukwu) Hi @leila, Kindly confirm that we can go ahead to delete the following data belonging to clarakosi: ` sandraebele@Sandras-MacBook-Pro Opsweek % sh check_user_files.sh... [14:27:50] Can anyone tell me, what was that handy trick with kafkacat to filter out all but the messages that I'm after? Is it just grep on the output, or pipe it to jq, or is there some way that's built into kafkacat itself? [14:28:23] its just grep [14:28:42] but both kafkacat and grep have some buffering thing going on, that i find disabling helps [14:28:58] OK, thanks. [14:28:59] i like [14:29:10] kafkacat -C -u -b kafka-jumbo1001.eqiad.wmnet:9092 -t | grep --line-buffered | jq . [14:29:31] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Refactor jobs to not use DAG factories - https://phabricator.wikimedia.org/T302391 (10mforns) [14:29:39] Nice, thanks. [14:30:58] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Refactor jobs to not use DAG factories - https://phabricator.wikimedia.org/T302391 (10mforns) The related merge request: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/38 All DAGS have been refactored and... [14:34:35] 10Data-Engineering, 10Data-Engineering-Radar, 10wmfdata-python, 10Product-Analytics (Kanban): Update Wmfdata-Python documention to describe code stewardship - https://phabricator.wikimedia.org/T304545 (10EChetty) [14:35:38] 10Data-Engineering, 10Data-Engineering-Radar, 10wmfdata-python, 10Product-Analytics (Kanban): Update Wmfdata-Python documention to describe code stewardship - https://phabricator.wikimedia.org/T304545 (10EChetty) [14:35:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: The network_internal druid load job fails if data is not present - https://phabricator.wikimedia.org/T302263 (10BTullis) Adding @ayounsi to help check whether all expected data is present. I think that we're expecting to see some data from eq... [14:39:31] hey a-team, question: I found some cases - around 50K in enwiki - where the timestamp on the revision_parent_id is newest than the child, usually those timestamps points to dates in 2001 ... I've checked manually few cases, and I still don't get why this is happening ...any thoughts about this? [14:40:03] interesting, that's on event data dsaez ? [14:40:38] ottomata I found it on mediawiki_history, and then confirm that the error is the same on mariadb [14:40:44] ohhh [14:40:55] i have no insight there, maybe milimetric would? [14:41:12] since it's from 2001, I think that's a bug caused by the old wiki software we used to use [14:41:37] eg. rev_id = 334465468 [14:41:40] dsaez: yeah, there are a few exceptions that we know about, not sure if these are them, I'll explain one I know [14:43:17] so what can happen is revisions from one page can be partially restored to another page, and I think when that happens the first one's rev_parent_id is set to the creation of the page they're restored into or possibly some more complicated chain we don't understand. [14:43:47] But basically all the confusing rev_parent_id chains we found were explained by partial restores [14:44:11] got it [14:44:21] There may be other stuff going on like bulk bot actions that who knows how they work [14:45:00] yeah, what I see now is that there are several rev_ids with same parent_id, and that parent_id comes from the future [14:45:01] We have some todos in mw history to get to a clear understanding, but so far nobody's needed it [14:45:27] Yep, the many to one relationship makes sense in the context of restore [14:45:40] (Partial multi-page restore) [14:46:36] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: The network_internal druid load job fails if data is not present - https://phabricator.wikimedia.org/T302263 (10BTullis) The region field says 'Unknown' for all of the data at the moment. That's supposed to be populated by the DC code isn't it? [14:46:40] milimetric, I discover this because I was trying to get the lifespan of every revision, so for that I need the next_revision. To do that, I'm joining "rev_id as next_id" with "parent_id as rev_id" [14:47:17] and then computing the the time diference between those two... there I discover negative values [14:47:43] so, maybe just discarding the negative values I'll be safe from this problem? [14:48:26] dsaez: no, because the negative values are just the degenrate case [14:48:44] In theory there can be connections that are not negative but still not "real" [14:48:55] I see [14:49:00] There should be very few so you have two options [14:49:03] One ignore [14:49:58] Two, find all the many-to-one relationships between rev_parent_id and rev_id and see if you can filter out all the ones that look like restores, maybe there's something in the rev_comment [14:50:29] (By ignore, I mean your approach of just not counting the negative values) [14:51:57] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Hosting of GDI use case specific source-code - https://phabricator.wikimedia.org/T304539 (10EChetty) p:05Triage→03High [14:53:02] so, I think I should always consider 1-to-1 ... so if I have 1-to-n what I need is to choose the "correct" value to keep... [14:53:26] somehow MediaWiki is doing that, because when you click on "next revision" it shows only 1 next revision [14:54:02] may considering the largest time diference (always positive), would make sense? [14:54:09] Yeah! MW has all the magic [14:54:14] hehe [14:54:25] cool, this makes sense...thanks! [14:54:26] 10Data-Engineering, 10Data-Services, 10Patch-For-Review: Move wikireplicas dbproxy haproxy config to etcd - https://phabricator.wikimedia.org/T304478 (10EChetty) 05Open→03In progress a:03razzi [14:54:37] I was thinking that, I'm not sure we can prove it's always the case, you can easily restore a revision from later on [14:54:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Services, 10Patch-For-Review: Move wikireplicas dbproxy haproxy config to etcd - https://phabricator.wikimedia.org/T304478 (10EChetty) [14:57:00] dsaez: you may want to ask the platform team actually, they might know how the magic works, or I can look through the code there [14:57:58] good, I'll check with them. But yes, probabily just sorting by timestamp would be enough for my use case [15:00:18] 10Data-Engineering: ------------------- NEWLY ADDED ABOVE ------------------- - https://phabricator.wikimedia.org/T304608 (10EChetty) [15:00:31] 10Data-Engineering: -------NEWLY ADDED ABOVE ------ - https://phabricator.wikimedia.org/T304608 (10EChetty) [15:00:58] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304608 (10EChetty) [15:03:56] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304609 (10EChetty) [15:08:20] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304610 (10EChetty) [15:08:44] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304611 (10EChetty) [15:11:02] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: The network_internal druid load job fails if data is not present - https://phabricator.wikimedia.org/T302263 (10ayounsi) >>! In T302263#7803367, @BTullis wrote: > Adding @ayounsi to help check whether all expected data is present. I think tha... [15:12:58] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304612 (10EChetty) [15:14:58] 10Data-Engineering, 10Airflow: Refactor refinery-drop-mediawiki-snapshots so that it no longer uses a _SUCCESS file - https://phabricator.wikimedia.org/T303988 (10EChetty) [15:16:23] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): Reimage WMCS db proxies to Bullseye - https://phabricator.wikimedia.org/T298940 (10EChetty) p:05Triage→03Medium [15:19:20] 10Data-Engineering: Download the Maxmind Geoip2 Databases - https://phabricator.wikimedia.org/T303461 (10odimitrijevic) 05Open→03Declined [15:19:22] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [15:20:11] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Generate $wgEventLoggingSchemas from $wgEventStreams - https://phabricator.wikimedia.org/T303602 (10EChetty) p:05Triage→03High [15:20:27] 10Data-Engineering: Purge GeoIP2 datasets as per the licensing agreement - https://phabricator.wikimedia.org/T303465 (10odimitrijevic) 05Open→03Declined [15:20:29] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [15:20:31] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Generate $wgEventLoggingSchemas from $wgEventStreams - https://phabricator.wikimedia.org/T303602 (10EChetty) p:05High→03Medium [15:26:02] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset: Help with data that's not appearing on charts - https://phabricator.wikimedia.org/T301895 (10EChetty) 05Open→03In progress [15:26:15] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset: Help with data that's not appearing on charts - https://phabricator.wikimedia.org/T301895 (10EChetty) p:05Triage→03Medium [15:32:17] 10Analytics, 10Data-Engineering, 10Platform Engineering, 10Product-Analytics: AQS `edited-pages/new` metric does not make clear that the value is net of deletions - https://phabricator.wikimedia.org/T240860 (10EChetty) [15:33:30] 10Data-Engineering, 10Airflow: Airflow scheduler and webserver logs should be readable by airflow instance admins - https://phabricator.wikimedia.org/T304615 (10Ottomata) [15:37:24] 10Data-Engineering, 10Metrics-Platform, 10User-Urbanecm: Access to aggregate User Agent statistics - https://phabricator.wikimedia.org/T298912 (10EChetty) [15:39:45] 10Data-Engineering, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10odimitrijevic) [15:42:52] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304618 (10EChetty) [15:52:51] 10Data-Engineering, 10SRE, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10elukey) Adding some context for the Traffic team. There were two varnishkafka versions, one in the `main` component and one in `component/varnish6` of `buster-wikimedia` at the time... [15:53:46] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304620 (10EChetty) [15:59:30] 10Data-Engineering, 10SRE, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10BBlack) Thanks for making this ticket and adding those insights! I agree, there have been multiple times in the past that we've had problems in this area, and we should probably pup... [16:10:20] 10Data-Engineering-Kanban, 10Airflow: Create Generic Hive-To-Graphite scala - https://phabricator.wikimedia.org/T304623 (10Snwachukwu) [16:11:37] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) 05Open→03Resolved [16:12:23] (03PS1) 10Vivian Rook: Expose history of query revisions [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/773578 (https://phabricator.wikimedia.org/T100982) [16:15:23] (03CR) 10jerkins-bot: [V: 04-1] Expose history of query revisions [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/773578 (https://phabricator.wikimedia.org/T100982) (owner: 10Vivian Rook) [16:19:56] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of bumeh-ctr - https://phabricator.wikimedia.org/T300607 (10KCVelaga_WMF) Hi @JAllemandou, I will look into these files early next week. Most of the files seem to have a backup wherever necessary, but I will do a final chec... [16:20:26] (03CR) 10Eigyan: [C: 03+2] analytics/legacy/quicksurveyinitiation: Add editCountBucket property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) (owner: 10Phuedx) [16:20:46] (03CR) 10Eigyan: [C: 03+1] analytics/legacy/quicksurveyinitiation: Add editCountBucket property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) (owner: 10Phuedx) [16:46:36] (03CR) 10Eigyan: [C: 03+1] "Greetings Ottomata -> what are the next steps in getting this patch merged? I would like to test the dependency in beta. Any feedback prov" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) (owner: 10Phuedx) [16:47:45] (03CR) 10Ottomata: [C: 03+1] "Hi! you can merge it!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) (owner: 10Phuedx) [16:56:17] 10Data-Engineering, 10Data-Engineering-Kanban: Add the commons-entity dataset to the refinery-drop-mediawiki-snapshots script - https://phabricator.wikimedia.org/T303993 (10EChetty) [16:59:27] 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Broken "tooltip-breakdown-automated" tooltip on Wikistats 2 - https://phabricator.wikimedia.org/T303990 (10EChetty) a:03Milimetric [17:00:54] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10EChetty) a:03Milimetric [17:02:27] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset: Help with data that's not appearing on charts - https://phabricator.wikimedia.org/T301895 (10EChetty) a:05Milimetric→03BTullis [17:04:06] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Cmjohnson) [17:04:12] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304612 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:04:18] 10Data-Engineering, 10Airflow: Airflow scheduler and webserver logs should be readable by airflow instance admins - https://phabricator.wikimedia.org/T304615 (10EChetty) a:03BTullis [17:04:20] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304610 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:04:48] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304620 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:05:03] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304611 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:05:34] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304609 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:05:52] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304608 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:06:02] 10Data-Engineering: --NEWLY ADDED ABOVE -- - https://phabricator.wikimedia.org/T304618 (10Milimetric) 05Open→03Stalled p:05Triage→03Lowest [17:06:30] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform: Remove StreamConfig::INTERNAL_SETTINGS logic from EventStreamConfig and do it in EventLogging client instead - https://phabricator.wikimedia.org/T286344 (10EChetty) [17:06:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10MediaWiki-extensions-EventLogging: Generate $wgEventLoggingSchemas from $wgEventStreams - https://phabricator.wikimedia.org/T303602 (10EChetty) [17:06:51] btw, ^ separator "tasks" should always be Stalled / Lowest so they don't annoy Andre and other Phab bug wranglers [17:08:17] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/773472 (https://phabricator.wikimedia.org/T302799) (owner: 10Urbanecm) [17:09:02] 10Data-Engineering, 10Librarization, 10MediaWiki-extensions-EventLogging, 10MediaWiki-extensions-JsonData: Librarise Libs/JsonSchemaValidation or replace - https://phabricator.wikimedia.org/T303131 (10EChetty) a:03Ottomata [17:10:49] 10Data-Engineering, 10Librarization, 10MediaWiki-extensions-EventLogging, 10MediaWiki-extensions-JsonData: Librarise Libs/JsonSchemaValidation or replace - https://phabricator.wikimedia.org/T303131 (10Ottomata) WMF is trying to stop using on wiki schemas for EventLogging. From our perspective, we can and... [17:11:58] 10Analytics, 10Data-Engineering, 10Event-Platform, 10SRE, and 2 others: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10BTullis) [17:13:01] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10EChetty) p:05Triage→03High a:03Ottomata [17:15:39] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Cmjohnson) 1143, 1147 and 1148 did not respond to the provision script [17:17:38] 10Data-Engineering-Radar, 10MW-on-K8s, 10serviceops: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10EChetty) [17:18:29] 10Data-Engineering, 10Data-Engineering-Kanban, 10Phabricator, 10Product-Analytics, 10wmfdata-python: Herald rule to add Product Analytics and Data Engineering tags to Wmfdata-Python tasks - https://phabricator.wikimedia.org/T304572 (10EChetty) [17:20:38] 10Data-Engineering-Radar, 10MW-on-K8s, 10serviceops: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10BTullis) Could we deploy the GeoIP databases to the kube-workers and then mount it to the mw pods as a readonly hostpath volum... [17:21:04] 10Data-Engineering, 10Data-Engineering-Kanban: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10EChetty) a:03Milimetric [17:23:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10EChetty) [17:25:29] 10Analytics, 10Data-Engineering, 10Data-Persistence (Consultation): Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10Ladsgroup) FWIW I wrote this script (P23031) that did more than 100 bullseye upgrade in production. It works basically on any db except codfw masters or hos... [17:26:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Pageview definition relies on X-Analytics to determine special pages - https://phabricator.wikimedia.org/T304362 (10EChetty) a:03Milimetric [17:30:53] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Services: View wb_changes_dispatch in commonswiki_p shows an error - https://phabricator.wikimedia.org/T304591 (10EChetty) p:05Triage→03Medium a:03razzi [17:34:10] 10Data-Engineering, 10MediaWiki-General: Update pingback "PHP Version" dashboards - https://phabricator.wikimedia.org/T298922 (10EChetty) a:03mforns [17:35:39] 10Data-Engineering: Add projects to sqoop list when synced in clouddb - https://phabricator.wikimedia.org/T304632 (10JAllemandou) [17:35:41] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Platform Engineering, 10Product-Analytics: AQS `edited-pages/new` metric does not make clear that the value is net of deletions - https://phabricator.wikimedia.org/T240860 (10EChetty) a:03mforns [17:37:36] 10Data-Engineering-Radar, 10Data-Services, 10cloud-services-team (Kanban): Reimage WMCS db proxies to Bullseye - https://phabricator.wikimedia.org/T298940 (10EChetty) [17:38:00] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of clarakosi - https://phabricator.wikimedia.org/T304065 (10JAllemandou) Actually I messed up - it's not Leila we should have pinged but @WDoranWMF. Please excuse me for the communication noise :S [17:38:04] 10Data-Engineering-Radar, 10Data-Services, 10cloud-services-team (Kanban): Upgrade clouddb* hosts to Bullseye - https://phabricator.wikimedia.org/T299480 (10EChetty) [17:40:47] 10Data-Engineering, 10Privacy Engineering: Investigate releasing historical top-pageview-per-country data - https://phabricator.wikimedia.org/T299627 (10EChetty) [17:44:33] 10Data-Engineering, 10Generated Data Platform, 10Metrics-Platform, 10Platform Engineering, 10User-Urbanecm: Access to aggregate User Agent statistics - https://phabricator.wikimedia.org/T298912 (10EChetty) p:05Triage→03Medium [17:45:46] 10Data-Engineering-Radar, 10SRE, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10EChetty) [17:48:07] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of bumeh-ctr - https://phabricator.wikimedia.org/T300607 (10JAllemandou) Thank you @KCVelaga_WMF :) [17:53:20] 10Data-Engineering, 10Data-Engineering-Kanban: Refactor refinery-drop-mediawiki-snapshots so that it no longer uses a _SUCCESS file - https://phabricator.wikimedia.org/T303988 (10EChetty) a:03JAllemandou [17:57:15] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1142.eqiad.wmnet with OS buster [17:58:11] 10Analytics, 10Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 10QuickSurveys, and 2 others: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10EChetty) [17:58:48] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmnet with OS buster [17:59:14] 10Analytics, 10Data-Engineering-Radar, 10Event-Platform, 10Product-Analytics: [MEP] Determine how stream configuration is authored and deployed - https://phabricator.wikimedia.org/T269774 (10EChetty) [17:59:18] 10Analytics, 10Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 10QuickSurveys, and 2 others: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10Milimetric) The fundamental problem here is that we're using an instrumentation pipeline for a... [17:59:27] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1144.eqiad.wmnet with OS buster [18:00:05] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1145.eqiad.wmnet with OS buster [18:01:33] 10Analytics, 10Data-Engineering-Radar, 10Event-Platform, 10Product-Analytics: [MEP] Determine how stream configuration is authored and deployed - https://phabricator.wikimedia.org/T269774 (10Ottomata) Relevant: https://docs.google.com/document/d/1OiOJ80yZT28sW2FcEacG4xIHDXCypaAJk-XavUXxHDA/edit?pli=1 [18:05:04] 10Analytics, 10Data-Engineering-Radar, 10Event-Platform, 10Metrics-Platform, 10Browser-Support-Microsoft-Edge: Problem with delay caused by intake-analytics.wikimedia.org - https://phabricator.wikimedia.org/T295427 (10EChetty) a:05jlinehan→03phuedx [18:08:09] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1146.eqiad.wmnet with OS buster [18:08:16] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1147.eqiad.wmnet with OS buster [18:10:12] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1148.eqiad.wmnet with OS buster [18:15:17] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Cmjohnson) [18:19:15] (03CR) 10Sharvaniharan: New schema for edit history screen interactions (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (https://phabricator.wikimedia.org/T304336) (owner: 10Sharvaniharan) [18:26:41] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1142.eqiad.wmn... [18:26:58] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1143.eqiad.wmn... [18:28:20] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1145.eqiad.wmn... [18:28:54] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1144.eqiad.wmn... [18:29:07] (03CR) 10Jsn.sherman: [C: 03+2] "Looks good, let's merge!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) (owner: 10Phuedx) [18:29:51] (03Merged) 10jenkins-bot: analytics/legacy/quicksurveyinitiation: Add editCountBucket property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/768014 (https://phabricator.wikimedia.org/T303740) (owner: 10Phuedx) [18:35:11] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1146.eqiad.wmn... [18:39:26] (03PS3) 10Sharvaniharan: New schema for edit history screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 [18:39:57] (03CR) 10jerkins-bot: [V: 04-1] New schema for edit history screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan) [18:39:59] (03CR) 10Sharvaniharan: New schema for edit history screen interactions (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan) [18:41:45] (03PS4) 10Sharvaniharan: New schema for edit history screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 [18:44:40] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1142.eqiad... [18:54:48] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:58:30] (03PS5) 10Sharvaniharan: New schema for measuring article screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772910 [19:02:17] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1142.eqiad.wmn... [19:06:02] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:17:12] (03CR) 10Sbisson: Create schemas for Wikistories instrumentation (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) (owner: 10Neil P. Quinn-WMF) [19:20:41] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1147.eqiad.wmnet with OS buster exec... [19:22:03] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1148.eqiad.wmnet with OS buster exec... [19:25:36] 10Data-Engineering, 10Data-Engineering-Kanban: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10Milimetric) 05Open→03Declined I'm going to decline this for now in favor of a sister task to look into possibly using etcd config to make rob... [19:25:38] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Milimetric) [19:33:19] (03PS2) 10Vivian Rook: Expose history of query revisions [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/773578 (https://phabricator.wikimedia.org/T100982) [19:35:57] 10Data-Engineering, 10Data-Engineering-Kanban: Spike: Investigate importing etcd config to help write robust data loss alerts - https://phabricator.wikimedia.org/T304651 (10Milimetric) [19:36:13] 10Data-Engineering, 10Data-Engineering-Kanban: Spike: Investigate importing etcd config to help write robust data loss alerts - https://phabricator.wikimedia.org/T304651 (10Milimetric) p:05Triage→03High a:05JAllemandou→03Milimetric [20:04:32] (03PS2) 10Clare Ming: Add new enum value to webuiscroll schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) [20:07:28] 10Data-Engineering-Radar, 10SRE, 10Traffic: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10odimitrijevic) [20:07:34] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10odimitrijevic) [20:10:53] 10Data-Engineering, 10Data-Engineering-Kanban: Spike: Investigate creating robust alerts to notify that caching nodes are not sending traffic data - https://phabricator.wikimedia.org/T304651 (10odimitrijevic) [20:12:19] (03CR) 10Jdlrobson: [C: 03+1] "This LGTM. Just want to check with Analytics what I believe to be true, that since we're just adding one enum value this is backwards comp" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/771700 (https://phabricator.wikimedia.org/T303297) (owner: 10Clare Ming) [20:14:10] 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic: Spike: Investigate creating robust alerts to notify that caching nodes are not sending traffic data - https://phabricator.wikimedia.org/T304651 (10odimitrijevic) [20:38:53] (03CR) 10Dzahn: "yep, that's unfortunate." [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/316289 (https://phabricator.wikimedia.org/T64570) (owner: 10Paladox) [22:32:57] (03CR) 10Bearloga: New schema for edit history screen interactions (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan)