[03:30:02] 10Data-Engineering, 10Data-Engineering-Kanban: reset kerberos password - https://phabricator.wikimedia.org/T303146 (10Effeietsanders) Thanks! [06:09:31] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): Reimage WMCS db proxies to Bullseye - https://phabricator.wikimedia.org/T298940 (10Marostegui) [06:09:40] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): Upgrade clouddb* hosts to Bullseye - https://phabricator.wikimedia.org/T299480 (10Marostegui) [10:07:30] 10Data-Engineering: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) Yes, I see. Thanks @elukey > We should probably either add another disk to the VM (or expand the current one) or clean up unused jars/artifacts. My first instinct would be to grow the... [10:08:30] 10Data-Engineering, 10Data-Engineering-Kanban: reset kerberos password - https://phabricator.wikimedia.org/T303146 (10BTullis) 05Open→03Resolved [10:10:19] 10Data-Engineering: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10elukey) @BTullis in the past I've done it via the Archiva UI, if you are an admin (and you should be given your LDAP credentials) you have also the option of dropping artifacts. It is a bit tedio... [10:11:52] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) p:05Triage→03Medium a:03BTullis [10:45:38] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) I've manually removed all but the last 10 releases from most of the `org.wikimedia.analytics.refinery` projects, so this has dropped the usage markedly. ` bt... [10:49:49] (03PS3) 10Kosta Harlan: Homepage module: add events for topic toggle match mode button [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [10:52:11] (03CR) 10Kosta Harlan: Homepage module: add events for topic toggle match mode button (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [11:20:31] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) I'm sorry to be a pain, but I'm under some pressure to implement this new service as soon as it's practicable, for which I really need help from #servi... [11:22:49] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10elukey) Nice! Something worth to follow up is the Archiva retention rules, in theory we should have some auto-clean up of old artifacts, maybe there is some misconfi... [11:31:11] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [11:35:07] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [11:35:37] 10Data-Engineering, 10Event-Platform, 10SRE, 10Traffic, and 2 others: Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10jbond) p:05Triage→03Medium [11:42:55] 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety, 10serviceops: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10jbond) p:05Triage→03Medium [12:03:58] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) > Something worth to follow up is the Archiva retention rules, in theory we should have some auto-clean up of old artifacts Yes this is interesting. I've sta... [12:26:55] btullis: oh, I don't know if that refinery release prune was safe! [12:27:12] because each oozie job references a specific version, some jobs haven't been touched in years [12:27:29] milimetric: Oh no! [12:27:30] the jars will still be on hdfs [12:27:34] so it's ok for now [12:27:46] but if we deploy and sync, if that deletes the old jars, then we lose them for good [12:28:26] Can we check which oozie jobs are referring to jars older than 10 releases ago? [12:28:37] we should probably freeze deployment somehow, and then take inventory of which jars we actually need, and either copy them from hdfs to archiva [12:29:04] yeah, it's kind of manual though, I'm not 100% sure if every job refers to it as "refinery_jar_version", it was just a loose convention [12:29:32] Argh, sorry for causing extra hassle. [12:29:37] np at all [12:29:50] it's not something you would've known, and we needed to clear the disk space [12:30:54] hm, `ag -G .properties refinery_jar_version` only finds 7 hits [12:31:42] 0.0.136, 0.0.141, 0.0.144, 0.1.2 are referenced that way, ... hm, gonna look a little closer [12:34:44] indeed, a bunch are referenced directly in the properties files, trying to find a regex is tricky [12:35:12] should all be refinery-{component}-{version}.jar [12:40:54] This might be crazy, but what if we just set them to the latest release? Is there anything in the recent releases that we think would be likely to break those jobs? [12:41:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10Ottomata) I'm worried that whatever is creating this data is not doing it as the correct user, and newl... [12:42:03] https://www.irccloud.com/pastebin/9YMbPdVQ/ [12:42:39] btullis: that's the whole list I could find, but some just use ${refinery_jar_version} [12:43:30] we could set them all to the latest... I'd say I have 80% confidence that would work. And it would be pretty painful to fix if it doesn't work, because it could mean corrupted data not just breaking jobs. [12:43:50] breaking jobs are easier in this case. So we have two options: [12:44:03] 1. generate unique list from the search results and reinstate those versions [12:44:32] 2. wait until jobs break and reinstate versions one by one (lazy approach and a little more efficient as we migrate jobs to AirFlow) [12:44:53] k, gotta run the kids to school [12:45:05] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10Ottomata) > I haven't created TLS certificates for datahub.wikimedia.org I don't believe you will need a cert for this, IIUC it should use the wikimedia.org wil... [12:51:02] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10Ottomata) > I've manually removed all but the last 10 releases from most of the org.wikimedia.analytics.refinery We should check to see which ones are still referenc... [12:54:48] (03CR) 10Ottomata: [C: 03+1] "Nice! I didn't deeply review the code but sounds great to me. Good catch!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (owner: 10Aqu) [13:03:48] btullis: we'd have to restart the jobs [13:03:52] to use the new versions [13:04:09] that's not so hard with refine and airflow [13:04:15] with oozie its quite annoying and manual [13:04:48] (sorry dont' have lots of backscroll, just got on IRC) [13:05:02] Should I manually take a copy of the files in hdfs:///wmf/refinery/current/artifacts/org/wikimedia/analytics/refinery in case we lose them? [13:07:36] Is there an easy way to re-create the artifacts in Archiva? I only see the jars present on hdfs and normally we have to upload the .pom with its jar, don't we? [13:07:58] Apolgies for this. I should have waited for confirmation before deleting anything. [13:13:11] hmmm i think maybe there is a backup of archiva... [13:13:29] ya no worries [13:14:29] yes in bacula [13:15:17] https://wikitech.wikimedia.org/wiki/Bacula#Restore_(aka_Panic_mode) [13:16:55] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10JAllemandou) >>! In T302970#7792361, @Ottomata wrote: > I'm worried that whatever is creating this data... [13:17:16] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10Ottomata) Okay, great! [13:17:52] (03CR) 10Ottomata: [C: 03+2] Add gobblin-wmf-core-1.0.1-jar-with-dependencies.jar [analytics/refinery] - 10https://gerrit.wikimedia.org/r/771693 (https://phabricator.wikimedia.org/T297939) (owner: 10Ottomata) [13:17:54] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add gobblin-wmf-core-1.0.1-jar-with-dependencies.jar [analytics/refinery] - 10https://gerrit.wikimedia.org/r/771693 (https://phabricator.wikimedia.org/T297939) (owner: 10Ottomata) [13:18:00] ottomata: OK. Investigating the backups now. [13:18:16] thanks to luca for backups of archiva! [13:18:21] elukey: ^ :) [13:20:29] (03CR) 10Ottomata: [C: 03+2] Enable gobblin metric reporting to Prometheus via Prometheus PushGateway in test jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/771609 (https://phabricator.wikimedia.org/T294420) (owner: 10Ottomata) [13:20:31] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Enable gobblin metric reporting to Prometheus via Prometheus PushGateway in test jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/771609 (https://phabricator.wikimedia.org/T294420) (owner: 10Ottomata) [13:21:58] ohhh ya btullis i just tried to do a reifnery deploy to hadoop-test, failures because of missing files in archiav [13:23:56] btullis: maybe easiest thing to do is full restore from latest backup? [13:24:19] ottomata: On it. [13:24:28] okay ty [13:31:46] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) I am restoring the deleted files on archiva1002 in order to fix issues that affect deployed oozie jobs. ` 3,086 files selected to be restored. Run Restore... [13:32:23] +1 btullis i guess we might want to take archiva down while that runs? [13:32:27] it will restore the archiva db too i think [13:33:10] I've only selected the files that I deleted and it's going to restore them to /var/tmp/bacula-restores [13:33:37] oh okay [13:33:38] hm [13:33:50] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10dcausse) We use archiva to offer WDQS to third parties, for instance the `service` artifacts are being referenced from https://github.com/wmde/wikibase-docker/tree/ma... [13:34:17] I was hoping to copy them back to their location manually and run a directory scan. [13:34:36] Do you think I should stop this restore and select a full restore instead? [13:34:52] so iirc archiva knows what is has in its database in /var/libb/archiva/data [13:35:01] but i'm not certain [13:35:09] hm [13:35:17] but git-fat is separate from archiva itself [13:35:22] if you copy them back into their locations [13:35:29] and run the git-fat sync script [13:35:34] i believe the deploys will be fixed. [13:35:42] the artifacts will not show up in archiva UI [13:35:50] but i don't 'think that we will ever really care [13:36:01] so, ya i suppose that will work [13:37:26] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10Ottomata) > Should we consider this usecase for archiva or consider another place to distribute this software? Hm, up for discussion, but I'd say: use something else... [13:40:16] OK. I'll try with a small release to one of the groups first. I thought that putting the files in place beneath: /var/lib/archiva/repositories/releases/org/wikimedia/analytics/refinery/ and then running this might work, followed by the git-fat-rsync script. [13:40:46] https://usercontent.irccloud-cdn.com/file/Lj02TiDw/image.png [13:40:58] Oh, I've never seen that [13:41:02] didn't know that was a thing [13:41:04] okay give that a go [13:46:56] (03CR) 10Ottomata: [C: 03+2] build: Add brief "Getting started" guide [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714874 (owner: 10Krinkle) [13:47:16] (03CR) 10Ottomata: [V: 03+2 C: 03+2] build: Add brief "Getting started" guide [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714874 (owner: 10Krinkle) [13:47:36] (03Merged) 10jenkins-bot: build: Add brief "Getting started" guide [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714874 (owner: 10Krinkle) [13:49:32] (03CR) 10Ottomata: [C: 03+2] build: Document simpler alternative contribution flow [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714875 (https://phabricator.wikimedia.org/T290074) (owner: 10Krinkle) [13:51:05] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) I'm trying to get back to this today/tomorrow. You don't need to create any TLS certificates and we can use Ingress for both, frontend and gms. [13:51:24] 10Data-Engineering-Radar, 10Privacy Engineering, 10Privacy: Privacy review for dataset publishing (Wikidata topic -> pageview data) - https://phabricator.wikimedia.org/T303304 (10Addshore) >>! In T303304#7779605, @Htriedman wrote: > My initial take is that this data release doesn't strike me as particularly... [13:54:13] ottomata: I've copied all of the files back in. THey just show up automatically in the UI. I haven't yet run the git-fat-sync script. Do you want to check if it works without running the sync script? [13:54:55] wow [13:55:22] oh if you know, if you didn't delete symlinks from /var/lib/archiva/git-fat [13:55:28] i guess they'll still be there! [13:55:29] okay will try [13:56:11] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) I have restored all of the deleted files from the refinery group. They appear in the UI again and we are back up to 92% of the disk's capacity. :-) Testing n... [14:11:16] (03CR) 10Joal: "Great catch! Minor comments - thanks @Aqu" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (owner: 10Aqu) [14:18:35] (03CR) 10Vivian Rook: [C: 03+1] compose: Add order to the startup [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771844 (owner: 10David Caro) [14:20:01] (03PS1) 10Ottomata: Use --no-git-add in npm run build-modified script [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772402 (https://phabricator.wikimedia.org/T290074) [14:20:19] (03CR) 10Ottomata: [C: 03+2] Use --no-git-add in npm run build-modified script [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772402 (https://phabricator.wikimedia.org/T290074) (owner: 10Ottomata) [14:20:55] (03Merged) 10jenkins-bot: Use --no-git-add in npm run build-modified script [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772402 (https://phabricator.wikimedia.org/T290074) (owner: 10Ottomata) [14:24:26] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Un-fork analytics/gobblin - https://phabricator.wikimedia.org/T292396 (10JAllemandou) [14:24:30] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add Jenkins job for gobblin-wmf jar release to archiva - https://phabricator.wikimedia.org/T297938 (10JAllemandou) 05Open→03Resolved [14:25:13] (03PS2) 10Vivian Rook: Update home to direct to profile [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771352 (https://phabricator.wikimedia.org/T85175) [14:25:48] (03CR) 10Vivian Rook: Update home to direct to profile (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771352 (https://phabricator.wikimedia.org/T85175) (owner: 10Vivian Rook) [14:52:45] 10Data-Engineering, 10Data-Engineering-Kanban: Archiva's disk partiton space is getting filled up - https://phabricator.wikimedia.org/T304224 (10BTullis) > For https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/service: > > * all snapshots can be removed > * everything < 0.3 That's done now. >... [14:53:46] (03PS1) 10Ottomata: Dummy commit to do a redeployment [analytics/refinery] - 10https://gerrit.wikimedia.org/r/772412 [14:53:56] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Dummy commit to do a redeployment [analytics/refinery] - 10https://gerrit.wikimedia.org/r/772412 (owner: 10Ottomata) [14:56:50] (03CR) 10David Caro: [C: 03+1] "LGTM" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771352 (https://phabricator.wikimedia.org/T85175) (owner: 10Vivian Rook) [14:57:02] (03CR) 10David Caro: [C: 03+2] compose: Add order to the startup [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771844 (owner: 10David Caro) [14:59:20] btullis: btw, i'm pretty sure its fixed [14:59:22] thank you! [14:59:42] (03Merged) 10jenkins-bot: compose: Add order to the startup [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771844 (owner: 10David Caro) [15:01:03] 10Data-Engineering: LVS in Analytics VLANs - https://phabricator.wikimedia.org/T288750 (10cmooney) Personally I think option 1 and 3 are the best here. Option 1 is relatively straightforward, after adding a bunch of new sub-interfaces to existing LVS in Eqiad for the new rows recently it doesn't seem to be such... [15:01:26] Great. Plus I learnt how to use our bacula setup, which is a win! [15:06:08] 10Data-Engineering, 10Data-Engineering-Kanban: Resume Webrequest Data Purge Job - https://phabricator.wikimedia.org/T303977 (10Milimetric) Merged and deployed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/771389 [15:07:00] (03PS1) 10Ottomata: Change materialization method and update readme to match schemas/event/secondary [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/772418 (https://phabricator.wikimedia.org/T290074) [15:07:31] (03PS2) 10Ottomata: Change materialization method and update readme to match schemas/event/secondary [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/772418 (https://phabricator.wikimedia.org/T290074) [15:09:33] (03CR) 10Ottomata: [C: 03+2] Change materialization method and update readme to match schemas/event/secondary [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/772418 (https://phabricator.wikimedia.org/T290074) (owner: 10Ottomata) [15:10:08] (03Merged) 10jenkins-bot: Change materialization method and update readme to match schemas/event/secondary [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/772418 (https://phabricator.wikimedia.org/T290074) (owner: 10Ottomata) [15:11:18] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, 10Patch-For-Review: Users should run explicit commands to materialize schema versions, rather than using magic git hooks - https://phabricator.wikimedia.org/T290074 (10Ottomata) Alright, done! I've updated wikitech documentati... [15:30:08] (03CR) 10Aqu: "Thanks both for the review." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (owner: 10Aqu) [15:32:35] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Bengali-Sites: Ireland in Tagalog, Bengali and Urdu Wikipedia traffic breakdown - https://phabricator.wikimedia.org/T143254 (10Bodhisattwa) [15:44:22] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 2 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplica... [15:47:14] (03PS1) 10Ottomata: gobblin - don't use http:// in prometheus push gateway url [analytics/refinery] - 10https://gerrit.wikimedia.org/r/772426 [15:47:37] (03CR) 10Ottomata: [V: 03+2 C: 03+2] gobblin - don't use http:// in prometheus push gateway url [analytics/refinery] - 10https://gerrit.wikimedia.org/r/772426 (owner: 10Ottomata) [15:56:32] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Investigate unifying SparkSQLRunner DAG templates - https://phabricator.wikimedia.org/T302391 (10mforns) [16:00:30] joal: I HAVE GOBBLIN METRICS IN GRAFANA IN TEST CLUSTER!!!!!!!!! [16:00:38] \o/ [16:00:44] gonna make a dashboard now [16:00:51] ottomata: this is great :) [16:02:29] actually no i'm going to make lunch now, THEN will make dashboards! [16:05:38] \o/ [16:08:16] (EventgateLoggingExternalLatency) firing: Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [16:08:50] hm - should I worry about that? ottomata, btullis --^ ? [16:08:58] no it is happening forever [16:13:15] (EventgateLoggingExternalLatency) resolved: Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [16:19:42] thank you ottomata for the refine reruns - I was going to do them now and saw our messages [16:20:06] np! :) [16:47:02] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 2 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplica... [16:57:15] howdy folks! o/ sorry to trouble y'all! I think I borked my jupyterhub server on stat1008. I was resetting my conda envs and deleted everything in ~/.conda/envs and forgot to do a server shutdown in jupyter control panel, so now when I go to http://localhost:8880 it takes me to http://localhost:8880/user/bearloga/lab? and it's just a blank page [16:58:03] (I have since then run `conda-create-stacked` and can run `source conda-activate-stacked` without problems.) [16:58:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Research, discuss and decide on DAG/task dependencies VS. success/failure files (Oozie style) - https://phabricator.wikimedia.org/T301568 (10mforns) OK, I'm going to start this conversation :] I argue in favor of success files. * We need a... [17:15:59] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 3 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10razzi) I kicked off the cookbook but infortunately ran... [17:21:11] (03CR) 10Vivian Rook: [C: 03+2] Update home to direct to profile [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771352 (https://phabricator.wikimedia.org/T85175) (owner: 10Vivian Rook) [17:25:04] (03Merged) 10jenkins-bot: Update home to direct to profile [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/771352 (https://phabricator.wikimedia.org/T85175) (owner: 10Vivian Rook) [17:28:59] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 3 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10Marostegui) That table might not be that popular so if... [17:49:31] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 3 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10razzi) @Marostegui It indeed works without depooling s... [17:51:30] 10Quarry, 10Patch-For-Review: Make "Home" navlink go to profile for logged-in users. - https://phabricator.wikimedia.org/T85175 (10rook) 05Open→03Resolved [17:52:45] joal: https://grafana.wikimedia.org/goto/5VvgbwPnz?orgId=1 :D [17:54:44] (03PS1) 10Ottomata: gobblin - Enable prometheus reporting for all jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/772449 (https://phabricator.wikimedia.org/T294420) [17:54:51] in hindsite it might have been nice to add a destination cluster label [17:54:54] of some kind [17:55:23] (03CR) 10Ottomata: [V: 03+2 C: 03+2] gobblin - Enable prometheus reporting for all jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/772449 (https://phabricator.wikimedia.org/T294420) (owner: 10Ottomata) [18:02:09] ottomata: would you be able to restart my jupyter server for me on stat1008, please? (see message from hour ago) [18:06:05] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 3 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10razzi) Everything looks good, the only strange thing I... [18:10:56] !log sudo systemctl restart jupyter-bearloga-singleuser on stat1008 [18:10:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:11:09] bearloga: I went ahead and restarted it, lmk if that fixes it! [18:13:40] oh oops [18:13:42] i just did too [18:13:44] i just stopped it [18:13:53] bearloga: you should be able to start it from the UI [18:38:50] (03PS4) 10Sergio Gimeno: Homepage module: add events for topic toggle match mode button [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) [18:41:43] (03CR) 10Sergio Gimeno: Homepage module: add events for topic toggle match mode button (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [18:49:55] ottomata razzi: thank you both! [18:50:09] anyone got time to help me with my java setup? I hit my 2 hour free trial limit [18:51:07] every *single* time I try to work on refinery-source, building works on either command line or JetBrains, but not both [18:53:45] ottomata ^ [18:54:58] 2 hour free trial limit??? [18:55:02] my archiva's set up as per https://wikitech.wikimedia.org/wiki/Archiva#Development, `mvn test` works, testing in JetBrains doesn't [18:55:18] i can try to help but after workout so 3:30? [18:55:25] def! [18:55:26] milimetric: does the absolute path to your clone of refinery-source contain any spaces? I ask because I couldn't build refinery-source for like a year (I think specifically the cassandra and/or spark stuff) and it was driving me nuts until finally I renamed the directory from "Analytics Refinery Source" to "analytics-refinery-source" – go figure [18:55:27] no rush [18:56:03] :) no, it doesn't, but many hugs for that pain bearloga, I've been there [18:58:35] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Send some existing Gobblin metrics to prometheus - https://phabricator.wikimedia.org/T294420 (10Ottomata) WIP dashboard!!! https://grafana.wikimedia.org/goto/rBSUYQPnk?orgId=1 [19:26:46] milimetric: heading to bc [19:48:07] (03CR) 10Kosta Harlan: [C: 03+1] Homepage module: add events for topic toggle match mode button (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [19:55:00] 10Data-Engineering: Pageview definition relies on X-Analytics to determine special pages - https://phabricator.wikimedia.org/T304362 (10Milimetric) [19:56:16] 10Data-Engineering: Pageview definition relies on X-Analytics to determine special pages - https://phabricator.wikimedia.org/T304362 (10Milimetric) [19:57:11] (03PS1) 10Milimetric: [WIP] Failing test shows bug with Special:Pages [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772496 (https://phabricator.wikimedia.org/T304362) [20:01:33] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Failing test shows bug with Special:Pages [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772496 (https://phabricator.wikimedia.org/T304362) (owner: 10Milimetric) [20:11:25] (03CR) 10Jdrewniak: [C: 03+2] Add WikipediaPortal to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685513 (https://phabricator.wikimedia.org/T282012) (owner: 10Ottomata) [20:11:55] (03CR) 10Ottomata: ":)" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685513 (https://phabricator.wikimedia.org/T282012) (owner: 10Ottomata) [20:13:04] (03Merged) 10jenkins-bot: Add WikipediaPortal to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685513 (https://phabricator.wikimedia.org/T282012) (owner: 10Ottomata) [20:38:15] (03PS5) 10Sergio Gimeno: Homepage module: add events for topic toggle match mode button [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) [20:39:42] (03CR) 10Sergio Gimeno: Homepage module: add events for topic toggle match mode button (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [21:24:36] 10Analytics, 10Data-Engineering, 10SRE: Also intake Network Error Logging events into the Analytics Data Lake - https://phabricator.wikimedia.org/T304373 (10CDanis) [22:19:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [22:22:05] (03CR) 10MewOphaswongse: [C: 03+2] Homepage module: add events for topic toggle match mode button [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [22:22:41] (03Merged) 10jenkins-bot: Homepage module: add events for topic toggle match mode button [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772031 (https://phabricator.wikimedia.org/T301825) (owner: 10Sergio Gimeno) [22:24:15] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [22:34:22] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) hi @Ottomata , I've merged the schema to analytics/legacy and have a patch up for the por... [22:46:59] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Great! The schema gets deployed automatically, so it is out. https://schema.wikimedia.or... [22:53:07] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Oh, I found it: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/772507 :) [23:07:16] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [23:08:48] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Non-deterministic unit test "streamInSample() - session sampling resets" - https://phabricator.wikimedia.org/T304379 (10matmarex) [23:12:15] (EventgateLoggingExternalLatency) resolved: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency [23:26:54] (03CR) 10Vivian Rook: [C: 03+1] "I don't see where /metrics works as an endpoint in the code. Though it surely does in prod, not in dev. How is /metrics reached? Regardles" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/756161 (owner: 10Majavah) [23:56:15] (EventgateLoggingExternalLatency) firing: Elevated latency for POST events on eventgate-logging-external in eqiad. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLoggingExternalLatency