[03:38:18] (DruidSegmentsUnavailable) firing: More than 30 segments have been unavailable for webrequest_sampled_128 on the druid_analytics Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&var-cluster=druid_analytics&panelId=49&fullscreen&orgId=1&var-cluster=druid_analytics - https://alerts.wikimedia.org [03:38:18] (DruidSegmentsUnavailable) firing: More than 20 segments have been unavailable for webrequest_sampled_128 on the druid_analytics Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&var-cluster=druid_analytics&panelId=49&fullscreen&orgId=1&var-cluster=druid_analytics - https://alerts.wikimedia.org [03:48:18] (DruidSegmentsUnavailable) resolved: More than 30 segments have been unavailable for webrequest_sampled_128 on the druid_analytics Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&var-cluster=druid_analytics&panelId=49&fullscreen&orgId=1&var-cluster=druid_analytics - https://alerts.wikimedia.org [03:48:18] (DruidSegmentsUnavailable) resolved: More than 20 segments have been unavailable for webrequest_sampled_128 on the druid_analytics Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&var-cluster=druid_analytics&panelId=49&fullscreen&orgId=1&var-cluster=druid_analytics - https://alerts.wikimedia.org [04:15:50] 10Analytics, 10Data-Engineering, 10Product-Analytics, 10Structured-Data-Backlog, and 3 others: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10AKhatun_WMF) a:03AKhatun_WMF [04:29:47] joal: Need some help on where to get started for T258834. Which repo? [04:29:48] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 [04:30:57] ok, analytics-refinery? [04:32:30] yep [07:51:56] https://github.com/RadeonOpenCompute/ROCm/issues/761#issuecomment-968613956 \o/ \o/ [08:07:28] Good morning :) [08:10:19] tanny411: from your comment it looks like you found the code - anything else I can help with? [08:15:10] joal: yeah, wondering how to test with commons? looked at the code, looks like simply running a spark job should work with the commons file path? Is there a small 2/3 entity json dump i can directly look into, instead of the whole one? [08:16:08] tanny411: the code is indeed a spark job with parameterized input-path [08:16:38] tanny411: I don't think there is a mini-dump for commons (nor for wikidata AFAIK) :S [08:17:19] tanny411: a possibility is to mimic a small dump extracting the top of the file (and verifying if the file closes with a ']' [08:20:25] okay, thanks! [08:45:35] 10Analytics-Radar, 10WMDE-GeoInfo-FocusArea, 10WMDE-TechWish-Sprint-2021-11-10: Review existing dashboards and metrics for maps - https://phabricator.wikimedia.org/T295315 (10lilients_WMDE) a:03lilients_WMDE [09:11:01] joal: I cant seem to find the commons json dump. i only see ttl in /wmf/data/raw/commons [09:27:35] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) All compactions from the 9th snapshot loading operation have completed. Starting the r... [10:02:46] elukey: Re the druid alerts, the logic for alerting is slighty different from what we had before. We used to have a singe check for segments unavailable. Now we have one alert per datasource. [10:03:43] It checks over a period of 15 minutes. It used to sum all of the data sources and warn at 180, critical at 200. https://gerrit.wikimedia.org/r/c/operations/puppet/+/736280 [10:05:49] Under alertmanager, each datasource alerts independently and the thresholds are set at 20 for a warning, 30 for a critical: https://gerrit.wikimedia.org/r/c/operations/alerts/+/736279 [10:07:59] I don't yet understand *why* the datasources are reporting unavailable at all, so I'd like to get my head around that a bit before tweaking the thresholds. [10:13:28] sounds good yes! [10:47:45] tanny411: the json dumps import onto HDFS is not productionized yet [10:48:07] tanny411: data is accessible on stat machine, in /mnt/data/xmldatadumps/public/commonswiki/entities [10:48:51] oohh. oops! okay [10:49:54] tanny411: will create a patch for the procutionization of trhe hdfs import [10:50:34] joal : I saw it in the dumps website, thought it was pumping out data everywhere like wikidata [10:51:01] dumps are actually not generated nor serve from haddop as of now (shame) [10:54:06] btullis: hello - let me know if you wish to brainstorm on druid alerts :) [10:54:30] btullis: I didn't take any actions this weekend as the grafana graphs showed no real problem [10:56:50] joal: Thanks. Yes, I don't think that there is a problem either, really. Nothing has changed about the ingestion, I think that the new alerts are probably just a bit too trigger-happy. [10:57:03] right :) [10:58:32] My reckoning is that the indexer jobs are sending their batch import tasks to the overlord, the overlord defines the segments and instructs the historical process to load them. [11:00:09] The pattern that we see is probably just in the short period of time when the historical process has not yet finished loading the segments. Most of the time the blips are in the region of 3-4 segments. The alerts over the weekend were brief periods of 30 segments, but they always resolved themselves very soon asfterwards. [11:01:42] my understanding is that the historical services segments load is not related to indexing (or not directly shall I say) - When an indexing job has finished, segments are made visisble to the coordinator service, and the coordinator asks historical services to load them - The metric about late-segments is, as I see it, the difference at which historical have finished loading - time at which [11:01:48] segments are made visible [11:03:19] Got it. I ought to try to work out what a prometheus query would look like for a dataset that was *stuck* - failing to load segments. Previously we just had the sum across *all* datasets and this evened out the effect of *late* segments. [11:03:39] right [11:04:24] btullis: I'm assuming the alert will fire at the beginning of the month when full reloads of some datasets is made (edit_hourly for instance) [11:08:59] 10Analytics-Radar, 10WMDE-GeoInfo-FocusArea, 10WMDE-TechWish-Sprint-2021-11-10: Review existing dashboards and metrics for maps - https://phabricator.wikimedia.org/T295315 (10lilients_WMDE) [11:09:27] joal: Hmm, maybe. The only spikes I can see around the start of the month are on wmf_netflow though: https://grafana.wikimedia.org/d/000000538/druid?orgId=1&var-cluster=druid_analytics&var-cluster=druid_analytics&viewPanel=49&from=1635379201500&to=1636070397500 (unless I'm missing something) [11:18:24] joal: I'll try to work on something that will use a longer time period to smooth out the spikes. [11:19:33] works for me btullis - Interesting that edit_hourly doesn't show up here :) [11:22:52] tanny411: I sent that - https://gerrit.wikimedia.org/r/738874 [11:24:51] joal: thats great, thanks! [12:45:07] heya teamm [12:45:25] sorry ottomata, on friday, I didn't see your last message [12:45:41] Hi mforns. :-) [12:45:46] hey! [12:50:41] joal: There are slight differences in commons json. such as: it doesnt have sitelinks or aliases, claims are called statements, additioanlly has lastrevid. Should I create a separate set of classes for commons, or simply make the present code more generic (everything is prefixed with Wikidata at the moment) [12:51:17] hm - good question tanny411 [12:52:07] tanny411: My understanding is that wikidata is a special case, while commons would be a generic case of structured-data on wikis - maybe created a new set of classes make sense? [12:53:23] joal: hmm...how about the classes common in both? [12:53:33] also, what should I prefix things [12:53:35] commons? [12:54:23] I don't think commons is a good prefix - StructuredData ? [12:55:25] Perfect! [12:56:08] Ah crap, I used StructuredData in the name of the file in existing Wikidata code - we could rename that file to WikidataTableClasses for instance [12:56:56] About classes that can be reused, let's extract them in a separate file [12:57:27] We would have WikidataTableClasses, StructuredDataTableClasses, CommonDataTableClasses [12:57:35] sounds ok tanny411 --^ ? [12:59:13] 10Analytics, 10MediaWiki-REST-API, 10Story: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10WDoranWMF) [13:00:11] joal: Its called WikidataStructuredDataClasses [13:00:25] is that what you meant/ [13:03:37] tanny411: the current class is named WikidataTableClasses, which conflict with the StructuredData name - so I was saying we could rename that file WikidataTableClasses [13:04:26] and that we could use that pattern ( ...TableClasses files) to organise [13:09:31] Okay, using this format. [14:20:48] mforns: o/ [14:21:00] heya! [14:21:11] so so so so! how else can I help atm? [14:21:12] :) [14:21:20] joal: same Q for you! [14:21:21] Was about to push the script for review... can you have a look at that please? [14:21:25] yes! [14:21:28] ok, one sec [14:21:48] ottomata: heya :) [14:33:15] ottomata: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/ddf4a326aa5308f42d781c827433808b52e049e7 [14:43:22] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) > I suggest removing the "source of truth" wording from the proposal entirely > I ha... [14:54:32] mforns: why is anaconda-wmf used? [14:54:42] OHHHH [14:54:48] sorry i see [14:54:48] CONDA_BASE_ENV_PREFIX=$env_path [14:54:49] cool [14:54:54] interesting! [14:55:22] hmmm.. yea it could use the script name relative to PATH [14:55:39] mforns: you want to use a stacked env so folks can install their own stuff into the enva airflow is runnign from? [14:55:43] I could remove /usr/lib/anaconda-wmf/ from script calls [14:56:25] ottomata: that would be cool [14:56:48] mforns: no i mean [14:56:50] but using the prod env is good enough for testing I imagine [14:56:55] why do you need a stacked env at all? [14:57:00] yeah that is my q [14:57:23] can't use just run /usr/lib/airflow/bin/airflow to launch your dev scheduler etc.? [14:57:42] I just need a regular environment, but the only conda installed in stats machines or airflow machines is those 3 scripts: conda-create-stacked, etc. [14:58:06] we can install airflow debian package there no problem [14:58:20] i think you don't need to create a new conda environment [14:58:34] but if people are testing a DAG that uses python packages [14:58:35] ? [14:59:01] the DAG or the job? [14:59:08] the dag [14:59:14] if the DAG needs more stuff, its going to have to be installed in the airflow debian package env anyway [14:59:21] but yeah for testing a conda env will let them do it [14:59:31] in that case, yeah you probably still shoudln't use a stacked env and anaconda [14:59:35] you'll probably just want to do [14:59:41] I see, but what if they are developing, then the airflow debian package won't have that yet no? [14:59:45] right [15:00:19] And I need a brand new Airflow db for this particular instance [15:00:24] and config [15:00:27] /usr/lib/airflow/condabin/conda create --clone /usr/lib/airflow -p ./airflow-dev0 [15:00:45] something like that... [15:01:16] what is -p ./airflow-dev0 ? [15:01:20] that just failed for me on airflow1002 because of somem permissions [15:01:24] path to new conda env [15:01:43] you don't need to use the weirdo create stacked stuff for this; we only do that to avoid copying the whole HUGE anaconda env over and over again [15:01:57] I see [15:01:58] and also to allow us to have some power in upgrades without having to seek out all user dev conda envs [15:02:01] in jupyter [15:02:18] the airflow conda env just has the airflow needed stuff, not the multi GB anaconda stuff [15:02:27] understand [15:02:27] so you can just fully clone it using regular conda CLI [15:02:32] so just clone the airflow env [15:02:37] lemme see how to do that properlry... [15:02:39] with the command you just used [15:02:48] somethign like that, but it just failed for me...trying [15:12:06] ok mforns it does work, but [15:12:15] it has to download the packages from the internet [15:12:30] but just the first time right? I think that is fine [15:12:31] because the airflow conda env the debian package installs does not include any conda .pkg files [15:12:36] each time an env is created, yes [15:12:51] but the env can stay there, and be reused [15:13:04] y [15:13:08] k [15:13:12] :] [15:13:22] will change [15:13:34] ok [15:13:36] CONDA_PKGS_DIRS=$HOME/.conda/pkgs /usr/lib/airflow/condabin/conda create --name airflow-dev2 --clone /usr/lib/airflow [15:13:55] and [15:14:10] actually, CONDA_PKGS_DIRS is already set for anyone on a stat box that has ever created a stacked anaconda env [15:14:25] but, it can't hurt to set it manually when creating airflow env [15:14:33] mforns: you'll also need http_proxy set [15:14:37] since it will get packages from internet [15:14:51] will add other comments on gitlab... [15:15:23] right, was doing it with pip, will re-add [15:15:39] thanks! [15:20:16] mforns: commented on gitlab...i think? [15:20:47] hehe, checkinb [15:23:19] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10awight) Just following up because I may have been wrong to introduce the idea of "a" source of... [15:23:34] ottomata: I can not see the comments :(, I think it's my faoul, should have created a merge request: just did: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/3 [15:26:53] mfm can you see https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/3#note_1886 [15:26:53] ? [15:29:40] mforns: ^ [15:29:45] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) > which becomes problematic if a revision is deleted If it is deleted without a corr... [15:30:44] mforns: can I help with more deployment/dependencies stuff? perhaps I can start working on some of that tooling? [15:31:14] maybe on the Artifact/Dependency python class and parts of that that sync to HDFS? [15:38:06] 10Analytics-Radar, 10WMDE-GeoInfo-FocusArea, 10WMDE-TechWish-Sprint-2021-11-10: Review existing dashboards and metrics for maps - https://phabricator.wikimedia.org/T295315 (10lilients_WMDE) [15:52:01] ottomata: yes! once I finish the script I can pair with you if you want, or we can split work, but yea, the next step is the Dependency class and the sync script to be called by scap [15:52:33] yeah lets talk after meetings today and see how we can work together. [15:52:44] actually,i could talk now before standup if you like! [15:55:13] ottomata: ok! omw [16:35:03] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-test-coord1002.eqiad.wmnet with OS bullseye [16:35:38] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-test-coord1002.eqiad.wmnet with OS bullseye executed wit... [16:37:04] !log Rerun failed mediawiki-wikitext-history-wf-2021-10 [16:37:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:49:28] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Cmjohnson) updated firmware, updated dns in netbox. Running into errors with the install script. [17:00:57] (03CR) 10MewOphaswongse: [C: 03+2] LinkSuggestionInteraction: Add qualitygate_dialog [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737489 (https://phabricator.wikimedia.org/T274325) (owner: 10Kosta Harlan) [17:01:56] ottomata: whenever you have time later, just ping me :] [17:02:02] (03Merged) 10jenkins-bot: LinkSuggestionInteraction: Add qualitygate_dialog [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737489 (https://phabricator.wikimedia.org/T274325) (owner: 10Kosta Harlan) [17:02:15] k mforns in about 1 hour, maybe a lil more [17:02:36] k [18:25:50] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 2 others: Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10odimitrijevic) [18:45:03] joal: wikidata json dump converter is missing `qualifier-order`, is this intended? [18:45:36] this is not tanny411 - it's a bug [18:45:52] I'll add it then [18:45:57] <3 thank you :) [18:46:01] :D [18:47:40] (03PS1) 10Razzi: [wip] Start of new presto query logger schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/738987 [18:48:06] (03PS2) 10Razzi: [wip] Start of new presto query logger schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/738987 (https://phabricator.wikimedia.org/T269832) [18:49:18] (03PS3) 10Razzi: [wip] Start of new presto query logger schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/738987 (https://phabricator.wikimedia.org/T269832) [18:49:34] (03CR) 10Razzi: "Hi Andrew, Joseph and I have copied the test schema to a new preto_query schema, are we on the right track?" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/738987 (https://phabricator.wikimedia.org/T269832) (owner: 10Razzi) [18:51:24] (03CR) 10jerkins-bot: [V: 04-1] [wip] Start of new presto query logger schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/738987 (https://phabricator.wikimedia.org/T269832) (owner: 10Razzi) [18:58:13] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 2 others: Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10odimitrijevic) Prioritizing the work after conversation with @Ottomata. Let's aim to complete by 11/29 when the web... [19:12:34] joal: also lastrevid [19:13:56] tanny411: ah, right - similarly to qualifier-order, this field must have been added after the job got released, and never got added back [19:14:37] okay, will add it. I would love to check what other fields are there [19:15:01] could not find a way to run through all entities and check for the fields [19:15:46] so far I am doing grep for the keys I know [19:16:10] joal: ^ [19:18:21] tanny411: hm, I can't tihnk of any easy way in spark :( [19:19:35] Yes [19:24:16] tanny411: I assume you could extract all fields hierarchy with the same pattern as the one used in the convertion job (read as string and remove end of line, parse json) [19:24:27] tanny411: and then get the field named from the json object [19:26:16] joal: yes, I did that with spark. It starts listing all properties Pxx as well, since those are keys in the json. It is possible, will take some work to filter etc. I think will do that again after I've run a test with commons ata [19:26:18] data* [19:27:25] ack tanny411, makes sense [19:37:01] (03CR) 10Nray: Restore ReadingDepth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737527 (https://phabricator.wikimedia.org/T294777) (owner: 10Jdlrobson) [20:01:48] (03CR) 10Clare Ming: Restore ReadingDepth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737527 (https://phabricator.wikimedia.org/T294777) (owner: 10Jdlrobson) [20:06:41] 10Analytics, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, and 2 others: AQS 2.0 - https://phabricator.wikimedia.org/T263489 (10Eevans) [21:36:02] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 2 others: Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10Ottomata) Hey y'all, after a conversation with @Milimetric and @mforns, we realized that adding these fields are no... [21:40:18] (03CR) 10Clare Ming: Restore ReadingDepth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737527 (https://phabricator.wikimedia.org/T294777) (owner: 10Jdlrobson) [22:37:52] (03PS1) 10Clare Ming: Update web_ui_reading_depth schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/739016 (https://phabricator.wikimedia.org/T294777) [23:08:13] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 2 others: Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10cjming) thanks @Ottomata for the updates (cc @jwang) @ovasileva I can work on getting `access_method` added to the... [23:18:46] (03CR) 10Nray: Update web_ui_reading_depth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/739016 (https://phabricator.wikimedia.org/T294777) (owner: 10Clare Ming) [23:22:07] (03CR) 10Clare Ming: Update web_ui_reading_depth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/739016 (https://phabricator.wikimedia.org/T294777) (owner: 10Clare Ming) [23:25:08] (03CR) 10Nray: Update web_ui_reading_depth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/739016 (https://phabricator.wikimedia.org/T294777) (owner: 10Clare Ming) [23:37:27] (03PS2) 10Clare Ming: Update web_ui_reading_depth schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/739016 (https://phabricator.wikimedia.org/T294777) [23:38:24] (03CR) 10Clare Ming: Update web_ui_reading_depth schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/739016 (https://phabricator.wikimedia.org/T294777) (owner: 10Clare Ming)