[02:34:04] 10Analytics, 10Analytics-Wikistats: Translations? - https://phabricator.wikimedia.org/T287661 (10Sabeloga) Nice, thanks, much appreciated! :D [06:24:05] razzi: re: > "restbase" is the aqs cassandra cluster, right? [06:25:19] razzi: Nope :) In the cassandra cluster's dashboard you need to select the analytics datasource (we have a separate prometheus instance) and then you'll see "aqs" as cluster (that should also be present among the option of the cassandra cookbook) [06:25:47] razzi: the restbase cluster is the one managed by SRE [06:26:31] for sre.aqs.roll-restart aqs [06:26:49] we use the canary basically to test safely the new druid mw history snapshot [06:27:14] (so the cookbook depools one aqs node, restart nodejs and ask to the operator to test locally) [06:27:38] if you have doubts/etc.. ping me anytime! [07:58:14] Hi, is the watchlist dump available? There is information about this table (https://www.mediawiki.org/wiki/Manual:Watchlist_table) but I cannot find the dump [08:12:59] wences91: watchlist contents are considered private, so contents of it are not available in any public dumps [08:14:04] ok, thanks! majavah [09:53:29] (03CR) 10Svantje Lilienthal: "This change is ready for review." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/709007 (https://phabricator.wikimedia.org/T287578) (owner: 10Svantje Lilienthal) [10:07:07] 10Analytics, 10Analytics-Kanban, 10Platform Engineering, 10Research, 10Patch-For-Review: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10BTullis) @Ottomata - FYI I spotted this on an-test-coord1001 this morning. ` Warning: /Stage[main]/Profil... [10:12:28] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10BTullis) Here are some fragments the first puppet run on an-test-coord1001.eqiad.wmnet after the patch was merged. I'm concerned... [10:17:30] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10BTullis) The service was restarted automatically, but a Kerberos related error was generated in the logs. ` Jul 30 09:06:34 an-te... [10:28:04] btullis: o/ [10:28:20] Hi elukey. [10:28:33] I think that http.server.authentication.krb5.principal-hostname may not be needed, Presto IIRC tends to be really upset for unused configs [10:28:43] does it work if you remove it manually and restart? [10:28:51] (curious now :P) [10:29:23] I will try now. I think that parameter was added in a version of presto later than ours. [10:29:26] > http.server.authentication.krb5.principal-hostname was added in Presto 302 [10:29:33] ahhh there you go [10:29:47] just to add confusion, there are two Presto out there [10:29:59] 1) Prestodb (the one from facebook that we use) [10:30:05] 2) PrestoSql, now called "Trino" [10:30:22] and they have completely different configs and docs [10:31:45] It appears that there is still an issue with the setting removed. [10:31:48] > Jul 30 10:30:53 an-test-coord1001 presto-server[45006]: 2021-07-30T10:30:53.277Z ERROR Announcer-2 com.facebook.airlift.discovery.client.Announcer Service announcement failed after 37.13ms. Next request will happen within 1000.00ms [10:32:49] > uncer-0 com.facebook.airlift.discovery.client.Announcer Cannot connect to discovery server for announce: Announcement failed for https://analytics-test-presto.eqiad.wmnet:8281 [10:33:22] same error on an-test-presto1001 [10:33:48] I bet that there is a TLS error [10:33:54] Ah right. I was looking at Trino. Will look again at the facebook one. [10:34:34] the /etc/presto/log.properties has INFO for logging, maybe DEBUG could give us more info, but it will spam a lot :) [10:34:52] Yes, if it's TLS it might be related to the permissions of the certificate files. Do you think I need to revert while I investigate, or is it safe for me to work on the test cluster like this? [10:35:00] it is fine [10:35:29] going afk for lunch, have fun :) [10:35:38] Will do. Thanks. [10:52:49] (03CR) 10Awight: added template wizard sessions (038 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/709007 (https://phabricator.wikimedia.org/T287578) (owner: 10Svantje Lilienthal) [10:57:29] Looks like the `sslcert::x509_to_pkcs12` didn't fire properly here: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/presto/server.pp#110 [10:57:56] ...because /etc/presto/ssl/server.p12 didn't get recreated as it was supposed to and still contains only the puppet certificate. [11:55:16] The certificate wasn't generated because it failed the 'unless' test here: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/sslcert/manifests/x509_to_pkcs12.pp#31 - i.e. it won't overwrite an existing valid keystore. [12:05:31] 10Analytics-Clusters, 10Analytics-Kanban, 10User-MoritzMuehlenhoff: Reduce manual kinit frequency on stat100x hosts - https://phabricator.wikimedia.org/T268985 (10MoritzMuehlenhoff) Some random thoughts here, some of those are wild guesses/wishful thinking since I haven't looked at krenew in detail yet :-)... [12:47:24] btullis: ahh nice! Does it work now?? [12:49:50] Yes, I think so. The presto coordinator starts when I remove `http.server.authentication.krb5.principal-hostname`and when I manually executed the `openssl pkcs12` command that I had intended puppet to run. [12:51:14] However, I forgot to set `profile::presto::server::generate_certificate: true` on an-test-presto1001, so that is still trying to use the puppet certificate. [12:51:19] btullis: if there is an issue for x509_to_pkcs12 can yuo raise a bug, however im not sure ill be able to get to it today (which is my last day before vacasion) [12:55:29] Cool, thanks jbond: I can make an iterative patch to get presto working in the test cluster anyway, with a manual delete/move of the .p12 file. Would you prefer that I just raise a bug for x509_to_pkcs12 or have a go at a patch too? [12:56:17] I don't think that there would be any hurry to merge it, so you could look at it when you're back anyway. [12:58:05] sure if yu want to have a go at the patch please do :) [13:00:42] Cool, thanks. I'll do the presto fixes first and check that everything else is OK with this method. When I create a bug report, do I tag it with Infrastructure Foundations? [13:01:49] if you tagg it puppet it should automaticly add the infratrsucture foundations one (you can of course also add it manually) also please add me [13:02:59] 👍 [13:03:14] If I was using kafka-main1001.eqiad.wmnet to look at jobs when we were in eqiad, any idea what host i should be using now we are in codfw? [13:07:54] addshore: I would guess at `kafka-main2001.codfw.wmnet`(from here: https://github.com/wikimedia/puppet/blob/HEAD/hieradata/common.yaml#L644) [13:08:31] (03PS3) 10Svantje Lilienthal: added template wizard sessions [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/709007 (https://phabricator.wikimedia.org/T287578) [13:08:57] btullis: thanks, that indeed looks right [13:09:18] a pleasure [13:12:15] (03CR) 10Svantje Lilienthal: "Thanks! I hope I got everything." (038 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/709007 (https://phabricator.wikimedia.org/T287578) (owner: 10Svantje Lilienthal) [13:46:09] 10Analytics, 10Analytics-Kanban, 10Platform Engineering, 10Research, 10Patch-For-Review: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10Ottomata) Oh thanks, cool, that's from when we moved this instance over to an-test-client for kerberos rea... [13:58:03] 10Analytics, 10Analytics-Wikistats: wikistats: montly pageview dumps are not bz2 files - https://phabricator.wikimedia.org/T287684 (10Radim.kubacki) BTW: Parquet compression would be significantly more effective if the line was splitted into its parts, i.e. with fields for wiki code, article, pageId, type, cou... [13:58:31] OK, presto is working again on the test cluster, using the new CNAME alias and matching Kerberos principal. [13:59:33] \o/ [13:59:44] does it work also with a simple query from an-test-client? [14:00:29] https://www.irccloud.com/pastebin/KVaS5rIm/ [14:00:47] Think so. [14:01:08] yep just tested, all working! [14:06:17] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10BTullis) A few patches later, presto is working again in the test cluster. We discovered that there is a peculiarity with the `s... [14:06:56] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10BTullis) I have verified that a simple query works from an-test-client1001. ` btullis@an-test-client1001:~$ presto --catalog ana... [14:10:12] btullis: btw i was thikning...if you wanted to actually test failover in the test cluster, we could create a new ganeti vm to be an-test-coord1002 [14:11:14] addshore: you can use either main codfw or eqiad [14:11:19] all the topic data exists in both [14:11:32] unless you are looking at consumers offset lag metrics or something [14:11:46] you coudl also even use jumbo! the topic data is there too [14:12:11] https://wikitech.wikimedia.org/wiki/Kafka#Kafka_Clusters [14:14:20] 10Analytics, 10Analytics-Wikistats: wikistats: montly pageview dumps are not bz2 files - https://phabricator.wikimedia.org/T287684 (10fdans) p:05Triage→03High My apologies for this. The intended format is bz2, not parquet. Clearly a miss of mine when configuring the job, looking into options to regenerate/... [14:15:55] ottomata: interesting. Yes I hadn't thought of that option. It would give us a means of testing some of the other cluster services as well. [14:17:00] I've been a big user of corosync/pacemaker in the past for HA services, but we don't use that at all here, do we? [14:18:54] one thing that I am wondering about presto is how a failover affects the workers [14:19:29] they do advertise periodically their presence to the query manager [14:19:56] so in case of a failover, the new query manager is probably unaware of workers [14:20:12] and needs to get some time to get up to speed [14:20:28] that is completely fine in my opinion, maybe we could figure out what is this timeframe [14:20:34] elukey: iirc (and i might not), all prestos could be query managers? [14:21:19] ottomata: no idea, but the workers need to advertise themselves anyway even if all are query managers [14:21:38] I don't recall any fixed list of presto workers [14:22:22] (this is why I was wondering about the failover) [14:24:12] hm i think you are right [14:24:15] "When a Presto worker process starts up, it advertises itself to the discovery server in the coordinator, which makes it available to the Presto coordinator for task execution. [14:24:15] " [14:24:19] https://prestodb.io/docs/current/overview/concepts.html [14:25:15] althought hte 'disccovery server' can be run separately from the corodinator if needed [14:25:16] https://prestodb.io/docs/current/installation/deployment.html [14:27:28] (03CR) 10Awight: "Looks right—please smoke test on a stat* server at your convenience." (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/709007 (https://phabricator.wikimedia.org/T287578) (owner: 10Svantje Lilienthal) [14:28:15] yeah interesting elukey indeed what happens.... i guess the workers will just start registering themselves with the cname addy [14:28:41] hmmm maybe what we need is a forwarding or multipleing discovery addy! [14:28:54] so taht both discovery servers on both coord nodes get all worker registrations [14:30:28] The only open source option I've found for HA with current versions of prestodb uses a proxy to forward to an active coordinator, with a standby: https://coding-stream-of-consciousness.com/2018/12/29/presto-coordinator-high-availability-ha/ [14:30:49] But they still say "Any active queries at the time a coordinator fails will fail though – we can’t do anything about that unless Presto starts supporting HA internally." [14:31:57] https://stackoverflow.com/questions/63701904/presto-coordinator-does-not-have-support-for-high-availabiltiy [14:32:02] yeah active queries failing is fine [14:32:07] we can't avoid that and it won't be a big deal [14:35:45] interesting btullis that setup sounds much more clean but complicated than our dns cname thing [14:35:54] our cname may be ok for our purposes [14:36:01] 10Analytics, 10EventStreams: stream-beta.wmflabs.org seems broken (can't see my mediawiki-create events) or anything else - https://phabricator.wikimedia.org/T287760 (10Addshore) [14:36:03] but it might need some testing of what luca is wondering [14:36:04] corosync/pacemaker is mentioned as a workable automatic failover mechanism here (in addition to the haproxy option): https://github.com/prestodb/presto/issues/3918#issuecomment-441196092 [14:36:18] what will the workers do when the discovery server changes? [14:36:24] they shoudl just send traffic to the new coord [14:36:28] and then it will see the workers [14:36:31] but it might take several minutes [14:36:39] which...with a dns change is true anyway [14:36:51] btullis: cool i'm not familiar with those, but they sound cool [14:36:55] nginx woudl prrobably work too [14:37:11] i think we have a few haproxy uses, def have nginx [14:37:14] in wmf prod [14:37:57] might be worth trying in test cluster with a new an-test-coord1002 [14:38:28] ...but if we had a virtual IP that is associated with the CNAME, then during a failover corosync would migrate the VIP and the active presto server as a group. Wouldn't require a DNS change. [14:38:54] btullis: https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM [14:39:19] btullis: ya that woudl be faster than dns for sure. what would be the method for failing over? [14:39:24] manually? [14:40:28] fwiw i don't think we need fast failover [14:40:47] we just need something that allows dashboards and clients to work without changing a host addrress config [14:40:59] its ok if running queries fail [14:42:59] With pacemaker one can do manual or automatic failover of resources. For manual, we can do for example: 'sudo crm_migrate -r presto_group an-test-coord1002`to migrate a group. [14:43:45] Or you could just take a cluster node offline, which migrates all resources away. 'sudo crm_node standby an-test-coord1001' [14:43:54] That sort of thing. [14:47:36] OK, I'll make a ticket to create an-test-coord1002 and we can assign it and talk about next steps during grooming on Monday, if you think that's a good idea. [14:50:54] ottomata: any idea what might be up with https://stream-beta.wmflabs.org/v2/ui/#/? [14:54:21] also struggling to see the events im triggering in beta kafka [14:54:25] I see output in mw logs of `wikidatawiki 1.37.0-alpha EventBus DEBUG: Using destination_event_service eventgate-analytics-external for stream wd_propertysuggester.server_side_property_request.` [14:54:28] but nothing in kafka [15:09:52] ohia michaelcochez [15:13:08] FYI this all relates to https://phabricator.wikimedia.org/T285098#7248774 [15:24:47] (03CR) 10Mholloway: "> Patch Set 3:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [15:31:45] 10Quarry: quarry-web-01 leaks files in /tmp - https://phabricator.wikimedia.org/T238375 (10Andrew) 05Open→03Resolved As of today the oldest files in /tmp are from the 26th, so I think tmpreaper is doing its job. [16:14:54] 10Analytics, 10EventStreams: stream-beta.wmflabs.org seems broken (can't see my mediawiki-create events) or anything else - https://phabricator.wikimedia.org/T287760 (10Michaelcochez) [18:14:12] 10Analytics, 10Analytics-Wikistats: Translations? - https://phabricator.wikimedia.org/T287661 (10Sabeloga) 05Resolved→03Open Hi again, there seems to have been some errors when the translations were carried over. Some text that is translated on Translatewiki doesn't appear translated on site (like [[ https... [18:29:00] (03CR) 10Sharvaniharan: "@Ottomata @Michael Holloway would it be possible to merge this if you both are done with the review? I am trying to get it in, in this rel" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/708597 (https://phabricator.wikimedia.org/T287652) (owner: 10Sharvaniharan) [18:35:37] (03CR) 10Ottomata: "Uhhhhh i dunno what I was thinking...when I read this code the first time I thought you were extracting a string field value from the retu" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [18:35:40] (03CR) 10Ottomata: [C: 03+1] Add Refine transform function to add normalized host [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [18:41:56] 10Analytics, 10EventStreams: stream-beta.wmflabs.org seems broken (can't see my mediawiki-create events) or anything else - https://phabricator.wikimedia.org/T287760 (10Ottomata) Ah, this was because I recommended to @Michaelcochez to use `+wikidatawiki` to add the stream config entries. This is fine for MW c... [18:44:03] (03CR) 10Ottomata: [C: 03+1] "+1 from me I'll let Michael merge." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/708597 (https://phabricator.wikimedia.org/T287652) (owner: 10Sharvaniharan) [18:44:23] (03CR) 10Ottomata: [C: 03+2] "Oh, Michael already +1ed, merging." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/708597 (https://phabricator.wikimedia.org/T287652) (owner: 10Sharvaniharan) [18:45:07] (03Merged) 10jenkins-bot: Migrate MobileWikiAppNotificationInteraction from legacy to MEP Bug: T287652 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/708597 (https://phabricator.wikimedia.org/T287652) (owner: 10Sharvaniharan) [18:52:56] addshore: michaelcochez yt? [18:53:07] 10Analytics, 10EventStreams, 10Patch-For-Review: stream-beta.wmflabs.org seems broken (can't see my mediawiki-create events) or anything else - https://phabricator.wikimedia.org/T287760 (10Ottomata) Nope. I think mediawiki-config currently does not allow us to override default settings for beta. Hm. [18:56:30] 10Analytics, 10EventStreams, 10Patch-For-Review: stream-beta.wmflabs.org seems broken (can't see my mediawiki-create events) or anything else - https://phabricator.wikimedia.org/T287760 (10Ottomata) I think we need to either: Declare the streams in both wikidatawiki and metawiki in InitialiseSettings-labs.p... [18:56:48] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Metrics-Platform: wgEventStreams (EventStreamConfig) should support per wiki overrides - https://phabricator.wikimedia.org/T277193 (10Ottomata) [18:58:05] 10Analytics, 10EventStreams, 10Patch-For-Review: stream-beta.wmflabs.org seems broken (can't see my mediawiki-create events) or anything else - https://phabricator.wikimedia.org/T287760 (10Ottomata) @Michaelcochez @Addshore I'd go ahead and just move these configs to InitialiseSettings.php myself (nothing wi... [19:17:15] ottomata: here now, checking the tasks. [19:20:18] My understanding is that if the configuration is done in InitialiseSettings.php, then it is also applied in beta, as the InitialiseSettings-labs.php is only the additional parts, correct? [19:20:35] In beta we are/should be ready to produce events. [19:21:17] After we know the events are fine there, we plan to move to test. [19:24:51] yes [19:25:14] michaelcochez: is the producer code only in beta, or is it already in prod? [19:25:24] Only in beta for now. [19:25:41] ok, then lets merge do it in InitialiseSettings.php so you can test [19:25:42] doing now [19:26:18] As we need the events for the A/B testing, we want to make sure they work before moving to test/prod. [19:33:48] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/709098 [19:35:08] lgtm. except for consistency of indentation :-) [19:35:28] (spaces vs. tabs war ahead) [19:36:15] side question: what is 'canary_events_enabled' => true, ? I just mimicked the examples without knowing what that is. [19:39:55] Jenkins has chosen sides already it seems. He is keeping tabs on the formatting. [19:40:57] 20887 still has spaces. [19:41:23] ^fixed now. [19:47:13] One more question: does the kafka stream get created when defined in this config, or only once there are actual events generated? [19:52:09] One more thing. The root volume of deployment-kafka-jumbo-2.deployment-prep.eqiad1.wikimedia.cloud is nearly 100% full. Not sure that matters. [19:57:13] waiting for post merge to get it deployed in beta to check [19:57:43] michaelcochez: re canary [19:57:48] i should make some docs to link you but [19:57:48] https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#examples [19:57:55] they are for monitoring [19:58:06] we should switch that to being default true, but there are some streams we'd ahve to disable it for [19:58:14] i'll make a task for that [19:58:45] https://phabricator.wikimedia.org/T251609 [19:59:36] 10Analytics, 10Event-Platform: Enable canary events for streams by default - https://phabricator.wikimedia.org/T287789 (10Ottomata) [19:59:37] https://phabricator.wikimedia.org/T287789 [19:59:47] 10Analytics-Radar, 10Growth-Scaling, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10Etonkovidova) 05Open→03Resolved [19:59:49] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10Product-Analytics (Kanban): Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10Etonkovidova) [19:59:51] Clear. So in this stream we will also find these canary events once in a while. (good to know when w do the A/B analysis) [19:59:53] kafka topics don't created until there are events produced to them [20:00:01] they will be in the stream [20:00:03] but not in the hive table [20:00:07] they are filtered out there [20:00:23] Clear [20:01:06] There was an older task for this it seems: https://phabricator.wikimedia.org/T266798 [20:01:27] 6 linked already. [20:01:54] yea thats to enable for all and is complicated [20:02:02] because there are many consumersr already that won't be expecting the canary events [20:02:10] the one i made is just to make enabled the default [20:02:17] but still disable for the complicated streams [20:07:39] michaelcochez: yeah / was full on that node [20:07:45] dunno why, but i cleared some old logs out [20:21:33] "Post-merge build succeeded." So, I assume this might be there now? [20:25:42] I'm on a boat now, but excited to see this moving :) [20:26:56] michaelcochez: yeah its there... but i think that disk full did cause some kafka issues [20:26:58] resolving...? [20:27:08] was about to say "here you go it works!!!! but now something else..." [20:27:29] The volume for kafka itself seemed separate from / [20:30:20] yeah [20:30:32] your new topic was created, but did not look healthy [20:30:36] it was not assigned a kakfa broker leader [20:30:39] i do not know why [20:30:48] i deleted it and am trying to recreate it, but things seem a little stuck [20:30:56] also this https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/709103 [20:31:06] the removal of that caused the stream-beta thing to not work right [20:31:15] it tried to subscribe to topics that didn't exit [20:31:17] exist [20:36:50] grr beta things always seem to get stale and stop working after long times [20:37:06] i don't have a lot of time to totally figure out whats wrong with kafka atm [20:37:15] i could wipe it in beta and start clean, no one would meind there... [20:37:15] ghm [20:37:23] ya going to do that [20:45:51] kafkacat -L -b deployment-kafka-jumbo-2.deployment-prep.eqiad1.wikimedia.cloud | grep propertysuggester [20:45:51] topic "eqiad.wd_propertysuggester.client_side_property_request" with 1 partitions: [20:45:51] topic "eqiad.wd_propertysuggester.server_side_property_request" with 1 partitions: [20:45:51] 8-) [20:49:21] ya better after wiping [20:49:26] but still your events are not coming through! [20:49:28] gRrRR [20:49:34] i can produce an event manually via curl through eventgate [20:49:42] and i can force browser to send [20:50:50] I see events. [20:51:29] oh wait [20:51:30] they are! [20:51:33] i was looking at server side doh [20:51:37] oh great!!!! [20:51:44] Server side should be there as well... [20:51:59] do you see them in eventstreams ui? [20:52:04] at stream-beta? [20:52:12] kafkacat -C -b deployment-kafka-jumbo-2.deployment-prep.eqiad1.wikimedia.cloud -t eqiad.wd_prertysuggester.server_side_property_request [20:52:15] aye [20:52:53] ok i don't know why i cant see thtem in stream-beta [20:52:59] but michaelcochez i have to run! [20:53:09] ok if we wait til monday to fix that bit? [20:53:12] i think you can test now ya [20:53:27] btw, if you have issues producing, you can POST events to [20:53:27] http://deployment-eventgate-3.deployment-prep.eqiad.wmflabs:8492/v1/events [20:53:27] We can look at them with kafkacat. Works for now. [20:53:31] Thanks a million [20:53:35] without the ?hasty=true bit [20:53:38] and it iwll return the error [20:53:40] e.g. [20:53:50] cat e.json [20:53:50] {"$schema":"/analytics/mediawiki/wd_propertysuggester/client_side_property_request/1.0.0","dt":"2021-07-30T20:42:54.748Z","entity_id":"Q393194","event_id":"162765789408917eb23f8","meta":{"stream":"wd_propertysuggester.client_side_property_request","domain":"wikidata.beta.wmflabs.org"},"num_characters":2,"session_id":"YQQT2X7pLd-EyzjN7IWIAAAAAFI","user_id":"17eb23f8"} [20:53:56] curl -H 'Content-Type: application/json' -X POST -v -d@e.json 'http://deployment-eventgate-3.deployment-prep.eqiad.wmflabs:8492/v1/events' [20:53:59] ok laterz! [22:21:48] Doing the roll restart on the druid test cluster (1 node) [22:21:50] sudo cookbook sre.druid.roll-restart-workers test [22:22:17] !log razzi@cumin1001:~$ sudo cookbook sre.druid.roll-restart-workers test [22:22:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:22:35] Will do the actual nodes on Monday :)