[07:40:53] Good morning! I'm seeking a host that I could test-reimage that is both in conftool and against which I could run httpbb to test the experimental reimage cookbook. By any chance do you have any? :) [08:40:03] any appserver in eqiad [08:44:20] that's great, should I pick a canary? [08:58:51] pick any, really [09:00:00] * volans stealing mw1414 for ~1h [09:29:33] 10serviceops, 10Observability-Logging, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Kubernetes logs (container stderr,strout) do not show up in Elasticsearch/Kibana - https://phabricator.wikimedia.org/T289766 (10JMeybohm) 05Open→03Resolved [10:05:12] 10serviceops, 10SRE: Clean up old Docker images on deneb - https://phabricator.wikimedia.org/T287222 (10JMeybohm) Failed to build an image today: ` 2021-09-13 09:51:42,980 [docker-pkg-build] ERROR - Build failed: devmapper: Thin Pool has 149221 free data blocks which is less than minimum required 163840 free d... [10:41:38] so couple of follow up questions [10:42:05] 1) should I run anything after the reimage before repooling? Like a scap pull or something, if not done automatically by puppet [10:42:49] 2) assuming you have a host with 3 services, one pooled=yes, one pooled=no, one pooled=inactive, what would you expect the reimage to do? put all 3 in inactive and at the end tell you what was their original state? [10:53:07] volans: regarding 1): When I pooled new mw machines I use the following steps: set/pooled=inactive, set/weight=30, set/pooled=no, scap pull, check icinga alerts for mw1414 (all should be green exept DSH), set/pooled=yes [10:53:57] volans: scap pool is done by puppet on boot IIRC [10:54:03] scap pull [10:54:48] regarding 2, I'd expect a reimaged server which passes our httpbb tests to be set to active, but others might have different opinions [10:55:56] so far the old script was just printing the repooling message to the user, but has never repooled automatically, also it doesn't know what kind of host it is with conftool support, might be a cache host, etc... [10:56:05] but we can adapt it at will ofc [10:56:28] if we're confident httpbb + icinga green are enough [10:56:44] we could repool automatically only if --httpbb is set, it passes httpbb and icinga is green [10:56:54] or something like that [10:57:26] I think printing is also ok [10:57:41] does the printed out line state people also need to set a weight? [10:59:29] don't think so, I'll add it mentioning the current weight [11:01:46] well that's only important for new installations actually [11:01:52] and there the weight should be changed [11:03:23] might be useful to recall to check the current weight anyway, it might have been lowered before the reimage for some reason :) [11:03:31] I'll put something up, thanks for the feedback [11:03:49] now the hard question [11:04:10] you had a couple followups [11:04:14] a service is pooled=inactive and you run the reimage with --conftool-value 'no', should it put it in 'no'? [11:04:52] I think so yes, if someone decides from the cli they explicitly want that value [11:05:01] btw, most of this stuff is going away in a few months [11:05:04] ack [11:05:08] yay for mw yes [11:11:33] * volans keeping mw1414 for a little while for follow up testing [12:04:41] 10serviceops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10hashar) That is still happening from time to time. Any person or team I can raise th... [12:13:04] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) [12:45:54] 10serviceops, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech, and 2 others: Deploy Flink (rdf-streaming-updater) to kubernetes (k8s) - https://phabricator.wikimedia.org/T264006 (10Gehel) [13:58:43] 10serviceops, 10SRE, 10Patch-For-Review, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10akosiaris) >>! In T210704#7345278, @Jdforrester-WMF wrote: > Has Thumbor been upgraded, or is this waiting on {T21... [14:00:21] 10serviceops, 10SRE, 10Patch-For-Review, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10akosiaris) Ah, `3d2png` is also shipped with thumbor and that one is indeed nodejs. I now get the question, sorry... [14:09:03] 10serviceops, 10SRE, 10Wikifeeds, 10Patch-For-Review: wikifeeds in codfw seems failing health checks intermittently - https://phabricator.wikimedia.org/T290445 (10elukey) I have started https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-06_Wikifeeds [14:12:58] 10serviceops, 10SRE, 10Patch-For-Review, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10MoritzMuehlenhoff) Not sure why restbase is ticked off, though? The restbase hosts in production currently run nod... [15:07:39] joe / jelto: in case you have few minutes, I sent https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/720770/ as a follow up of our previous chat [15:07:48] meeting, sorry [16:16:37] * volans lends back mw1414 to serviceops, reimaged, scap pulled, icinga green, pooled [16:16:52] thanks for the loan :) [16:38:46] 10serviceops, 10DC-Ops, 10ops-eqiad: Q1: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10RobH) [16:39:04] 10serviceops, 10DC-Ops, 10ops-eqiad: Q1: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10RobH) [16:41:01] 10serviceops, 10DC-Ops, 10ops-eqiad: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10RobH) p:05Medium→03High [17:53:24] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) After **round 1** fixes, we run another set of 10k requests with and without xhprof. Results can be found here: https://peop... [17:55:47] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) [18:08:41] 10serviceops, 10MW-on-K8s, 10SRE: Make all httpbb tests pass on the mwdebug deployment. - https://phabricator.wikimedia.org/T285298 (10dancy) [18:14:38] 10serviceops, 10SRE, 10Patch-For-Review, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Jdforrester-WMF) [18:16:41] 10serviceops, 10SRE, 10Patch-For-Review, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Jdforrester-WMF) >>! In T210704#7348478, @MoritzMuehlenhoff wrote: > Not sure why restbase is ticked off, though?... [18:52:49] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dancy) Still not working: https://foundation.wikimedia.org/static/current/skins/Timeless/resources/print.css [19:17:14] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Niharika) >>! In T288844#7321649, @mepps wrote: > @Niharika Based on my read, it also looks like the 10 day delay would only be when th... [19:47:12] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-9), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.12.0 - https://phabricator.wikimedia.org/T285857 (10ldelench_wmf) [20:16:07] 10serviceops, 10Prod-Kubernetes, 10Toolhub, 10Kubernetes: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10bd808) >>! In T290357#7344966, @akosiaris wrote: > That being said, we don't have yet implemented this functionality and we 'd like to see how... [20:32:27] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10sbassett) >>! In T288844#7349692, @Niharika wrote: >>>! In T288844#7321649, @mepps wrote: >> @Niharika Based on my read, it also looks... [20:42:58] 10serviceops, 10Datacenter-Switchover: switchdc services cookbook should allow pooling services in both DCs (active/active) - https://phabricator.wikimedia.org/T290919 (10Legoktm) [20:52:11] 10serviceops, 10Datacenter-Switchover: switchdc services cookbook should allow pooling services in both DCs (active/active) - https://phabricator.wikimedia.org/T290919 (10RLazarus) Something like `--restore` is also a possibility, as sort of a middle ground. When writing this we'll also have to decide whether... [22:11:18] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Radar): The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10dancy) Can someone give me an example of a curl command that exercises the /w/static.php cod... [22:13:25] 10serviceops, 10MediaWiki-Cache, 10MediaWiki-General, 10Performance-Team, 10User-jijiki: Use monotonic clock instead of microtime() for perf measures in MW PHP - https://phabricator.wikimedia.org/T245464 (10Krinkle) Looks like we already bundle and deploy `symfony/polyfill-php73` in production for `symfo...