[08:19:47] 10serviceops, 10ops-eqiad: Broken CPU on parse1002 - https://phabricator.wikimedia.org/T326119 (10MoritzMuehlenhoff) [09:54:24] 10serviceops, 10MW-on-K8s, 10SRE, 10observability, 10Patch-For-Review: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Clement_Goubert) Kafka and logstash ingestion points configured. [09:54:35] 10serviceops, 10MW-on-K8s, 10SRE, 10observability, 10Patch-For-Review: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Clement_Goubert) [09:55:29] 10serviceops, 10MW-on-K8s, 10SRE, 10observability, 10Patch-For-Review: New mediawiki.httpd.accesslog topic on kafka-logging + logstash and dashboard - https://phabricator.wikimedia.org/T324439 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert [10:37:07] 10serviceops, 10SRE, 10ops-eqiad: Broken CPU on parse1002 - https://phabricator.wikimedia.org/T326119 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=8a35d570-5625-4e3d-a6ff-eb737a303711) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 1 host(s) and their services with reason: CPU1 machi... [10:38:56] 10serviceops, 10DC-Ops, 10ops-eqiad: hw troubleshooting: CPU1 machine check error on parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326119 (10Clement_Goubert) p:05Triage→03High a:03Cmjohnson [10:39:40] 10serviceops, 10DC-Ops, 10ops-eqiad: hw troubleshooting: CPU1 machine check error on parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326119 (10Clement_Goubert) p:05High→03Medium [10:45:44] Objections to starting rolling reboots of parse servers in codfw? [10:47:21] 10serviceops, 10DC-Ops, 10ops-eqiad: hw troubleshooting: CPU1 machine check error on parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326119 (10Clement_Goubert) a:05Cmjohnson→03Jclark-ctr [10:51:26] 10serviceops, 10Content-Transform-Team-WIP, 10Maps, 10Patch-For-Review: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10Jgiannelos) I think at this point these are the ways forward I can think of: * Full planet import * Codfw full planet im... [11:01:43] 10serviceops, 10Data-Engineering-Radar, 10MW-on-K8s, 10Patch-For-Review: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10Clement_Goubert) GeoIP data copied to all mw-on-k8s kubernetes hosts. [11:16:19] <_joe_> claime: none but check the deployment windows [11:18:14] ack, I have until 1400UTC to proceed (we are in the kubernetes deployment window until 1200UTC and then I'm clear until the UTC afternoon backport window) [11:19:10] <_joe_> the k8s window isn't an issue for parse servers [11:19:17] I know :) [11:19:48] What I mean is there should not be deployments right now because we're in our sre dedicated window [11:19:50] <_joe_> ah sorry I misunderstood [11:20:26] I'll start with codfw and we'll see afterwards [11:22:52] _joe_: 5%/30s sound good? [11:23:15] (30s grace sleep) [11:23:46] <_joe_> yep [12:25:03] 10serviceops, 10Content-Transform-Team-WIP, 10Maps, 10Patch-For-Review: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10TheDJ) Was this problem ever relayed to OSM btw ??? I mean this cannot have just been us that ran into this problem right ? [12:34:47] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10Jgiannelos) What would be the right channel to communicate this issue ? [13:26:36] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10Sabas88) Imposm should be already tested against the invalid layer case https://github.com/omniscale/imposm3/blob/master/test/completedb_test.go#... [13:29:26] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10TheDJ) > Imposm should be already tested against the invalid layer case Maybe we are on an older imposm version ? Related code for that test case... [14:00:38] 10serviceops, 10SRE, 10Wikimedia-Site-requests, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10LSobanski) [14:00:52] 10serviceops, 10SRE, 10Wikimedia-Site-requests, 10Performance-Team (Radar), 10Russian-Sites: Increase $wgMaxArticleSize to 4MB for ruwikisource - https://phabricator.wikimedia.org/T308893 (10LSobanski) [14:03:52] 10serviceops, 10SRE, 10Maps (Maps-data): Tune thread for osm2pgsql / postgres max connections for Maps - https://phabricator.wikimedia.org/T137229 (10LSobanski) [14:33:41] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10Jgiannelos) This is our apt package for imposm: ` imposm3: Installed: 0.11.0+git20201104.4758cf4-1 Candidate: 0.11.0+git20201104.4758cf4-1... [14:36:52] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10Jgiannelos) I will defer to serviceops now that we have a better understanding of the problem and some potential solutions [16:18:03] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10jijiki) I will explore our option to upgrade to 0.11.1 and get back to you [19:54:15] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:rack/setup/install rdb101[34] - https://phabricator.wikimedia.org/T326170 (10RobH) [19:54:22] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:rack/setup/install rdb101[34] - https://phabricator.wikimedia.org/T326170 (10RobH) [19:56:20] 10serviceops, 10SRE: rdb101[34] serviceops implementation tracking - https://phabricator.wikimedia.org/T326171 (10RobH) [19:56:34] 10serviceops, 10SRE: rdb101[34] serviceops implementation tracking - https://phabricator.wikimedia.org/T326171 (10RobH) [19:59:25] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install rdb101[34] - https://phabricator.wikimedia.org/T326170 (10RobH) [20:00:30] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install rdb101[34] - https://phabricator.wikimedia.org/T326170 (10RobH) 05Open→03Stalled Please note this is a Q4 order being placed in Q3 for discounting, but won't land in the datacenter until April 15th or later. I'm not entirely certain how...