[00:02:02] 10serviceops, 10API Platform, 10Growth-Structured-Tasks, 10Image-Suggestions, and 7 others: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset b... - https://phabricator.wikimedia.org/T313973 [00:07:49] 10serviceops, 10API Platform, 10Growth-Structured-Tasks, 10Image-Suggestions, and 7 others: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset b... - https://phabricator.wikimedia.org/T313973 [01:28:56] 10serviceops, 10serviceops-collab, 10vrts: Setup VRTS on WMCS - https://phabricator.wikimedia.org/T317059 (10Dzahn) [02:20:30] 10serviceops, 10API Platform, 10Growth-Structured-Tasks, 10Image-Suggestions, and 7 others: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset b... - https://phabricator.wikimedia.org/T313973 [08:59:07] 10serviceops, 10decommission-hardware, 10Patch-For-Review: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by cgoubert@cumin1001 for hosts: `wtp[1029-1033].eqiad.wmnet` - wtp1029.eqiad.wmnet (**FAIL**) - Down... [09:03:04] 10serviceops, 10API Platform, 10Growth-Structured-Tasks, 10Image-Suggestions, and 7 others: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset b... - https://phabricator.wikimedia.org/T313973 [09:03:58] 10serviceops, 10API Platform, 10Growth-Structured-Tasks, 10Image-Suggestions, and 7 others: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset b... - https://phabricator.wikimedia.org/T313973 [09:56:47] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) [09:57:25] 10serviceops, 10decommission-hardware, 10Patch-For-Review: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Clement_Goubert) 05Open→03In progress [09:59:54] 10serviceops, 10decommission-hardware, 10ops-eqiad, 10Patch-For-Review: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Clement_Goubert) [09:59:56] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) 05In progress→03Resolved [10:00:04] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10Clement_Goubert) [10:03:12] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10Clement_Goubert) Old wtp servers now in the hands of DCops for decom. [10:07:02] 10serviceops, 10decommission-hardware, 10ops-eqiad, 10Patch-For-Review: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Clement_Goubert) `wtp[1029-1033].eqiad.wmnet` didn't power off correctly. [10:39:11] 10serviceops, 10SRE-OnFire, 10Sustainability (Incident Followup): Page on etcdmirror critical status - https://phabricator.wikimedia.org/T317402 (10Clement_Goubert) [10:41:49] 10serviceops, 10SRE-OnFire, 10Sustainability (Incident Followup): Add etcdmirror status check to scap - https://phabricator.wikimedia.org/T317403 (10Clement_Goubert) [10:45:05] 10serviceops, 10SRE-OnFire, 10Sustainability (Incident Followup): Add failure rate triggered rollback to scap - https://phabricator.wikimedia.org/T317405 (10Clement_Goubert) [10:45:51] 10serviceops: Incident: 2022-09-08 codfw api-https api appserver appserver parsoid degradation - https://phabricator.wikimedia.org/T317340 (10Clement_Goubert) https://wikitech.wikimedia.org/wiki/Incidents/2022-09-08_codfw_api-https_api_appserver_appserver_parsoid_degradation [10:58:19] 10serviceops, 10Wikimedia-Incident: Incident: 2022-09-08 codfw api-https api appserver appserver parsoid degradation - https://phabricator.wikimedia.org/T317340 (10Aklapper) +#Wikimedia-Incident [12:27:13] 10serviceops, 10Data-Persistence-Backup, 10serviceops-collab, 10GitLab (Infrastructure), and 2 others: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) Backups on production `gitlab1004` fail with ` Errno::EACCES: Permission denied @ dir_s_mkdir - /srv/gitlab-backup/db ` since Sep... [12:30:49] 10serviceops, 10Data-Persistence-Backup, 10serviceops-collab, 10GitLab (Infrastructure), and 2 others: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10jbond) @jelto i think this is fall out from https://gerrit.wikimedia.org/r/c/operations/puppet/+/809095 which changed the permission of t... [12:38:38] 10serviceops, 10Data-Persistence-Backup, 10serviceops-collab, 10GitLab (Infrastructure), and 2 others: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) @jbond thanks for the context! I also found the following line in the puppet log: ` Sep 7 11:42:31 gitlab1004 puppet-agent[93484... [12:39:59] 10serviceops, 10Data-Persistence-Backup, 10serviceops-collab, 10GitLab (Infrastructure), and 2 others: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10jbond) > I guess this permission was automatically setup when installing/bootstrapping GitLab apt package for the first time. yes i imagi... [14:35:57] 10serviceops, 10Observability-Logging, 10Patch-For-Review: Increase of ~50 million access logs per day from mobileapps-production-tls-proxy - https://phabricator.wikimedia.org/T313099 (10akosiaris) Hi, I 've been meaning to ask regarding this, can we sample really heavily those logs? We want to use them ju... [14:45:13] 10serviceops, 10Wikimedia-Incident: Incident: 2022-09-08 codfw appservers degradation - https://phabricator.wikimedia.org/T317340 (10Clement_Goubert) [15:05:34] 10serviceops, 10Observability-Logging, 10Patch-For-Review: Increase of ~50 million access logs per day from mobileapps-production-tls-proxy - https://phabricator.wikimedia.org/T313099 (10colewhite) >>! In T313099#8224795, @akosiaris wrote: > I 've been meaning to ask regarding this, can we sample really heav... [15:09:36] 10serviceops, 10Observability-Logging, 10Patch-For-Review: Increase of ~50 million access logs per day from mobileapps-production-tls-proxy - https://phabricator.wikimedia.org/T313099 (10colewhite) [15:10:26] 10serviceops, 10Observability-Logging, 10Patch-For-Review: Increase of ~50 million access logs per day from mobileapps-production-tls-proxy - https://phabricator.wikimedia.org/T313099 (10colewhite) [15:17:42] 10serviceops, 10Observability-Logging, 10Patch-For-Review: Increase of ~50 million access logs per day from mobileapps-production-tls-proxy - https://phabricator.wikimedia.org/T313099 (10akosiaris) >>! In T313099#8224931, @colewhite wrote: >>>! In T313099#8224795, @akosiaris wrote: >> I 've been meaning to a... [15:44:27] 10serviceops, 10SRE-OnFire, 10Sustainability (Incident Followup): Add failure rate triggered rollback to scap - https://phabricator.wikimedia.org/T317405 (10Clement_Goubert) [16:11:54] 10serviceops, 10Observability-Logging, 10Patch-For-Review: Increase of ~50 million access logs per day from mobileapps-production-tls-proxy - https://phabricator.wikimedia.org/T313099 (10colewhite) >>! In T313099#8224967, @akosiaris wrote: > I don't think I have much to add, side from the fact, that I wouldn... [16:21:28] 10serviceops, 10Data-Persistence-Backup, 10serviceops-collab, 10GitLab (Infrastructure), and 2 others: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) Backups on production `gitlab1004` are fixed again with https://gerrit.wikimedia.org/r/831083. The puppet run reported ` Notice:... [16:41:04] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ssastry) https://grafana.wikimedia.org/d/000000048/parsoid-timing-wt2html?orgId=1&refresh=30s&from=now-90d&to=now&viewPanel=43 shows a dip in the time per out... [17:00:38] 10serviceops, 10Observability-Metrics, 10observability: Scrape more envoy metrics in ops prometheus - https://phabricator.wikimedia.org/T317430 (10JMeybohm) [18:40:19] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-publish: Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10Krinkle) [18:40:56] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-publish: Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10Krinkle) Saved for future reference: {F35515191 height=300} {F35515194 height=300} [18:47:15] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-publish: Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ssastry) [[ https://grafana.wikimedia.org/d/000000046/parsoid-timing-html2wt?orgId=1&refresh=30s&from=now-30d&to=now&viewPanel=4... [18:47:44] 10serviceops, 10API Platform, 10Growth-Structured-Tasks, 10Image-Suggestions, and 7 others: GrowthExperiments\NewcomerTasks\AddImage\ServiceImageRecommendationProvider::get Unable to decode JSON response for page {title} upstream connect error or disconnect/reset b... - https://phabricator.wikimedia.org/T313973 [19:10:49] 10serviceops, 10serviceops-collab: link static-bugzilla to dump files - https://phabricator.wikimedia.org/T317436 (10Dzahn) [19:10:59] 10serviceops, 10serviceops-collab: link static-bugzilla to dump files - https://phabricator.wikimedia.org/T317436 (10Dzahn) p:05Triage→03Low [19:11:41] 10serviceops, 10serviceops-collab: link static-bugzilla to dump files - https://phabricator.wikimedia.org/T317436 (10Dzahn)