[00:02:47] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.192.48.121:9042 on restbase2017 is CRITICAL: connect to address 10.192.48.121 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[00:02:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T355609)', diff saved to https://phabricator.wikimedia.org/P56163 and previous config saved to /var/cache/conftool/dbconfig/20240203-000252-marostegui.json
[00:02:54] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
[00:03:01] <icinga-wm>	 PROBLEM - cassandra-a SSL 10.192.48.121:7000 on restbase2017 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[00:03:01] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[00:03:08] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
[00:03:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56164 and previous config saved to /var/cache/conftool/dbconfig/20240203-000314-marostegui.json
[00:28:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56165 and previous config saved to /var/cache/conftool/dbconfig/20240203-002817-marostegui.json
[00:28:34] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[00:35:38] <wikibugs>	 (03PS4) 10Zabe: foreachwikiindblist: Return early when no arg is passed [puppet] - 10https://gerrit.wikimedia.org/r/992263
[00:35:45] <wikibugs>	 (03CR) 10Zabe: foreachwikiindblist: Return early when no arg is passed (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/992263 (owner: 10Zabe)
[00:39:04] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/995347
[00:39:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/995347 (owner: 10TrainBranchBot)
[00:40:20] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] "This change looks reasonable/useful to me." [puppet] - 10https://gerrit.wikimedia.org/r/992263 (owner: 10Zabe)
[00:43:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56166 and previous config saved to /var/cache/conftool/dbconfig/20240203-004324-marostegui.json
[00:58:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56167 and previous config saved to /var/cache/conftool/dbconfig/20240203-005830-marostegui.json
[01:01:41] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/995347 (owner: 10TrainBranchBot)
[01:13:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56168 and previous config saved to /var/cache/conftool/dbconfig/20240203-011337-marostegui.json
[01:13:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[01:13:51] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[01:13:52] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[01:34:03] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:49:29] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[01:55:56] <wikibugs>	 (03PS4) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[01:58:35] <wikibugs>	 (03PS5) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[01:58:55] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[02:10:54] <wikibugs>	 (03PS6) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[02:11:09] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[02:34:16] <wikibugs>	 (03PS7) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[02:35:01] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[02:39:00] <wikibugs>	 (03PS8) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[02:39:31] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:40:41] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[02:41:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[02:44:30] <wikibugs>	 (03PS9) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[02:44:41] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[02:46:35] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:47:37] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:51:22] <wikibugs>	 (03PS10) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[02:51:28] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[02:51:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[03:07:33] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:08:03] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:09:31] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:21:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[03:31:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[04:11:57] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1010 is CRITICAL: CRITICAL - dgawiki_content_first[0](2024-01-30T20:52:14.401Z), bmwikiquote_general_1692325479[0](2024-01-30T20:52:14.402Z), ilowiki_content_1682003872[0](2024-01-30T20:52:14.403Z), nlwikinews_content_1682132908[0](2024-01-30T20:52:14.398Z), bjnwikiquote_content_first[0](2024-01-30T20:52:14.400Z), napwikisource_content_1682124288[0](2024-01-30T20:52:14.4
[04:11:57] <icinga-wm>	 wiki_content_1682324242[0](2024-01-30T20:52:14.402Z), bpywiki_general_1692375521[0](2024-01-30T20:52:14.399Z), angwikiquote_content_1692144583[0](2024-01-30T20:52:14.402Z), fiwikivoyage_general_1693144275[0](2024-01-30T20:52:14.399Z), chowiki_content_1692531553[0](2024-01-30T20:52:14.401Z), btmwiktionary_general_1692403885[0](2024-01-30T20:52:14.400Z), hewikinews_general_1681947456[0](2024-01-30T20:52:14.403Z), kowikiquote_general_1682063
[04:11:57] <icinga-wm>	 024-01-30T20:52:14.401Z), ltwikiquote_general_1682083320[0](2024-01-30T20:52:14.400Z), cywikiquote_general_1692581318[0](2024-01-30T20:52:14.402Z), hiwikimedia_content_1681961237[0](2024-01-30T20:52:14.402Z), e https://wikitech.wikimedia.org/wiki/Search%23Administration
[04:19:19] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1009 is CRITICAL: CRITICAL - quwikibooks_content_1682184136[0](2024-01-30T20:52:14.403Z), napwikisource_content_1682124288[0](2024-01-30T20:52:14.401Z), newiktionary_content_1682126793[0](2024-01-30T20:52:14.403Z), knwikiquote_general_1682050558[0](2024-01-30T20:52:14.401Z), angwikiquote_content_1692144583[0](2024-01-30T20:52:14.402Z), bewikisource_general_1692300925[0](
[04:19:19] <icinga-wm>	 30T20:52:14.401Z), dgawiki_general_first[0](2024-01-30T20:52:14.400Z), bewikibooks_general_1692297309[0](2024-01-30T20:52:14.401Z), ttwiktionary_content_1682372674[0](2024-01-30T20:52:14.401Z), kgwiki_general_1682046219[0](2024-01-30T20:52:14.402Z), fywiktionary_content_1693308635[0](2024-01-30T20:52:14.400Z), labtestwiki_content_1682073064[0](2024-01-30T20:52:14.401Z), cswikiversity_content_1692566020[0](2024-01-30T20:52:14.399Z), angwik
[04:19:19] <icinga-wm>	 general_1692144613[0](2024-01-30T20:52:14.399Z), arwikinews_general_1692183507[0](2024-01-30T20:52:14.402Z), zuwiktionary_content_1682468427[0](2024-01-30T20:52:14.402Z), hiwikimedia_general_1681961248[0](2024- https://wikitech.wikimedia.org/wiki/Search%23Administration
[04:32:11] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1007 is CRITICAL: CRITICAL - blkwiki_general_1692325160[0](2024-01-30T20:52:14.400Z), swwiktionary_content_1682336016[0](2024-01-30T20:52:14.399Z), fiwikiquote_general_1693141971[0](2024-01-30T20:52:14.403Z), tswiki_general_1682369644[0](2024-01-30T20:52:14.398Z), bxrwiki_content_1692405982[0](2024-01-30T20:52:14.403Z), sahwikisource_content_1682223239[0](2024-01-30T20:5
[04:32:11] <icinga-wm>	 Z), zh_min_nanwikiquote_content_1682432826[0](2024-01-30T20:52:14.402Z), gawikibooks_content_1693313037[0](2024-01-30T20:52:14.399Z), map_bmswiki_content_1682087026[0](2024-01-30T20:52:14.399Z), ttwikibooks_content_1682372612[0](2024-01-30T20:52:14.399Z), fiwikibooks_general_1693141213[0](2024-01-30T20:52:14.399Z), ruewiki_content_1682184981[0](2024-01-30T20:52:14.403Z), rmwiki_content_1682184205[0](2024-01-30T20:52:14.402Z), arwikibooks_
[04:32:11] <icinga-wm>	 1692181666[0](2024-01-30T20:52:14.402Z), tawikibooks_content_1682339007[0](2024-01-30T20:52:14.403Z), lijwikisource_general_1682077655[0](2024-01-30T20:52:14.403Z), bawikibooks_general_1692285898[0](2024-01-30T https://wikitech.wikimedia.org/wiki/Search%23Administration
[04:46:31] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1008 is CRITICAL: CRITICAL - fywiktionary_content_1693308635[0](2024-01-30T20:52:14.400Z), sswiki_content_1682324242[0](2024-01-30T20:52:14.402Z), kawiktionary_general_1682045987[0](2024-01-30T20:52:14.402Z), tawikinews_content_1682339054[0](2024-01-30T20:52:14.403Z), hakwiki_content_1681938141[0](2024-01-30T20:52:14.402Z), ilowiki_content_1682003872[0](2024-01-30T20:52:
[04:46:31] <icinga-wm>	 , abwiktionary_content_1692132157[0](2024-01-30T20:52:14.401Z), cywikiquote_content_1692581299[0](2024-01-30T20:52:14.401Z), fiwikivoyage_general_1693144275[0](2024-01-30T20:52:14.399Z), angwikisource_general_1692144613[0](2024-01-30T20:52:14.399Z), iawikibooks_general_1681994016[0](2024-01-30T20:52:14.403Z), mniwiki_general_1682113849[0](2024-01-30T20:52:14.399Z), kowikiquote_general_1682063572[0](2024-01-30T20:52:14.401Z), bxrwiki_conte
[04:46:31] <icinga-wm>	 05982[0](2024-01-30T20:52:14.403Z), btmwiktionary_general_1692403885[0](2024-01-30T20:52:14.400Z), pswiktionary_content_1682172419[0](2024-01-30T20:52:14.402Z), hiwikimedia_general_1681961248[0](2024-01-30T20:5 https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:12:21] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1002 is CRITICAL: CRITICAL - mhrwiki_general_1682106101[0](2024-01-30T20:52:14.403Z), zhwikiversity_content_1682458435[0](2024-01-30T20:52:14.400Z), ttwiktionary_content_1682372674[0](2024-01-30T20:52:14.401Z), ruewiki_content_1682184981[0](2024-01-30T20:52:14.403Z), map_bmswiki_content_1682087026[0](2024-01-30T20:52:14.399Z), ltwikiquote_content_1682083282[0](2024-01-30
[07:12:21] <icinga-wm>	 4.403Z), kaawiki_content_1682043722[0](2024-01-30T20:52:14.403Z), cywikiquote_general_1692581318[0](2024-01-30T20:52:14.402Z), kuwiki_general_1682069766[0](2024-01-30T20:52:14.400Z), hiwikimedia_general_1681961248[0](2024-01-30T20:52:14.399Z), hiwikimedia_content_1681961237[0](2024-01-30T20:52:14.402Z), sswiki_content_1682324242[0](2024-01-30T20:52:14.402Z), tawikinews_content_1682339054[0](2024-01-30T20:52:14.403Z), mywikibooks_general_1
[07:12:21] <icinga-wm>	 9[0](2024-01-30T20:52:14.402Z), olowiki_general_1682147477[0](2024-01-30T20:52:14.399Z), kswikiquote_content_1682069213[0](2024-01-30T20:52:14.402Z), nnwiktionary_content_1682139542[0](2024-01-30T20:52:14.399Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:18:39] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1001 is CRITICAL: CRITICAL - aswikisource_content_1692252094[0](2024-01-30T20:52:14.400Z), bpywiki_general_1692375521[0](2024-01-30T20:52:14.399Z), kabwiki_content_1682043807[0](2024-01-30T20:52:14.400Z), kowikiquote_general_1682063572[0](2024-01-30T20:52:14.401Z), iewikibooks_content_1682003352[0](2024-01-30T20:52:14.401Z), pawikibooks_general_1682150992[0](2024-01-30T2
[07:18:39] <icinga-wm>	 400Z), guwwikinews_content_first[0](2024-01-30T20:52:14.400Z), tlywiki_general_first[0](2024-01-30T20:52:14.402Z), swwiktionary_content_1682336016[0](2024-01-30T20:52:14.399Z), iawikibooks_general_1681994016[0](2024-01-30T20:52:14.403Z), angwikiquote_content_1692144583[0](2024-01-30T20:52:14.402Z), mhwiki_general_1682106175[0](2024-01-30T20:52:14.403Z), fiwikivoyage_general_1693144275[0](2024-01-30T20:52:14.399Z), rmwiktionary_content_168
[07:18:39] <icinga-wm>	 0](2024-01-30T20:52:14.402Z), bawikibooks_general_1692285898[0](2024-01-30T20:52:14.403Z), fawiktionary_content_1693123829[0](2024-01-30T20:52:14.402Z), madwiki_content_1682086579[0](2024-01-30T20:52:14.400Z), https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:21:55] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1003 is CRITICAL: CRITICAL - extwiki_content_1693076991[0](2024-01-30T20:52:14.400Z), iawikibooks_general_1681994016[0](2024-01-30T20:52:14.403Z), eewiki_content_1692733368[0](2024-01-30T20:52:14.403Z), fiwikibooks_general_1693141213[0](2024-01-30T20:52:14.399Z), dkwikimedia_general_1692731509[0](2024-01-30T20:52:14.398Z), chowiki_content_1692531553[0](2024-01-30T20:52:1
[07:21:55] <icinga-wm>	  bpywiki_general_1692375521[0](2024-01-30T20:52:14.399Z), cowiktionary_general_1692536869[0](2024-01-30T20:52:14.400Z), arwikibooks_general_1692181666[0](2024-01-30T20:52:14.402Z), kswikiquote_content_1682069213[0](2024-01-30T20:52:14.402Z), aswiki_general_1692250469[0](2024-01-30T20:52:14.403Z), kgwiki_general_1682046219[0](2024-01-30T20:52:14.402Z), tewikiquote_content_1682345381[0](2024-01-30T20:52:14.400Z), hewikinews_general_16819474
[07:21:55] <icinga-wm>	 24-01-30T20:52:14.403Z), pswiktionary_content_1682172419[0](2024-01-30T20:52:14.402Z), extwiki_general_1693077226[0](2024-01-30T20:52:14.403Z), krwikiquote_general_1682068959[0](2024-01-30T20:52:14.399Z), zuwik https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:25:35] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1006 is CRITICAL: CRITICAL - tswiki_general_1682369644[0](2024-01-30T20:52:14.398Z), bawikibooks_general_1692285898[0](2024-01-30T20:52:14.403Z), mhrwiki_general_1682106101[0](2024-01-30T20:52:14.403Z), hewikinews_general_1681947456[0](2024-01-30T20:52:14.403Z), arwikiquote_content_1692184167[0](2024-01-30T20:52:14.399Z), fowiktionary_content_1693152223[0](2024-01-30T20:
[07:25:35] <icinga-wm>	 0Z), chowiki_content_1692531553[0](2024-01-30T20:52:14.401Z), angwikisource_general_1692144613[0](2024-01-30T20:52:14.399Z), nowiktionary_general_1682145425[0](2024-01-30T20:52:14.402Z), cywiktionary_content_1692583389[0](2024-01-30T20:52:14.401Z), afwiktionary_general_1692140122[0](2024-01-30T20:52:14.400Z), hiwikimedia_content_1681961237[0](2024-01-30T20:52:14.402Z), azwikibooks_general_1692273436[0](2024-01-30T20:52:14.403Z), quwiktion
[07:25:35] <icinga-wm>	 ent_1682184179[0](2024-01-30T20:52:14.401Z), dgawiki_general_first[0](2024-01-30T20:52:14.400Z), gnwiki_general_1681935910[0](2024-01-30T20:52:14.402Z), thwikiquote_content_1682353715[0](2024-01-30T20:52:14.401 https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:48:31] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1004 is CRITICAL: CRITICAL - thwikiquote_content_1682353715[0](2024-01-30T20:52:14.401Z), mniwiki_content_1682113784[0](2024-01-30T20:52:14.399Z), sahwikisource_content_1682223239[0](2024-01-30T20:52:14.402Z), elwikiversity_content_1692753938[0](2024-01-30T20:52:14.399Z), trwikivoyage_content_1684520752[0](2024-01-30T20:52:14.401Z), ocwiktionary_content_1682147094[0](202
[07:48:31] <icinga-wm>	 20:52:14.401Z), mniwiki_general_1682113849[0](2024-01-30T20:52:14.399Z), iowiki_content_1682009405[0](2024-01-30T20:52:14.401Z), extwiki_content_1693076991[0](2024-01-30T20:52:14.400Z), kbdwiki_general_1682046061[0](2024-01-30T20:52:14.403Z), napwikisource_content_1682124288[0](2024-01-30T20:52:14.401Z), ttwikiquote_content_1682372652[0](2024-01-30T20:52:14.402Z), tawikinews_content_1682339054[0](2024-01-30T20:52:14.403Z), rmwiktionary_co
[07:48:31] <icinga-wm>	 82184443[0](2024-01-30T20:52:14.402Z), suwiki_general_1682325319[0](2024-01-30T20:52:14.399Z), testcommonswiki_general_1686951842[0](2024-01-30T20:52:14.400Z), lijwikisource_general_1682077655[0](2024-01-30T20: https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:52:53] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9400 on cloudelastic1005 is CRITICAL: CRITICAL - niawiki_general_1682127438[0](2024-01-30T20:52:14.403Z), amwiktionary_content_1692144240[0](2024-01-30T20:52:14.399Z), azwikibooks_general_1692273436[0](2024-01-30T20:52:14.403Z), fawiktionary_content_1693123829[0](2024-01-30T20:52:14.402Z), bawikibooks_general_1692285898[0](2024-01-30T20:52:14.403Z), kowikiquote_general_1682063572[0](2024-01
[07:52:53] <icinga-wm>	 2:14.401Z), cywikiquote_general_1692581318[0](2024-01-30T20:52:14.402Z), rmwiki_content_1682184205[0](2024-01-30T20:52:14.402Z), fiwikivoyage_general_1693144275[0](2024-01-30T20:52:14.399Z), angwikibooks_content_1692144545[0](2024-01-30T20:52:14.400Z), dkwikimedia_general_1692731509[0](2024-01-30T20:52:14.398Z), mnwiktionary_content_1682114598[0](2024-01-30T20:52:14.403Z), fywiktionary_content_1693308635[0](2024-01-30T20:52:14.400Z), guwi
[07:52:53] <icinga-wm>	 _content_1681937816[0](2024-01-30T20:52:14.402Z), eewiki_content_1692733368[0](2024-01-30T20:52:14.403Z), shnwikivoyage_general_1682229250[0](2024-01-30T20:52:14.402Z), avwiki_general_1692258005[0](2024-01-30T2 https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:00:39] <ryankemper>	 ^ taking a look
[08:09:24] <ryankemper>	 !log [cloudelastic] current state: `{"cluster_name":"cloudelastic-omega-eqiad","status":"yellow","number_of_nodes":10,"number_of_data_nodes":10,"active_primary_shards":798,"active_shards":1438,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":160,"delayed_unassigned_shards":0,"active_shards_percent_as_number":89.98748435544431}`
[08:09:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:10:03] <ryankemper>	 !log [cloudelastic] Seeing `replica allocations are forbidden due to cluster setting [cluster.routing.allocation.enable=primaries`; that likely explains the many unassigned shards of cloudelastic.wikimedia.org:9400 ... feels like a previous cookbook run didn't back out successfully leaving replica allocation disabled
[08:10:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:15:58] <ryankemper>	 !log [cloduelastic] Re-enabled replica allocation on `cloudelastic-omega-eqiad` => `curl -H 'Content-Type: application/json' -XPUT https://cloudelastic.wikimedia.org:9443/_cluster/settings -d '{"transient":{"cluster.routing.allocation":{"enable": "all"}}}'`
[08:16:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:16:13] <ryankemper>	 meh, wrote `cloduelastic` instead :P
[08:19:22] <ryankemper>	 !log [cloudelastic] Replica shards have re-initialized; cluster is back to green. Will probably see a wall of `ElasticSearch unassigned shard check - 9400` resolve messages soon, fingers crossed
[08:19:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:20:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: Too many codfw mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[08:23:19] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1001 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:19] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1005 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:19] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1004 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:19] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1002 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:19] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1003 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:19] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1007 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:20] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1009 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:20] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1006 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:21] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1008 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:23:21] <icinga-wm>	 RECOVERY - ElasticSearch unassigned shard check - 9400 on cloudelastic1010 is OK: OK - All good https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:25:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[08:50:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[09:05:57] <icinga-wm>	 PROBLEM - Docker registry HTTPS interface on registry1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Docker
[09:07:21] <icinga-wm>	 RECOVERY - Docker registry HTTPS interface on registry1003 is OK: HTTP OK: HTTP/1.1 200 OK - 3746 bytes in 1.582 second response time https://wikitech.wikimedia.org/wiki/Docker
[10:21:07] <jinxer-wm>	 (MediaWikiEditFailures) firing: (2) Elevated MediaWiki edit failures (session_loss) for cluster appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[10:26:06] <jinxer-wm>	 (MediaWikiEditFailures) resolved: (2) Elevated MediaWiki edit failures (session_loss) for cluster appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[10:44:53] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence, 10Thumbor, and 2 others: Changing default image thumbnail size on English Wikipedia - https://phabricator.wikimedia.org/T355914 (10TheDJ) >>! In T355914#9510509, @Redrose64 wrote: >>>! In T355914#9501705, @Joe wrote: >> Given the chosen size is both non-stand...
[10:55:57] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.48.122:9042 on restbase2017 is CRITICAL: connect to address 10.192.48.122 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[10:56:41] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.48.122:7000 on restbase2017 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[12:16:15] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:16:19] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:31:31] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:31:37] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:30:33] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2017.codfw.wmnet with reason: Decommissioning — T352469
[13:30:47] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2017.codfw.wmnet with reason: Decommissioning — T352469
[13:30:48] <stashbot>	 T352469: Decommission restbase20[13-20]) - https://phabricator.wikimedia.org/T352469
[13:40:55] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[13:42:17] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1013 is OK: HTTP OK: HTTP/1.1 200 OK - 501 bytes in 0.059 second response time https://wikitech.wikimedia.org/wiki/Swift
[14:39:31] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:59:31] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:05:52] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q3:rack/setup/install cloudcephosd10(3[5-9]|40) - https://phabricator.wikimedia.org/T324998 (10Volans) There are pending DNS changes in Netbox not committed to the auto-generated DNS repository related to those hosts since yesterday:  ` Fri 22...
[16:55:23] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[16:56:47] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe1009 is OK: HTTP OK: HTTP/1.1 200 OK - 501 bytes in 0.138 second response time https://wikitech.wikimedia.org/wiki/Swift
[17:03:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:18:17] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:28:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:33:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:01:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:06:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:31:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 3.4321265794755864s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:40:07] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - No response from remote host 208.80.153.193 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[21:42:55] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[21:44:17] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.253 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[22:06:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 2.4851127085001004s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:22:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 4.503494906478542s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:27:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.053708267554382s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:27:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 3.485202046976389s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:32:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.053708267554382s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:33:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 3.7558366376170373s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:53:45] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.160105819276109s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:54:00] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 3.51669949281251s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded