Blue Series 2-1 Signal / Routing / Performance Issue Troubleshooting Thread

I had a chance to review this some more. The NWK_TABLE_FULL error is on the FAQ page for Z2M, Koenkk says that it happens when the wireless spectrum is congested. Not saying this is the case here, but worth evaluating if you haven’t yet.

2 Likes

FWIW you can see from my network map here that the blue series generally have good LQI to the coordinator and very strong routes to each other (Z2M, Sonoff-P latest firmware)

The bulbs however will not consistently route via the switches. All of those bulbs have direct line of sight to at least one switch but they will instead route directly to the coordinator with very low LQI. A few bulbs seem to have stuck with the switches and they have very high LQI as a result. Per my other thread these are all Sengled zigbee 3.0 bulbs.

One other oddity I noticed that isn’t shown on this network map – I have one location where a blue series refuses to even pair. It’s ~10ft from the coordinator, one corner wall in the way. Tried 2 different switches which worked in other locations, box is not metal, etc. Got it to pair once but the LQI was 1 and it quickly dropped. Very strange.

Got some plugs in yesterday to use as stand in routers. Every switch that was having trouble pairing connected immediately once the plugs were strategically placed and connected to the network.

I think this screenshot sums up the issue well. I put all Inovelli switches in the top row, plugs below, and the coordinator at the bottom.

As you can see, not a single Inovelli switch has a direct route to another Inovelli switch. While switches are at an inherent disadvantage from an RF perspective due to being enclosed in a box inside walls, this does not explain the routing behavior as at least some of these switches should be able to establish a minimal connection with at least one other as several of these are 10-15 feet away from one another with direct LOS. The plugs on the other hand have an almost perfect star mesh, with only one plug being far enough away from another to make any sort of connection.

All that said, the issue seems to be a difference between how the switch handles inbound and outbound connection requests. The switches seem perfectly fine when sending outbound requests, but inbound requests fall on deaf ears.

Given the hardware seems to be functioning properly in all other respects my assumption is this is on the software side which could be a number of things - the switch is failing to parse inbound requests, the LQI threshold required to establish a connection is set absurdly high, etc. etc. I’ll have to dig through my logs and the source code of zigpy to see if there is a common thread (matter puns? no? ok…). Given the issue also persists with Zigbee2MQTT which doesn’t seem to rely on zigpy per se it may be more of an overall protocol interpretation issue, but could also be an artifact of efficient development processes..

Anywho hope this is helpful and I’ll keep digging.

3 Likes

Everyone having routing issues: can you provide the first 12 characters of your device IEEE addresses?

Something I am observing is that there are several IEEE prefixes among all switches and issues seem to mostly affect this prefix: 94:34:69:ff:fe:08

There is another that starts with 38: that I don’t think I have enough information on yet.

1 Like

Appreciate you looking into it further. So I’ve tried a bunch of removals and re-interviews. I don’t think I’m seeing the NWK_TABLE_FULL error anymore. It’s much more simply just saying “pairing failed” - with no further red box pop-ups like the first time a couple days ago.

Here are the Z2M logs of a re-interview attempt:

Info 2022-10-30 09:58:18Starting interview of '0x943469fffe089d93'
Info 2022-10-30 09:58:18MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"0x943469fffe089d93","ieee_address":"0x943469fffe089d93","status":"started"},"type":"device_interview"}'
Info 2022-10-30 09:58:18MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"interview_started","meta":{"friendly_name":"0x943469fffe089d93"},"type":"pairing"}'
Error 2022-10-30 09:58:38Failed to interview '0x943469fffe089d93', device has not successfully been paired
Info 2022-10-30 09:58:38MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"0x943469fffe089d93","ieee_address":"0x943469fffe089d93","status":"failed"},"type":"device_interview"}'
Info 2022-10-30 09:58:38MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"interview_failed","meta":{"friendly_name":"0x943469fffe089d93"},"type":"pairing"}'

Would wireless congestion potentially explain a failed interview/pairing, but the item still showing up as online (at least initially, for a while)?

Edit: @dmulcahey interesting theory, but my two Blues are as follows:

  • 0x943469fffe089a61 (Interview/pairing passed first try, and has been rock solid ‘online’ ever since)
  • 0x943469fffe089d93 (I can get it to connect/show online, but it says pairing/interview failed, and eventually goes offline hours later)
2 Likes

If the network is dense enough they will work fine… the signal quality is just very low


3 feet apart the one in the 30’s is prefixed in the range I mentioned the other isn’t. Everything works as I have 166 devices on the network and these (with 9x prefix) are the only devices with LQI like this for neighbors. The density of my network makes the lower quality signal non terminal but for folks just starting it will be problematic… especially trying to pair across multiple rooms or with messy RF environments.

1 Like

There isn’t a lot of information to go off in those logs. Can you follow the debug process here: https://www.zigbee2mqtt.io/guide/usage/debug.html#enabling-logging

That will give more details about what’s going on.

I tried pairing two switches from my 10-pk yesterday with my freshly setup Home Assistant - ZHA instance. No dice. Closest switch is about 15 feet to the coordinator. Plastic switch housings in the walls. Thinking it was an issue with ZHA/ConBee II, I ordered a Hubitat delivered overnight, and today wasted a lot of time trying to pair the switches to the Hubitat. (after disconnecting my Home Assistant instance from power) Same issue, the switches will not pair to either coordinator I have.

EDIT: Submitted the form from OP twice to account for each of the two coordinators I tried using.

We have a working theory that we’re testing behind the scenes that I hope we can get resolution to tonight or tomorrow.

I also have some further probing questions based on this new theory that I’ll be posting tonight. We’d be super grateful if the same ppl with issues could fill it out.

We appreciate everyone’s patience and again, really sorry for the frustration. I truly am.

16 Likes

Sorry to be a pain but any update on this? I stopped installation of the switches until further notice because it’s unclear if I’m going to need to send them back, of the 3/20 I’ve installed I’m having the same issue as everyone else, only one would pair, the lqi is low, and it doesn’t seem to be extending my mesh strength

2 Likes

Yes, the engineers are looking into our theory. I should have more info tonight when they are back in the office (they are in China).

If possible, can you provide us the IEEE addresses of the devices in question?

2 Likes

My only installed switch has address 0x385b44fffee8e9d5. Hoping the fix pans out! If so, I’ll build a little harness so I can power up my remaining 19 switches right next to the coordinator for better odds of OTA success.

Edit: I realize you never actually said the problems would be addressed with a firmware fix, but it seems to me this is the most likely scenario

1 Like

Awesome thanks - I’ll elaborate our working theory in 30 min or so. I’m hoping it’s a firmware fix (seems like some weren’t tuned).

Dropping my girls off now and will be in the office shortly!

6 Likes

That’s great to hear. I’m hoping this is all a quick firmware fix. Just in case this help, I’ve see all theses devices go offline and stop responding entirely. I’d get them to initially pair but then at some point they just stopped communicating. It’s really odd because I have around 20 mains powered Zigbee 3.0 devices acting as routers.

0x385b44fffeee1770
0x943469fffe05d42f
0x943469fffe05c98a
0x943469fffe05cea5
0x943469fffe05d2f8
0x943469fffe089db5
0x943469fffe088f05
0x943469fffe05ce8c
0x943469fffe05cea8

1 Like

These are the three that are acting up for me:

04:0d:84:ff:fe:02:b5:1f
04:0d:84:ff:fe:02:b4:97
04:0d:84:ff:fe:05:f6:fb

Out of curiosity, I wonder how a firmware fix could be applied for those of us who cannot get the switches paired to our coordinator/network.

Well this is awkward, I thought that message was a PM from @epow as we’ve been chatting offline. I wasn’t ready to let everyone know yet as it’s still a theory, but I guess there’s no harm in letting everyone know what we’re testing! If anything, there may be some other extremely intelligent people out there with Zigbee background that could help.

So, the working theory is that some of the modules were not tuned or were improperly calibrated. This is something similar to what appears to have happened to ITead in this thread: [BUG] Missed to tune and set HFXO Capacitor Bank calibration value (CTune) in firmware fo ITead Zigbee 3.0 USB Dongle Model 9888010100045 with hardware version 1.3 · Issue #4 · xsp1989/zigbeeFirmware · GitHub

This would explain why all the beta units are exhibiting zero issues and why we’re all so perplexed as to why this is happening.

We’re still gathering data as to how many batches were sourced to make this production run but so far we are seeing a similarity of IEEE addresses in that 38:5b:44:ff:fe:ee and 94:34:69:ff:fe:08 are all experiencing really low LQI numbers.

What we’re currently working on is adjusting the ctune value to see if that helps at all.

Really hoping this is the issue and we should know more tonight or tomorrow as the engineers are 12hr ahead. We do have some local volunteers working as well (very thankful for!).

6 Likes

Great question – I don’t have the answer to this right now, but off the top of my head, we could either flash it ourselves (we have a rig that can be put within inches of the coordinator), we could ask the manufacturer for new switches, or for people who have a bench testing station they could move the switch within inches themselves.

Either way, this is an issue with the manufacturer, so if we can’t solve it, they will have to replace the defective units. There are units about to head to production for completion in November (delivery in December) that would likely be used as replacements.

Here are the IEEE for my four switches:

94:34:69:ff:fe:05:d2:58
38:5b:44:ff:fe:e8:ee:23
38:5b:44:ff:fe:ee:1b:d7
38:5b:44:ff:fe:ee:1b:f9

Of the four, the three leading with 38 have had the most issues but they are also furthest away.

1 Like

Can you post LQI data and neighbor tables please for all 3