Red switches ignoring command to turn on

I’ve got over 20 Inovelli z-wave switches around the house that are primarily activated via motion sensor automations in Home Assistant. Very sporadically a couple of them will ignore the command to turn on.
Looking at the logs they get sent a TargetValue of 99 but then report a CurrentValue of 0 a few seconds later with no corresponding log entry telling them to actually turn off. The vast majority of the time they turn on and off as expected.

The only correlation I can think of is these two switches have some of the highest load in the house, 150W for the most problematic and 60W for the other. The rest of the fixtures are very low wattage LED lights. All the switches are running firmware 1.61.

I have checked and there are no protection settings or groupings that would be sending other commands to these devices. Is there any other reason a switch would ignore a turn-on command or be unable to turn on its load as needed?

My initial best guess explanation is that the switches aren’t receiving the commands, not that they’re receiving and ignoring them. The next step is to try to confirm/refute that, and figure out why.

Are your switches using S2 or unsecured? What version of home assistant/zwavejs are you using? And which “flavor” of zwavejs?

I’d start your troubleshooting by looking at the zwavejs logs. Set the log level to debug, wait for the issue to happen, then look through the logs around that time to see what messages were sent. There may be some hints there as to what messages are being sent, especially with respect to whether the supervision command class is being used, and if so, with what results.

It could be a z-wave mesh issue. Are the two switches where this happens farther away from your other mains-powered z-wave devices than usual, or perhaps they only have one path back to the hub?

Some background on z-wave that might be helpful:

How z-wave supervision works

Z-Wave is a mesh network. That means it works best when there are multiple, diverse routes to each node (a route is a series of hops from the source to the destination, which may pass through other mains-powered nodes in the network). This adds resilience in the face of interference, but also introduces the possibility of a message being received more than once (if it was transmitted along multiple routes), or being dropped. The supervision command class is designed to solve this. The way it works is something like this:

The controller sends a command to turn on, and wraps the command in a supervision command, which says “this is command number 12. Let me know when you receive it”. The number (12) is typically incremented for each command being sent, and then resets to zero periodically.

When the switch gets the supervision-encapsulated command, it sends back a response to the sender which says “I received command 12”. This helps the sender know the message was actually received. It then checks if it’s received command number 12 recently. If so, it assumes the new command is a duplicate and ignores it. This helps with the issue of commands being processed more than once.

The sender (in this case, the hub), waits a short while for that “I received command 12” response, and if it doesn’t get it, it knows that either the original command was not received, or the response was lost. I believe it’s supposed to retransmit the message, still calling it number 12. This gives the receiver another chance to process the message, (in case the problem was with the original message) while avoiding double-processing in case the problem was with the response (because it’s reusing number 12).

Supervision works pretty well, most of the time, as long as everyone sticks to the z-wave spec and doesn’t have any bugs. My guess is that something, somewhere in this process isn’t working right in your case, some of the time.

What hub or zwave stick are you using to control your switches? Also how many devices do you have? What kind of devices make up your network? Have you made any adjustments to the power reporting configs on the switch? I had improved reliability for my network by basically turning off all the unnecessary reporting. I wasn’t using it and it was adding traffic to the network. I’ve had my switches on hubitat, nortek 500 series usb, zooz 700 usb, and just upgraded to zooz 800 series usb and I don’t think I’ve ever had 100% reliability.

Switches are currently paired as unsecured. Home Assistant is 2022.12.8 and using ZwaveJS UI (formerly ZwaveJSmqtt) as an addon.

The most problematic switch is less than 5 feet away from controller and was the first device to be included on the network. I only have a small number of devices that require a hop to the controller, most are directly connected.

I’ve turned on debug logging to file but here is a snippet of the event log from the time it failed to turn on. The last command is setting the notification bar on the device from another automation.

2023-01-05T15:32:39.158Z - VALUE UPDATED 
Arg 0:
└─commandClassName: Multilevel Switch
└─commandClass: 38
└─property: currentValue
└─endpoint: 0
└─newValue: 99
└─prevValue: 0
└─propertyName: currentValue
2023-01-05T15:32:39.160Z - VALUE UPDATED 
Arg 0:
└─commandClassName: Multilevel Switch
└─commandClass: 38
└─endpoint: 0
└─property: targetValue
└─newValue: 99
└─prevValue: 0
└─propertyName: targetValue
2023-01-05T15:32:44.207Z - VALUE UPDATED 
Arg 0:
└─commandClassName: Multilevel Switch
└─commandClass: 38
└─property: currentValue
└─endpoint: 0
└─newValue: 0
└─prevValue: 99
└─propertyName: currentValue
2023-01-05T15:32:59.146Z - VALUE UPDATED 
Arg 0:
└─commandClassName: Configuration
└─commandClass: 112
└─endpoint: 0
└─property: 16
└─newValue: 83823233
└─prevValue: 65792
└─propertyName: param016

I’m using a Nortek HUSBZB-1, an older 500-series stick that has otherwise been rock solid for me. I haven’t changed any of the power reporting but that might be worthwhile for reducing the overall traffic on the network.

Device-wise I’ve got mostly Inovelli switches with a smattering of other brands for switches/motion sensors.

Ok I’m coming off a similar setup with a similar switch count. I did have some issues during my bedtime routine specifically, I would send a multicast request to update the LED bar (red chase for alarm armed) and in a separate automation that was fired at the same time I would turn off almost all lights via another multicast request. It seemed like these two requests would occasionally collide and clobber each other on random switches. Either a light would be left on or the led would be wrong. I eventually added a one second pause between them which thankfully seemed to work. Any chance you’re in a similar scenario?

In this case the one automation fired to turn on the lights and the other one fired 15 seconds later when the garage door was opened.

Hmm yea can definitely rule out that scenario. Is the switch that you saw this on direct connected to the stick or does it mesh through other devices?

Closest one to controller with no hops. My network looks like a giant pancake on the map. Maybe it’s relaying the most messages?

Yea that’s an idea, I love the feature set of inovelli devices but by default they are kind of “chatty” reporting power level and such. There are 3 parameters you can set to cut back, or at least these are the 3 I always set