Hello Forum,
currently I'm writing a rather simple program on the Pico W which connect to Wifi and then works with MQTT to do some publish/subscribe. Program generally works but after several hours of runtime the program halts. The program is using pico_cyw43_arch_lwip_threadsafe_background.
With SWD/openocd I was able to obtain the following stacktrace:
#0 0xfffffffe in ?? ()
#1 <signal handler called>
#2 0x1000d104 in tcp_output (pcb=pcb@entry=0x2000cbb8 <memp_memory_TCP_PCB_base+656>) at C:/Program Files/Raspberry Pi/Pico SDK v1.5.0/pico-sdk/lib/lwip/src/core/tcp_out.c:1393
#3 0x100152f6 in mqtt_output_send (rb=rb@entry=0x2000d674 <ram_heap+244>, tpcb=0x2000cbb8 <memp_memory_TCP_PCB_base+656>)
at C:/Program Files/Raspberry Pi/Pico SDK v1.5.0/pico-sdk/lib/lwip/src/apps/mqtt/mqtt.c:266
#4 0x10015a20 in mqtt_publish (client=0x2000d588 <ram_heap+8>, topic=topic@entry=0x1001a9f8 "homie/thermo_relay/$state", payload=payload@entry=0x1001ab44,
payload_length=<optimized out>, qos=qos@entry=1 '\001', retain=retain@entry=1 '\001', cb=cb@entry=0x1000042d <mqtt_request_cb>, arg=arg@entry=0x2000d01c <mqtt>)
at C:/Program Files/Raspberry Pi/Pico SDK v1.5.0/pico-sdk/lib/lwip/src/apps/mqtt/mqtt.c:1163
#5 0x10000592 in mqtt_publish2 (topic=topic@entry=0x1001a9f8 "homie/thermo_relay/$state", payload=0x1001ab44 "ready", payload@entry=0x1001ab50 "ing relay %s to %s.\n",
qos=qos@entry=1 '\001', retain=retain@entry=1 '\001') at C:/ck_projects/99_personal/thermo_relay/main.c:240
#6 0x100009c8 in main () at C:/ck_projects/99_personal/thermo_relay/main.c:472
And this particular crash is caused as useg is 0.
Out of this I suspect that there is a concurrency issue between my main function which in intervals publishes some MQTT data and the MQTT callbacks which react to received topic data. Also within those callbacks at times mqtt_publish is called as a reaction.
Suspicion is backed by statements in https://savannah.nongnu.org/bugs/?59831.
Based on reading:
https://www.nongnu.org/lwip/2_1_x/pitfalls.html
I started playing with cyw43_arch_lwip_begin/cyw43_arch_lwip_end but this lead to either wifi and or MQTT not connecting or PANICs (which I so far did not further debug into) [1].
Question is: How shall locking conceptually work here?
Let's say main thread is in the middle of sending some TCP data and then an interrupt will fire low_priority_irq_handler and then cyw43_do_poll and so on. This will then lead eventually also to the callback sending TCP data which for sure is not supported by LWIP and I assume can lead to the useg=0 as per above.
Now adding locking here, what shall happen? If main function has the LWIP lock then either the lock in callback would fail or deadlock? Or as long as main function has the lock then the interrupt would not enter any LWIP code and retry later?
I know as an alternative I could try to avoid sending MQTT data in the callback or I could switch to LWIP in polling mode or freeRTOS mode or even reimplement using MicroPython or Arduino SDK.
[1]
Side question: Why does panic_compact not output its message in debug mode? Also generally I rarely get useful PANIC output on the device, which makes debugging quite hard.
currently I'm writing a rather simple program on the Pico W which connect to Wifi and then works with MQTT to do some publish/subscribe. Program generally works but after several hours of runtime the program halts. The program is using pico_cyw43_arch_lwip_threadsafe_background.
With SWD/openocd I was able to obtain the following stacktrace:
#0 0xfffffffe in ?? ()
#1 <signal handler called>
#2 0x1000d104 in tcp_output (pcb=pcb@entry=0x2000cbb8 <memp_memory_TCP_PCB_base+656>) at C:/Program Files/Raspberry Pi/Pico SDK v1.5.0/pico-sdk/lib/lwip/src/core/tcp_out.c:1393
#3 0x100152f6 in mqtt_output_send (rb=rb@entry=0x2000d674 <ram_heap+244>, tpcb=0x2000cbb8 <memp_memory_TCP_PCB_base+656>)
at C:/Program Files/Raspberry Pi/Pico SDK v1.5.0/pico-sdk/lib/lwip/src/apps/mqtt/mqtt.c:266
#4 0x10015a20 in mqtt_publish (client=0x2000d588 <ram_heap+8>, topic=topic@entry=0x1001a9f8 "homie/thermo_relay/$state", payload=payload@entry=0x1001ab44,
payload_length=<optimized out>, qos=qos@entry=1 '\001', retain=retain@entry=1 '\001', cb=cb@entry=0x1000042d <mqtt_request_cb>, arg=arg@entry=0x2000d01c <mqtt>)
at C:/Program Files/Raspberry Pi/Pico SDK v1.5.0/pico-sdk/lib/lwip/src/apps/mqtt/mqtt.c:1163
#5 0x10000592 in mqtt_publish2 (topic=topic@entry=0x1001a9f8 "homie/thermo_relay/$state", payload=0x1001ab44 "ready", payload@entry=0x1001ab50 "ing relay %s to %s.\n",
qos=qos@entry=1 '\001', retain=retain@entry=1 '\001') at C:/ck_projects/99_personal/thermo_relay/main.c:240
#6 0x100009c8 in main () at C:/ck_projects/99_personal/thermo_relay/main.c:472
And this particular crash is caused as useg is 0.
Out of this I suspect that there is a concurrency issue between my main function which in intervals publishes some MQTT data and the MQTT callbacks which react to received topic data. Also within those callbacks at times mqtt_publish is called as a reaction.
Suspicion is backed by statements in https://savannah.nongnu.org/bugs/?59831.
Based on reading:
https://www.nongnu.org/lwip/2_1_x/pitfalls.html
I started playing with cyw43_arch_lwip_begin/cyw43_arch_lwip_end but this lead to either wifi and or MQTT not connecting or PANICs (which I so far did not further debug into) [1].
Question is: How shall locking conceptually work here?
Let's say main thread is in the middle of sending some TCP data and then an interrupt will fire low_priority_irq_handler and then cyw43_do_poll and so on. This will then lead eventually also to the callback sending TCP data which for sure is not supported by LWIP and I assume can lead to the useg=0 as per above.
Now adding locking here, what shall happen? If main function has the LWIP lock then either the lock in callback would fail or deadlock? Or as long as main function has the lock then the interrupt would not enter any LWIP code and retry later?
I know as an alternative I could try to avoid sending MQTT data in the callback or I could switch to LWIP in polling mode or freeRTOS mode or even reimplement using MicroPython or Arduino SDK.
[1]
Side question: Why does panic_compact not output its message in debug mode? Also generally I rarely get useful PANIC output on the device, which makes debugging quite hard.
Statistics: Posted by christiank2510 — Sun Jan 14, 2024 10:58 pm — Replies 0 — Views 1102