Toward Sensor-to-Text Generation: Leveraging LLM-Based Video Annotations for Stroke Therapy Monitoring

Journal article

Mohammad Akidul Hoque, Shamim Ehsan, Anuradha Choudhury, Peter S. Lum, Monika Akbar, Shashwati Geed, M. S. Hossain
Bioengineering, 2025

Semantic Scholar DOI

Cite

APA Click to copy
Hoque, M. A., Ehsan, S., Choudhury, A., Lum, P. S., Akbar, M., Geed, S., & Hossain, M. S. (2025). Toward Sensor-to-Text Generation: Leveraging LLM-Based Video Annotations for Stroke Therapy Monitoring. Bioengineering.

Chicago/Turabian Click to copy
Hoque, Mohammad Akidul, Shamim Ehsan, Anuradha Choudhury, Peter S. Lum, Monika Akbar, Shashwati Geed, and M. S. Hossain. “Toward Sensor-to-Text Generation: Leveraging LLM-Based Video Annotations for Stroke Therapy Monitoring.” Bioengineering (2025).

MLA Click to copy
Hoque, Mohammad Akidul, et al. “Toward Sensor-to-Text Generation: Leveraging LLM-Based Video Annotations for Stroke Therapy Monitoring.” Bioengineering, 2025.

BibTeX Click to copy

@article{mohammad2025a,
  title = {Toward Sensor-to-Text Generation: Leveraging LLM-Based Video Annotations for Stroke Therapy Monitoring},
  year = {2025},
  journal = {Bioengineering},
  author = {Hoque, Mohammad Akidul and Ehsan, Shamim and Choudhury, Anuradha and Lum, Peter S. and Akbar, Monika and Geed, Shashwati and Hossain, M. S.}
}

Abstract

Stroke-related impairment remains a leading cause of long-term disability, limiting individuals’ ability to perform daily activities. While wearable sensors offer scalable monitoring solutions during rehabilitation, they struggle to distinguish functional from non-functional movements, and manual annotation of sensor data is labor-intensive and prone to inconsistency. In this paper, we propose a novel framework that uses large language models (LLMs) to generate activity descriptions from video frames of therapy sessions. These descriptions are aligned with concurrently recorded accelerometer signals to create labeled training data. Through exploratory analysis, we demonstrate that accelerometer signals exhibit distinct temporal and statistical patterns corresponding to specific activities, supporting the feasibility of generating natural language narratives directly from sensor data. Our findings lay the foundation for future development of sensor-to-text models that can enable automated, non-intrusive, and scalable stroke rehabilitation monitoring without the need for manual or video-based annotation.