publications | Shakhrul Iman Siam

2025

NeurIPS 2025
Reading Recognition in the Wild

Charig Yang, Samiul Alam, Shakhrul Iman Siam, Michael J Proulx, Lambert Mathias, Kiran Somasundaram, Luis Pesqueira, James Fort, Sheroze Sheriffdeen, Omkar Parkhi, and others

arXiv preprint arXiv:2505.24848, 2025

Abs Bib PDF

To enable egocentric contextual AI in always-on smart glasses, it is crucial to be able to keep a record of the user’s interactions with the world, including during reading. In this paper, we introduce a new task of reading recognition to determine when the user is reading. We first introduce the first-of-its-kind large-scale multimodal Reading in the Wild dataset, containing 100 hours of reading and non-reading videos in diverse and realistic scenarios. We then identify three modalities (egocentric RGB, eye gaze, head pose) that can be used to solve the task, and present a flexible transformer model that performs the task using these modalities, either individually or combined. We show that these modalities are relevant and complementary to the task, and investigate how to efficiently and effectively encode each modality. Additionally, we show the usefulness of this dataset towards classifying types of reading, extending current reading understanding studies conducted in constrained settings to larger scale, diversity and realism. Code, model, and data will be public.
@article{yang2025reading, title = {Reading Recognition in the Wild}, author = {Yang, Charig and Alam, Samiul and Siam, Shakhrul Iman and Proulx, Michael J and Mathias, Lambert and Somasundaram, Kiran and Pesqueira, Luis and Fort, James and Sheriffdeen, Sheroze and Parkhi, Omkar and others}, journal = {arXiv preprint arXiv:2505.24848}, year = {2025}, }
Artificial intelligence of things: A survey

Shakhrul Iman Siam, Hyunho Ahn, Li Liu, Samiul Alam, Hui Shen, Zhichao Cao, Ness Shroff, Bhaskar Krishnamachari, Mani Srivastava, and Mi Zhang

ACM Transactions on Sensor Networks, 2025

Abs DOI Bib PDF

The integration of the Internet of Things (IoT) and modern Artificial Intelligence (AI) has given rise to a new paradigm known as the Artificial Intelligence of Things (AIoT). In this survey, we provide a systematic and comprehensive review of AIoT research. We examine AIoT literature related to sensing, computing, and networking & communication, which form the three key components of AIoT. In addition to advancements in these areas, we review domain-specific AIoT systems that are designed for various important application domains. We have also created an accompanying GitHub repository, where we compile the papers included in this survey: https://github.com/AIoT-MLSys-Lab/AIoT-Survey. This repository will be actively maintained and updated with new research as it becomes available. As both IoT and AI become increasingly critical to our society, we believe AIoT is emerging as an essential research field at the intersection of IoT and modern AI. We hope this survey will serve as a valuable resource for those engaged in AIoT research and act as a catalyst for future explorations to bridge gaps and drive advancements in this exciting field.
@article{siam2025artificial, title = {Artificial intelligence of things: A survey}, author = {Siam, Shakhrul Iman and Ahn, Hyunho and Liu, Li and Alam, Samiul and Shen, Hui and Cao, Zhichao and Shroff, Ness and Krishnamachari, Bhaskar and Srivastava, Mani and Zhang, Mi}, journal = {ACM Transactions on Sensor Networks}, volume = {21}, doi = {10.1145/3690639}, number = {1}, pages = {1--75}, year = {2025}, publisher = {ACM New York, NY}, }

2024

MobiSys 2024
ChirpTransformer: Versatile LoRa encoding for low-power wide-area IoT

Chenning Li, Yidong Ren, Shuai Tong, Shakhrul Iman Siam, Mi Zhang, Jiliang Wang, Yunhao Liu, and Zhichao Cao

In Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services, 2024

Abs DOI Bib PDF

This paper introduces ChirpTransformer, a versatile LoRa encoding framework that harnesses broad chirp features to dynamically modulate data, enhancing network coverage, throughput, and energy efficiency. Unlike the standard LoRa encoder that offers only single configurable chirp feature, our framework introduces four distinct chirp features, expanding the spectrum of methods available for data modulation. To implement these features on commercial off-the-shelf (COTS) LoRa nodes, we utilize a combination of a software design and a hardware interrupt. ChirpTransformer serves as the foundation for optimizing encoding and decoding in three specific case studies: weak signal decoding for extended network coverage, concurrent transmission for heightened network throughput, and data rate adaptation for improved network energy efficiency. Each case study involves the development of an end-to-end system to comprehensively evaluate its performance. The evaluation results demonstrate remarkable enhancements compared to the standard LoRa. Specifically, ChirpTransformer achieves a 2.38 × increase in network coverage, a 3.14 × boost in network throughput, and a 3.93 × of battery lifetime.
@inproceedings{li2024chirptransformer, title = {ChirpTransformer: Versatile LoRa encoding for low-power wide-area IoT}, author = {Li, Chenning and Ren, Yidong and Tong, Shuai and Siam, Shakhrul Iman and Zhang, Mi and Wang, Jiliang and Liu, Yunhao and Cao, Zhichao}, booktitle = {Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services}, pages = {479--491}, doi = {10.1145/3643832.3661861}, year = {2024}, }

2022

A Deep Learning Based Person Detection and Heatmap Generation Technique with a Multi-Camera System

Md Shakhrul Iman Siam and Subrata Biswas

In 2022 12th International Conference on Electrical and Computer Engineering (ICECE), 2022

Abs DOI Bib PDF

This paper outlines a technical method for video analysis that may be used to identify persons in footage from several CCTV cameras and provide a heatmap of that information for a certain floor layout. The analysis of customer and employee behavior in retail and office settings, as well as motion tracking and advertising effectiveness research, can all be aided by the automatic creation of people density maps. With the use of video recordings made by common video surveillance cameras, density maps were created. We made advantage of CCTV cameras, which are dispersed across a retail establishment. Because the Yolov5 object detection algorithm may produce findings more quickly, we have chosen to employ it for human detection. Additionally, due to the short inference time, it is appropriate for real-time applications.
@inproceedings{siam2022deep, title = {A Deep Learning Based Person Detection and Heatmap Generation Technique with a Multi-Camera System}, author = {Siam, Md Shakhrul Iman and Biswas, Subrata}, booktitle = {2022 12th International Conference on Electrical and Computer Engineering (ICECE)}, pages = {260--263}, year = {2022}, organization = {IEEE}, doi = {10.1109/ICECE57408.2022.10089044} }
Bioradiolocation-Based Multi-Class Sleep Stage Classification Using Time and Frequency Features with Random Forest Classifier

Md Shakhrul Iman Siam, Md Saiful Bari Siddiqui, Mushfiqul Abedin, and Mohammed Imamul Hassan Bhuiyan

In 2022 12th International Conference on Electrical and Computer Engineering (ICECE), 2022

Abs DOI Bib PDF

Sleep disorders are a common problem that disrupts our regular sleeping patterns. To diagnose sleep disorders, Long-term monitoring of sleep could be useful. In this paper an automated scheme of sleep staging is presented based on Bioradiolocation signals using time and frequency domain feature extraction and Random Forest Classifier. This experiment is val idated using data of 32 subjects without sleep-related breathing disorders. A Random Forest based algorithm is used for two, three, four and five-stage classification. We achieved the best performance so far (89.35% accuracy and 0.65 Cohens kappa) on 2-stage, 75.3% accuracy on 3-stage, 56.18% on 4-stage, and 54.2% accuracy on 5-stage classification with BRL Signals. These results show high potential in real-life sleep stage monitoring systems.
@inproceedings{siam2022bioradiolocation, title = {Bioradiolocation-Based Multi-Class Sleep Stage Classification Using Time and Frequency Features with Random Forest Classifier}, author = {Siam, Md Shakhrul Iman and Siddiqui, Md Saiful Bari and Abedin, Mushfiqul and Bhuiyan, Mohammed Imamul Hassan}, booktitle = {2022 12th International Conference on Electrical and Computer Engineering (ICECE)}, pages = {208--211}, year = {2022}, organization = {IEEE}, doi = {10.1109/ICECE57408.2022.10089093} }

2019

A Dual-Purpose Refreshable Braille Display Based on Real Time Object Detection and Optical Character Recognition

KM Naimul Hassan, Subrata Kumar Biswas, Md Shakil Anwar, Md Shakhrul Iman Siam, and Celia Shahnaz

In 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON), 2019

Abs DOI Bib PDF

This paper proposes a dual-purpose braille system for the visually impaired people. There are two main features of this system- object detection and optical character recognition. Real time object detection will help a visually impaired person to know about the things around him and optical character recognition will help him reading characters in both international (English) and local community (Bengali) language. In this paper, the detailed methodology of our proposed method is described. A pre-trained convolutional neural network (AlexNet) is used for classifying the objects and an OCR engine (Tesseract) along with basic image processing is used for optical character recognition. A refreshable braille display is also designed to show the braille characters.
@inproceedings{hassan2019dual, title = {A Dual-Purpose Refreshable Braille Display Based on Real Time Object Detection and Optical Character Recognition}, author = {Hassan, KM Naimul and Biswas, Subrata Kumar and Anwar, Md Shakil and Siam, Md Shakhrul Iman and Shahnaz, Celia}, booktitle = {2019 IEEE International Conference on Signal Processing, Information, Communication \& Systems (SPICSCON)}, pages = {78--81}, year = {2019}, organization = {IEEE}, doi = {10.1109/SPICSCON48833.2019.9065110} }