AI Powered Tablet Scanner with Multilingual Voice Assistance

Vijai Anand P. R, Surya P., Sudhiksha M, Neghapriya U, Alaguvigneshwaran M,

doi:10.5281/zenodo.19421893

Research Paper | Open Access
Volume 04 | Issue 04 | Article Id IJPS/260403464

AI Powered Tablet Scanner with Multilingual Voice Assistance
Vijai Anand P. R* Surya P. Sudhiksha M Neghapriya U Alaguvigneshwaran M
United College of Pharmacy, Periyanaickenpalayam, Coimbatore, Affiliated to The Tamil Nadu Dr. M.G.R Medical University, Chennai.

Abstract

Background: Medication errors that occur due to incorrect identification of tablets continue to be a serious issue in healthcare. This problem is especially common among elderly individuals, visually impaired patients, and people who take multiple medicines every day. Traditional pill identification methods usually depend on reading labels or recognizing the color and shape of tablets, which may not always be reliable or accessible. With the rapid development of Artificial Intelligence (AI), computer vision, and voice technologies, it is now possible to design automated systems that can recognize medicines accurately and present medication information in a more user-friendly way. Objective: This study aims to design and develop an AI-based tablet scanner equipped with multilingual voice assistance. The system identifies pharmaceutical tablets using image recognition and provides essential medicine information through audio instructions in regional languages, thereby improving medication safety and accessibility. Methods: The proposed system combines deep learning techniques with Optical Character Recognition (OCR) and Text-to-Speech (TTS) technologies. A ResNet-50 convolutional neural network model pretrained on ImageNet was fine-tuned to classify tablet images captured by users. Uploaded images are processed through a FastAPI backend where image classification is performed. The system then retrieves relevant medicine details from a structured dataset and presents the results through a responsive interface along with multilingual voice guidance. Results: The trained model showed consistent improvement during training with a gradual decrease in loss values and was able to accurately recognize tablets under suitable imaging conditions. The application successfully detected uploaded tablet images, retrieved the corresponding medicine information from the dataset, and displayed the results in a mobile-friendly interface. The addition of voice assistance significantly improved usability for elderly and visually impaired users by delivering medication details through audio output. Conclusion: The AI-powered tablet identification system developed in this study demonstrates how deep learning combined with voice technologies can improve medication safety and accessibility. Although the system currently functions as an academic prototype, future improvements such as larger datasets, OCR-based imprint recognition, and scalable databases will be essential for real-world clinical implementation.

Keywords

Artificial Intelligence, Tablet Identification, Deep Learning, ResNet-50, Optical Character Recognition, Text-to-Speech, Medication Safety, Multilingual Voice Assistance

Introduction

Medication Safety and the Need for Accurate Pill Identification

Medication safety and adherence remain major challenges in healthcare, with medication errors contributing significantly to preventable harm. A common cause of these errors is improper pill identification, particularly when medications are removed from their original packaging or when unlabeled or visually similar pills are encountered. These challenges emphasize the need for reliable pill identification tools and improved training in safe prescribing and medication management [1].

Polypharmacy and Medication Errors in Chronic Diseases

Polypharmacy is strongly associated with adverse drug events, nonadherence, drug–drug interactions, hospitalizations, and increased mortality. In nephrology, polypharmacy has been linked to medication error rates as high as 68%, highlighting the importance of medication reconciliation as a primary strategy for improving patient safety [2]. Similar challenges are observed in rheumatic diseases and the general population, where long-term combination therapy and aging-related comorbidities contribute to widespread polypharmacy [9].

Challenges Faced by Older Adults and Visually Impaired Patients

Older adults and visually impaired individuals face disproportionate risks related to medication misidentification. Studies indicate that 75–96% of elderly patients make medication-related errors, while existing solutions largely focus on healthcare providers rather than patient- centered interventions [3].

Medication management is particularly challenging for individuals with visual impairments, as traditional identification methods rely on visual cues such as pill color, shape, and imprints. These methods are often inaccessible, error-prone, and time-consuming, increasing the risk of adverse outcomes [4].

Medication Identification in Ophthalmology and Chronic Care

In chronic ophthalmic conditions such as glaucoma, medication identification remains a major challenge due to vision loss, similar packaging, and label degradation. Patients with poor visual acuity or cognitive decline often struggle to distinguish eye drop medications, leading to noncompliance and disease progression. Studies show a strong association between medication nonadherence and worsening glaucomatous visual field loss [5].

Artificial Intelligence in Modern Allopathic Medicine

AI has become integral to modern allopathic medicine, supporting diagnostics, drug discovery, patient monitoring, and clinical decision-making. From early rule-based systems such as MYCIN to contemporary deep learning models, AI has demonstrated its ability to process complex medical data and deliver personalized, evidence-based care [10].

Challenges and Advances in Pill Image Recognition

Despite progress, pill image recognition remains challenging due to variations in lighting, background, orientation, and image quality in mobile environments. Deep learning models also face constraints related to computational cost, power consumption, privacy, and reliance on cloud connectivity [12].

Lightweight, on-device solutions such as MobileDeepPill address these challenges by employing efficient multi-CNN architectures and triplet loss functions, achieving state-of-the- art performance without cloud offloading [12]. Other CNN-based approaches, including ResNet101, have demonstrated classification accuracies exceeding 98%, further supporting the role of deep learning in improving medication safety [13].

Need for Assistive and Automated Solutions

Accurate pill detection and inspection are essential for ensuring medication safety, regulatory compliance, and effective clinical care, particularly with the increasing use of automated dispensing systems [15]. Traditional manual identification methods remain time-consuming and error-prone, especially for patients with visual impairments or complex medication regimens. Consequently, AI-driven assistive technologies are increasingly recognized as critical tools for reducing medication errors and improving patient outcomes [14].

2. MATERIALS AND METHODS

System Overview

The Tablet Identification and Medicine Information System is an AI-based application developed to recognize pharmaceutical tablets using back-side image analysis and to retrieve detailed medicine information from a structured dataset. The system integrates computer vision, deep learning algorithms, and web technologies to create a complete workflow suitable for educational, research, and demonstration purposes.

The workflow consists of four main stages:

Image acquisition and upload through a web interface
Deep learning-based image classification
Retrieval of structured drug information from a medicine database (Excel)
Presentation of results through a responsive and user-friendly interface
Materials (Software and Tools)

2.1 Programming Language

Python 3.10: The primary programming language used for model development, backend API implementation, and data processing because of its extensive libraries for machine learning and web development.

2.2 Deep Learning Frameworks

PyTorch: Utilized for training and running the deep learning model, offering dynamic computation graphs and efficient CPU/GPU execution.

Torchvision: Used to access pretrained models such as ResNet-50 and to perform image preprocessing operations.

2.3 Model Architecture

ResNet-50 (Residual Neural Network):

A deep convolutional neural network consisting of 50 layers.
Employs residual connections that help prevent the vanishing gradient problem.
Pretrained on the ImageNet dataset and later fine-tuned for tablet image classification.

2.4 Data Handling

Pandas: Used to manage and process structured medicine information stored in Excel files. OpenPyXL: Serves as the backend engine for reading .xlsx datasets.

2.5 Backend Framework

FastAPI:

A high-performance Python framework used to build the backend API.
Supports asynchronous request handling.
Provides automatic JSON serialization and REST API functionality.

2.6 Frontend Technologies

HTML5: Defines the structure of the web interface.

CSS3: Responsible for layout design, styling, and responsive presentation.

JavaScript (Vanilla): Handles image uploads, API communication, and dynamic display of prediction results.

2.7 Development Environment

Ubuntu Linux: Operating system used during development.

Virtual Environment (venv): Used to isolate project dependencies. Uvicorn: ASGI server used to run the FastAPI application.

3. DATASET PREPARATION

3.1 Image Dataset

The dataset contains back-side images of pharmaceutical tablets.
Each class represents a specific medicine (for example Paracetamol_P650 or Flavoxate_Urivel).
Images are arranged using a directory-based structure compatible with ImageFolder datasets.

3.2 Data Split

Training Set: Approximately 80% of the total images per class. Validation Set: Approximately 20% of the total images per class.

3.3 Image Preprocessing

Images are resized to 224 × 224 pixels.
Normalization is applied using ImageNet mean and standard deviation values.
Data augmentation such as random horizontal flipping is used during training to improve model generalization.

4. METHODS

4.1 Model Training Methodology

A pretrained ResNet-50 model with ImageNet weights is loaded.
Convolutional layers are frozen to preserve previously learned visual features.
The final fully connected layer is modified to match the number of tablet classes.
Model training uses the following parameters: Loss Function: Cross-Entropy Loss

Optimizer: Adam Optimizer Learning Rate: 0.001

Training is conducted for 8 epochs while monitoring the reduction in loss values.

4.2 Prediction Method

Uploaded images undergo the same preprocessing steps used during training.
The trained model predicts the tablet class with the highest probability.
This predicted class is used as a key for retrieving medicine information from the database.

4.3 Medicine Information Retrieval

Medicine details are stored in an Excel dataset.
Each row corresponds to a particular medicine.
The column medicine_key functions as the primary identifier.
Pandas is used to filter the dataset and retrieve the record corresponding to the predicted class.

4.4 Web Application Workflow

The user uploads a tablet image using the browser interface.
The image is sent to the FastAPI backend through an HTTP POST request.
The backend performs model inference and database retrieval.
A JSON response is returned to the frontend.

5. RESULTS

5.1 Model Performance

Training loss gradually decreased from around 0.65 to 0.07, indicating successful model learning.
The trained model accurately classified tablets used during testing.
Predictions were most reliable when images were captured clearly and under conditions similar to the training data.

5.2 Functional Results

The system successfully:

Identified tablets from uploaded images.
Retrieved the appropriate medicine information from the Excel dataset.
Displayed detailed drug information within a clean and structured user interface.

1.3 User Interface Results

The interface developed for the application is:
Mobile-responsive.
Easy to read and visually organized.
Appropriate for academic demonstrations and healthcare-related presentations.

6. DISCUSSION

6.1 Strengths of the System

Provides a complete automated workflow from image input to structured output.
Transfer learning reduces both training time and the need for very large datasets.
Clear separation between prediction logic and data storage.
Lightweight frontend implementation without heavy external frameworks.

6.2 Limitations

System accuracy is strongly dependent on dataset quality and image diversity.
Excel-based storage may not be suitable for large-scale deployment.
The current system relies only on visual appearance and does not yet analyze tablet imprint text.

6.3 Practical Applications

Educational demonstrations for pharmacy and artificial intelligence courses.
Preliminary support for tablet identification.
Prototype framework for hospital or pharmacy automation systems.

PREVIEW PAGE

CONCLUSION

The proposed system demonstrates how deep learning and web technologies can be effectively combined to identify pharmaceutical tablets. By integrating image classification with structured data retrieval, the application can generate accurate predictions while also providing detailed medicine information. Although the system is currently suitable mainly for research and demonstration purposes, further development will be required before large-scale clinical deployment becomes possible.

Practical Applications

Educational demonstrations in pharmacy and AI courses.
Preliminary tablet identification support.
Prototype for hospital or pharmacy automation systems.

FUTURE ENHANCEMENT

Although the current system demonstrates the potential of deep learning in pill identification, several improvements can be implemented to transform it into a more advanced clinical tool:

- - Advanced OCR Integration: Future versions will incorporate Optical Character Recognition to detect and read tablet imprints directly from the pill surface, providing an additional verification mechanism along with visual recognition.
  - Scalable Database Migration: The current Excel-based dataset can be replaced with relational databases such as SQLite or PostgreSQL to allow faster data retrieval and improved data reliability.
  - Real-Time Mobile Application: Developing a dedicated mobile application will allow users to scan tablets in real time using the device camera without manually uploading images.
  - Confidence Scores and Uncertainty Estimation: Displaying prediction confidence scores will help users understand the reliability of the system and identify cases where human verification may be required.
  - Drug-Drug Interaction Alerts: Future updates may include AI-based warnings for possible drug-drug or drug-food interactions, which would be particularly beneficial for patients managing multiple medications.

REFERENCES

Jha, Phul Babu, et al. "EasyPill: AI-Powered Pill Identification for Enhancing Medication Safety and Reducing Errors by Using Mobilenet CNNs." Journal of Productive Discourse 3.1 (2025): Pg no:23-38.
Yassin, Yusef, et al. "Evaluating a generative artificial intelligence accuracy in providing medication instructions from smartphone images." Journal of the American Pharmacists Association 65.1 (2025): 102284, Pg no:1-5.
Sheikh, Mohammad S et al. “Identification of kidney-related medications using AI from self-captured pill images.” Renal failure vol. 46,2 (2024): 2402075. doi:10.1080/0886022X.2024.2402075
Alahmadi, Taif, et al. "A Computer Vision-Based Pill Recognition Application: Bridging Gaps in Medication Understanding for the Elderly." International Journal of Advanced Computer Science & Applications 15.7 (2024) Pg no:715-725.
Dang, Bo, et al. "Real-time pill identification for the visually impaired using deep learning." 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE). IEEE, (2024), Pg no:1-5.
Yang, Christopher D., et al. "Clinical validation of a handheld deep learning tool for identification of glaucoma medications." Journal of Ophthalmic & Vision Research 19.2 (2024): Pg no:172-181.
Joshi, Nidhi, et al. "Automatic Pill Identifier An Overview On Identifying, Retrieving And Authenticating Drug-Pill." (2024), Pg no:61-72
Biswas, Risab, et al. "Drug discovery and drug identification using AI." (2020) Indo– Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo- Taiwan ICAN). IEEE, Pg no:49-51.
Larios Delgado, Natalia, et al. "Fast and accurate medication identification." NPJ digital medicine 2.1 (2019), Pg no:1-9.
Cho, Soo-Kyung, et al. "Usability evaluation of an image-based pill identification application." Journal of Rheumatic Diseases 26.2 (2019), Pg no:111-117.
Kesh, Snigdha, and U. Ananthanagu. "Text Recognition and Medicine Identification by Visually Impaired People." International Journal Of Engineering Research & Technology (IJERT) ICPCN-2017 5 (2017).
Zeng, Xiao, Kai Cao, and Mi Zhang. "MobileDeepPill: A small-footprint mobile deep learning system for recognizing unconstrained pill images." Proceedings of the 15^th Annual International Conference on Mobile Systems, Applications, and Services. (2017).
Kim, Seongheon, et al. "Advanced Pharmaceutical Recognition System Based on Deep Learning for Mobile Medication Identification." Applied Sciences 15.10 (2025): 5644.
Varghese, Rose Mary, et al. "MediLenz: Medicine Identification and Voice Assistance System." 2025 5th International Conference on Intelligent Technologies (CONIT). IEEE, (2025).
Kavitha, N., and P. Madhumathy. "Real-time pill identification and classification using deep learning framework for medicine inspection systems." Discover Electronics 2.1 (2025), Pg no:1-24.