Integrating Advanced OCR and NLP Techniques for Enhanced Text Extraction and Image Plagiarism Detection

Dr. Palvadi Srinivas Kumar; Dr. Krishna Prasad

doi:10.47992/IJAEML.2581.7000.0230

PDF

Published: Dec 7, 2024

DOI: https://doi.org/10.47992/IJAEML.2581.7000.0230

Keywords:

Plagiarism detection, Natural Language Processing, Transformer models, Attention Generative Adversarial Networks, content originality, digital ethics

Dr. Palvadi Srinivas Kumar

Post Doctoral Research Fellow, Institute of Computer and Information Sciences, Srinivas University, Mangalore, Karnataka, INDIA

Dr. Krishna Prasad

Professor, Department of Cyber Security and Cyber Forensics, Institute of Computer and Information Sciences, Srinivas University, Mangalore, Karnataka, INDIA

Abstract

This study targets the problem of digital content misuse and impersonification, both for text and images. This paper presents a new way to discover misuses of images by first leveraging OCR to make sure the text present in the image is extracted. The extracted Text is then processed to determine the originality of the content using advanced Natural Language Processing (NLP) techniques, more recently Transformer based models like BERT. It enhances the detection of potential misuse by comparing the extracted text with databases at scale. In addition, the study investigates how Attentional Generative Adversarial Network (AttnGAN) visually imagines descriptions, expanding our understanding of text to image generation. Result analysis indicates that the incorporation of OCR with NLP enhances accuracy in determining image abuse where BERT allows to get further knowledge about content originality. Furthermore, AttnGAN has demonstrated the ability to generate high-quality images from text input efficiently; therefore, promoting the understanding of digital content creation and originality. In this work, we introduced a novel approach for content detection based on OCR, NLP and image generation (detected contents) as well as conscious sharing practices in academia, law and authorship.

How to Cite

Dr. Palvadi Srinivas Kumar, & Dr. Krishna Prasad. (2024). Integrating Advanced OCR and NLP Techniques for Enhanced Text Extraction and Image Plagiarism Detection. International Journal of Applied Engineering and Management Letters (IJAEML), 8(2), 198–207. https://doi.org/10.47992/IJAEML.2581.7000.0230

Issue

Vol. 8 No. 2 (2024): Volume 8 Issue 2

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details