Integrating Advanced OCR and NLP Techniques for Enhanced Text Extraction and Image Plagiarism Detection

Main Article Content

Dr. Palvadi Srinivas Kumar
Dr. Krishna Prasad

Abstract

This study targets the problem of digital content misuse and impersonification, both for text and images. This paper presents a new way to discover misuses of images by first leveraging OCR to make sure the text present in the image is extracted. The extracted Text is then processed to determine the originality of the content using advanced Natural Language Processing (NLP) techniques, more recently Transformer based models like BERT. It enhances the detection of potential misuse by comparing the extracted text with databases at scale. In addition, the study investigates how Attentional Generative Adversarial Network (AttnGAN) visually imagines descriptions, expanding our understanding of text to image generation. Result analysis indicates that the incorporation of OCR with NLP enhances accuracy in determining image abuse where BERT allows to get further knowledge about content originality. Furthermore, AttnGAN has demonstrated the ability to generate high-quality images from text input efficiently; therefore, promoting the understanding of digital content creation and originality. In this work, we introduced a novel approach for content detection based on OCR, NLP and image generation (detected contents) as well as conscious sharing practices in academia, law and authorship.

Article Details

How to Cite
Dr. Palvadi Srinivas Kumar, & Dr. Krishna Prasad. (2024). Integrating Advanced OCR and NLP Techniques for Enhanced Text Extraction and Image Plagiarism Detection. International Journal of Applied Engineering and Management Letters (IJAEML), 8(2), 198–207. https://doi.org/10.47992/IJAEML.2581.7000.0230
Section
Articles