In the digital document world, diverse file formats create a language barrier. Most software grasp only their formats, necessitating translators like import modules or display plugins. A versatile search engine becomes crucial, as few offer similar functions. To build such an engine, it must comprehend various file formats and their text storage techniques. Initially, define the subject with keywords for document search. WordNet helps find synonyms. Next, differentiate file types and extract text content for searching. Cluster and display documents with the same subject or synonyms, highlighting them.The application offers an appealing GUI and graphical search representation.
This module accepts a set of keyword including subject as input. the application can find out the parts of speech of each key word and finally find out the synonyms for each part of speech for each word.
This module takes a folder, drive or removable disk as input. It identifies which type of document is given as input and use appropriate parser to extract the text content from it. Once the text is extracted it is tokenized and stored.
We are able to search for the documents which contain the keywords and synonyms. If a match is found, the document will be displayed in the output window. The document can be open with its corresponding file format. The synonym part of the document will be highlighted and documents will be listed in a manner that the last modified document will be the first item.
This module deals with Graphical User Interface.