View Article |
Comparison of VSM, GVSM, and LSI in information retrieval for Indonesian text
Pardede, Jasman1, Husada, Milda Gustiana2.
Vector space model (VSM) is an Information Retrieval (IR) system model that represents
query and documents as n-dimension vector. GVSM is an expansion from VSM that
represents the documents base on similarity value between query and minterm vector
space of documents collection. Minterm vector is defined by the term in query.
Therefore, in retrieving a document can be done base on word meaning inside the
query. On the contrary, a document can consist the same information semantically. LSI
is a method implemented in IR system to retrieve document base on overall meaning
of users’ query input from a document, not based on each word translation. LSI uses a
matrix algebra technique namely Singular Value Decomposition (SVD). This study
discusses the performance of VSM, GVSM and LSI that are implemented on IR to retrieve
Indonesian sentences document of .pdf, .doc and .docx extension type files, by using
Nazief and Adriani stemming algorithm. Each method implemented either by thread or
no-thread. Thread is implemented in preprocessing process in reading each document
from document collection and stemming process either for query or documents. The
quality of information retrieval performance is evaluated based-on time response,
values of recall, precision, and F-measure were measured. The results show that for each
method, the fastest execution time is .docx extension type file followed by .doc and
.pdf. For the same document collection, the results show that time response for LSI is
more faster, followed by GVSM then VSM. The average of recall value for VSM, GVSM
and LSI are 82.86 %, 89.68 % and 84.93 % respectively. The average of precision value
for VSM, GVSM and LSI are 64.08 %, 67.51 % and 62.08 % respectively. The average of Fmeasure value for VSM, GVSM and LSI are 71.95 %, 76.63 % and 71.02 % respectively.
Implementation of multithread for preprocessing for VSM, GVSM, and LSI can increase
average time response required is about 30.422%, 26.282%, and 31.821% respectively.
Affiliation:
- Institut Teknologi Nasional, Indonesia
- Institut Teknologi Nasional, Indonesia
Download this article (This article has been downloaded 126 time(s))
|
|
Indexation |
Indexed by |
MyJurnal (2021) |
H-Index
|
6 |
Immediacy Index
|
0.000 |
Rank |
0 |
Indexed by |
Scopus 2020 |
Impact Factor
|
CiteScore (1.4) |
Rank |
Q3 (Engineering (all)) |
Additional Information |
SJR (0.191) |
|
|
|