Tran Duy Thanh*, Jun-Ho Huh**
* Lecturer of Faculty of Information Systems, University of Economics and Law, Viet Nam National University HCMC
*First Author Email: thanhtd@uel.edu.vn
**Assistant Professor (Tenure Track) of Department of Data Informatics, (National) Korea Maritime and Ocean University, Busan 49112, Republic of Korea.
**Corresponding Author Email: 72networks@pukyong.ac.kr or 72networks@kmou.ac.kr
Abstract. In recent years, the information technology industry around the world has grown strongly. At the same time, we also face a new challenge is the explosion in the amount of information, although there is a huge amount of data, the information that we actually have is very little, the implications behind data have not been fully exploited yet. Scientists have researched new ways to fully exploit the information contained in the database. Since the late 1980s, the concept of knowledge discovery in databases was first mentioned, this is the process of detecting latent, unknown and useful knowledge in large databases [1] [2]. Overcoming the limitations of traditional database models with only data query tools that cannot find new information, hidden information in the database. Knowledge mining in a database is the process of discovering new, useful, and hidden information in a database. Since the early 1980s Z. Pawlak has proposed the rough set theory [3] with a very solid mathematical basis, this theory is practiced by many research groups working in the field of general information technology and exploring knowledge in the database in particular and applied in research. Rough set theory is increasingly widely applied in the field of knowledge discovery, very useful in solving problems of data classification, association rules discovery and especially useful in problems dealing with ambiguous and uncertain data. Specifically, in theory the raw set of data is represented through information systems or tables. Since in fact, with large data tables with imperfect data, redundant data, continuous data or represented in the form of symbols, the theory of rough sets allows knowledge exploration in databases like this to detect hidden knowledge from these “raw” blocks of data. The found knowledge is expressed in the form of rules and patterns. After finding the most general rules for data representation, one can calculate the strength and dependence between attributes in the information system. In the paper, the author studies the recommendation system [12], the rough set theory and the theory of approximation, the fuzzy rough set theory [13], thereby building a partial model. Software enables users to exploit association rules from their database, thereby helping to make appropriate purchase or import decisions. The system supports user design options of database features, can load data from SQL Server, export the statistics to website and report.
Keywords: big data; data mining; knowledge discovery; rough set; machine learning; fuzzy rough set; recommendation systems, association rules.
The paper is published on the Springer, The Journal of Supercomputing https://link.springer.com/article/10.1007/s11227-021-04275-5