Book Recommender System Using Matrix Factorization with Alternating Least Square Method

−In this digital age, we are faced with countless choices of books. Finding books that match our interests and desires becomes a complex challenge. However, the existence of a book recommender system is useful to help provide the best decision-making experience that users can have. This research develops a book recommender system using Collaborative Filtering (CF) Matrix Factorization with Alternating Least Squares method which is compared with Singular Value Decomposition method to see an accurate recommender system. This research uses datasets from Goodreads in the form of book data and rating data. This research uses several evaluation metrics, namely RMSE and MAE for regression metrics and F1-Score and Precision for classification metrics. Based on the research that has been done, SVD gets a better accuracy value with an RMSE value of around 0.86822, for MAE values around 0.6903, for F1-Score values around 0.827923 and for Precision values around 0.568347. Meanwhile, the ALS algorithm gets an RMSE value of around 1.09320, for MAE value of around 0.86479, for F1-Score value of around 0.000304 and for Precision value of around 0.000596.


INTRODUCTION
In today's digital information age, the number of books available is staggering.This makes it difficult for users to choose books that match their interests and preferences.Recommender systems are commonly used to recommend books to the most suitable users [1].A recommender system helps provide the best decision-making experience a user can have.Filtering information by predicting user ratings or preferences for books that the user wants [2].Since recommender systems are a strategy for decision making [1] that might help users.
A recommender system is a dynamic analysis process, which is carried out as a whole about a product and also about customers related to the product and can help to overcome information overload by providing specific recommendations for users and it is hoped that these recommendations can meet the wants and needs of users [3].Recommender systems with various domains have been developed including in the book domain.A method that has been frequently used is Collaborative Filtering [4].Collaborative Filtering recommends items based on the similarity of users in terms of selecting or rating items [5].CF also outperforms the accuracy compared to Content method addresses CF challenges such as scalability, scarcity, and cold start issues that make accurate recommendations difficult.JS is based on an index calculated from pairs of books that represents the ratio of shared users.The higher the number of shared users, the higher the JS index and the better the recommendation.And for books that have a high accurate JS index will be recommended, The calculated RMSE is 1.504.
Based on previous research, a book recommender system based on CF has been studied with various methods.However, the previous research still needs to be developed again because of the less than optimal results.In this research has designed, implemented, tested and evaluated the recommender system created.By taking the dataset in Goodreads.Implementing a Collaborative filtering recommender system using the Matrix factorization model by comparing 2 models, namely Alternating Least Square and Singular Value Decomposition, to evaluate the accuracy of the recommender system, we use 2 evaluation metrics regression and classification evaluation metrics, for regression using RMSE, and MAE while classification evaluation metrics use F1-Score and Precision.With the hope of producing a book recommender system that can provide accurate and relevant recommendations to users, improve user experience in making decisions to choose books, and increase efficiency and effectiveness in providing recommendations.

RESEARCH METHODOLOGY
The flow of the modeling of the recommender system made is with the first stage of dividing the data into 80% of the training data and 20% of the test data.After it will be converted into a matrix and then the training data will be learned by the ALS and SVD algorithms, so that with this learning the data will produce a recommendation model from the two models.And for 20% of the test data will be predicted by the two models.In the final stage, the prediction results of the two models will be evaluated with evaluation metrics using RMSE and MAE while for classification evaluation metrics using F1-Score and Precision.To see a brief overview of the flow can be seen in Figure 1 below.

Dataset
In this study, 2 datasets from Goodreads were used.The first dataset contains information about 58292 books.The second dataset contains 51945 ratings, including attributes such as ID, Name, Rating.For the book dataset, only a few columns were used.Tables 1 and 2 show examples of both datasets.

Collaborative Filtering
This technique is considered as a frequently used technique in recommender systems.It is widely used to make recommendations in various fields.This technique focuses on collecting and analyzing user preferences along with interaction patterns between users and items.The logic of this CF is centered on if there are 2 users for example, these 2 users have the same or similar preferences on some products or items, and most likely these two users have the same or similar preferences on other items.CF is an information filtering technique that attempts to predict the rating users give to a particular item based on a similarity matrix [12], [13] CF forms the basis of the first recommender system that helps people make decisions based on the opinions of others.CF has a user item matrix to find users with similar interests.CF also has the advantage of being data aware, as it can be used to determine similarities between users, user habits, and items.This will later be entered into the user item matrix.

Matrix Factorization
Matrix factorization (MF) being a very important model in Collaborative Filtering algorithms, there have been many studies to improve traditional MF methods [14], [15].MF has recently gained greater exposure in variable decomposition and latent variable and dimension reduction.Most MF models are based on latent factor models.
In latent factor, the rating matrix is modeled as the product of user and item matrices.The MF approach is found to be the most accurate to reduce the problem of sparsity.A diagram of MF can be seen in Figure 2.

Figure 2. Matrix factorization diagram
Matrix factorization is the method used in the recommender system to model a user-item rating matrix into two or more smaller matrices.This approach aims to identify hidden structures in the rating data, such as user preferences for item attributes or similarities between users.ALS and SVD methods are two matrix factorization techniques often used in recommender systems.

Alternating Least Square
ALS is a matrix factorization and parallelization algorithm.ALS is designed for large-scale CF problems.ALS is a great way to address the scalability and sparsity of evaluation data, and scales well even on very large datasets with ease [16], [17].In this study, we use a low-rank matrix factorization method with ALS because we use a fairly large dataset.In the matrix factorization problem, our goal is to find a relatively small number that can approximate each user with a user vector that has a lower dimension (k) and each book with a book vector that also has a lower dimension (k).These vectors are referred to as factors.Using these factors, we can predict our user rating for a book by simply predicting the multiplication between the user vector (xu) and the transpose of the book vector (yi).Thus, we can use the matrix factorization approach to generate accurate and efficient rating predictions in recommender systems.
It should be noted that the objective function to be optimised does not have a convex shape and is difficult to optimize directly.Therefore, to overcome this, one can use the gradient derivative method as an approximate approach.However, this method tends to be slow and requires many iterations.However, if we fix a set of variables (e.g.X) and make them constants, the objective function will be a convex function of the other variables (Y).In the ALS algorithm, we fix the Y variable and optimize the X variable, then fix the X variable and optimize the Y variable, and this process is repeated until convergence is achieved.With this approach, we can overcome the challenges in complex objective function optimisation.

Singular Value Decomposition
SVD also has important applications in recommender systems.In the context of recommender systems, SVD is used to model user preferences for items.SVD has the advantage of being able to cope with sparse data, better prediction performance on unstructured data.In SVD, the user-item rating matrix is transformed into three matrices: the singular value (sigma) matrix, the left singular vector (U) matrix that describes user preferences, and the right singular vector (V) matrix [18] that describes the attributes or features of the recommended items.See Figure 3 for the matrix factorization of the SVD for the recommender system.

Evaluation Metrics
In this evaluation stage, we will use 2 evaluation metrics, namely regression and classification evaluation metrics.For regression evaluation metrics using RMSE and MAE while classification evaluation metrics are F1-Score and Precision.

Root Mean Square Error
A metric along with MAE that measures the difference or error rate between predicted and observed values.RMSE is determined by the magnitude of the forecast error rate.The lower the RMSE value, the more accurate the forecast results [19].

Mean Absolute Error
MAE is a convenient measure commonly used in scoring models.MAE looks for real differences where the difference is the same for all users.MAE is used in recommender systems to obtain correct user ratings [20].The MAE is determined when the value is low [21].

F1-Score
F1-Score is an evaluation metric that combines precision and recall to measure the performance of classification models.F1-Score provides a balance between precision and recall, and is useful when there is an imbalance in class distribution.When close to 1, it indicates that model performance predicts positive outcomes and vice versa [22].F1 scores are especially useful when there is an imbalance between positive and negative classes., and is used as a comprehensive evaluation metric for classification models.

Precision
Precision is an evaluation metric that measures the extent to which the positive outcomes predicted by the model are correct.More specifically, precision calculates the number of correct positive predictions divided by the total number of positive predictions made by the model.When close to 1, it indicates that model performance predicts positive outcomes and vice versa [22], [23].RESULTS AND DISCUSSION

Testing Results
This research uses 2 algorithm methods to calculate the accuracy of the book recommender system, namely the ALS and SVD algorithms.Then from each test that has been done, we take measurements using several evaluation metrics, namely using regression evaluation metrics and classification evaluation metrics.For regression evaluation metrics using RMSE and MAE, while for classification evaluation metrics F1-Score and Precision.

Dataset Preprocessing
Before doing the recommender system, our dataset will enter the preprocessing stage first where there is a rating column in the Rating dataset which is categorical and will be converted into numeric form.In this stage, it can be seen in Table 2 that the Rating column is still in categorical form and will be converted into numeric by doing an encoder and entered into the rating_numeric column for the encoder results.Which is where the categorical rating will be converted into numeric on a scale of 0-5 (where 0 = No rating) Table 3 shows the results of the encoder in the rating column and entered into the rating_numeric column.Then the final stage is to measure performance through several evaluation metrics, namely using regression evaluation metrics and classification evaluation metrics.For regression evaluation metrics using RMSE and MAE, while for classification evaluation metrics F1-Score and Precision.Table 5 shows examples of results from several metric evaluations.It can be seen in Table 4, it can be concluded that the test results with the comparison of the two algorithms, For the ALS algorithm on the regression evaluation metric, the RMSE value is around 1.09320 and for the MAE value is around 0.86479, while on the classification evaluation metric the F1-Score value is around 0.000304 and for the Precision value is around 0.000596.For the SVD algorithm on the regression evaluation metric, the RMSE value is around 0.86822 and for the MAE value is around 0.69032, while the classification evaluation metric gets an F1-Score value of around 0.827923 and for Precision around 0.568347.It can be concluded from the overall test on each evaluation metric that Singular Value Decomposition is better in accuracy than the Alternating Least Squares algorithm.

ALS Recommendation Prediction Result
Prediction calculations for the ALS algorithm using several parameters, namely the number of ranks around (rank 20), the number of maxIter around (maxIter 10), the number of seeds around (seed 0).By displaying the top 5 recommendations on the recommender system, Table 6 shows an example of ALS results.The results obtained for the recommendation results for several IDs (users who received recommendations) and viewed as a whole in Table 6 can be concluded that the average prediction on the book recommender system tends to be in the range of lower values than the ratings given by users.This indicates that the system tends to provide recommendations that are somewhat conservative or less optimistic.However, the difference between user ratings and system predictions is still relatively small, so the system still provides recommendations that are overall consistent with user preferences.

SVD Recommendation Prediction Result
Evaluation prediction results using parameters considered to be the strongest.Prediction calculation of SVD algorithm using some parameters.That is, number of factors (n_factors 50), number of surrounding iterations (n_epochs 20), (large learning rate by lr_all) 0.025), number of regulations by (reg_all 0.16).By displaying the top 5 recommendations on the recommender system, Table 7 shows an example of SVD results.The results obtained for the recommendation results for several IDs (users who received recommendations) and viewed as a whole in Table 7 can be concluded that the average prediction of the book recommender system tends to be in the lower value range than the rating given by the user.For the same results as the previous method where the system tends to provide recommendations that are less optimistic, but can be determined again for the accuracy of the recommender system by calculating the evaluation metric.

Discussion
The recommender system in this research uses CF techniques with ALS and SVD algorithms to recommend books.Of the two algorithms that have several advantages such as handling cold start problems and data sparsity, a comparison is made for the two algorithms to see which one is better in terms of accuracy and can perform a recommender system that is suitable for users.There are several stages, In the dataset preprocessing stage, the rating dataset is encoded from categorical to numerical with the aim that it can be calculated and to predict user preferences, and can be used to create modeling techniques such as regression to build predictive models that are useful in recommender systems.Once the data is ready it will be learnt by the ALS and SVD algorithms to produce a recommendation from both algorithm models.After being learnt by both models, it enters the metric evaluation stage to see the extent of the accuracy of a model.Several evaluation metrics are used such as RMSE, MAE, F1-Score, and Precision.For the ALS algorithm in the regression evaluation metric, the RMSE value is around 1.09320 and for the MAE value is around 0.86479, while in the classification evaluation metric, the F1-Score value is around 0.000304 and for the Precision value is around 0.000596.For the SVD algorithm on the regression evaluation metric, the RMSE value is around 0.86822 and for the MAE value is around 0.69032, while on the classification evaluation metric, the F1-Score value is around 0.827923 and for the Precision value is around 0.568347.The average results obtained are better than previous research for both algorithms, namely ALS and SVD.But it is still less than optimal for the results obtained so, future research is expected to improve the performance of the recommender system by using a larger data set.In addition, you can compare other methods such as SVD++, stochastic matrix factorization, non-negative matrix factorization, or use other algorithms to create a recommender system.

CONCLUSIONS
It is based on research done by comparing ALS and SVD algorithms and is used for book recommender systems.Of the two algorithms that have several advantages such as handling cold start problems and data sparsity, a comparison is made for the two algorithms to see which one is better in terms of accuracy and can perform a recommender system that is suitable for users.Use Goodreads book records, which consist of book records and review records.This creates a data set with 58,292 books and a data set with 51,945 reviews.After the dataset is retrieved and preprocessed on the dataset using the Rating Dataset Encoder, convert the categorical rating column to a numeric column with a scale of 0 to 5 (0 = no rating) and add to the column Rating_numeric.The rating dataset is changed to numeric due to several conditions, namely, the rating dataset will be the calculation of evaluation metrics such as RMSE and MAE so that it can be processed easily and more consistently to facilitate statistical analysis and calculation of the rating dataset.The data set is then tested by comparing the ALS and SVD algorithms.Although the average predictions of the two algorithms did not differ much, i.e. not very optimistic results, from all the research done, it is possible to calculate some metrics of e.g. the two algorithms.turned out by for the values of the regression evaluation metrics, RMSE and MAE, the RMSE value produced by the ALS algorithm is approximately 1.09320 and for MAE is approximately 0.86479.Regarding the values of the classification metrics, F1 score and accuracy generated by ALS, the F1 score value is approximately 0.000304 and the accuracy is approximately 0.000596.In this case, the regression evaluation metric values for the SVD algorithm would be an RMSE value of approximately 0.86822 and an MAE value of approximately 0.69032.As for the values of the classification metrics, F1 score and accuracy, produced by SVD, the F1 score value is approximately 0.827923 and the accuracy is approximately 0.568347.This is because the accuracy and performance can be read from the average error values of RMSE and MAE.The closer the value is to 0, the better the accuracy or performance achieved.F1 score when close to 1, it indicates that model performance predicts positive outcomes and vice versa.Indicates that therefore, we can conclude that the SVD algorithm gives better results than ALS.This means that SVD can better predict what user preferences are understood and served, and his SVD's excellent prediction performance using matrix factorization allows it to extract better hidden features from the training data.Because you can make more accurate predictions.Therefore, future research is expected to potentially improve the performance of recommender systems using larger datasets.In addition, you can compare other methods such as SVD++, stochastic matrix factorization, non-negative matrix factorization, or use other algorithms to create a recommender system.

Table 1 .
Book Dataset

Table 3 .
Encode Rating Dataset Additionally, the two datasets enter a split data stage where they are split into 80% and 20%.80% for training data and 20% for test data.What is used in training data only contains ID, Name, and Rating_numeric.

Table 5 .
Calculation Results of Evaluation Metrics

Table 6 .
ALS Book Recommendation Results

Table 7 .
SVD Book Recommendation Results