Information and Communication Technology Competencies Clustering for students for Vocational High School Students Using K-Means Clustering Algorithm

The k-Means Clustering algorithm is intended to partition data into one or more groups, where data has similarities in one group and data has differences in another. Information and Communication Technology (ICT) Competency data clustering in educational units is consid-ered necessary to facilitate educational facilitation based on the differences in student abilities, determine advanced ICT guidance groups and become a reference in determining the place of Industrial Work Practices (Prakerin). This study aims to find out how the K-Means Clustering algorithm can be applied in clustering the ICT competencies of students at the State Vocational High School (SMK) 3 Lhokseumawe. The benefits generated in this study are in the form of visualization of data clustering that can help teachers and school management in formulating ICT policies at SMKN 3 Lhokseumawe. The data used in this study is the Information and Communication Technology (ICT) competency test score data for the 2021/2022 academic year. The data was obtained through a competency test process that refers to the Minister of Education and Culture Regulation Number 45 of 2015 concerning the Role of ICT/KKPI Teachers in the Implementation of the 2013 Curriculum where ICT competence includes the skills to search, store, process, present and disseminate data and information. Data processing in this study uses the K-means Clustering method and the RapidMiner application. Data processing using the RapidMiner application starts with data preparation, determining the number of clusters, and configuring the method. This study uses 3 (three) cluster configurations, namely the Very Competent, Competent, and Less Competent clusters. Testing data processing using the RapidMiner application resulted in 80 (eighty) students in cluster_0 with a Very Competent rating, 64 (sixty-four) students in cluster_1 with a Competent rating, and 10 (ten) students in cluster_2 with a Less Competent rating.


Introduction
Student profiles related to Information and Communication Technology (ICT) competencies must be mapped by each educational unit. This is due to the development of information technology which has affected all spheres of life, including in the world of education. This condition cannot be denied that every learning and curriculum application process is permanently attached to the Information and Communication Technology (ICT) approach. It is essential to increase the competence of cognitive, affective, and psychomotor learners in solving problems based on information technology [1] [2] [3] [4]. Therefore, educational units that apply a learning approach based on Information and Communication Technology (ICT) can present independent learning innovations compared to conventional learning approaches without collaboration with ICT. Learning through the ICT approach provides a new method in teaching and learning because it can minimize differences in teaching methods and materials, thus providing a more consistent standard of learning quality [5] [6] [7]. The results of previous studies showed that the application of interactive learning with the Blended Learning approach could improve the quality of learning for students at SMKN 3 Lhokseumawe [8]. This makes SMKN 3 Lhokseumawe an educational unit that implements blended learning where the learning process collaborates with ICT, thus making the ICT competence of students an essential aspect to know and map. The results of the ICT competency mapping can be used as a reference by the education unit to formulate lesson plans, formulate ICT policies and become a reference for determining the place for Industrial Work Practices (Prakerin) for students. Therefore, it is deemed necessary to cluster Information and Communication Technology (ICT) competency data at SMKN 3 Lhokseumawe so that it can automatically generate a visualization of the ICT competency mapping of students. Mapped ICT competencies refer to Permendikbud number 45 of 2015 where ICT competencies are mapped in terms of the ability to search, store, process, present and disseminate data and information in various ways to support smooth learning. Full online or blended learning is the best solution during the COVID-19 pandemic, causing differences in the learning process. Changes in the learning process are expected to be implemented effectively and efficiently [9]. Data clustering is one of the methods of Data Mining, which is a process that performs one or more computer learning techniques (machine learning) to analyze and extract knowledge automatically. Where the main functions of Data Mining are classification, assessment, prediction, relevance grouping or association rules, grouping, description and visualization [10] [11]. Data classification is seen as very important in the current era of information technology because the Education unit will easily process data and the data security aspect will be guaranteed [12] [13]. The algorithm that can be used to cluster data is the K-Means Clustering method. K-Means Clustering Algorithm is a technique in Data Mining that can partition data into several clusters so that data with the same characteristics will be grouped into one cluster while data with different characteristics will be grouped into other clusters. K-Means performs two processes, namely the process of determining the center of the cluster (centroid) and the process of finding members from each cluster and requires k input parameters or the number of clusters [14] [11] [15]. Previous research using the K-Means Clustering Algorithm was carried out by Miftahul (2021) who examined the implementation of the K-Means Algorithm for Clustering of Participants in the National Science Olympiad at the High School Level which resulted in C1 there were 12 highly competent students to take part in OSN, in C2 there were 14 competent students. but have not been able to take part in OSN and in C3 2 students are less competent to take part in OSN. Ari Sulistiyawati's research (2021) examines the implementation of the Kmeans Clustering Algorithm in the Determination of Superior Class Students, the result is that the implementation of the K-Means Clustering algorithm into the clustering information system provides the results of an effective data grouping classification and the process of each iteration of the centroid distance rotation, point determination clusters are formed, student data as a reference object saves more time on clustering superior classes [16] [17] [15]. Surohman's research (2019) on the use of the K-Means Algorithm in measuring the correlation between student profiles and grades using data on the 2019 National Examination scores at one of the Vocational High Schools in Jakarta obtained the results of the K-Means algorithm that can be used in grouping students based on social aspects. From the research, there were 36% or 184 of the total 512 students whose profile clusters and academic scores matched. 43 students with low grades and minimal family economy and 83 students with the highest academic scores and affluent family economic conditions [18]. Suhefi's research (2020) on clustering the determination of student interest in school selection uses the K-Means algorithm where in the research the data used is sourced from the Basic Education Data (Dapodik). The result is that the K-Means Clustering algorithm can be used to cluster students' interest in school selection. From 10 data on schools at the junior high school level, the results obtained for the less desirable category (C0) as many as 6 schools, the moderately attractive category (C1) as many as 3 schools, and the very attractive category (C2) as many as 1 school [19]. This study aims to find the group of Information and Communication Technology (ICT) competency values of SMKN 3 Lhokseumawe students by calculating the shortest distance between the data and the center point (centroid) in a cluster. Thus the intended grouping will produce several clusters, namely Very Competent, Competent and Less Competent in the field of Information and Communication Technology (ICT) [11] [20].

Methods
A research requires a framework where the research framework consists of the stages carried out to complete the research. The stages of work in this research can be seen in Figure 1  Based on the flow of the above work stages can be described as follows:

Identification of problem
The purpose of this research is to determine the competency clusters of Information and Communication Technology (ICT) students who can determine the location of Industrial Work Practices (Prakerin). Based on the recapitulation of the ICT competency test scores, three clusters were formed, namely Highly Competent, Competent and Less Competent.

Research Purposes
The purpose of this research is to determine the competency clusters of Information and Communication Technology (ICT) students who can determine the location of Industrial Work Practices (Prakerin). Based on the recapitulation of the ICT competency test scores, three clusters were formed, namely Highly Competent, Competent and Less Competent.

Data Collecting
The data used in this study is the data of the Information and Communication Technology (ICT) competency test score. Where the data has been encapsulated into variables looking for, storing, processing, presenting and disseminating data or information.

Data Processing
Data processing in this study uses the K-Means Clustering algorithm, with the following steps: 1. Determine the number of k clusters, where the number of clusters depends on system requirements 2. Determine the centroid value as many as the number of clusters k 3. Allocate each data to the nearest cluster center, the formula used is as follows: is the data distance to the center of the cluster, x is the data record and y is the data centroid. 4. Recalculate the new centroid, the formula used is as follows: Where C1 is the new centroid, x1 is the value of the first data record, xn is the value of the nth data record while x is the number of data records.
5. Repeat process number 4.3. if the cluster center is still changing. If there are no more changes, the data allocation process to the cluster center is stopped.

Result Testing
Testing the results is intended to obtain accurate data from the system built so that it can run according to research objectives. Data testing is carried out by providing data used when testing applications, determining the results of ICT competency clusters, using the RapidMiner application to build models and visualize data.

Result Conclusioning
The conclusion is formulated based on the results of the calculation of the K-Means Clustering algorithm using the Rapid Miner application. The results of the clustering of students' ICT competencies will determine where to carry out Industrial Work Practice activities.

Results and Discussion
The clustering of Information and Communication Technology (ICT) competencies for SMKN 3 Lhokseumawe students can be explained in the following stages:

Recapitulation of Information and Communication Technology (ICT) Competency Test Scores
The amount of data processed in this study was 154 data according to the number of students at level XI at SMKN 3 Lhokseumawe. The ICT competency test scores of these students can be seen in the following Calculating the data distance with the centroid using the Rapid Miner application, RapidMiner is a reliable software or application in data processing. Based on data mining principles and algorithms, RapidMiner extracts patterns from large data sets and combines statistical methods, artificial intelligence and databases [21]. RapidMiner can be a solution for analyzing data mining and predictive analysis [22]. Data processing of Information and Communication Technology (ICT) competencies for SMKN 3 Lhokseumawe students uses data and operator configurations, as shown in the following figure: The Information and Communication Technology (ICT) competency test score file is imported into the RapidMiner application worksheet then the configuration is followed by importing the K-Means algorithm operator and Cluster Distance Performance segmentation. In the K-Means algorithm operator, 3 (three) clusters are determined and in the performance section the main criterion selected is Davies Boludin. Once the operator configuration and performance are confirmed, the process can be started. The results of the data processing are displayed in the form of Cluster Model, Clustering Example Set and Performance Vector. In the Cluster Model section, you can see the results of the cluster names of participants who have been divided into 3 clusters and the cluster data distribution plot, as shown below:

Fig 3 : Folder View pada Cluster Model RapidMiner
Based on the results of the data processing using 3 (three) clusters, it is obtained that cluster_0 contains 80 students, cluster_1 contains 64 students and cluster_2 contains 10 students. The results of the process through RapidMiner can be seen in the following image:

Fig 4 : Cluster Model Process Results
Based on the resulting cluster model, only the number of members in the cluster can be seen. To find out which cluster has the highest score group, the plot features in the cluster model can be used. The data plot can be seen in the following figure: Through the picture above, it can be seen that cluster_0 is identified using blue where the cluster is at the highest level, meaning that cluster_0 contains data on students who have a high level of competence (very competent). Cluster_1 is an intermediate (competent) cluster identified in green while cluster_2 is identified in red as a cluster of students who have a low level of ICT competence (less competent).

Conclusion
Based on the results of the data processing of the Information and Communication Technology (ICT) competency test scores 154 students of SMKN 3 Lhokseumawe using the RapidMiner application with a configuration of 3 (three) clusters resulting in 80 (eighty) students in cluster_0 with a rating of Very Competent, 64 (sixty four) ) students in cluster_1 with a Competent rating and 10 (ten) students in cluster_2 with a Less Competent rating. Based on the results of the mapping, the education unit can formulate a policy for Industrial Work Practices (Prakerin) for students in each cluster.