Unification of Numerical and Ordinal Survey Data for Clustering-based Inferencing
Main Article Content
Abstract
With the proliferation of surveys for almost every issue governing our life with various parameters and a variety of data, it becomes necessary for a researcher to unify these data followed for extracting inferences from the survey. Data from quantitative surveys are clustered to reveal respondents' divergent and dominant tendencies. It aims to investigate the general trends among the respondents' categories. Due to the unique characteristics of survey data, popular clustering techniques based on value similarity are inadequate.
In this paper, we attempt to unify the numerical data with the ordinal data of a survey. We model the data with a Gaussian distribution, therefore, we first convert the numerical data to ordinal data following the distribution; this may be the governing attributes for deciding the clusters. Then, we use $K$-means clustering with varying numbers of clusters. We implement the proposed methodologies on real survey data and compare the clustering efficiency before and after the proposed methodology on the number of clusters. More crucially, it appropriately uses the ordinal attributes order information and numerical attribute statistical information for clustering. Extensive testing demonstrates that the suggested unification works better on real data sets than its contemporaries.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.
References
Amine, A., Elberrichi, Z., Simonet, M., and Malki, M. Evaluation and comparison of concept
based and n-grams based text clustering using SOM. INFOCOMP Journal of Computer Science, 7(1):27–35, 2008.
Biernacki, C., Marbac, M., and Vandewalle, V. Gaussian-based visualization of gaussian and nongaussian-
based clustering. Journal of Classification, 38(1):129–157, 2021.
Carrillo, D., Nguyen, L. D., Nardelli, P. H., Pournaras, E., Morita, P., Rodríguez, D. Z., Dzaferagic, M., Siljak, H., Jung, A., Hébert-
Dufresne, L., et al. Corrigendum: Containing future epidemics with trustworthy federated systems for ubiquitous warning and response. Frontiers in Communications and Networks, 2:721971, 2021.
Cheng, Y. and Church, G. M. Biclustering of expression data. In Proc. ISMB, volume 8, pages 93–103, 2000.
Fang, Y., Karlis, D., and Subedi, S. Infinite mixtures of multivariate normal-inverse gaussian distributions
for clustering of skewed data. Journal of Classification, pages 1–43, 2022.
Ferreira, J. P. B., Junior, F. L., Rosa, R. L., and Rodríguez, D. Z. Evaluation of sentiment and affectivity analysis in a blog recommendation system. In Proceedings of the XVI Brazilian Symposium on Human Factors in Computing Systems, pages
–9, 2017.
Ghassabeh, Y. A. A sufficient condition for the convergence of the mean shift algorithm with
gaussian kernel. Journal of Multivariate Analysis, 135:1–10, 2015.
Giordan, M. and Diana, G. A clustering method for categorical ordinal data. Communications in
StatisticsâTheory & Methods, 40(7):1315–1334, 2011.
Golinko, E., Sonderman, T., and Zhu, X. CNFL: categorical to numerical feature learning for clustering
and classification. In Proc. IEEE 2nd Int. Conf. Data Science in Cyberspace, pages 585– 594. IEEE, 2017.
Harvey, L. The new collegialism: improvement with accountability. Tertiary Education & Management,
(2):153–160, 1995.
Jiang, D., Tang, C., and Zhang, A. Cluster analysis for gene expression data: a survey. IEEE Trans.
Knowledge & Data Engineering, 16(11):1370– 1386, 2004.
Jongbloed, B., Enders, J., and Salerno, C. Higher education and its communities: Interconnections,
interdependencies and a research agenda. Higher Education, 56(3):303–324, 2008.
Kinnunen, T., Sidoroff, I., Tuononen, M., and Fränti, P. Comparison of clustering methods: A
case study of text-independent speaker modeling. Pattern Recognition Letters, 32(13):1604–1617, 2011.
Kriegel, H.-P., Kröger, P., and Zimek, A. Clustering high-dimensional data: A survey on subspace
clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowledge Discovery
from Data, 3(1):1–58, 2009.
Kumar, B. and Kumar, R. Difference-attributebased clustering for ordinal survey data. In Proc.
Int. Conf. Signal Processing & Integrated Networks, pages 17–27. Springer, 2022.
Kumar, B. and Kumar, R. Entropy-based clustering for subspace pattern discovery in ordinal survey
data. In Proc. Int. Conf. Frontiers of Intelligent Computing: Theory and Applications, pages
–519. Springer, 2022.
Kumar, V., Chhabra, J. K., and Kumar, D. Performance evaluation of distance metrics in the clustering
algorithms. INFOCOMP Journal of Computer Science, 13(1):38–52, 2014.
Lyytinen, A., Kohtamäki, V., Kivistö, J., Pekkola, E., and Hölttä, S. Scenarios of quality assurance
of stakeholder relationships in finnish higher education institutions. Quality in Higher education,
(1):35–49, 2017.
Mamabolo, M. A. and Myres, K. A detailed guide on converting qualitative data into quantitative
entrepreneurial skills survey instrument. The Electronic Journal of Business Research Methods,
pages 102–117, 2019.
Okey, O. D., Melgarejo, D. C., Saadi, M., Rosa, R. L., Kleinschmidt, J. H., and Rodríguez, D. Z.
Transfer learning approach to ids on cloud iot devices using optimized cnn. IEEE Access,
:1023–1038, 2023.
PINTO, G. E., Rosa, R. L., and Rodriguez, D. Z. Applications for 5g networks. INFOCOMP Journal
of Computer Science, 20(1), 2021.
Rastogi, R., Mondal, P., Agarwal, K., Gupta, R., and Jain, S. GA based clustering of mixed data
type of attributes (numeric, categorical, ordinal, binary and ratio-scaled). BVICA M’s Int. J. Information
Technology, 7(2):861, 2015.
Rich, T. S. South korean perceptions of unification: Evidence from an experimental survey. Geo.
J. Int’l Aff., 20:142, 2019.
Rodriguez, D. Z., de Oliveira, F. M., Nunes, P. H., and de Morais, R. M. A. Wearable devices: Concepts
and applications. INFOCOMP Journal of Computer Science, 18(2), 2019.
Rodríguez, D. Z., Rosa, R. L., and Bressan, G. A proposed video complexity measurement method
to be used in cluster computing. In Proc. IEEE Global High Tech Congress Electronics, pages 76–77. IEEE, 2013.
Rosa, R. L., De Silva, M. J., Silva, D. H., Ayub, M. S., Carrillo, D., Nardelli, P. H., and Rodriguez, D. Z. Event detection system based on user behavior changes in online social networks: Case of the covid-19 pandemic. Ieee Access, 8:158806–
, 2020.
Rosa, R. L., Rodriguez, D. Z., and Bressan, G. Sentimeter-br: Facebook and twitter analysis tool
to discover consumersâ sentiment. AICT 2013, page 72, 2013.
Rosa, R. L., Schwartz, G. M., Ruggiero, W. V., and Rodríguez, D. Z. A knowledge-based recommendation
system that includes sentiment analysis and deep learning. IEEE Trans. Industrial Informatics,
(4):2124–2135, 2018.
Sadh, R. and Kumar, R. Clustering of quantitative survey data based on marking patterns. INFOCOMP
Journal of Computer Science, 19(2):109–119, 2020.
Sharma, U. and Manchanda, N. Predicting and improving entrepreneurial competency in university
students using machine learning algorithms. In Proc. 10th Int. Conf. Cloud Computing, Data
Science & Engineering (Confluence), pages 305–309. IEEE, 2020.
Silva, D. H., Rosa, R. L., and Rodriguez, D. Z.Sentimental analysis of soccer games messages
from social networks using userâs profiles. INFOCOMP Journal of Computer Science, 19(1), 2020.
Teodoro, A. A., Gomes, O. S., Saadi, M., Silva, B. A., Rosa, R. L., and Rodríguez, D. Z. An fpgabased performance evaluation of artificial neural network architecture algorithm for iot. Wireless Personal Communications, pages 1–32, 2021.
Teodoro, A. A., Silva, D. H., Rosa, R. L., Saadi, M., Wuttisittikulkij, L., Mumtaz, R. A., and Rodriguez,
D. Z. A skin cancer classification approach using gan and roi-based attention mechanism.
Journal of Signal Processing Systems, 95(2- 3):211–224, 2023.
Velleman, P. F. and Wilkinson, L. Nominal, ordinal, interval, and ratio typologies are misleading.
The American Statistician, 47(1):65–72, 1993.
Vichi, M., Cavicchia, C., and Groenen, P. J. Hierarchical means clustering. Journal of Classification,
pages 1–25, 2022.
Wang, H., Wang, W., Yang, J., and Yu, P. S. Clustering by pattern similarity in large data sets. In
Proc. ACM SIGMOD Int. Conf. Management of data, pages 394–405, 2002.
Zhang, Y. and Cheung, Y.-m. Learnable weighting of intra-attribute distances for categorical data
clustering with nominal and ordinal attributes. IEEE Trans. Pattern