Article Open Access

Contextual Relevance-Driven Question Answering Generation: Experimental Insights Using Transformer-Based Models

Tri Lathif Mardi Suryanto, Aji Prasetya Wibawa, Hariyono Hariyono, Hechmi Shili

Abstract


This study investigates the impact of contextual relevance and hyperparameter tuning on the performance of Transformer-based models in Question-Answer Generation (QAG). Utilising the FlanT5 model, experiments were conducted on a domain-specific dataset to assess how variations in learning rate and training epochs affect model accuracy and generalisation. Six QAG models were developed (QAG-A to QAG-F), each evaluated using ROUGE metrics to measure the quality of generated question-answer pairs. Results show that QAG-F and QAG-D achieved the highest performance, with QAG-F reaching a ROUGE-LSum of 0.4985. The findings highlight that careful tuning of learning rates and training duration significantly improves model performance, enabling more accurate and contextually appropriate question generation. Furthermore, the ability to generate both questions and answers from a single input enhances the interactivity and utility of NLP systems, particularly in knowledge-intensive domains. This study underscores the importance of contextual modelling and hyperparameter optimisation in generative NLP tasks, offering practical insights for improving chatbot development, educational tools, and digital heritage applications.


Keywords


Question Generation, Transformer Models, Hyperparameter Optimisation, Contextual Relevance, Cultural Heritage

References


T. B. Brown et al., “Language Models are Few-Shot Learners,” Adv. Neural Inf. Process. Syst., vol. 2020-December, May 2020, Accessed: Dec. 19, 2024. [Online]. Available: https://arxiv.org/abs/2005.14165v4

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 4171–4186. doi: 10.18653/v1/N19-1423.

M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 7871–7880, Oct. 2019, doi: 10.18653/v1/2020.acl-main.703.

C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, pp. 1–67, 2020.

Y. Liu and M. Lapata, “Text Summarization with Pretrained Encoders,” EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 3730–3740, Aug. 2019, doi: 10.18653/v1/d19-1387.

A. See, P. J. Liu, and C. D. Manning, “Get To The Point: Summarization with Pointer-Generator Networks,” ACL 2017 - 55th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 1, pp. 1073–1083, Apr. 2017, doi: 10.18653/v1/P17-1099.

M. M. Henry, G. N. Elwirehardja, and B. Pardamean, “Automatic question generation for bahasa indonesia examination using copynet,” Procedia Comput. Sci., vol. 245, no. C, pp. 953–962, 2024, doi: 10.1016/j.procs.2024.10.323.

G. Kurdi, J. Leo, B. Parsia, U. Sattler, and S. Al-Emari, “A systematic review of automatic question generation for educational purposes,” Int. J. Artif. Intell. Educ., vol. 30, no. 1, pp. 121–204, Mar. 2020, doi: 10.1007/s40593-019-00186-y.

N. Mulla and P. Gharpure, “Genetic Algorithm Optimized Topic-aware Transformer-Based Framework for Conversational Question Generation,” Procedia Comput. Sci., vol. 230, no. 2023, pp. 914–922, 2023, doi: 10.1016/j.procs.2023.12.041.

M. Zhang and X. Shang, “Chinese Short Text Classification by ERNIE Based on LTC_Block,” Wirel. Commun. Mob. Comput., vol. 2022, 2022, doi: 10.1155/2022/1411744.

V. Kumar, G. Ramakrishnan, and Y. F. Li, “Putting the horse before the cart: A generator-evaluator framework for question generation from text,” CoNLL 2019 - 23rd Conf. Comput. Nat. Lang. Learn. Proc. Conf., pp. 812–821, 2019, doi: 10.18653/v1/k19-1076.

J. Ling and M. Afzaal, “Automatic question-answer pairs generation using pre-trained large language models in higher education,” Comput. Educ. Artif. Intell., vol. 6, no. April, p. 100252, 2024, doi: 10.1016/j.caeai.2024.100252.

L. Murakhovs’ka, C. S. Wu, P. Laban, T. Niu, W. Liu, and C. Xiong, “MixQG: Neural Question Generation with Mixed Answer Types,” Find. Assoc. Comput. Linguist. NAACL 2022 - Find., pp. 1486–1497, 2022, doi: 10.18653/V1/2022.FINDINGS-NAACL.111.

C. Patil and M. Patwardhan, “Visual question generation: the state of the art,” ACM Comput. Surv., vol. 53, no. 3, May 2020, doi: 10.1145/3383465.

S. Shen et al., “On the generation of medical question-answer pairs,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., pp. 8822–8829, 2020, doi: 10.1609/AAAI.V34I05.6410.

H. C. Wang, M. Maslim, and C. H. Kan, “A question–answer generation system for an asynchronous distance learning platform,” Educ. Inf. Technol., vol. 28, no. 9, pp. 12059–12088, Sep. 2023, doi: 10.1007/S10639-023-11675-Y.

Q. Zaman, S. Safwandi, and F. Fajriana, “Supporting Application Fast Learning of Kitab Kuning for Santri’ Ula Using Natural Language Processing Methods,” Int. J. Eng. Sci. Inf. Technol., vol. 5, no. 1, pp. 278–289, Jan. 2025, doi: 10.52088/ijesty.v5i1.713.

M. N. Dorabati, R. Ramezani, and M. A. Nematbakhsh, “Research of LSTM Additions on Top of SQuAD BERT Hidden Transform Layers,” 2022 12th Int. Conf. Comput. Knowl. Eng. ICCKE 2022, pp. 415–422, 2022, doi: 10.1109/ICCKE57176.2022.9960031.

W. Wang et al., “Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 2591–2600, Mar. 2022, doi: 10.18653/v1/2022.acl-long.185.

L. S. Hartono, E. I. Setiawan, and V. Singh, “Retrieval Augmented Generation-Based Chatbot for Prospective and Current University Students,” Int. J. Eng. Sci. Inf. Technol., vol. 5, no. 3, pp. 268–277, Jun. 2025, doi: 10.52088/ijesty.v5i3.951.

A. Ushio, F. Alva-Manchego, and J. Camacho-Collados, “An Empirical Comparison of LM-based Question and Answer Generation Methods,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 14262–14272, May 2023, doi: 10.18653/v1/2023.findings-acl.899.

P. M. Patil, R. P. Bhavsar, and B. V. Pawar, “A Review on Natural Language Processing based Automatic Question Generation,” 2022 Int. Conf. Augment. Intell. Sustain. Syst., 2022, doi: 10.1109/ICAISS55157.2022.10010799.

S. Indurthi, D. Raghu, M. M. Khapra, and S. Joshi, “Generating natural language question-answer pairs from a knowledge graph using a RNN based question generation model,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 1, pp. 376–385, 2017, doi: 10.18653/V1/E17-1036.

S. Rao and H. Daumé, “Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information,” ACL 2018 - 56th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 1, pp. 2737–2746, 2018, doi: 10.18653/v1/p18-1255.

B. Das, M. Majumder, S. Phadikar, and A. A. Sekh, “Automatic question generation and answer assessment: a survey,” Res. Pr. Technol. Enhanc. Learn., vol. 16, no. 1, Dec. 2021, doi: 10.1186/s41039-021-00151-1.

M. Bahani, A. El Ouaazizi, and K. Maalmi, “The effectiveness of T5, GPT-2, and BERT on text-to-image generation task,” Pattern Recognit. Lett., vol. 173, pp. 57–63, Sep. 2023, doi: 10.1016/j.patrec.2023.08.001.

M. Fuadi, A. D. Wibawa, and S. Sumpeno, “idT5 : Indonesian Version of Multilingual T5 Transformer,” 2023, [Online]. Available: https://doi.org/10.48550/arXiv.2302.00856

B. Guan, X. Zhu, and S. Yuan, “A T5-based interpretable reading comprehension model with more accurate evidence training,” Inf. Process. Manag., vol. 61, no. 2, p. 103584, Mar. 2024, doi: 10.1016/j.ipm.2023.103584.

T. Ji, C. Lyu, G. Jones, L. Zhou, and Y. Graham, “QAScore—an unsupervised unreferenced metric for the question generation evaluation,” Entropy, vol. 24, no. 11, p. 1514, Nov. 2022, doi: 10.3390/e24111514.

S. Snekha and N. Ayyanathan, “An Educational CRM Chatbot for Learning Management System,” Shanlax Int. J. Educ., vol. 11, no. 4, pp. 58–62, Sep. 2023, doi: 10.34293/education.v11i4.6360.

J. C. Sánchez-Prieto, V. Izquierdo-álvarez, M. T. Del Moral-Marcos, and F. Martínez-Abad, “Generative artificial intelligence for self-learning in higher education: Design and validation of an example machine,” RIED-Revista Iberoam. Educ. a Distancia, vol. 28, no. 1, pp. 59–81, Jan. 2025, doi: 10.5944/RIED.28.1.41548.

W. Zhong et al., “ProQA: Structural Prompt-based Pre-training for Unified Question Answering,” NAACL 2022 - 2022 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Conf., pp. 4230–4243, 2022, doi: 10.18653/v1/2022.naacl-main.313.

R. Rodriguez-Torrealba, E. Garcia-Lopez, and A. Garcia-Cabot, “End-to-End generation of Multiple-Choice questions using Text-to-Text transfer Transformer models,” Expert Syst. Appl., vol. 208, p. 118258, Dec. 2022, doi: 10.1016/J.ESWA.2022.118258.

H. Yen, T. Gao, J. Lee, and D. Chen, “MoQA: Benchmarking Multi-Type Open-Domain Question Answering,” pp. 8–29, 2023, Accessed: Mar. 27, 2025. [Online]. Available: https://github.com/princeton-nlp/MoQA

B. Weng, “Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies,” Apr. 2024, Accessed: Dec. 19, 2024. [Online]. Available: https://arxiv.org/abs/2404.09022v1

Z. Zhuang, M. Liu, A. Cutkosky, and F. Orabona, “Understanding AdamW through Proximal Methods and Scale-Freeness,” no. 2019, 2022, [Online]. Available: http://arxiv.org/abs/2202.00089

K. Lv, H. Yan, Q. Guo, H. Lv, and X. Qiu, “AdaLomo: Low-memory Optimization with Adaptive Learning Rate,” Oct. 2023, Accessed: Dec. 13, 2024. [Online]. Available: https://arxiv.org/abs/2310.10195v3

L. Xue et al., “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 483–498. doi: 10.18653/v1/2021.naacl-main.41.

A. Mohammadshahi et al., “RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 6845–6867, 2023, doi: 10.18653/v1/2023.findings-acl.428.

R. Rodriguez-Torrealba, E. Garcia-Lopez, and A. Garcia-Cabot, “End-to-End generation of Multiple-Choice questions using Text-to-Text transfer Transformer models,” Expert Syst. Appl., vol. 208, Dec. 2022, doi: 10.1016/j.eswa.2022.118258.

M. Barbella and G. Tortora, “Rouge Metric Evaluation for Text Summarization Techniques,” SSRN Electron. J., May 2022, doi: 10.2139/SSRN.4120317.

M. Mieskes and U. Padó, “Summarization Evaluation meets Short-Answer Grading,” Proc. 8th Work. NLP Comput. Assist. Lang. Learn., no. Nlp4call, pp. 79–85, 2019, [Online]. Available: https://www.aclweb.org/anthology/W19-6308

S. Kumar and A. Solanki, “ROUGE-SS: A New ROUGE Variant for Evaluation of Text Summarization,” Authorea Prepr., Jul. 2023, doi: 10.22541/AU.168984209.92955863/V1.

H. S. Ali, L. M. Kefali, S. Parida, and S. R. Dash, “Amharic ATS - A Comparison Between Graph Based and Statistical Based Approach using Rouge Metric and Human Evaluation,” 2022 OPJU Int. Technol. Conf. Emerg. Technol. Sustain. Dev. OTCON 2022, 2023, doi: 10.1109/OTCON56053.2023.10114029.

T. L. M. Suryanto, A. P. Wibawa, H. Hariyono, and A. Nafalski, “Comparative Performance of Transformer Models for Cultural Heritage in NLP Tasks,” Adv. Sustain. Sci. Eng. Technol., vol. 7, no. 1, p. 0250115, Jan. 2025, doi: 10.26877/asset.v7i1.1211.




DOI: https://doi.org/10.52088/ijesty.v5i4.989

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Tri Lathif Mardi Suryanto, Aji Prasetya Wibawa, Hariyono Hariyono, Hechmi Shili

International Journal of Engineering, Science, and Information Technology (IJESTY) eISSN 2775-2674