Using Webpage Comparison Method for Automated Web Application Testing with Reinforcement Learning
DOI:
https://doi.org/10.46604/ijeti.2024.14104Keywords:
automated testing, web crawler, reinforcement learning, webpage comparisonAbstract
Web application testing often uses crawlers to explore the application under test (AUT) and identify potential vulnerabilities. For dynamically generated pages, crawlers must provide test inputs for web forms. A previous tool combines a web crawler with a reinforcement learning agent, which uses code coverage to guide the crawler in filling web forms. This paper aims to improve the applicability of web application testing by using webpage comparison techniques instead of code coverage and source code access, thereby enhancing the handling of multiple web forms on a single page. Experimental results show that this approach explores more pages, reaches greater crawling depths, and achieves better code coverage than the original method. It also interacts more efficiently with multiple web forms and outperforms a random-action Monkey on new, untrained web applications. Therefore, this approach is promising for automated web application testing.
References
Siteefy, “How Many Websites Are There in the World?” https://siteefy.com/how-many-websites-are-there/, 2024.
S. Jordan, “Online Presence Management Tips for Small Businesses,” https://clutch.co/resources/online-presence-management, 2024.
A. Van Deursen, A. Mesbah, and A. Nederlof, “Crawl-Based Analysis of Web Applications: Prospects and Challenges,” Science of Computer Programming, vol. 97, part 1, pp. 173-180. 2015.
S. Bennetts, “Crawljax,” https://github.com/zaproxy/crawljax, 2024.
A. Mesbah, A. Van Deursen, and S. Lenselink, “Crawling AJAX-Based Web Applications Through Dynamic Analysis of User Interface State Changes,” ACM Transactions on the Web, vol. 6, no. 1, article no. 3, 2012.
L. Brader, H. F. Hilliker, and A. C. Wills, Testing for Continuous Delivery with Visual Studio 2012, Redmond, Washington: Microsoft, 2012.
C. H. Liu, S. D. You, and Y. C. Chiu, “A Reinforcement Learning Approach to Guide Web Crawler to Explore Web Applications for Improving Code Coverage,” Electronics, vol. 13, no. 2, article no. 427, 2024.
K. Anantheswaran, “Istanbul-Middleware,” https://github.com/gotwarlost/istanbul-middleware, 2024.
T. Gowda and C. A. Mattmann, “Clustering Web Pages Based on Structure and Style Similarity (Application Paper),” IEEE 17th International Conference on Information Reuse and Integration, pp. 175-180, 2016.
K. Griazev and S. Ramanauskaitė, “HTML Block Similarity Estimation,” IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering, pp. 1-4, 2018.
J. W. Lin, F. Wang, and P. Chu, “Using Semantic Similarity in Crawling-Based Web Application Testing,” IEEE International Conference on Software Testing, Verification and Validation, pp. 138-148, 2017.
S. Carino and J. H. Andrews, “Dynamically Testing GUIs Using Ant Colony Optimization,” 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 138-148, 2015.
J. Kim, M. Kwon, and S. Yoo, “Generating Test Input with Deep Reinforcement Learning,” Proceedings of the 11th International Workshop on Search-Based Software Testing, pp. 51-58, 2018.
C. H. Liu, W. K. Chen, and C. C. Sun, “GUIDE: An Interactive and Incremental Approach for Crawling Web Applications,” The Journal of Supercomputing, vol. 76, no. 3, pp. 1562-1584, 2020.
Y. Zheng, Y. Liu, X. Xie, Y. Liu, L. Ma, J. Hao, et al., “Automatic Web Testing Using Curiosity-Driven Reinforcement Learning,” IEEE/ACM 43rd International Conference on Software Engineering, pp. 423-435, 2021.
E. Z. Liu, K. Guu, P. Pasupat, T. Shi, and P. Liang, “Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration,” https://doi.org/10.48550/arXiv.1802.088022018.
S. Sherin, A. Muqeet, M. U. Khan, and M. Z. Iqbal, “QExplore: An Exploration Strategy for Dynamic Web Applications Using Guided Search,” Journal of Systems and Software, vol. 195, article no. 111512, 2023.
X. Wang and W. Tian, “An Efficient Method for Automatic Generation of Linearly Independent Paths in White-Box Testing,” International Journal of Engineering and Technology Innovation, vol. 5, no. 2, pp. 108-120, 2015.
Pavlo, “TimeOff.Management,” https://github.com/timeoff-management/timeoff-management-application, 2024.
S. H. Chou, “Using Agents to Automatically Choose Input Data for Web Crawler to Increase Code Coverage,” Master thesis, Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan, ROC, 2020.
W3C, “Document Object Model (DOM) Technical Reports,” https://www.w3.org/DOM/DOMTR, 2023.
E. Hamilton, “Keystone,” https://github.com/keystonejs/keystone, 2024.
M. E. Haase, “Page Compare,” https://github.com/TeamHG-Memex/page-compare, 2024.
The Python Software Foundation, “Difflib — Helpers for Computing Deltas,” https://docs.python.org/3/library/difflib.html, 2024.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature, vol. 518, pp. 529-533, 2015.
J. Janzen and Facebook Community Bot, “FastText,” https://github.com/facebookresearch/fastText, 2024.
B. S. Uşaklı, “NodeBB,” https://github.com/NodeBB/NodeBB, 2023.
Dušan, “Django Blog Demo,” https://github.com/reljicd/django-blog, 2023.
D. Syer, “Spring PetClinic Sample Application,” https://github.com/spring-projects/spring-petclinic, 2023.

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ci-Feng Lai, Chien-Hung Liu, Shingchern D. You

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright Notice
Submission of a manuscript implies: that the work described has not been published before that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication. Authors can retain copyright in their articles with no restrictions. Also, author can post the final, peer-reviewed manuscript version (postprint) to any repository or website.
Since Jan. 01, 2019, IJETI will publish new articles with Creative Commons Attribution Non-Commercial License, under Creative Commons Attribution Non-Commercial 4.0 International (CC BY-NC 4.0) License.
The Creative Commons Attribution Non-Commercial (CC-BY-NC) License permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.