Using Webpage Comparison Method for Automated Web Application Testing with Reinforcement Learning

Ci-Feng Lai; Chien-Hung Liu; Shingchern D. You

doi:10.46604/ijeti.2024.14104

Authors

Ci-Feng Lai Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan, ROC
Chien-Hung Liu Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan, ROC
Shingchern D. You Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan, ROC

DOI:

https://doi.org/10.46604/ijeti.2024.14104

Keywords:

automated testing, web crawler, reinforcement learning, webpage comparison

Abstract

Web application testing often uses crawlers to explore the application under test (AUT) and identify potential vulnerabilities. For dynamically generated pages, crawlers must provide test inputs for web forms. A previous tool combines a web crawler with a reinforcement learning agent, which uses code coverage to guide the crawler in filling web forms. This paper aims to improve the applicability of web application testing by using webpage comparison techniques instead of code coverage and source code access, thereby enhancing the handling of multiple web forms on a single page. Experimental results show that this approach explores more pages, reaches greater crawling depths, and achieves better code coverage than the original method. It also interacts more efficiently with multiple web forms and outperforms a random-action Monkey on new, untrained web applications. Therefore, this approach is promising for automated web application testing.

References

Siteefy, “How Many Websites Are There in the World?” https://siteefy.com/how-many-websites-are-there/, 2024.

S. Jordan, “Online Presence Management Tips for Small Businesses,” https://clutch.co/resources/online-presence-management, 2024.

A. Van Deursen, A. Mesbah, and A. Nederlof, “Crawl-Based Analysis of Web Applications: Prospects and Challenges,” Science of Computer Programming, vol. 97, part 1, pp. 173-180. 2015.

S. Bennetts, “Crawljax,” https://github.com/zaproxy/crawljax, 2024.

A. Mesbah, A. Van Deursen, and S. Lenselink, “Crawling AJAX-Based Web Applications Through Dynamic Analysis of User Interface State Changes,” ACM Transactions on the Web, vol. 6, no. 1, article no. 3, 2012.

L. Brader, H. F. Hilliker, and A. C. Wills, Testing for Continuous Delivery with Visual Studio 2012, Redmond, Washington: Microsoft, 2012.

C. H. Liu, S. D. You, and Y. C. Chiu, “A Reinforcement Learning Approach to Guide Web Crawler to Explore Web Applications for Improving Code Coverage,” Electronics, vol. 13, no. 2, article no. 427, 2024.

K. Anantheswaran, “Istanbul-Middleware,” https://github.com/gotwarlost/istanbul-middleware, 2024.

T. Gowda and C. A. Mattmann, “Clustering Web Pages Based on Structure and Style Similarity (Application Paper),” IEEE 17th International Conference on Information Reuse and Integration, pp. 175-180, 2016.

K. Griazev and S. Ramanauskaitė, “HTML Block Similarity Estimation,” IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering, pp. 1-4, 2018.

J. W. Lin, F. Wang, and P. Chu, “Using Semantic Similarity in Crawling-Based Web Application Testing,” IEEE International Conference on Software Testing, Verification and Validation, pp. 138-148, 2017.

S. Carino and J. H. Andrews, “Dynamically Testing GUIs Using Ant Colony Optimization,” 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 138-148, 2015.

J. Kim, M. Kwon, and S. Yoo, “Generating Test Input with Deep Reinforcement Learning,” Proceedings of the 11th International Workshop on Search-Based Software Testing, pp. 51-58, 2018.

C. H. Liu, W. K. Chen, and C. C. Sun, “GUIDE: An Interactive and Incremental Approach for Crawling Web Applications,” The Journal of Supercomputing, vol. 76, no. 3, pp. 1562-1584, 2020.

Y. Zheng, Y. Liu, X. Xie, Y. Liu, L. Ma, J. Hao, et al., “Automatic Web Testing Using Curiosity-Driven Reinforcement Learning,” IEEE/ACM 43rd International Conference on Software Engineering, pp. 423-435, 2021.

E. Z. Liu, K. Guu, P. Pasupat, T. Shi, and P. Liang, “Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration,” https://doi.org/10.48550/arXiv.1802.088022018.

S. Sherin, A. Muqeet, M. U. Khan, and M. Z. Iqbal, “QExplore: An Exploration Strategy for Dynamic Web Applications Using Guided Search,” Journal of Systems and Software, vol. 195, article no. 111512, 2023.

X. Wang and W. Tian, “An Efficient Method for Automatic Generation of Linearly Independent Paths in White-Box Testing,” International Journal of Engineering and Technology Innovation, vol. 5, no. 2, pp. 108-120, 2015.

Pavlo, “TimeOff.Management,” https://github.com/timeoff-management/timeoff-management-application, 2024.

S. H. Chou, “Using Agents to Automatically Choose Input Data for Web Crawler to Increase Code Coverage,” Master thesis, Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan, ROC, 2020.

W3C, “Document Object Model (DOM) Technical Reports,” https://www.w3.org/DOM/DOMTR, 2023.

E. Hamilton, “Keystone,” https://github.com/keystonejs/keystone, 2024.

M. E. Haase, “Page Compare,” https://github.com/TeamHG-Memex/page-compare, 2024.

The Python Software Foundation, “Difflib — Helpers for Computing Deltas,” https://docs.python.org/3/library/difflib.html, 2024.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature, vol. 518, pp. 529-533, 2015.

J. Janzen and Facebook Community Bot, “FastText,” https://github.com/facebookresearch/fastText, 2024.

B. S. Uşaklı, “NodeBB,” https://github.com/NodeBB/NodeBB, 2023.

Dušan, “Django Blog Demo,” https://github.com/reljicd/django-blog, 2023.

D. Syer, “Spring PetClinic Sample Application,” https://github.com/spring-projects/spring-petclinic, 2023.