arxiv:2308.01191

Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey

Published on Aug 2, 2023

Authors:

Abstract

Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile, large language models (LLMs) possess diverse code-related knowledge, making them versatile for various software engineering challenges. However, LLMs' performance in code clone detection is unclear and needs more study for accurate assessment. In this paper, we provide the first comprehensive evaluation of LLMs for clone detection, covering different clone types, languages, and prompts. We find advanced LLMs excel in detecting complex semantic clones, surpassing existing methods. Adding intermediate reasoning steps via chain-of-thought prompts noticeably enhances performance. Additionally, representing code as vector embeddings, especially with text encoders, effectively aids clone detection.Lastly, the ability of LLMs to detect code clones differs among various programming languages. Our study suggests that LLMs have potential for clone detection due to their language capabilities, offering insights for developing robust LLM-based methods to enhance software engineering.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2308.01191 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2308.01191 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2308.01191 in a Space README.md to link it from this page.