Leveraging Machine Learning for Software Redocumentation

Verena Geist, Michael Moser, Josef Pichler, Stefanie Beyer, Martin Pinzger

Research output: Chapter in Book/Report/Conference proceedingsConference contribution

11 Citations (Scopus)

Abstract

Source code comments contain key information about the underlying software system. Many redocumentation approaches, however, cannot exploit this valuable source of information. This is mainly due to the fact that not all comments have the same goals and target audience and can therefore only be used selectively for redocumentation. Performing a required classification manually, e.g. in the form of heuristic rules, is usually time-consuming and error-prone and strongly dependent on programming languages and guidelines of concrete software systems. By leveraging machine learning, it should be possible to classify comments and thus transfer valuable information from the source code into documentation with less effort but the same quality. We applied different machine learning techniques to a COBOL legacy system and compared the results with industry-strength heuristic classification. As a result, we found that machine learning outperforms the heuristics in number of errors and less effort.

Original languageEnglish
Title of host publicationSANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
EditorsKostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, Minghui Zhou
PublisherIEEE
Pages622-626
Number of pages5
ISBN (Electronic)9781728151434
DOIs
Publication statusPublished - Feb 2020
Event27th International Conference on Software Analysis, Evolution and Reengineering - London, Ontario, Canada
Duration: 19 Feb 202021 Feb 2020
http://saner2020.csd.uwo.ca/

Publication series

NameSANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering

Conference

Conference27th International Conference on Software Analysis, Evolution and Reengineering
Country/TerritoryCanada
CityLondon, Ontario
Period19.02.202021.02.2020
Internet address

Keywords

  • CNNs
  • NLP
  • comment classification pipeline
  • heuristic rules
  • legacy system
  • machine learning
  • software redocumentation

Fingerprint

Dive into the research topics of 'Leveraging Machine Learning for Software Redocumentation'. Together they form a unique fingerprint.

Cite this