TY - GEN
T1 - Watching a Language Model Learning Chess
AU - Stöckl, Andreas
N1 - Publisher Copyright:
© 2021 Incoma Ltd. All rights reserved.
PY - 2021/9/1
Y1 - 2021/9/1
N2 - We analyse how a transformer-based language model learns the rules of chess from text data of recorded games. We show how it is possible to investigate how the model capacity and the available number of training data influence the learning success of a language model with the help of chess-specific metrics. With these metrics, we show that more games used for training in the studied range offers significantly better results for the same training time. However, model size does not show such a clear influence. It is also interesting to observe that the usual evaluation metrics for language models, predictive accuracy and perplexity, give no indication of this here. Further examination of trained models reveals how they store information about board state in the activations of neuron groups, and how the overall sequence of previous moves influences the newly-generated moves.
AB - We analyse how a transformer-based language model learns the rules of chess from text data of recorded games. We show how it is possible to investigate how the model capacity and the available number of training data influence the learning success of a language model with the help of chess-specific metrics. With these metrics, we show that more games used for training in the studied range offers significantly better results for the same training time. However, model size does not show such a clear influence. It is also interesting to observe that the usual evaluation metrics for language models, predictive accuracy and perplexity, give no indication of this here. Further examination of trained models reveals how they store information about board state in the activations of neuron groups, and how the overall sequence of previous moves influences the newly-generated moves.
UR - http://www.scopus.com/inward/record.url?scp=85123619379&partnerID=8YFLogxK
U2 - 10.26615/978-954-452-072-4_153
DO - 10.26615/978-954-452-072-4_153
M3 - Conference contribution
T3 - International Conference Recent Advances in Natural Language Processing, RANLP
SP - 1369
EP - 1379
BT - International Conference Recent Advances in Natural Language Processing, RANLP 2021
A2 - Angelova, Galia
A2 - Kunilovskaya, Maria
A2 - Mitkov, Ruslan
A2 - Nikolova-Koleva, Ivelina
PB - INCOMA Ltd.
CY - Held Online
ER -