Categoria:
Seminário
Onde:
Presencial
Local:
Sala de Seminários do DI e Google Meet
Descrição:
Predicting the diversity of words and multi-words (n-grams) in a text corpus and their frequency distributions is important in NLP and language modelling, and is becoming critical to enable the design of modern applications, namely Large Language Models, e.g. for guiding tokenization and corpus analysis for pre-training. This requires the ability to model the very large scale corpora behaviour, the handling of multiwords as subwords or phrases, and the distribution of n-grams across different frequency ranges, namely the low occurrence n-grams.
Ligação:
https://meet.google.com/fup-ddqu-iox