The importance of a quality dataset for automated tasks

No items found.

Automation of banking and non-banking processes consists, among other things, of automating the sub-tasks that make up most of these complex processes. And some of these tasks can be solved using AI-enabled cognitive technologies.

You might also enjoy

We will now try to focus on the automation of tasks using trained NLP models, where the quality of the input dataset for their creation plays a key role. In this case, the most common tasks are categorization and extraction tasks, which differ from each other in their function. For example, some help to sort emails according to the specified categories, while others assign sentiment to emails.

Others can determine the topics of individual messages within ongoing conversations. Interesting examples include tasks that help to find terms in a selected message that are significant in their content (called entities) or to identify word types and their meaning in a sentence.

The importance of a well-made dataset

Each task has its own learning process. As a rule, a training dataset with "correct answers" is required as input. Such a dataset is crucial for the quality of the resulting task - it must be correctly labelled, balanced in terms of the representation of the different categories, and contain a sufficient amount of labelled data. Obtaining, labeling, or otherwise creating this dataset for a particular task is one of the most challenging parts of the HR learning process.

For this reason, it is necessary to reflect current trends in natural language processing (NLP). These are currently based on large language models trained for a specific language - in our case, Czech. A good language model itself carries a strong knowledge of the language (Czech), and forms a robust basis for creating specific automated tasks. In addition, it significantly reduces the requirements on the size of the input training datasets, and tasks created on top of it achieve higher overall success rates.

The ActiveLearning domain, or the process of training models with minimal input dataset size requirements, cannot be overlooked, which can also achieve comparable model success rates. It offers a significant reduction in human resource requirements.

In practice, we then create separate models for each task. The input datasets essentially characterise the behaviour of the resulting task and therefore cannot be automatically taken from general sources or from other entities. Datasets always differ in content, topics, language used and other factors. Therefore, models for individual tasks are usually published in Trask as services with the described interface.

Written by

No items found.

Read more insights

More Confidence in Enterprise Data: Trask Earns Analytics on Microsoft Azure Specialization

Trask has earned the Analytics on Microsoft Azure Specialization, confirming validated expertise in building and modernizing enterprise data platforms on Microsoft Azure, including Microsoft Fabric and Databricks.

Trask Vela AI Software Factory: Turning AI Into Delivery Infrastructure

The real AI productivity gain is bigger than faster coding. It starts with redesigning how software moves from intent to release. We call this Trask Vela, our AI Software Factory: an automation production line, where AI Agents cooperate with humans...

From Capacity to Outcomes: How AI Is Rewriting IT Delivery Economics

AI is starting to change the economics of IT delivery. Not because it writes code faster, but because it challenges the model of buying people, roles and capacity. For CIOs and sourcing leaders, one question is becoming hard to avoid: are we still buying effort, or are we ready to buy outcomes?

Back to all insights

We’re hiring.

Join us at