A Comparative Analysis of the Use of ‘Thereof’ in an English Non-translated Text and the English Machine-and Human-translated Versions of the Hungarian Criminal Code 1

Owing to the recent rise of neural language translation, a paradigm shift has been witnessed regarding the role of translators and reviewers. As neural machine translation is increasingly more capable of modelling how natural languages work, the traditional tasks of translators are being gradually replaced by new challenges. More emphasis is placed on pre-and post-editing (revision) skills and competences, presumably enabling the production of higher quality and near human-made translations. In my paper, I attempt to demonstrate through the qualitative and quantitative comparison of machine-translated legal texts (acts) to human-translated ones the relevant challenges and dynamic contrasts arising in the process of translating. Through the qualitative and quantitative analysis of the original Hungarian (source language) Criminal Code and its English (target language) machine and human translations, I aim to highlight the peculiar challenges emerging in the process of translation. I also aim to demonstrate what patterns can be observed in translations produced by human and non-human translators.


Introduction
Due to its specific nature, legal language and legal translation represent a distinct branch within the field of translation studies.The explosion in machine translation in the last few years is posing new challenges to translation studies and to translators in practice.If a constantly increasing number of texts and their translations is becoming easily accessible in electronic form, what new challenges and expectations does this pose for translators?If access is becoming easier to a steadily increasing number of texts and their translations in electronic form, what new challenges and expectations does this pose for translators?Parallel to this, is the concept of a quality standard for translators also evolving?How can we define the quality standard for both machine and human translation?How can we define the quality standard for both machine and human translation?

Legal Translation
One of the most important characteristics of legal translation is to preserve the source language content as accurately as possible and to make it interpretable as clearly as possible, while making its meaning as clear as possible in the target language.Therefore, in the conflict often arising in the process of translation between faithfulness to source language content and linguistic and cultural appropriateness of the target language text, the former is as important, if not more important, than the latter factor.In this process, translators have to take into account a number of linguistic, cultural, pragmatic and legal specificities affecting the translatability and interpretability of target and source language texts.The genre specificities of English-Hungarian and Hungarian-English legal translation (Balogh, 2020) and relevant strategies adopted in legal translation (Kovács, 2020b;Kovács, 2020c) are discussed by several authors.Nevertheless, in this paper, the author will focus on one particular aspect of legal translation, i.e., the use of complex pronoun structures in English source language texts and in the machine and human translated texts of Hungarian source language texts.Since the referential function of language elements used in the legal register is crucial, it is worth examining the use and translation of pronoun structures in legal texts, as they have a significant impact on the interpretability and the actual meaning of translated texts.
The present study attempts to compare and analyse the frequency and use of complex pronoun structures in English source-language normative (prescriptive) texts and their (raw) machine and human translations.The study seeks to answer the question of which of the analysed texts correspond more to the language use trends observed in the original English text used as a reference text.In other words, is it the human-or the machine-translated language text that "gravitates" (Halverson 2003(Halverson , 2010a(Halverson , 2017) ) more towards the original, Englishlanguage text?

Machine translation, post-editing, corpus linguistics
Machine translation is evolving dynamically thanks to the proliferation of applications based on neural network models (neural-machine translation).Since neural machine translation takes far less time to translate the same source language text into the target language, and the quality of its output is improving day by day, its use is becoming increasingly widespread among translation service providers.The unprecedented pace at which neural machine translation is improving and modifying the translation setting has also led to the emergence of a new translation process, i.e., post-editing.The definition of post-editing according to the ISO 17100:2015 standard is "the editing and correction of a machine translation".However, the definition of the quality standards and reference texts to be considered in the process of postediting is still rather imprecise.Although there are quantifiable indicators (metrics), their mere application is not sufficient to accurately assess the quality of a given target language text.Commonly used approaches are BLEU (Papineni et al., 2002) and (H)TER (Snover et al., 2006).In fact, no generally accepted or standard principles exist in post-editing (DePalma, 2013;TAUS, 2016).Practical and theoretical experts involved in translation have developed several possible quality levels for defining post-editing.These distinguish between partial and full post-editing (on the basis of Hu & Cadwell, 2016).However, this two-fold division regarding the depth of post-translation editing aimed at linguistic, textual, and cultural aspects does not seem to be sufficient for a precise qualitative definition of post-editing.
Thanks to the unprecedented progress in neural machine translation, the quality of translated texts is increasing.Nevertheless, the quality of translations hinges on a myriad of factors, such as the type and register of the text, the target and source languages involved, and the machine translation software used (Google, Deep L, etc.).According to some theorists, neural machine-translated texts can even reach a level of near human translation quality (Lample et al., 2018).The question is whether they can ever exceed it.
Parallel to the rapid expansion and development of machine translation, translation competence is increasingly being transformed into post-editing competence (Pym, 2013).However, a precise and standardised definition of what is expected of post-editing is still to be defined.One possible way of carrying out post-editing, either 'partial' or 'full', could be by means of reference texts that can be considered as expected standards in translation.But the question arises whether a text written in the source language and in the relevant register can function as an actual reference or standard text for a translated text.In an attempt to offer some insight into this question, the use of certain linguistic elements in original and translated target language will be analysed in this paper.
Corpus linguistics is playing an increasingly important role in research, thanks to the proliferation of computer applications in translation methodology since the 1990s.As corpus linguistic tools allow quantitative research to be carried out on a vast range of electronically accessible texts, they are justifiably popular among researchers.At the same time, several theorists point to the dangers inherent in the use of corpus linguistic tools.Solely quantitative and item-based research can make the interpretation of results one-sided (Heltai, 2014).For this reason, the corpus linguistic (Sketch Engine) research used to collect and analyse the data presented in this study is complemented by the qualitative analysis and interpretation of the results.

Aim and methodology of the research
The aim of the present study is therefore to quantitatively investigate and qualitatively analyse particular linguistic elements collected from source language (non-translated English) texts and the machine-and human-translated English texts of source language Hungarian texts, using corpus linguistic tools.In particular, the analysis will seek possible answers to the question of what characteristics can be observed in the use of a specific linguistic element in the source (non-translated) and (human-or machine-translated) target language.In several previous studies, the author of this paper has investigated the interaction of source and target language texts in English-Hungarian and Hungarian-English legal translations in both human (Kovács, 2020b) and machine (Kovács, 2020a(Kovács, , 2020c) ) translations.As the use of one specific compound pronoun, i.e., 'thereof', in English-Hungarian legal translation often poses challenges both for translators and post-editors, its use will be subject to quantitative and qualitative analysis in the present paper.
A total of 4 monolingual ad-hoc corpora has been used for the analysis.The English source language (non-translated) corpus is the Penal Code of California (hereinafter: PCC_EN).The Hungarian source (non-translated) language corpus is the 2012.évi C. Törvény a Magyar Büntetőjogról (Act C of 2012 on the Hungarian Criminal Code) (hereinafter: CC_HU).The human English translation of the above Hungarian text subject to analysis is available on the Internet at the National Legislation Library (hereinafter: CC_HU_EN_HT), and the Deep L Pro (neural network based) machine translation of the text is available online (hereinafter: CC_HU_EN_MT).The Criminal Penal Codes selected for analysis are normative, prescriptive texts according to the classification of Šarčevič (1997) and other experts (Kjaer, 2007;Ződi, 2017).
Four corpora have been created using the above four texts.The general characteristics of the four corpora are summarised in the first table.As can be inferred from data in Table 1, the Penal Code of California (391,257 words) is much more extensive than the Hungarian Criminal Code (53,130 words).As the present analysis does not cover the similarities and differences in the structure and content of the two criminal codes, this difference is not addressed in the paper.However, it could be the subject of future investigation.The present study focuses exclusively on the original English (nontranslated) use of compound pronouns and the use in English (human-and machine-made) translated versions of the Hungarian source text.
By comparing the Hungarian source language text and its two (human-and machinemade) translations, it can be seen that both translated texts (human translation: 66,326 words, machine translation: 73,872 words) are longer than the original Hungarian text (53,130 words).Of the three texts, the machine-made Hungarian translation is the most extensive.Taking a look at the lexical density of the texts, i.e., the average number of words used in a sentence, it is evident that the Hungarian Criminal Code is the least dense (34.7) and, interestingly, the two translations exceed the source language text in lexical density.On the other hand, the two (human-and machine-translated) English target language texts are closer in terms of lexical density to the density observed in the Penal Code of California.

The use of compound pronoun structures in the corpora studied
In this section, the results of the analysis of the use of compound pronouns will be discussed.According to Pavlíčková (2008), the elements beginning with there-and here-are legal deictic linguistic devices that refer to a given speech situation or to a situation, element or relation outside of it.Indeed, they are used instead of the indicative pronouns 'this' and 'that', associated only with the prepositions required in legal language use.The most frequently used prepositions attached to there and here are -after, -by, -upon, -in, -on, and -to.In these compound pronoun structures, which are most commonly used in legal language, the first element is a locative, 'there' or 'here', to which a preposition is attached functioning as a post-positioned element (ending).The main difference between structures starting with 'there' and 'here' is that the element replacing 'that' refers to something beyond the context of the text, while this refers to something within it.
As claimed by another interpretation, the prefix 'there' refers to another document or element beyond the given text, while the elements starting with 'here' refer to the given document (Bázlik et al., 2010).According to Pavlíčková (2008), these linguistic devices are adverbs, which can be used as both conjunctions and adjectives.They denote the relation between the participants in a legal communication situation and the relation to the text and the texts referred to within the text.They may refer to one or more parties, times, places or even parts of them.They can also refer to certain things, such as statements, sentences or ideas.Since their function is to substitute and refer to specific linguistic elements, the author of this paper also prefers to refer to them as compound pronoun structures (Balogh, 2020).
The structures formed by attaching prepositions to the words 'here' and 'there' are specific examples of legal language use.Therefore, examining their use in the English nontranslated source-language legal text and their English (human and machine) translations of the Hungarian source-language legal text could highlight relevant tendencies.Table 2 illustrates the most frequently occurring compound pronoun structures and their frequencies in the three English language corpora: PCC_EN and its human CC_HU_EN_HT and machine CC_HU_EN_MT translations.The Hungarian source language text is excluded from the comparison.The table above shows the frequency of compound pronoun structures.In two texts, 'thereof' is the most frequently used structure.It is most frequently found in the English (nontranslated) source text (296 instances), followed by the human translation (37 instances) and then the machine translation (20 instances) of the Hungarian source language text.It is interesting to note that 'hereto' occurs only once in the English source language reference text, compared to seven times in the human translation, and none in the machine-translated text.Taking a look at the overall number of different compound pronoun structures in each text, it is shown that the English (non-translated) source-language text manifests the greatest variety with 19 elements in total.However, in terms of variety, the two translated texts are identical, though the use of the elements differs.In both texts, 'thereof', 'therein', 'thereby', 'therefrom', 'therewith', 'hereinafter' are used, but 'hereof' appears only in the human translation and 'thereto' only in the machine translation.
Following this, the position of the phrase 'thereof' in respective sentences has been analysed, using the N-grams application of Sketchengine.The results of this analysis are summarised in Table 3 below.Table 3 three includes the ten most common two-to-six-word combinations in which 'thereof' is used.It is clear from the table that 'thereof' appears most often in conjunction with such nouns as 'part' and 'conviction' and their respective modifications, as well as with the auxiliary 'shall'.
In Table 4 below, word combinations including the highest numbers of elements, i.e., six, are summarised, together with their frequency.It can be seen that the six-element N-grams including 'thereof' contain 'conviction' and 'shall' and their supplements.
In Table 5 below, N-grams occurring in the human translation of the Hungarian Criminal Code (CC_HU_EN_HT) are listed.The table above shows that the corpus under study displays lower variance in word combinations.The two most frequent combinations ('facilitating thereof', and 'or facilitating thereof') occur six times, although the tenth most frequent combination ("knowledge thereof") also occurs four times.Interestingly, 'thereof' appears with or without the combinations 'facilitating' and 'gaining knowledge' and their supplements.Table 6 contains the six-element word combinations in the human translation of the Hungarian Criminal Code.The table above shows that the most frequently used six-element combinations include 'impacts', 'becoming aware' and 'increasing'.
Following this, an analysis of N-grams in the machine translation of the Hungarian Criminal Code has been implemented.The results of the analysis are summarised in Table 7.Based on the data in the table above, it can be concluded that 'part' and 'shall' are the most frequent elements in word combinations with 'thereof'.Therefore, the syntactic embedding of the lexical items in the machine-translated text displays more similarity to the original non-translated English text (the Penal Code of California).In addition, the word 'respect' also appears.The six-element structures are summarised in Table 8 below.

Table 8: Six-element N-grams in the machine translation of the Hungarian Criminal Code (CC_HU_EN_MT)
the remaining part thereof shall be 4 part thereof shall be converted into 4 or the remaining part thereof shall 4 thereof shall be converted into imprisonment 4 remaining part thereof shall be converted 3 service or the remaining part thereof 3 a third person in respect thereof 3 unlawful advantage or a promise thereof 3 Table 8 shows that the linguistic units 'part' and 'shall' appear the most frequently in word combinations with 'thereof', together with 'respect' and 'promise'.
In summary, the above table shows that of the two analysed texts, the frequency with which 'thereof' is used converges more in the human English translations with the original (non-translated) English Penal Code than in the machine translation.On the other hand, examining the syntactic embeddedness of the words and their frequency in the most frequent word combinations, it is clear that the machine translation converges more with the original (non-translated) English text.

Qualitative analysis of complex pronoun structures used in translations
The quantitative analysis was followed by a qualitative analysis of the human and machine translations of the Hungarian Criminal Code.The purpose of the analysis was to compare the use of 'thereof' in extracts of the two translations.Based on the data from the above analysis, 'thereof' appeared 37 times in the human translation and 20 times in the machine translation.What these elements have in common is that 'thereof' appeared in word combinations with 'part'.It is clear that in the Hungarian text, 'remaining part thereof' appears as the translation of annak hátralévő részét ("the remaining part thereof') of the Hungarian text in both translations.However, in only four cases was there an overlap in the use of 'thereof' in the human-and machine-translated texts.Examining the fragments of the Hungarian source text which contain translation of 'thereof', it can be seen that in most cases it appears as the translation of the conjugated forms of the demonstrative pronouns 'ez' ('this) or 'az' ('that') ('ezek', 'azok', 'ennek', 'annak', 'ezt', 'azt', etc.)  [...] the public executive did not take the measures deemed necessary and justified within his power to prevent the criminal offense, or did not report the criminal offense promptly after gaining knowledge thereof.[...] failed to take all necessary and reasonable measures within his or her power to prevent the commission of the offence or to report the offence promptly after becoming aware that it had been committed.
It can be observed that in the human-made translations 'thereof' mostly appears as the English counterpart of Hungarian conjugated demonstrative pronouns, e.g.'ez', 'az' (Example 2 and 3) or as reference to a whole clause (Example 4).However, another tendency can also be

Conclusion
In terms of the frequency of the use of 'thereof', the human translation text corresponds more closely to the frequency patterns observed in the original English source language text.Nevertheless, its syntactic positioning in word combinations and sentences in the machinetranslated English text shows a more striking similarity to the original (non-translated) English text.This may confirm the gravitational effect observed in translated text by Halverson (2003Halverson ( , 2010Halverson ( , 2017)), whereby lexical or grammatical items specific to a given register are more likely to be chosen by translators, and thus their use may be over-represented in translations, or even redundant.In the human-made translation, the human translator tends to use 'thereof' as a characteristic element of legal language use, although it is not used in the same way as in the original English language text.The use of 'thereof' in the machine-translated text follows more closely its syntactic positioning in the original English text than in the human-made translation.Nevertheless, in the human translation the frequency of using 'thereof', is more in line with the frequency patterns observed in the original English text.Further data and analysis are needed to confirm this claim.However, when defining a quality standard for post-editing legal texts, it is crucial to take into account that the actual use of some elements characteristic of the legal register may be over-represented or even redundant in human-made translations.Therefore, the machine-translated versions of such source language texts may correspond better to actual (nontranslated) source language use.

Table 7 :
N-grams and their frequency in the English machine translation of the Hungarian Criminal Code (CC_HU_EN_MT)

Example 1 :
The use of 'thereof' in the human and machine translation extracts of the Hungarian

Table 1 :
General characteristics of the corpora

Table 2 :
The use of compound pronoun structures in the three English language corpora

Table 3 :
N-grams in the Penal Code of California (PCC_EN) and their frequency

Table 4 :
Six-element N-grams in the Penal Code of California (PCC_EN)

Table 5 :
N-grams and their frequency in the human English translation of the Hungarian Criminal Code (CC_HU_EN_HT)

Table 6 :
Six-element N-grams in the human translation of the Hungarian Criminal Code(CC_HU_EN_HT) (20 elements in total).This is illustrated in 'Thereof' also appears in the English human translation of the target language as the translation of the possessive conjugated form of the noun ('melléklet', 'Annex') in the Hungarian source text (in six cases).This is illustrated in Example 3:Example 3: The use of 'thereof' in the human and machine translations of Hungarian possessive conjugations of nouns ('melléklet') Interestingly, 'thereof' also appears in the human translation as the counterpart of a Hungarian source language subordinate clause (four instances in total), see Example 4:Example 4: The use of 'thereof' in the human and machine translations of Hungarian clauses