El primo libro de la Iliade de Omero, I book. As can be seen, the majority of our translations can be located in a very important period for the Italian language as well as for Italy itself - that transitional phase that saw Italy become a united country and Italian a standard, national language to be spoken in its territory Bricchi Ciani and Ferrari have been acquired by OCR exclusively for the purposes of this study.
In some cases, basic operations of OCR errors correction have been applied to the texts. After a brief history of Italian translations of Homer, where I give a chronological account of the principal Italian translations of the Homeric poems between the XV and the XXI centuries, I develop the two main parts of my work. In Part I, I explain the working principles of the textual aligner. After a summary of the state of the art in textual alignment in section 1.
In section 2. Some examples of the behavior of the aligner on different kind of translations are given in section 2. Section 2. Part II is devoted to the analysis of Italian translations of Homer. Sections 3. To analyze Italian translations, I chose a set of Ancient Greek terms and a set of their Italian translations and I studied the similarity of those terms both in the Ancient Greek and Italian texts.
Section 4. To find their most diffused Italian counterparts I used a method of automatic extraction to which section 4. Chapter 5 shows the results of such analysis: section 5. Finally, sections 5. The textual aligner I built up to handle those texts is being used in a research project conducted by dr. Marianne Reboul of La Sorbonne, Paris, to create an extremely useful Java interface that would allow users to scroll through hundreds of French translations of the Odyssey a diachronic corpus that represents nearly every period of French Homeric translation , aligned both between them and to the Greek source, to visualize them in parallel and also to view word-by-word equivalences of those translations.
The interface should be released next year. I wrote my original programs in Python. Lager Although I am using an internal approach to my corpus, seeking to note internal differences between one text and the other, the History and the context in which our translations were inserted are essential to any conclusion. I will only take into consideration translations, without including the re-writings, re- formulations and reconstructions of the Homeric epos. The earliest translations of Homer of which we have sure notice are Latin translations. The first known translations of the Odyssey seem to predate the first translations of the Iliad, if it is true that a Latin Odyssey was written by Livius Andronicus in the III century BC, while earliest Latin versions of the Iliad are generally collocated in the first half of the I century BC.
Anyway, it is in the time of the first Latin translations of the Iliad, in the I century BC, that roman poetry rapidly grew from a somehow archaic and local stylistic dimension to the level of international poetry, acknowledged successor of Greek tradition. Polybius' transpositions were probably quite independent from the original, at least in terms of formal reproduction of the stylistic features, and had great success in the imperial Rome.
Labeo Accio instead, although keeping the constraint of the hexameter, tried to realize a very literal translation of the Homeric poem, which apparently passed to History for being absolutely unreadable, as can happen to blindly transpositional translations Its author, traditionally believed to be Silius Italicus, is actually unknown It seems also that elements of later roman mythology were inserted in the text. It is not sure whether the Ilias Latina was written expressly for the school, but it is clear that it was used as a scholar version of the poem since late antiquity The passage of the Ilias Latina from Antiquity to Middle Ages with growing success, while Homeric texts as well as their complete Latin translations were apparently lost for the whole western Europe, is confirmed by many sources Homer was known only by fame, while the success of minor writers like Ditti and Daretes became more and more large It is with the Renaissance that the European tradition of the Homeric translations begins again.
A young Angelo Poliziano also tried a four books translation in Latin hexameters, which knew a relative success also in Italian academic studies In general, about the knowledge Dante could have had of Greek literature see Ziolkowski Gussano, Il primo libro de la Iliade, , vv. Unlike Spain, Italy seems to appreciate the Iliad from the beginning and its translations are in late XVI century relatively numerous: among others, we still have the Iliadic versions of La Badessa , Groto , Nevizano and Leo da Piperno They change and interpole lines or even stanzas, they modify at their wish not only the style, but even the secondary facts of the story.
Similar adaptations are not only a problem of a different translation theory or translation philosophy: some characteristics of the Homeric style are widely blamed by the critics. The Odyssey, although less represented in this period than the Iliad, is translated by Ludovico Dolce and, apparently, Francesco Aretino, although no copy of his work seems to have survived. In Gerolamo Bacelli edited the first complete translation of the Odyssey in Italian, and tried later also a translation of the Iliad, interrupted at the VII book.
In the last part of the XVI century Chapman starts working at the first English translation of Homer and choices a fourteen syllables verse to represent the Greek hexameter. In he edits the first books of his translation of Homer in English. His version has immediate success in Great Britain. Although being the first, it is also one of the most known and appreciated English Odysseys.
In Thomas Hobbes produces a new Iliad in English. The XVIII century saw new translations in Europe, while translators seem to problematize more than their predecessors on the issue of faithfulness to the original. In Dryden tries to translate some parts of the Iliad, but in general his results are poorly judged. Also appreciated German versions begin to appear: Bodmer and Burger are some of the most notable authors. While the so-called quarrel of the Ancients and the Moderns is alive and active among scholars, critics to Homeric style are harsh.
But we must also acknowledge that while the idea of adapting Homer to the stylistic features and requirements of the time, and, if necessary, of radically changing its text, is visible in many preceding translations here, at least, the operation is programmatically declared and explained. In the following years, Italian literates as Maffei , Barbi , Egizio , Lami , Rezzonico produce partial versions of Homer in hendecasyllables.
Ancient scholars themselves discussed the problem for a long time. The last author produced two versions of the poem, one in standard Italian and one in Venetian language. The deadly wrath of Achilles: To Greece the source of many woes! In other words, the first line of the Iliad is here divided in three separate lines. But it is in that the most successful German translator of Homer, Heinrich Voss, publishes the Odyssey in German, twelve years before his Iliad. Discussions over the aesthetics and translational qualities of this work began immediately after its publication.
Il quale aggiungimento di energia In Europe, generally, the epoch of prose translations had begun. The fortune of the Odyssey in this period is not only Italian or French. In England, Buttler edits a translation of the Odyssey. Between and the Odyssey is particularly appreciated - the s register the highest number of Italian translations until now De Caprio The majority of Italian Homeric translations of this period between and circa is characterized by a declared faithfulness, a tendency to privilege the economy of the text and sometimes a slight imitation of the spoken language also for the narrator.
Translators, and not only Italian translators, are highly concerned with the issues of reproducing the style of the original text Elements generally appreciated also from older commentators, such as the Homeric similes, are greatly exalted and considered fascinating insights in daily life from the Dark Ages27 as well as poetic masterpieces. This could be partly due to the diffusion of the oral theory about the origins of the poems, that induced many scholars to apply different standards when analysing Homer, not a writer anymore, but the name of the voice of an entire people But the vitality of Homeric reductions, reproductions, and the number itself of literal translations produced in the XX century seem to suggest that the aesthetic appreciation is sincere.
In the last seventy years, Homer has been the classical author with the highest number of new integral translations in Italy - 22 between the Iliad and the Odyssey De Caprio Naturally, an essential text on this theme is Graziosi , Homer in the XX century. See also Mahler Most importantly, the majority of translations in verses have abandoned the idea of reproducing the meter rhythm and musicality of Homeric verse, becoming in fact another kind of translation in prose.
Naturally this too is a precise aesthetic choice and not necessarily a sign of increased faithfulness to the features of the original text: as those translations are more literal in the contents, they lose the metrical and musical side that is so relevant in the original texts New enthusiasm toward the classical epics is perceivable also in editorial statistics: between and the number of editors that have promoted new Italian translations of the classical epos Iliad, Odyssey and Aeneid is the highest in the last fifty years De Caprio In the last 15 years new translations of Homeric poems have continued to appear Ventre, Marinari, Mirto are some of the translators , and the versions of the second half of the XX century have often been re-printed and sometimes transformed in ebook.
As someone said, translations get old much faster than originals. Articles and theses on this argument keep being published on the subject, together with new suggestions for the problem of word alignment, comparative studies between different aligners, and overviews of the field Since segment alignment is widely considered a necessary step in order to proceed toward any kind of word alignment attempt31, a good textual alignment is regarded as a very important pre-requisite for many studies about automatic word translation.
In my case, the main problem was the necessity to align long and non segmented texts with translations that are often noisy, literary and unfaithful. Very popular alignment tools as ParaConc aligner Barlow worked poorly with such texts, and usually lost track of the process within the first 6 segments. Although the specific object of this work were Homeric poems, I tried to keep the alignment algorithm as text-independent as possible, so to enable it, under certain constrains, to align texts from very different authors with no need for external databases, manually made lists, tagged translations or other time honoured devices.
Although sophisticated techniques exist to create lexica and thesauri from corpora32, and it would be interesting to measure the performance of similar systems over our translations, I preferred to limit myself to simpler approaches. The task of aligning bilingual textual blocks is considered to be a fundamental stepstone for machine translation improvements since, at least, three decades.
Naturally, a form of cross- lingual alignment has always been necessary for any attempt to build systems of machine translation, but the perspectives with which these alignments have been made has changed in time. Works about machine translation from more than thirty years ago often talk about the building of abstract representations33 involved should be less vague, ambiguous, and 30 See Bisson and Fluhr ; Och and Ney ; Kohen et al.
The related field of sentence translation is also intensely studied Stanojevic , Kundu et al. This approach is now criticised by many linguists. Consequently, a real interest in text alignment rose up in the early nineties, in parallel with the new success of heavily statistical and corpus-based machine translation. Anyway, it was a relatively slow process and still in , approaches to Machine Translation were more rule or principle-based than relying on statistic - see for example Berwick et al. The first important works on sentence alignment are generally considered to be - after the pioneering effort by Brown in - Brown and Gale and Church Brown in his work suggested to align sentences composed of a similar number of words.
This idea has inspired the latter article, Gale and Church , that based its alignment heuristics on the principle that original and translated sentences would both be of similar length, but considering the number of characters rather than words. The length heuristic is still widely used for a number of alignment purposes Naturally, if I were to use this approach, what would prove to be problematic for the task I set myself to achieve is the need for a very literal translation, a condition hard to satisfy in Homeric tradition.
The necessity for either clear or literal corpora, or both, is common to many alignment techniques, as Singh and Husain point out. Furthermore, Gale and Church first algorithm is designed to work on pre-aligned paragraphs. Anyway, Church himself, in , shows how the methods based on sentence length can only work well for clean texts, and not noisy documents or OCR outputs.
So he presents another approach, based on finding cognates in the text. He argues that this method can be useful for any language that written using the roman alphabet, since texts usually contain a high number of names and dates - thus elements that maintain a degree of similarity through languages - and he goes further arguing that it could be even used for texts written in non-Roman alphabets, provided such texts had a reasonable distribution of numbers and names in Latin characters.
At first, the only two aligned sentences are the first and the last.
The idea it is based on is to assign each term a vector the value of whose components is given by the number of words interluding each occurrence of the term in question. So if a term appears as the first word of a text and then it is repeated again after 90 words, its vector will be 0, Fung and McKeown argue that in parallel corpora - noisy, linguistically unrelated parallel corpora too - many words will have vectors similar to the ones of their translation.
This approach was inspired by a technique used in signal processing, and represents one of those cases where speech processing methods are used to fulfill text-based computational linguistics tasks This language-independent approach is very interesting for my purposes, but it is based on the assumption that the translations used are relatively faithful.
While it does find pairing words in fairly noisy texts too38, on the other hand if the translator varies the recurrence of a single term too much, using synonyms or ellipses — if he wants to create a translation with a different style - this kind of distribution vectors soon become unreliable. In an almost contemporary article Fung and Church a method is proposed fairly similar to a technique we will use later in this work to find single word translations. Such method is based on counting how many times a word recurs in a corpus splitted in N parts, and then to find in the translation, which must also be split in N parts, the words with the most similar recurrence.
Naturally this system is based on the hypothesis that the two corpora original and translation , once splitted in N parts, will give roughly similar blocks with roughly similar contents, which is not always the case with our translations. Coleman In the cited article, the authors at first use a good Part of Speech tagger to filter potential anchor words and then a big, reliable online dictionaries to create the actual pairs.
Chen bases his deductions on a statistical pre-built translation model while Melamed uses a sophisticated token-matching system through pattern recognition and segment boundary information. Bilingual text alignment can be the basis of a vast number of multilingual applications, as bilingual lexicography Langlais et al.
Using anchor words is an efficient and popular heuristic. While it is sometimes considered a conservative approach, it is still frequently used for a variety of tasks. For example text segmentation, or also sentence alignment, even in already paragraph-aligned corpora, as in Xu et al. A relatively recent paper that shares some similarities with my work, first of all in the use of anchor words for aligning corpora, is Feng and Manmatha , although its main purpose - identification of OCR errors - makes its main heuristics and practices of little use to my task.
Yet using anchor words also brings some limitations, such as the necessity to provide anchor words from external databases or manually constructed lists or the fact that different translators use different words to translate the same original. As I'll show in further details, the choice of the right names as anchor words is one of the best ways to overcome the first problem. In order to surpass the second problem, that proved to be harder to solve, I tried to automatically build an anchor words dictionary.
Moreover they would have risked to become computationally heavy when analyzing large texts and we both didn't want and could not use external thesauri and wordnets in the alignment phase, because we hoped to keep the procedure independent from external tools and since we didn't have, at that stage, a complete Ancient Greek WordNet or similar resources at our disposition. Naturally, many procedures acquired in machine learning and machine translation assume either the use of at least one rich language as pivot — a rich language is a language that has a great number of already made NLP tools and neither Homeric Greek nor XVIII century Italian fall into this category — or the presence of vast amounts of data and large corpora.
The problem of aligning multiple translations to their common original has also been widely studied in the field of automatic paraphrase generation. Barzilay and McKeown extracted paraphrases from sentence aligned translations using Gale and Church heuristic, while Pang et al. Sometimes, the issue has also been confronted in the field of Word Sense Disambiguation41, where parallel data are used to disambiguate polysemous words in a language with the aid of its translations.
Gale et al. They aligned the Canadian Hansards and English-French parallel corpus sentence by sentence and tagged the sentences where a given English polysemous word was translated with different French words: for example, they tagged differently sentences where duty was translated with devoir and sentences where duty was translated with droit.
After that, they trained a Naive Bayes classifier to automatically disambiguate English word senses depending on their context. Naturally, this approach is not very robust when faced with data sparsity and can work only for words frequent enough to provide reasonable context distinctions to their different senses Charniak Anyway, there are also several cases of use of HMM and Viterbi for textual alignment. To segment those texts and their different translations I chose to use proper names as anchor words. There is a variety of elements that can be used as anchors in a text: depending on its nature, the best anchors could be high frequency words, when the original and its translation are very similar; or low frequency words, as technical terms, if we are sure they'll be always translated in the same way or within a very reduced number of variants; or even numbers, that are translated always in the same way independently of the kind of translation.
But in the case of Homeric translations, these heuristics are not reliable. We can find very different and free translations of Homer, with every sort of periphrasis, interpolation and stylistic compromise. First of all, we must keep in mind that the original has a verse meter, and that many translations even the majority of surviving Italian translations of Homer also use a verse meter, and sometimes even a rhyme scheme.
Similar constraints are: 1 very rare in machine translation, textual alignment experiments, paraphrasis generation studies and parallel corpora; 2 incline to heavily distort the original text, producing translations that are, for a machine, very unreliable.
But Homeric translations can change for several other reasons. For example, many translators found repetition a negative stylistic feature and consistently used synonyms or periphrases where the original text had simply the same word repeated twice: so high frequency nouns or verbs could be unreliable. Other translators could apply an inverse aptitude, making low frequency words unreliable.
Different translations in different epochs do have very different styles. For the same reasons, the choice of technical terms or similar anchors is infeasible. If Homeric texts present low frequency groups of names that occur in almost every book - for example, nearly every book presents two or three names of birds, which would be ideal as members of an anchor group - translators can use a very specific, very general or very 'cultural' synonym for a common word, or use every kind of periphrasis.
Numbers themselves are sometimes changed. Furthermore, they are not frequent enough to enable a good segmentation. Proper names, instead, are a relatively stable feature. They don't change too much through different translations and they don't get obliterated or redoubled too often. For similar reasons, shared proper names are used also for detecting semantic similarity over short passages. Moreover they are extremely frequent and well distributed elements in Homeric texts, as can be seen in Figure 1. Distributionally, Homeric nouns are never so rare to form excessively large chunks.
Figure 1. Distribution of names in the first tokens of the Iliad. On the x-axis are all the tokens, while the y-axis shows the number of proper names. As can be seen, portions of text without proper names are extremely brief. Right within the first tokens, the first book begins to show its relevant abundance of proper names, and a similar mechanism is true for all the books, to the very last: in fact the last line of each poem includes a proper name.
In average, Iliad presents a proper name every 14 words circa. The importance epics assign to onomastics, that allows to extract from epic texts fully populated social networks43, gives us a reliable set of anchors to break the corpus; observance of this aspect in almost every translation I studied allows us to find those anchors in the target texts.
In fact, even the most free or facetious translations I found tend to maintain the proper names given by the original text, as rare and irrelevant as they can be. That's why this choice also needs to be treated in a number of steps, as it will be shown in the next chapter in general, and in the following chapters in detail. First, two texts are uploaded. They could be the Ancient Greek original text and an Italian translation, two different Italian translations or, as we will see, translations in some other languages.
If one of the texts is in Ancient Greek, some phonetic transformations are carried out over its potential anchor words so to make them more similar to the translated ones. If both texts are translations, there is naturally no need to use a set of transformation functions on them. The two texts, that I will call the 'original text' and the 'translation', are divided in tokens, sentences, and tokenized sentences, considering punctuation marks as independent tokens.
In phase of preprocessing, a frequency list of both texts is built up. This frequency list will be used to assign better similarity scores between blocks.
Eloquenza e Letterarietà nell'Iliade di Vincenzo Monti
Two lists are then created, one for each text, containing all words that could be proper names. A potential proper name is any word that starts with an upper case letter and never appears in the whole text with a first letter in lower case. For example, the Italian uppercase article 'Il' is not a potential proper name since its lower case form 'il' also appears in the text. It is important to underline that our approach to proper names here is purely functional. This filters out many false names, even if some noise will always remain, as we will see further.
The proper names candidates taken from the original text are then transformed. A second function cleans the names from betacode 'noise' due to diacritical signs as accents and breathing marks. This is not real noise, naturally - we could wonder whether anything in writing systems can ever be considered mere noise Sproat - but it constitutes information we don't need and that will create some troubles to the matching function.
It transforms Greek names following a set of strong phonological rules that are at work in transition from Greek to Italian names, and also in several common nouns. This function will be explained in detail in Chapter 3. This dictionary is automatically generated matching similar words from the two lists of potential proper names.
Greek names, being transformed and converted, will be linked much more easily to their correct equivalent. Anyway, individual pairs can be entered to study their efficiency in aligning sentences. We can finally apply the main alignment algorithm, an implementation of Needleman- Wunsch algorithm. Segment frequency is also taken into account. While it is rare that a full segment appears more than once in a text, it is surely a useful characteristic to consider: the similarity score of a segment and its potential translations is augmented if they have similar frequency - in other words, Greek segments with frequency 2 will have more probability to match Italian segments with frequency 2.
This is especially useful in the case of brief segments words , that could appear more than once in a text.
Additive features could be naturally added in a later phase of the work. When this is done, I have a list of aligned chunks with gaps when the Needlemann-Wunsch found no possible alignment. In other words, every time an original or translated block is found without an aligned companion, it is merged with the preceding block. This creates larger blocks - always of a contained size anyway - but makes us more confident about the translation equivalence. We are not chunking our texts only with the words actually matching in the first dictionary.
If we did, we could not run this post-processing checking with a lower threshold matching function. Textual blocks can be eventually refined recollecting the beginning and end of their opening and closing sentences, so to have sensed blocks that start and end with complete sentences, although this procedure requires some precaution.
In the following chapters I give a more detailed analyses of the phases of this process. There are four main steps in preprocessing: 1. Segmentation of the texts and extraction of possible names. Conversion of Greek names into Latin alphabet betacode.
Transformation of Greek names, already in Latin alphabet, in elements more similar to their translated equivalents, through a so called 'phonological converter'. Creation of a dictionary of anchor words. Segmentation of texts is a plain task: Greek and Italian corpora are each divided into textual blocks.
The choice of the segmentation heuristic, though, created some problems. I tried two ways: the division of text in sentences, and the division of text in chunks that start with a proper name and end before the following proper name, so that the number of textual blocks will be the number of names the text contains, plus 1. The main problem with the sentence model is naturally the constant cross-lingual shift of sentence boundaries: no translator keeps the original punctuation unchanged, and matching sentences that can be perfectly to their original - sentences that have the same information as the original, no more no less - are hard to find.
This problem is present in every parallel corpus, but is specially important in literary translations and could cause a high number of errors. The second method overcomes this difficulty. I considered as potential proper names in the Italian texts any word starting with a capital letter that never appears in lower case through the text. As I will show, remaining false proper names will be filtered out while building the anchor words dictionary.
This allowed us to handle small blocks of text generally between 30 and 80 words, rarely more than words and sometimes made up of only one word whose information kept a bigger similarity probability with the original chunk than we could ever expect from sentences. It is important to underline that if we judged the heuristic of the proper names the better choice to Homeric alignments, it is not the only possible application of the alignment functions.
Depending on the translation, blocks from the same original book will be of different size and number: translations can have different approaches also to proper names depending on stylistic or metric constraints - for example, translators who chose a verse meter could have a stronger tendency to skip or interpolate proper names, that can be hard to handle in some prosodic pattern.
Furthermore, some translators have the tendency to avoid name repetition through pronouns and various ellipses45, while especially older translations could contain more occurrences of some names than the original. Recent translations in general seem to be more reliable in this perspective, since in the last century circa has mostly prevailed the idea, it seems, that translations should privilege the economy of the Greek text against the freedom of a stylized epic translation. The problem at this stage is that most Greek names in betacode are not easily recognized by a similarity matching function, or, if recognized, are not easily separated from 'false matches' — common words or other names.
These rules are the main phonological laws that were found in action from Ancient Greek to Italian, which is to say, the regular transformations that led to the Italian equivalents of Ancient Greek names. In the last two centuries phonology has made impressive advances47 and has come to define with good approximation, between many other things, a number of rules that work in the evolution of words from one language to another. My function is comparable to many functions built in machine translation to handle unknown and misspelled words 48, but it adopts historical phonetic rules.
The phonological converter does not always transform an Ancient Greek name into a perfect Italian translation - there is a good deal of stochastic change in the life of words - but it makes them much easier to match for a similarity algorithm. Every Italian translator has to make a decision about these points before starting his translation. Romanization of proper names can be a precise choice of the translator: it can depend from a number of reasons, and can be read in different ways. The frequency of names through the books the only aligned unities at this stage could be used to infer their identities.
But at the state of the art I was not able to provide a really efficient way to operate in this direction: this system works just for a group of names that are neither too rare or too frequent, since Homeric texts are full of names that occur only once in one book and for high frequency names the mere count of occurrences in books is not enough: translators can change those parameters too easily. In this way a dictionary is created that for every name in the original names set returns all of the translated names that were recognized as similar.
As I wrote in the preceding chapter, the similarity threshold in this phase is normally not too low normally 0. During my alignments, I kept using the phonological converter as I described it here. If it is the case, naturally, it is possible to insert some external database or noun networks of Iliad and Odyssey to improve the performance.
This is an algorithm created in , among the many alignment algorithms designed between and , and was originally employed in bioinformatics to align nucleotide chains, protein chains and similar sequences. It works building up a grid from any two strings. For each element in the first string for example, for each letter it assigns a value of matching probability to every element of the second string, based on a given similarity score and on the already made matches. To explain and visualize this process, maybe it could be useful to expand it in a small example. The Needlman-Wunsch applies the following technique to reach the right conclusion and perform the best result.
Every value in the matrix is initialized as 1 if it represents a match between two letters for example A and A in the first position high on the left and 0 otherwise. This is a very simple way to proceed that forms the basis of many alignment algorithms. In simple occasions, it is sufficient to find the best alignment. Also in our example we can find that aligning the strings as A sc. One of the biggest problems of a similar approach is that it becomes computationally unaffordable in little time.
In fact, if we have to generate all possible alignments between two sequences, give to each alignment its similarity score and then select the best pair, the number of required alignments between two sequences of length N is in other words, two sequences of units would require comparisons. This is where the Needleman Wunsch becomes useful.
This algorithm avoids calculating every possible match between the two sequences. It starts assigning values to each position in the matrix, beginning from the first row on high, moving from left to right, and it gives to each position the highest value among the values of the immediately adjacent northwest diagonal, up and left positions. Gaps can be desirable or undesirable depending on the kind of alignment we want; if we set a gap penalty of 0, gaps are not penalized.
Thus, the choice will be between -1, -1 and 1 and the diagonal value of 1 will be assigned to the cell. Note here that the positive value in C 1,1 is neutralized by the gap penalty, that discourages non-diagonal paths. So C 1,2 will have a lower value than C 1,1 , namely 0. If we opted for a gap penalty of 0, C 1,1 and C 1,2 would have the same value. Applying this heuristic to the whole matrix, we will have the following values: A B C D E A 1 0 -1 -2 -3 A 1 1 0 -1 -2 B -1 1 1 0 -1 D -2 0 2 1 0 F -3 -1 1 2 1 E -4 -2 0 1 3 Now, alignment is traced as follows: we start from the last cell on the bottom right and we move toward the top left cell according to the value we find in each cell - following the cells with highest value.
A similar consideration allows also to include the cell 1,2 in the diagonal instead of the cell 1,1. Gap penalty has also produced the F - C alignment. Global alignment is used when we want to align sequences of roughly the same length and we are interested in reducing the number of gaps. If we wanted instead align a single part of a translation to a full original text, we would use a local alignment heuristic.
The program takes in input an Ancient Greek text and an Italian text divided in tokenized blocks and assigns to every possible Greek-Italian pair of blocks a score of similarity. For this task, it uses an important similarity score function. This function uses a similarity metric to score two confronted pairs. This is somehow the most sensitive part of the system, since it is the function that decides whether two elements have a good probability of matching.
The function that attributes a similarity score determines the success of the rest of the operation. In our case, since I was using non-annotated corpora, I maintained very simple parameters: the similarity is calculated through the automatically generated dictionary and some other heuristics. Specifically, the similarity between two blocks is based upon: 1. The presence in the two blocks of a Greek-Italian pair present in the dictionary the number of matching anchor words. The more the words recognized are rare, the more the probability of a good match increases and the similarity score grows.
The length similarity of the two blocks: the difference in length between the blocks is inversely proportional to their similarity score. The frequency of the two blocks. Fattah suggests, as third element, to take into consideration punctuation marks, but I found that this technique seemed to globally worsen my results, probably due to the particular unfaithfulness of literary translations. Anchor words frequency is considered also in Chen This similarity metric keeps three obvious but fundamental conditions: 1. The similarity between i and j, s i,j , is always higher or equal to 0; 2.
Personally, I think they would create a good deal of noise, since these parameters are stylistically important and are probably modified by different translators. The scoring function seems to work quite good. Low frequency proper names are very rarely doubled by translators, nor they are often suppressed, so they constitute a small set of reliable anchors. Unaligned strings are merged with the previous ones. This is made again with the dictionary, but the matching threshold is far lower.
Blocks that don't pass this matching test are merged with the previous one. This is not a necessary step, naturally. Removing gaps and merging blocks could seem a dangerous operation, in the sense that segments too large are generated. This naturally depends on the size we need, but normally the number of blocks created with the proper names heuristic is sufficiently high, for epic texts, to leave an acceptable number of parts.
The general pseudocode of second step process. Anyway, sometimes very large blocks can be formed. For example, in some points Homeric epics have long descriptions, that generally result in ample parts without a proper name. In those cases, the last part of the pre-processing can apply some different heuristics. Yet, if the two added parts are too different in length, something could be wrong and the smaller part has to keep searching for the next punctuation mark.
Blocks can also be expanded, for example by merging the blocks judged too small with the previous ones. In the third part, blocks can also be treated in order to reduce their dimension, if it is judged too large. If their length is not too dissimilar and it is too large - for example if the two blocks of the pair both exceed characters - they get splitted. Blocks could be both divided in a half, and each half can be completed as described for the normal blocks.
This simple heuristic shows a reasonable efficacy. All of this can be made, with some precaution, without misaligning the texts. Chen, Fundamentals of Noise Reduction. There are many kinds of translation to work with. First, we have to deal with texts which come from OCR processings and can contain a very high number of errors, misspellings, and other flaws. But also without considering this issue, it is well known that translations can change surprisingly.
Ancient translations, furthermore, can even have added or removed parts of text, entire phrases created from the translator, or removed from the original. Literary translations alignment is then a very challenging task. Epics of the dimensions and of the importance of Homeric works naturally generate the most different interpretations. The range of themes and characters touched and their role as cultural pillars favorite the most different and also bizarre inflections.
Nonetheless, the permanence of a single original text allows us to study those variations in relation to a hinge. As we will see in the analysis section, distributional semantics could, among other things, show us how words change their relations and meanings technically, their cosine similarity in different translations. The problem would be the interpretation of those variations. As we wrote in the introductory part, translators of Homer - and not only - have always dealt with the problem of faithfulness to the original and adequation to contemporary standards.
All of them had to concede something to the changing aesthetics of their time; some of them strongly preferred to adequate to it rather than to respect the original. Or, we can try to see what of their protohistoric original translators change the most, and what they tend to keep the same. Being a short text, Piaceva and Essi appear only in upper-case and thus are treated as proper names. In a longer text, they would have probably been filtered out.
Those words are used to create textual blocks. With our dictionary, we can proceed to alignment. Here is the result, with the similarity score given by the algorithm to each alignment: 0. Done this, gaps are deleted, allowing the correct alignment: Posidone, e neanche alla vergine dagli occhi lucenti. More litteral are texts, briefer tend to be the aligned blocks. The passage of book Divisate al figliuol distintamente queste avvertenze, si raccolse il veglio nell ' erboso suo seggio. Ultimo intanto con bella coppia di corsier superbi Merion nella lizza era venuto.
Montati i carri, si gittar le sorti. Era questa la cavalla che lui, Menelao, guidava sotto il giogo, tutta fremente di smania per la corsa. Merione era il quinto a bardare i cavalli dalle belle criniere. Salirono quindi sui carri e gettarono le sorti dentro un elmo. While Tonna's blocks number is very near to the number of Greek proper names, indicating a good level of matching, Monti offers large examples of 'failed matches'. The most immediate cause of mismatch is naturally the romanization of Greek names: 'Troia','Giove' and 'Nettunno' are all necessarily skipped by the dictionary.
Why this? Because Antiloco is a rarer word than Nestore: consequently, it forms a stronger textual anchor. So the aligner first matched the 'missing' Antiloco with its first recognized appearance that did not contradict the monotonic hypothesis, and then matched Nestor with the following most convenient candidate.
The result was a greatly overbalanced block, with disproportionate original-translation lengths. For this reason, as we showed above, it was merged with the preceding block in post- processing phase. This is why we have the impression that 'Sicion' was not recognized in Monti: in fact, the alignment it produced was only considered suspect and was aborted in phase of verification, leading to a longer block, but solving the mismatch irregularity. Anyway, in Tonna nothing of this happened, since the word order was respected. This led to a far more fine-grained block division.
In general a smaller number of proper names leads to bigger and fewer blocks. This is sometimes contradicted by very noisy texts, where it is harder to match anchors. Already with aligned texts, some considerations could be carried out without the sophisticated techniques of distributional semantics.
The possibility to visualize many different translations in fine-grained alignment could already be a good position to advance some considerations. Ma chi degli dei li spinse a contrastare con violenza? Fu il Achille. E qual de ' numi inimicolli? This kind of fine-grained alignment is possible using every word of the text, and not only proper names, as potential anchor words. Since string similarities work with high efficiency on words from two same languge translations the phonological converter is naturally de- activated in these cases , the Needleman-Wunsch can succeed in aligning with impressive precision very small blocks of text.
It is known that in modern translations the roman names of gods are less used than in XIX century translations, as Tonna-Foscolo show: chi chi degli degli dei li spinse a contrastare con violenza? Il figlio di figlio di Latona Latona e e di Zeus. It is also known that Foscolo had a particularly synthetic style of translation in some cases, as can be seen by the different length and expressivity of dei li spinse a contrastare con violenza?
My version of XIX century Iliad translated by Fiocchi was so noisy at first, due to the OCR process, that the beginning of the third book could not be found - the third book title had been skipped.
Traduzione dell' Iliade (Italian Edition) eBook: Vincenzo Monti: wisolyvahode.tk: Kindle Store
A set of aligned textual blocks can be of great help in many studies. The first and most immediate application we can imagine is obviously in the research of different translations of the same passage, or of different occurrences in the same translation. For example, we could wonder how and when a translator or many translators differentiated the translation of a polysemous Greek word A word-to-word translation in this case could even be less useful to the objective, since literary translations can very often recur to sophisticated periphrases to express important concepts; furthermore, we could be interested not only in the direct translation, but also in its immediate context.
I have already mentioned in the introduction the interesting systems of Gale and al. This means that the whole algorithm can be applied, or at least tested, also on translations in other languages, expanding enormously our field of action. The aligner can work also for translations in different, albeit not too distant, languages. It was tested on Spanish, French, German and English showing adequate results. It also seems to work with Russian, albeit with an increased number of errors.
There is another kind of tradition we could study: the dialect translations. Italy produced, and keeps producing, a vast number of Homeric translations in different dialects such as Venetian, Sardinian, or Piemontese. It is well known that standard modern Italian has a different history from standard French or standard English. Alignment of translations in different languages, both with the Greek original and between them, is of great interest.
As we have seen by the differences in Italian translations, dealing with epic formal characteristics is a challenging point: how did translators of other countries and languages handle them? Although this work is focused on Italian translation, we could try to reproduce a number of the following experiments on a multilingual scale. I chose to address my research in the direction of the study of semantic relations between words, in both original and translation languages.
In particular I tried to study the different lexical choices made by translators to render a selected group of Greek terms and the way those terms interact in translation. The task in itself is not easy. Ingersoll et al. These ontologies were often inspired by relevant works in cognitive linguistics and psycholinguistics and were attempts to find successful models for the human judgements and time of reaction about, for example, similarity and dissimilarity between words, as the notorious mind lexicon model developed by Collins and Quilian and Collins and Quillina Programs working with ontologies transposed on digital formats where even tried over human-realized databases of similarity judgements built up by psychologists, like Rubenstein and Goodenough datasets: "they invited 51 judges who assigned to every pair a score between 4.
From the psychological field these approaches have now shifted to the computational side, where they are considered useful devices for a wide range of operations. Corpus-based approaches try to extract semantic relations from the text itself. Corpus-based - and context-based - approaches to semantics were not unknown before the s 55, and in some sense they have always been a part of the disciplines of language 56, but between the s and the s they were treated as somehow marginal approaches.
Knowledge-based approaches are at the base of important and recent resources as SensoComune Chiari et al. Aitken Descriptive lexicographers in fact depict their job as a catalogation of observed behaviours Atkins and Rundell On the other hand, also context-dependent models had some fortune in psycholinguistics, thanks to theories like the Feature Comparison Model Formalist approaches to language and grammar slowly shifted again towards statistics.
Many kinds of hybrid approaches were created, and still are present, where "The kernel of the grammar statements are rule-like constraints but optional probabilistic features are available if the grammar statements fail" Karlsson Consequently, all these approaches are relatively new. Popular measures in this field are several. Pointwise Mutual Information Church and Hanks based its search of semantic similarity on the probability for two words to co-occur: given an actual co-occurrence of A and B, it is calculated the probability of a similar coincidence given the joint distribution of AB and the individual distribution of A and B Second Order PMI Islam and Inkpen uses PMI to retrieve the lists of important neighbours59 of two terms and then computes the similarity between those lists the number of words they have in common to infer the similarity between the two original terms.
- A Guide to Healthy Eating (Unique Tasty & Sensible Way!).
- Le rêve de Leona (Horizon) (French Edition)!
- Get this edition.
- Bite Size Bible Study 1: The Widows Mite.
- Closer to Free;
- Iliade by Milo Manara!
Latent Semantic Analysis Landauer et al. Hyperspace Analogue to Language Burgess et al. Sentence-verification is viewed as a 2 stage process" Moyne in which a similarity index between the elements of the sentence is calculated first on every feature, then just on defining features. The importance of context is naturally recognized also in pragmatics Forabosco , phonetics Couper-Kuhlen and in several NLP fields like PoS tagging Aminian et al. These approaches have in common the research of patterns and structures inside the texts that can give hints to the semantic relations between the elements of the texts Lenci argues that while these techniques are different from the classical heuristics employed in formal semantics, they can find some similarities with the connectionists models that represents notions as points in a multidimensional space determined by a neural network, and that, indeed, are sometimes mingled with the distributional semantics — see Li and Li et al.
Several are also the recent applications of distributional semantics to fields nearer to the psycholinguistics. I chose the distributional semantics approach, for two reasons, the first being that I find it very powerful and elegant and the second being the novelty of the study of multilingual or cross-lingual distributional representations that makes it a somehow unexplored field. Vectorial representations are widely used in linguistics to model the distance between words, concepts Alfonseca and Manandhar , expressions Baroni et al.
Anyhow, semantic distance is normally computed between two words of the same language and only recently some studies have been made about vectors in parallel corpora. Corpus-based approaches to parallel corpora have been exploited mainly in the field of Machine Translation. Cohn and Lapata try to improve poor-resource languages translation through a triangulation method, using a rich language as pivot between two texts. Namespaces File Discussion. Views View Edit History.
Permission Reusing this file. Public domain Public domain false false.