Contents & References of An efficient model for creating a parallel text corpus from a comparative text corpus
List:
1. Introduction. 2
1-1. Introduction. 2
1-1-1. Dictionary-based machine translation. 3
1-1-2. Rule-based machine translation. 4
1-1-3. Knowledge-based machine translation. 5
1-1-4. corpus-based machine translation. 5
Statistical machine translation. 6
Example-based machine translation. 6
Text-based machine translation. 7
1-2. The necessity of building a parallel structure. 7
1-3. Research problem: construction of parallel bodies. 9
1-4. The purpose of the research: making a parallel body from the comparative body. 10
1-5. Headings 10
1-5-1. The second chapter: theoretical foundations. 10
1-5-2. The third chapter: an overview of the research done. 11
1-5-3. Chapter 4: Proposed model. 11
1-5-4. The fifth chapter: evaluation and conclusion. 12
2. Theoretical foundations. 14
2-1. body 14
2-1-1. parallel body 15
2-1-2. Adaptive body. 17
2-2. alignment 18
2-2-1. Alignment at the document level. 19
2-2-2. Alignment at the sentence level. 19
2-2-3. Alignment at the word level (lexical alignment). 21
Lexual alignment using IBM models. 22
2-3. Evaluation of machine translation. 23
2-3-1. blue 23
2-3-2. NIST metric. 24
2-3-3. Word error rate. 24
2-3-4. Translation error rate (TER). 25
3. An overview of the research done. 28
3-1. Introduction. 28
3-2. Building a parallel corpus from co-translation texts. 28
3-3. Extracting parallel sentences from the web. 30
3-4. Extracting parallel sentences from comparative corpora. 32
3-5. Recognition of parallel sentences using maximum entropy classifier. 34
3-6. Construction of English-Persian parallel corpus. 36
4. The proposed model. 39
4-1. Introduction. 39
4-2. Selection of pairs of parallel candidate sentences. 40
4-2-1. Filter common words. 41
Converting the encoding of characters 42
Determining the boundaries of sentences and words 43
Finding roots. 44
remove frequently used words 45
eliminate ambiguity. 45
Searching for meanings from the dictionary. 46
Grouping the repeated words of the sentence along with the number of occurrences in the sentence. 46
Algorithm to find the rate of common words (from the source) 47
4-3. Selecting pairs of parallel sentences from candidate pairs of sentences. 48
4-3-1. Maximum entropy classifier. 48
4-3-2. General features. 49
Features based on the length of two sentences. 49
Rate of common words. 50
4-3-3. Word-level alignment-based features of a pair of sentences. 50
unmatched words 50
fertility. 51
Continuous range. 52
Alignment score. 53
4-4. Increasing the accuracy of extracted pairs of parallel sentences. 54
4-5. Model evaluation method. 55
5. Evaluation and conclusion. 58
5-1. Evaluation of maximum entropy classifier. 58
5-1-1. Evaluation of features 58
5-1-2. Domain sensitivity. 60
5-2. Configurations and tests of building a parallel body from an adaptive body. 63
5-2-1. Adaptive body used. 63
Persian-English comparative text of Tehran University (UTPECC) 63
Comparative text taken from Wikipedia articles 65
5-2-2. parameters set and tools used. 66
Selecting pairs of candidate sentences: 66
Selecting pairs of parallel sentences: 68
Increasing the accuracy of pairs of extracted sentences: 69
5-2-3. Evaluation of parallel sentences extracted using translation machine. 69
5-3. conclusion 72
5-4. Future offers. 75
Source:
[1]S. Tripathi and J. K. Sarkhel, "Approaches to machine translation", Annals of Library and Information Studies, vol. 57, pp. 388-393, December 2010.
A. Lopez, "statistical machine translation", ACM Computing Surveys, vol. 40, no. 3, pp. 1-49, 2008.
P. F. Brown, J. Cocke, S. A. Della-Pietra, V. J. Della-Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer and P. S. Roossin, “A statistical approach to machine translation”, Comput Linguist, vol. 16, no. 2, pp. 79-85, 1990.
F. J. Och and H. Ney, "Discriminative training and maximum entropy models for statistical machine translation", in 40thNey, "Discriminative training and maximum entropy models for statistical machine translation", in 40th Annual meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 295–302, 2002.
P. Koehn, "Europarl: a parallel corpus for statistical machine translation", in MT Summit X: the tenth machine translation summit, Phuket, Thailand, pp. 79-86, 2005.
M. Mohaghegh, A. Sarrafzadeh and T. Moir, "Improved Language Modeling for English-Persian Statistical Machine Translation", Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (COLING 2010), Beijing, pp. 75–82, August 2010.
Supreme Council of Information and Communication Technology. (2013). Mizan English-Persian Parallel Corpus. Tehran, I.R. Iran. Retrieved from http://dadegan.ir/catalog/mizan.
A. Mansouri and H. Faili, "State-of-the-art English to Persian Statistical Machine Translation System", in 16th CSI International Symposium on Artificial Intelligence and Signal Processing, pp. 174-179. IEEE, Fars, 2012.
T. Ishisaka, K. Yamamoto, M. Utiyama and E. Sumita, "Development of a Japanese-English software manual parallel corpus", MT Summit XII: proceedings of the twelfth machine translation summit, Ottawa, ON, Canada, pp. 254-259, 2009.
M. T. Pilevar, A. H. Pilevar and H. Faili, "TEP: Tehran English-Persian Parallel Corpus", In: Gelbukh, A. (eds.) Computational Linguistics and Intelligent Text Processing. LNCS, vol. 6609, pp. 68-79. Springer, Heidelberg, 2011.
F. Jabbari, S. Bakhshaei, S. M. Mohammadzadeh Ziabary and S. Khadivi, "Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus", Fourth Workshop on Computational Approaches to Arabic-Script-based Languages ??(AMTA 2012), San Diego, CA, USA, November 2012.
J. Nie, M. Simard, P. Isabelle and R. Dur, "Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web", Proceedings of the 22nd annual international ACMSIGIR conference on research and development in information retrieval (SIGIR '99), Berkeley, CA, pp. 74-81, 1999.
P. Resnik and N. A. Smith, "The web as a parallel corpus", Comput Linguist, vol. 29, no. 3, pp. 349-380, 2003.
Y. Zhang, K. Wu, J. Gao, and P. Vines, "Automatic acquisition of Chinese-English parallel corpus from the Web", Proceedings of 28th European Conference on Information Retrieval, pages 420-431. Lecture Notes in Computer Science, Vol. 3936, Springer, January 2006.
D. W. Oard, "Alternative approaches for cross-language text retrieval", In AAAI symposium on cross-language text and speech retrieval, Stanford, CA, USA, pp. 154-162, 1997.
J. Tiedemann, "Parallel Data, Tools and Interfaces in OPUS", In Proceedings of the 8th International Conference on Language Resources
[16]and Evaluation (LREC'2012), 2012.
R. Zajac, S. Helmreich and K. Megerdoomian, "Black-Box/Glass-Box Evaluation in Shiraz", Workshop on Machine Translation Evaluation at LREC-2000, Athens, Greece, 2000.
R. S. Belvin, W. May, S. Narayanan, P. Georgiou and S. Ganjavi, "Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients", International Conference on Language Resources and Evaluation (LREC), 2004.
B. Qasemizadeh and S. Rahimi, "The First Parallel Multilingual Corpus of Persian: Toward a Persian BLARK", the second workshop on Computational Approaches to Arabic Script-based Languages ??(CAASL-2), California, USA, 2007.
M. Mohaghegh and A. Sarrafzadeh, "Performance evaluation of various training data in English-Persian Statistical Machine translation", 10th International Conference on the Statistical Analysis of Textual Data (JADT2010), Rome, Italy, 2010.
M. A. Farajian, "Pen: Parallel English-Persian News Corpus", Proceedings of the 2011th World Congress in Computer Science, Computer Engineering and Applied Computing, 2011.
F. Jabbari, S. Bakhshaei, S. M.