cicada

cicada is a statistical machine translation toolkit based on a semiring parsing framework [1]. Based on the generic framework, we can

The cicada toolkit is mainly developed by Taro Watanabe at Multilingual Translation Laboratory, Universal Communication Institute, National Institute of Information and Communications Technology (NICT). If you have any questions about cicada, you can send them to taro.watanabe at nict dot go dot jp.

Remark: cicada is 蝉(CJK UNIFIED IDEOGRAPH-8749), (or セミ) in Japanese, pronounced SEMI.

Quick Start

The stable version is: 0.3.5. The latest code is also available from github.com.

Compile

For details, see BUILD.rst.

./autogen.sh (required when you get the code by git clone)
./configure
make
make install (optional)

Run

You can find a sample grammar file at samples directory together with ngram language model. Here is an example run (Note that \ indicates shell's newline).

./progs/cicada \
   --input samples/scfg/input.txt \
   --grammar samples/scfg/grammar.bin \
   --grammar "glue:straight=true,inverted=false,non-terminal=[x],goal=[s]" \
   --grammar "insertion:non-terminal=[x]" \
   --feature-function "ngram:file=samples/scfg/ngram.bin" \
   --feature-function word-penalty \
   --feature-function rule-penalty \
   --operation compose-cky \
   --operation apply:prune=true,size=100,weights=samples/scfg/weights \
   --operation output:file=-,kbest=10,weights=samples/scfg/weights

This sample means:

  • Input is samples/scfg/input.txt
  • Three grammars:
    • The SCFG file is samples/scfg/grammar.bin which is a binary version of samples/scfg/grammar.bz2.
    • Additional grammar is a glue grammar consisting of two rules: [s] -> <[x,1], [x,1]> and [s] -> <[s,1] [x,2], [s,1] [x,2]>.
    • Another additional grammar is an insertion grammar which simply copies the input string to output string, [x] -> <word-x, word-x>
  • Three feature functions:
    • 5-gram language model from samples/scfg/ngram.bin which is a binary version of samples/scfg/ngram.bz2.
    • A word penalty feature which penalize by the number of words in the target side of a derivation.
    • A rule penalty feature which penalize by the number of rules in a derivation.
    • In addition, there exist features already defined for each hierarchical phrase pair. For example, see samples/scfg/grammar.bz2.
  • Actual operation(s):
    1. Input is composed by CKY algorithm (compose-cky) which result in a hypergraph.
    2. Cube-pruning (apply) to approximately compute features using 100 as a histogram pruning threshold using the weights at samples/scfg/weights.
    3. 10-best derivations are computed and output at - (stdout) using samples/scfg/weights as a weight vector to compute the score for each derivation.

In depth

The above example can be found at samples/scfg/sample.sh. The details of grammars, features and operations are described in doc/grammar.rst, doc/features.rst and doc/operation.rst, respectively. If you want to try tree-to-string or string-to-tree models, see samples/t2s/sample.sh and samples/s2t/sample.sh. In order to train a model, see doc/training.rst and doc/training-stsg.rst which describe how to create your own {tree,string}-to-{tree,string} models, tune parameters, and run decoder.

Systems

It has been successfully compiled on x86_64 on Linux, OS X and Cygwin, and regularly tested on Linux and OS X.

Descriptions

Basically, we have four distinct structures:

Translation/parsing can be carried out by:

Alignment can be carried out by:

Dependency parsing can be carried out by:

After the hypergraph generation, you can:

Or, you can combine outputs from multiple systems by [29]:

Or, a conventional system combination strategy of [14]:

Monolingual grammar learning is implemented:

Phrase/synchronous-rule/tree-to-string/string-to-tree extraction/scoring are implemented (see doc/extraction.rst and doc/indexing.rst for details):

Various learning components are implemented (see doc/learning.rst for details):

Feature functions (see doc/features.rst for details):

References

[1]Joshua Goodman. Semiring parsing. Computational Linguistics, 25:573-605, December 1999.
[2]Christopher Dyer, Smaranda Muresan, and Philip Resnik. Generalizing word lattice translation. In Proceedings of ACL-08: HLT, pages 1012-1020, Columbus, Ohio, June 2008. Association for Computational Linguistics.
[3]Chris Dyer and Philip Resnik. Context-free reordering, finite-state translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 858-866, Los Angeles, California, June 2010. Association for Computational Linguistics.
[4](1, 2, 3, 4, 5) Liang Huang and David Chiang. Forest rescoring: Faster decoding with integrated language models. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 144-151, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
[5]Liang Huang and David Chiang. Better k-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology, pages 53-64, Vancouver, British Columbia, October 2005. Association for Computational Linguistics.
[6](1, 2) David Chiang, Kevin Knight, and Wei Wang. 11,001 new features for statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 218-226, Boulder, Colorado, June 2009. Association for Computational Linguistics.
[7]Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. Online large-margin training for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 764-773, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
[8]Shankar Kumar, Wolfgang Macherey, Chris Dyer, and Franz Och. Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 163-171, Suntec, Singapore, August 2009. Association for Computational Linguistics.
[9]Kevin Gimpel and Noah A. Smith. Softmax-margin crfs: Training log-linear models with cost functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 733-736, Los Angeles, California, June 2010. Association for Computational Linguistics.
[10]Zhifei Li, Jason Eisner, and Sanjeev Khudanpur. Variational decoding for statistical machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 593-601, Suntec, Singapore, August 2009. Association for Computational Linguistics.
[11]David Vilar and Hermann Ney. On lm heuristics for the cube growing algorithm. In Annual Conference of the European Association for Machine Translation, pages 242-249, Barcelona, Spain, May 2009.
[12]John DeNero, David Chiang, and Kevin Knight. Fast consensus decoding over translation forests. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 567-575, Suntec, Singapore, August 2009. Association for Computational Linguistics.
[13]John DeNero, Shankar Kumar, Ciprian Chelba, and Franz Och. Model combination for machine translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 975-983, Los Angeles, California, June 2010. Association for Computational Linguistics.
[14]Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. Incremental hypothesis alignment with flexible matching for building confusion networks: BBN system description for WMT09 system combination task. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 61-65, Athens, Greece, March 2009. Association for Computational Linguistics.
[15]Andreas Zollmann and Stephan Vogel. New parameterizations and features for pscfg-based machine translation. In Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation, pages 110-117, Beijing, China, August 2010. Coling 2010 Organizing Committee.
[16]Haitao Mi and Liang Huang. Forest-based translation rule extraction. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 206-214, Honolulu, Hawaii, October 2008. Association for Computational Linguistics.
[17]Yang Liu, Yajuan Lü, and Qun Liu. Improving tree-to-tree translation with packed forests. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 558-566, Suntec, Singapore, August 2009. Association for Computational Linguistics.
[18]Liang Huang and Haitao Mi. Efficient incremental decoding for tree-to-string translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 273-283, Cambridge, MA, October 2010. Association for Computational Linguistics.
[19]Chris Dyer, Aaron Cordova, Alex Mont, and Jimmy Lin. Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 199-207, Columbus, Ohio, June 2008. Association for Computational Linguistics.
[20]Jason Riesa and Daniel Marcu. Hierarchical search for word alignment. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 157-166, Uppsala, Sweden, July 2010. Association for Computational Linguistics.
[21]Jakob Uszkoreit and Thorsten Brants. Distributed word clustering for large scale class-based language modeling in machine translation. In Proceedings of ACL-08: HLT, pages 755-762, Columbus, Ohio, June 2008. Association for Computational Linguistics.
[22]Percy Liang, Ben Taskar, and Dan Klein. Alignment by agreement. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 104-111, New York City, USA, June 2006. Association for Computational Linguistics.
[23]Kuzman Ganchev, João V. Graça, and Ben Taskar. Better alignments = better translations? In Proceedings of ACL-08: HLT, pages 986-993, Columbus, Ohio, June 2008. Association for Computational Linguistics.
[24]Dan Klein and Christopher D. Manning. Parsing and hypergraphs. In IN IWPT, pages 123-134, 2001.
[25]Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 433-440, Sydney, Australia, July 2006. Association for Computational Linguistics.
[26]Slav Petrov and Dan Klein. Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 404-411, Rochester, New York, April 2007. Association for Computational Linguistics.
[27](1, 2) Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. What's in a translation rule? In Daniel Marcu Susan Dumais and Salim Roukos, editors, HLT-NAACL 2004: Main Proceedings, pages 273-280, Boston, Massachusetts, USA, May 2 - May 7 2004. Association for Computational Linguistics.
[28]Arne Mauser, Saša Hasan, and Hermann Ney. Extending statistical machine translation with discriminative and trigger-based lexicon models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 210-218, Singapore, August 2009. Association for Computational Linguistics.
[29]Taro Watanabe and Eiichiro Sumita. Machine translation system combination by confusion forest. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1249-1257, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
[30]Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871-1874, 2008.
[31]Andrea Gesmundo and James Henderson. Faster Cube Pruning. In Marcello Federico, Ian Lane, Michael Paul, and François Yvon, editors, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT), pages 267-274, 2010.
[32]Wenbin Jiang and Qun Liu. Dependency parsing and projection based on word-pair classification. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 12-20, Uppsala, Sweden, July 2010. Association for Computational Linguistics.
[33]Mark Hopkins and Jonathan May. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1352-1362, Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics.
[34]Mark Hopkins and Greg Langmead. SCFG decoding without binarization. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 646-655, Cambridge, MA, October 2010. Association for Computational Linguistics.
[35]Antti-Veikko Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. Expected bleu training for graphs: Bbn system description for wmt11 system combination task. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 159-165, Edinburgh, Scotland, July 2011. Association for Computational Linguistics.
[36]Patrick Simianer, Stefan Riezler, and Chris Dyer. Joint feature selection in distributed stochastic learning for large-scale discriminative training in smt. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11-21, Jeju Island, Korea, July 2012. Association for Computational Linguistics.
[37]Taro Watanabe. Optimized online rank learning for machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 253-262, Montréal, Canada, June 2012. Association for Computational Linguistics.
[38]Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 967-975, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
[39]Taro Watanabe, Hajime Tsukada, and Hideki Isozaki. A succinct n-gram language model. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 341-344, Suntec, Singapore, August 2009. Association for Computational Linguistics.
[40]Kenneth Heafield. Kenlm: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 187-197, Edinburgh, Scotland, July 2011. Association for Computational Linguistics.
[41]Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. Decoding with large-scale neural language models improves translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1387-1392, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.