Show simple item record Haug, Dag Jøhndal, Marius L. 2016-11-29T08:36:36Z 2016-11-29T08:36:36Z 2016-11-29
dc.description The _PROIEL Treebank_ is a dependency treebank with morphosyntactic and information-structure annotation. It includes texts in several ancient Indo-European languages and is freely available under a Creative Commons CC BY-NC-SA 4.0 license. Please cite as: Dag T. T. Haug and Marius L. Jøhndal. 2008. 'Creating a Parallel Treebank of the Old Indo-European Bible Translations'. In Caroline Sporleder and Kiril Ribarov (eds.). Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008) (2008), pp. 27-34. Releases of the PROIEL Treebank are hosted on[Github]( The following texts are included in this release of the treebank: Text | Language | Filename | Size The Greek New Testament (ed. Tischendorf 1869) | Ancient Greek | greek-nt | 140,676 tokens The Armenian New Testament (ed. Künzle 1984) | Classical Armenian | armenian-nt | 23,513 tokens The Gothic Bible (ed. Streitberg 1919) | Gothic | gothic-nt | 57,211 tokens Codex Marianus (ed. Jagić 1883) | Old Church Slavonic | marianus | 58,269 tokens Jerome's Vulgate | Latin | latin-nt | 81,441 tokens Caesar, Commentarii belli Gallici (ed. Holmes 1914) | Latin | caes-gal | 28,608 tokens Cicero, Epistulae ad Atticum (ed. Purser 1901) | Latin | cic-att | 41,901 tokens Peregrinatio Aetheriae (ed. Heraeus 1908) | Latin | per-aeth | 18,356 tokens Herodotus, Histories (ed. Godley 1920) | Ancient Greek | hdt | 81,495 tokens Sphrantzes, Chronicles (post-1453) (ed. Grecu 1966) | Ancient Greek | chron | 24,612 tokens (The 'size' column in the table above shows the number of annotated tokens ina text. The number of tokens will be slightly larger than the number of words in the original printed edition as some words have been split into multiple tokens and some tokens have been inserted during annotation.) Please see the XML files for detailed metadata and a full list of contributors. Data formats: The texts are available on two formats: 1. PROIEL XML: These files are the authoritative source files and the only ones that contain all available annotation. They contain the complete morphological, syntactic and information-structure annotation, as well as the complete text, including punctuation, section headers etc. The schema is defined in [`proiel.xsd`]( 2. [CoNLL-X format](
dc.language.iso got
dc.language.iso grc
dc.language.iso chu
dc.language.iso lat
dc.language.iso xcl
dc.publisher The PROIEL Treebank
dc.rights Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.label CC
dc.subject Treebank
dc.subject Morphosyntactic Annotation
dc.title PROIEL collection
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding Clarino
contact.person Dag Haug University of Oslo
sponsor Norwegian Research Council 192606 PROIEL – Pragmatic Resources in Old Indo-European Languages nationalFunds 46406 sentences 530666 words
files.size 16372731
files.count 1

 Files in this item

This item is
Distributed under Creative Commons
and licensed under:
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Attribution Required Noncommercial Share Alike
15.61 MB
PROIEL Collection
 Download file

Show simple item record