Show simple item record

 
dc.contributor.author Giellatekno - Saami Language Technology, UiT The Arctic University of Norway
dc.contributor.author The Divvun group at UiT The Arctic University of Norway
dc.date.accessioned 2015-10-27T08:08:28Z
dc.date.available 2015-10-27T08:08:28Z
dc.date.issued 2015-10-17
dc.identifier.uri http://hdl.handle.net/11509/111
dc.description The Lule Saami N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of the language community. In particular, Ciprian-Virgil Gerstenberger compiled the data set from the entire SIKOR Lule Saami corpus version 2015-10-10. The length of the N-grams ranges from unigrams (single words) to tri-grams (112877 unigrams, 552639 bigrams, 164286 trigrams). Only N-grams within sentences have been counted. The data format follows the ARPA backoff N-gram models and has been generated using SRILM, the SRI Language Modeling Toolkit (http://www.speech.sri.com/projects/srilm/). Since the N-grams have been derived automatically, they may contain wrong values. In case you find any errors the creators would appreciate your feedback sent to giellatekno@uit.no and feedback@divvun.no. Please note that the Giellatekno resources are dynamic in nature. To ensure that you have a completely updated version, please contact Giellatekno (see Contact Info in metadata).
dc.language.iso smj
dc.publisher Giellatekno - Saami Language Technology
dc.rights Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
dc.rights.uri http://creativecommons.org/licenses/by/3.0/
dc.rights.label CC
dc.source.uri http://giellatekno.uit.no/index.eng.html
dc.subject Lule Saami
dc.subject 1-gram
dc.subject 2-gram
dc.subject 3-gram
dc.subject Language Model
dc.title Lule Saami N-grams
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarino
demo.uri http://gtweb.uit.no/korp
contact.person Trond Trosterud trond.trosterud@uit.no Giellatekno - Saami Language Technology
size.info 112877 unigrams
size.info 552639 bigrams
size.info 164286 trigrams
files.size 6336636
files.count 1


 Files in this item

This item is
Distributed under Creative Commons
and licensed under:
Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
Attribution Required
Icon
Name
SIKOR_smj_20151010.lm.zip
Size
6.04 MB
Format
application/zip
 Download file

Show simple item record