dc.contributor.author | Giellatekno - Saami Language Technology, UiT The Arctic University of Norway |
dc.contributor.author | The Divvun group at UiT The Arctic University of Norway |
dc.date.accessioned | 2015-10-27T08:08:19Z |
dc.date.available | 2015-10-27T08:08:19Z |
dc.date.issued | 2015-10-17 |
dc.identifier.uri | http://hdl.handle.net/11509/109 |
dc.description | The South Saami N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of the language community. In particular, Ciprian-Virgil Gerstenberger compiled the data set from the entire SIKOR South Saami corpus version 2015-10-10. The length of the N-grams ranges from unigrams (single words) to tri-grams (101693 unigrams, 499288 bigrams, 112879 trigrams). Only N-grams within sentences have been counted. The data format follows the ARPA backoff N-gram models and has been generated using SRILM, the SRI Language Modeling Toolkit (http://www.speech.sri.com/projects/srilm/). Since the N-grams have been derived automatically, they may contain wrong values. In case you find any errors the creators would appreciate your feedback sent to giellatekno@uit.no and feedback@divvun.no. Please note that the Giellatekno resources are dynamic in nature. To ensure that you have a completely updated version, please contact Giellatekno (see Contact Info in metadata). |
dc.language.iso | sma |
dc.publisher | Giellatekno - Saami Language Technology |
dc.rights | Creative Commons - Attribution 3.0 Unported (CC BY 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/ |
dc.rights.label | CC |
dc.source.uri | http://giellatekno.uit.no/index.eng.html |
dc.subject | South Saami |
dc.subject | Ngram |
dc.subject | 1-gram |
dc.subject | 2-gram |
dc.subject | 3-gram |
dc.subject | Language Model |
dc.title | South Saami N-grams |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarino |
demo.uri | http://gtweb.uit.no/korp |
contact.person | Trond Trosterud trond.trosterud@uit.no Giellatekno - Saami Language Technology |
size.info | 101693 unigrams |
size.info | 499288 bigrams |
size.info | 112879 trigrams |
files.size | 5359748 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
Distributed under Creative Commons
and licensed under:Creative Commons - Attribution 3.0 Unported (CC BY 3.0)