Kven N-grams

Giellatekno - Saami Language Technology, UiT The Arctic University of Norway; The Divvun group at UiT The Arctic University of Norway

dc.contributor.author	Giellatekno - Saami Language Technology, UiT The Arctic University of Norway
dc.contributor.author	The Divvun group at UiT The Arctic University of Norway
dc.date.accessioned	2015-10-27T08:08:13Z
dc.date.available	2015-10-27T08:08:13Z
dc.date.issued	2015-10-17
dc.identifier.uri	http://hdl.handle.net/11509/108
dc.description	The Kven N-gram data set is work done by the Giellatekno and Divvun research groups, Department of Linguistics, UiT The Arctic University of Norway, as well as by members of the language community. In particular, Ciprian-Virgil Gerstenberger compiled the data set from the entire SIKOR Kven corpus version 2015-08-30. The length of the N-grams ranges from unigrams (single words) to tri-grams (25961 unigrams, 78497 bigrams, 27690 trigrams). Only N-grams within sentences have been counted. The data format follows the ARPA backoff N-gram models and has been generated using SRILM, the SRI Language Modeling Toolkit (http://www.speech.sri.com/projects/srilm/). Since the N-grams have been derived automatically, they may contain wrong values. In case you find any errors the creators would appreciate your feedback sent to giellatekno@uit.no and feedback@divvun.no. Please note that the Giellatekno resources are dynamic in nature. To ensure that you have a completely updated version, please contact Giellatekno (see Contact Info in metadata).
dc.language.iso	fkv
dc.publisher	Giellatekno - Saami Language Technology
dc.rights	Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/
dc.rights.label	CC
dc.source.uri	http://giellatekno.uit.no/index.eng.html
dc.subject	Kven
dc.subject	Ngram
dc.subject	1-gram
dc.subject	2-gram
dc.subject	3-gram
dc.subject	Language Model
dc.title	Kven N-grams
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	Clarino
demo.uri	http://gtweb.uit.no/f_korp
contact.person	Trond Trosterud trond.trosterud@uit.no Giellatekno - Saami Language Technology
size.info	25961 unigrams
size.info	78497 bigrams
size.info	27690 trigrams
files.size	1068154
files.count	1

Files in this item

This item is

Distributed under Creative Commons

and licensed under:
Creative Commons - Attribution 3.0 Unported (CC BY 3.0)

Name: SIKOR_fkv_20150830.lm.zip
Size: 1.02 MB
Format: application/zip

Download file

Show simple item record

Files in this item

Consortium Partners

Repository

More