Show simple item record

 
dc.contributor.author Velldal, Erik
dc.contributor.author Øvrelid, Lilja
dc.contributor.author Bergem, Eivind Alexander
dc.contributor.author Stadsnes, Cathrine
dc.contributor.author Touileb, Samia
dc.contributor.author Jørgensen, Fredrik
dc.date.accessioned 2017-10-25T08:43:26Z
dc.date.available 2017-10-25T08:43:26Z
dc.date.issued 2017-10-23
dc.identifier.uri http://hdl.handle.net/11509/124
dc.description While the NoReC dataset was primarily created for training and evaluating models for document-level sentiment analysis, many other use cases are of course possible. The corpus comprises more than 35,000 full-text reviews extracted from eight different major Norwegian news sources: Dagbladet, VG, Aftenposten, Bergens Tidende, Fædrelandsvennen, Stavanger Aftenblad, DinSide.no and P3.no. The reviews cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. The texts have been pre-processed using UDPipe and are distributed in the CoNLL-U format. However, we also provide HTML files with the raw texts. Documentation and an accompanying Python package are provided through the following git repository: https://github.com/ltgoslo/norec
dc.language.iso nno
dc.language.iso nob
dc.language.iso nor
dc.publisher Department of Informatics, University of Oslo
dc.rights Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc/3.0/
dc.rights.label CC
dc.source.uri https://github.com/ltgoslo/norec
dc.subject sentiment analysis
dc.subject opinion mining
dc.subject reviews
dc.subject news
dc.subject norwegian
dc.title NoReC: The Norwegian Review Corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarino
contact.person Erik Velldal erikve@ifi.uio.no Department of Informatics, University of Oslo
sponsor Research Council of Norway 270908 SANT: Sentiment Analysis for Norwegian Text nationalFunds
size.info 35194 articles
size.info 837914 sentences
size.info 14819248 tokens
size.info 35194 texts
files.size 230563840
files.count 1


 Files in this item

This item is
Distributed under Creative Commons
and licensed under:
Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
Attribution Required Noncommercial
Icon
Name
norec-1.0.0.tar.gz
Size
219.88 MB
Format
application/gzip
Description
NoReC: Norwegian Review Corpus (version 1.0.0)
 Download file

Show simple item record