CLARINO Bergen Centre
http://hdl.handle.net/11509/2
2024-03-16T17:31:45ZWAB XML transcriptions of Wittgenstein's Nachlass > 2nd subset of 15000 pages with restricted license
http://hdl.handle.net/11509/149
WAB XML transcriptions of Wittgenstein's Nachlass > 2nd subset of 15000 pages with restricted license
Wittgenstein, Ludwig; The Wittgenstein Archives at the University of Bergen (WAB)
During his lifetime, the Austrian-British philosopher Ludwig Wittgenstein (1889–1951) published
only one philosophical book, the Logisch-philosophische Abhandlung / Tractatus logico-philosophicus (1921/22), and the Dictionary for Elementary Schools (1926). However, on his death in 1951, he left behind a significant 20,000 page corpus of unpublished philosophical notebooks, manuscripts, typescripts and dictations. This corpus is called "Wittgenstein's Nachlass".
The Wittgenstein Archives at the University of Bergen (WAB, http://wab.uib.no/) was established in 1990 and has produced a machine-readable version of Wittgenstein's Nachlass in the form of facsimiles and transcriptions. At present the transcriptions are maintained in XML TEI format.
In terms of licensing, WAB's transcriptions of the Wittgenstein Nachlass are organized in two sub-parts under two different licenses. Please note that the sub-part made available here (the part that was not already made available in the 1st subset) is licensed under the restricted license Clarin ACA-NC-NORED.
Two sets of files are made available. One with the character entity encodings already converted, the other with the character entity encodings retained. Example: In set 2a the encoding "&p.es;" for period at the end of sentence is already converted to ".".
For HTML transformations of WAB's XML transcriptions, visit the Wittgenstein Source Bergen Nachlass Edition (BNE) http://www.wittgensteinsource.org/ (static outputs) or http://wittgensteinonline.no/ (Interactive dynamic presentation).
Copyright holders: The Master and Fellows of Trinity College, Cambridge; University of Bergen, Bergen
2022-02-15T00:00:00ZWittgenstein Archives at the University of Bergen (WAB): WiTTLex - The WiTTFind Lexicon of Wittgenstein’s Nachlass, with Frequency Lists and Indication of the Words’ Sources in the Nachlass
http://hdl.handle.net/11509/148
Wittgenstein Archives at the University of Bergen (WAB): WiTTLex - The WiTTFind Lexicon of Wittgenstein’s Nachlass, with Frequency Lists and Indication of the Words’ Sources in the Nachlass
Wittgenstein, Ludwig; The Wittgenstein Archives at the University of Bergen (WAB); The Center for Information and Language Processing at the Ludwig-Maximilians-University in Munich (CIS); Guenthner, Franz; Röhrer, Ines; Pichler, Alois; Hadersbeck, Max
WiTTLex - The WiTTFind Lexicon of Wittgenstein’s Philosophical Nachlass, with Frequency Lists and Indication of the Words’ Sources in the Nachlass
WiTTLex is an electronic dictionary of Wittgenstein's Nachlass. It is the fruit of a long-lasting cooperation between WAB and CIS on applying the CISLEX dictionary system developed by Franz Guenthner to WAB’s transcriptions of the Wittgenstein Nachlass and creating the Wittgenstein search engine WiTTFind. WiTTLex has become possible thanks to contributions by and the cooperation between:
- Franz Guenthner (CISLEX)
- Staff at CIS (coordinated by Max Hadersbeck)
- Staff at WAB (coordinated by Alois Pichler)
- Staff at the University of Bergen Library (Øyvind Liland Gjesdal)
- Students at CIS (coordinated by Max Hadersbeck)
WiTTLex was prepared for publication by Ines Röhrer (CIS) and Alois Pichler (WAB).
Rightholders: Trinity College Cambridge, University of Bergen, LMU Munich, Franz Guenthner (CISLEX)
2021-06-30T00:00:00ZNorwegian Sign Language Corpus – Pilot Corpus (Conversations)
http://hdl.handle.net/11509/147
Norwegian Sign Language Corpus – Pilot Corpus (Conversations)
Ferrara, Lindsay; Bø, Vibeke
The Norwegian Sign Language Corpus is a collection of four datasets, collected at different times and for different projects:
-- The first dataset was collected as part of a doctoral research project in 2007 (Halvorsen, 2012).
-- The second dataset was collected in 2015 as part of a pilot Norwegian Sign Language Corpus project.
-- The third dataset was collected 2017-2018 for a project investigating visual perspective in spatial language.
--The fourth dataset is currently being collected (2019-2024) with the aim of establishing a larger and more representative corpus for Norwegian Sign Language, in order to investigate the semiotic diversity of signed interactions.
Currently, the first three datasets are archived in CLARINO as four items according to project and license:
--Norwegian Sign Language Corpus – Halvorsen (2012) https://repo.clarino.uib.no/xmlui/handle/11509/141
--Norwegian Sign Language Corpus – Depicting Perspective https://repo.clarino.uib.no/xmlui/handle/11509/144
--Norwegian Sign Language Corpus – Pilot Corpus (Narratives, Monologues)
--Norwegian Sign Language Corpus – Pilot Corpus (Conversations)
Each deposit contains data in the form of video-recordings and metadata files. These video-recordings are being annotated in ELAN according to the Norwegian Corpus Annotation Guidelines. Annotated ELAN files are archived elsewhere (see project publications for details). See the Corpus project’s website for additional information and details. For other questions or to receive a current version of the annotation guidelines, please contact the Corpus manager (currently Lindsay Ferrara, NTNU).
-------------------------------------------------------------------------------------------------------
Specific summary for this dataset: Norwegian Sign Language Corpus – Pilot Corpus (Conversations)
License: CLARIN RES (CLARIN RES+PLAN+BY+NC+INF+PRIV+NORED+ND)
*This dataset comes with a CLARIN restricted license that contains a number of conditions. See the file ‘PilotConvos_LicenseRestrictions.rtf’ for full details. This license requires that a research plan be submitted to the Corpus manager explaining how the data will be used. By accepting this license, you are stating that you have received access permission from the Corpus manager. Please also note that this content is available for non-commercial purposes only.
In 2014, a pilot corpus project was funded by the Norwegian Ministry of Culture to Vibeke Bø at OsloMet. The aim of this pilot project was to film a small number of elderly signers in order to introduce the community and researchers to signed language corpus work and research as well as to begin documenting the language practices of this cohort of the deaf community. Colleagues from OsloMet and Lindsay Ferrara from NTNU came together for the project, and seven elderly deaf signers from Oslo, Bergen, and Trondheim were filmed together in various constellations doing a variety of language-based activities.
The signers were contacted and invited through personal networks. Data collection occurred in Oslo over two days, on the campus of OsloMet (which meant that travel for participants living outside of Oslo was arranged). After an information and consent process, and a filling out of a background questionnaire, the participants engaged in a variety of activities involving other participants and/or one or two deaf members of the project team (Lise Marie Nyberg and Odd-Inge Schröder). The activities archived in this deposit include:
• Conversations between a participant and a deaf interlocutor from the project team about issues relevant to the deaf community
• Group conversations with participants from each city, plus one or two deaf members of the project team
• Conversation with all participants and two deaf members of the project team (total 9 signers)
Supervisuell A deaf-run film company was hired to manage the filming during the data collection. Between one and four cameras was used to record the participants as they engaged in the various activities. Ingvild Wilson Skjong also provided technical and administrative support during the data collection.
2022-04-28T00:00:00ZELMCIP Electronic Literature Knowledge Base: Critical Writing
http://hdl.handle.net/11509/146
ELMCIP Electronic Literature Knowledge Base: Critical Writing
ELMCIP project - Electronic Literature as a Model of Creativity and Innovation in Practice
The database ELMCIP Critical writing includes monographs, book chapters, journal articles, reviews etc. written about electronic literature or referenced in electronic literature criticism, as well as non-traditional forms of scholarly discourse, such as video interviews, documentaries and webtexts about electronic literature. The title is the name of the specific work described in the record (e.g. the title of an article if the record is for an article in a journal or a book). Column titles in the data correspond to the data fields of the ELMCIP database, of which this dataset is an extract of selected fields.
See for example this object https://elmcip.net/node/409.
A PDF document enclosed shows the complete set of ELMCIP data fields and their interrelations.
See also: ELMCIP Electronic Literature Knowledge Base: Creative Works: http://hdl.handle.net/11509/145
Some nodes in Creative Work may correspond to nodes in the dataset Critical Writing.
Note that individual entries may have further specifications of licence.
The data are deposited in Clarino Bergen Centre Repository as a part of the infrastructure upgrade project Clarino+.
Developing a Network-Based Creative Community: Electronic Literature as a Model of Creativity and Innovation in Practice (ELMCIP) was a three-year (June 2010-June 2013) collaborative research project funded by HERA, the Humanities in the European Research Area framework, sponsored by EU FP7 and the national research councils of the countries participating in the framework. The project has involved researchers from seven institutions in six European nations, who together have produced seven events including seminars, workshops and the Remediating the Social conference and exhibition. The ELMCIP Electronic Literature Knowledge Base has continued to be developed in the years since the end of the project, and is still updated with new data.
See: https://elmcip.net/sites/default/files/media/critical_writing/attachments/rettberg_bootstrapping.pdf
2022-07-08T00:00:00ZELMCIP Electronic Literature Knowledge Base: Creative Works
http://hdl.handle.net/11509/145
ELMCIP Electronic Literature Knowledge Base: Creative Works
ELMCIP project - Electronic Literature as a Model of Creativity and Innovation in Practice
The ELMCIP Creative Works database contains works of electronic literature, digital literary art, and print antecedents. Column titles in the data correspond to the data fields of the ELMCIP database, of which this dataset is an extract of selected fields.
See for example this object https://elmcip.net/node/409.
A PDF document enclosed shows the complete set of ELMCIP data fields and their interrelations.
Some nodes in Creative Work may correspond to nodes in the dataset Critical Writing.
Note that individual entries may have further specifications of licence.
The data are deposited in Clarino Bergen Centre Repository as a part of the infrastructure upgrade project Clarino+. See also the ELMCIP Critical Writing dataset: http://hdl.handle.net/11509/146
Developing a Network-Based Creative Community: Electronic Literature as a Model of Creativity and Innovation in Practice (ELMCIP) was a three-year (June 2010-June 2013) collaborative research project funded by HERA, the Humanities in the European Research Area framework, sponsored by EU FP7 and the national research councils of the countries participating in the framework. The project has involved researchers from seven institutions in six European nations, who together have produced seven events including seminars, workshops and the Remediating the Social conference and exhibition. The ELMCIP Electronic Literature Knowledge Base has continued to be developed in the years since the end of the project, and is still updated with new data.
See: https://elmcip.net/sites/default/files/media/critical_writing/attachments/rettberg_bootstrapping.pdf
2022-07-08T00:00:00ZNorwegian Sign Language corpus – Depicting Perspective
http://hdl.handle.net/11509/144
Norwegian Sign Language corpus – Depicting Perspective
Ferrara, Lindsay; Ringsø, Torill
The Norwegian Sign Language Corpus is a collection of four datasets, collected at different times and for different projects:
-- The first dataset was collected as part of a doctoral research project in 2007 (Halvorsen, 2012).
-- The second dataset was collected in 2015 as part of a pilot Norwegian Sign Language Corpus project.
-- The third dataset was collected 2017-2018 for a project investigating visual perspective in spatial language.
--The fourth dataset is currently being collected (2019-2024) with the aim of establishing a larger and more representative corpus for Norwegian Sign Language, in order to investigate the semiotic diversity of signed interactions.
Currently, the first three datasets are archived in CLARINO as four items according to project and license:
--Norwegian Sign Language Corpus – Halvorsen (2012) https://repo.clarino.uib.no/xmlui/handle/11509/141
--Norwegian Sign Language Corpus – Depicting Perspective
--Norwegian Sign Language Corpus – Pilot Corpus (Narratives, Monologues)
--Norwegian Sign Language Corpus – Pilot Corpus (Conversations)
Each deposit contains data in the form of video-recordings and metadata files. These video-recordings are being annotated in ELAN according to the Norwegian Corpus Annotation Guidelines. Annotated ELAN files are archived elsewhere (see project publications for details). See the Corpus project’s website for additional information and details. For other questions or to receive a current version of the annotation guidelines, please contact the Corpus manager (currently Lindsay Ferrara, NTNU).
-------------------------------------------------------------------------------------------------------
Specific summary for this dataset: Norwegian Sign Language Corpus – Depicting Perspective License: CLARIN RES (CLARIN RES+PLAN+BY+NC+INF+PRIV+NORED+ND)
*This dataset comes with a CLARIN restricted license that contains a number of conditions. See the file ‘DPNTS_LicenseRestrictions.rtf’ for full details. This license requires that a research plan be submitted to the Corpus manager explaining how the data will be used. By accepting this license, you are stating that you have received access permission from the Corpus manager. Please also note that this content is available for non-commercial purposes only.
**Some data in this dataset come with additional restrictions and are thus not included in the archived data here. Please contact the corpus manager to request access to these files.
This dataset was collected as a follow-up to a project on L2 signing (Ferrara & Nilsson, 2017; Ferrara, 2019). The focus of this new project was on how signers establish and maintain different visual perspectives (see Ferrara & Ringsø 2019 for initial findings from the project). Data collection was carried out in three Norwegian cities (Oslo, Bergen, and Trondheim) and involved 21 young and middle-age deaf signers. The deaf participants were recruited through personal and professional networks and they were recorded in conversational settings with one or two other interlocutors. Each conversation involved one interlocutor from the project team. In some cases, this interlocutor was a hearing, native signer, while in others it was a deaf native deaf signer. While the conversations were to be as naturalistic as possible, the interlocutor who was a part of the project team was to leverage any opportunities that arose to talk about topics with spatial relevance. Each session began with information about the project, after which the participants filled in a background questionnaire and signed a consent form. Then, each video-recorded conversation lasted approximately 30-40 minutes. Depending on the number of participants and the space available, between 1-2 cameras was used. In total, 13 conversations lasting 7.5 hours were video recorded.
Ferrara, L. (2019). Coordinating signs and eyegaze in the depiction of directions and spatial scenes by fluent and L2 signers of Norwegian Sign Language. Spatial Cognition and Computation: An Interdisciplinary Journal, 9(3), 220-251. https://doi.org/10.1080/13875868.2019.1572151
Ferrara, L., & Nilsson, A.-L. (2017). Describing spatial layouts as an M2 signed language learner. Sign Language and Linguistics, 20(1), 1-26. https://doi.org/10.1075/sll.20.1.01fer
Ferrara, L., & Ringsø, T. (2019). Spatial vantage points in Norwegian Sign Language. Open Linguistics, 5, 583-600. https://doi.org/10.1515/opli-2019-0032
2021-12-22T00:00:00ZWAB XML transcriptions of Wittgenstein's Nachlass > 1st subset of 5000 pages with license CC BY-NC 3.0
http://hdl.handle.net/11509/143
WAB XML transcriptions of Wittgenstein's Nachlass > 1st subset of 5000 pages with license CC BY-NC 3.0
Wittgenstein, Ludwig; The Wittgenstein Archives at the University of Bergen (WAB)
During his lifetime, the Austrian-British philosopher Ludwig Wittgenstein (1889–1951) published
only one philosophical book, the Logisch-philosophische Abhandlung / Tractatus logico-philosophicus (1921/22), and the Dictionary for Elementary Schools (1926). However, on his death in 1951, he left behind a significant 20,000 page corpus of unpublished philosophical notebooks, manuscripts, typescripts and dictations. This corpus is called "Wittgenstein's Nachlass".
The Wittgenstein Archives at the University of Bergen (WAB, http://wab.uib.no/) was established in 1990 and has produced a machine-readable version of Wittgenstein's Nachlass in the form of facsimiles and transcriptions. At present the transcriptions are maintained in XML TEI format.
In terms of licensing, WAB's transcriptions of the Wittgenstein Nachlass are organized in two sub-parts under two different licenses. The sub-part made available here is licensed under CCPL BY-NC 3.0. It contains Wittgenstein Nachlass items Ts-201a1, Ts-201a2, Ms-139a, Ts-207, Ms-114, Ms-115, Ms-153a, Ms-153b, Ms-154, Ms-155, Ms-156a, Ms-148, Ms-149, Ms-150, Ts-212, Ts-213, Ms-141, Ms-152 and Ts-310, amounting in total to ca. 5,000 pages of the Nachlass. This part was made available under a CCPL BY-NC license within the framework of the European project Digital Semantic Corpora for Virtual Research in Philosophy (Discovery, 2006-09) and Open Scholarly Communities on the Web (COST A32, 2006-10).
Two sets of files are made available. One with the character entity encodings already converted, the other with the character entity encodings retained. Example: In set 1a the encoding "&p.es;" for period at the end of sentence is already converted to ".".
For HTML transformations of WAB's XML transcriptions, visit the Wittgenstein Source Bergen Nachlass Edition (BNE) http://www.wittgensteinsource.org/ (static outputs) or http://wittgensteinonline.no/ (Interactive dynamic presentation).
Copyright holders: The Master and Fellows of Trinity College, Cambridge; University of Bergen, Bergen
2022-02-14T00:00:00Z[MCSQ]: The Multilingual Corpus of Survey Questionnaires
http://hdl.handle.net/11509/142
[MCSQ]: The Multilingual Corpus of Survey Questionnaires
Zavala Rojas, Diana; Sorato, Danielly; Hareide, Lidun; Hofland, Knut
The Multilingual Corpus of Survey Questionnaires (MCSQ) is the very first publicly available multilingual database comprised of international survey texts. Its latest version (Rosalind Franklin), is composed of 306 distinct questionnaires comprising approximately 766.000 sentences and includes new annotations and datasets to the corpus.
The MCSQ is compiled from the following surveys and their versions:
-Questionnaires from ESS, the EVS, the SHARE, and the WageIndicator surveys of different years
European Social Survey (ESS): Round 1, Round 2, Round 3, Round 4, Round 5, Round 6, Round 7, Round 8, Round 9
Survey of Health, Ageing and Retirement in Europe (SHARE): Round 7, Round 8 and COVID-19 questionnaire
European Values Study (EVS): Wave 2, Wave 3, Wave 4, Wave 5
WageIndicator (WIS): Round 1 and COVID-19 questionnaire
- Texts of the questionnaires in 9 languages: source language (English) and their translations into Catalan, Czech, French, German, Norwegian, Portuguese, Spanish, and Russian
- Part-of-Speech and named entity recognition (NER) annotated texts (annotation conducted automatically)
- Texts of the questionnaire translations (Catalan, Czech, French, German, Norwegian, Portuguese, Spanish, and Russian) sentence-aligned in respect to the source
Metadata was added to the corpus by the attribution of segment level variables (e.g., survey item ID, item name). A survey item can be decomposed into introduction, instruction, request, and response segments referred to as 'item types' in the MCSQ.
Additionally, the modules and item names, the year, the country and language, and the wave/round of the surveys are available as metadata.
Each survey is uniquely identified with a survey identifier (ID) that obeys the following nomenclature: SSS_RRR_YYYY_LLL_CC, where SSS is a 3-character code that refers to the name of the survey project (ESS, EVS, WIS, or SHARE), RRR is a 3-character code that refers to the round/wave of the survey, YYY is a 3-digit code that refers to the year, LLL and CC are the ISO 639- 2/B three-character standard and the ISO 3166 Alpha-2 two-character standard codes that refer to the language and the country of the survey. For instance, the ESS round 3 ( 2006 ), written in English from the Great Britain questionnaire has the following survey identifier (ID): ESS_R03_2006_ENG_GB. The same base nomenclature is used for uniquely identifying each sentence of a given survey, i.e. the survey item identifier (ID). For the survey item ID, a sequential number i that identifies each text segment in a questionnaire is added to the end of the survey ID (SSS_RRR_YYYY_LLL_CC_i). So for instance, survey item ID that identifies the eleventh sentence of the ESS round 3 ( 2006 ), written in English from the Great Britain questionnaire (ESS_R03_2006_ENG_GB) would be ESS_R03_2006_ENG_GB_10 (the IDs start from 0).
2021-09-01T00:00:00ZNorwegian Sign Language Corpus - Halvorsen (2012)
http://hdl.handle.net/11509/141
Norwegian Sign Language Corpus - Halvorsen (2012)
Ferrara, Lindsay; Halvorsen, Rolf PIene
The Norwegian Sign Language Corpus is a collection of four datasets, collected at different times and for different projects:
– The first dataset was collected as part of a doctoral research project in 2007 (Halvorsen, 2012).
– The second dataset was collected in 2015 as part of a pilot Norwegian Sign Language Corpus project.
– The third dataset was collected 2017-2018 for a project investigating visual perspective in spatial language.
– The fourth dataset is currently being collected (2019-2024) with the aim of establishing a larger and more representative corpus for Norwegian Sign Language, in order to investigate the semiotic diversity of signed interactions.
Currently, the first three datasets are archived in CLARINO as four items according to project and license:
– Norwegian Sign Language Corpus – Halvorsen (2012)
– Norwegian Sign Language Corpus – Depicting Perspective https://repo.clarino.uib.no/xmlui/handle/11509/144
– Norwegian Sign Language Corpus – Pilot Corpus (Narratives, Monologues)
– Norwegian Sign Language Corpus – Pilot Corpus (Conversations)
Each deposit contains data in the form of video-recordings and metadata files. These video-recordings are being annotated in ELAN according to the Norwegian Corpus Annotation Guidelines. Annotated ELAN files are archived elsewhere (see project publications for details). See the Corpus project’s website for additional information and details. For other questions or to receive a current version of the annotation guidelines, please contact the Corpus manager (currently Lindsay Ferrara, NTNU).
--------------------------------------------------------------------------------------------------------------
Specific summary for this dataset: Norwegian Sign Language Corpus – Halvorsen (2012)
License: CC BY-NC-SA 4.0, https://creativecommons.org/licenses/by-nc-sa/4.0/
The data was collected in 2007 for the purposes of a doctoral research project about boundary markers in Norwegian Sign Language (Halvorsen, 2012). Four signers were filmed: two men and two women, both young and old. They are all deaf with deaf parents, siblings, or other family members. They live in central Eastern Norway, and all have gone to the deaf school in the area.
The signers were asked to retell a children’s picture book entitled “Frog, Where Are You?” (Mayer, 1969) and also respond to the question “What happened on 9/11 and what did you do?” Video recordings of the signers were made in a studio, and sessions were led by a deaf adult man who is an L1 signers of Norwegian Sign Language. No other people were present during the recordings. In total, there are eight video clips totaling about 18 minutes.
---------------------
Dataene ble samlet som materiale til en Ph.D.-studie av grensemarkører i norsk tegnspråk (Halvorsen, 2012). Det er fire deltakere, to menn og to kvinner, alder: unge og eldre. Alle er døve med døve foreldre, søsken og øvrig familie som er døve. Deltakere er fra det sentrale Østlandet. Alle har gått på døveskole i det samme området.
Deltakerne gjenfortalte bildefortellingen «Frog, where are you?» (Mayer, 1969) og svarte på spørsmålet «Hva skjedde 11. september og hva gjorde du?». Opptakene ble gjort i et studio og ledet av en voksen, døv mann med norsk tegnspråk som sitt førstespråk. Andre var ikke tilstede under opptakene. Det er åtte fortellinger på til sammen i underkant av 20 minutter.
2021-10-07T00:00:00ZRandomized extraction of the New Norwegian corpus
http://hdl.handle.net/11509/140
Randomized extraction of the New Norwegian corpus
Gammeltoft, Peder
Randomized extraction of the New Norwegian Corpus (Nynorskkorpuset).
Contains sentences in New Norwegian (Nynorsk) from the year 2000 and after. Tab-separated, one word pr. line, lemmatized and morphologically tagged, year and domain information is given. Annotation is done with the Oslo-Bergen tagger. Sentences in the Bokmål standard have been removed.
This corpus is intended for use in the development of language technology.
Size: 3,3 million sentences, 57,5 million words.
2021-01-29T00:00:00Z