LiLa at UiB

Bilingualism and the status of the Russian language in Ukraine.

Friday March 21, 2014 at 14:15 – 16:00 (HF: 217)

Marta Szytmaniuk (Gdansk, Poland)

The aim of the lecture is to present the problem of bilingualism and status of the Russian language in Ukraine in the turn of 20th and 21st century. First of all, I would like to focus on specific historical conditioning of Ukrainian bilingualism. Secondly, I will introduce the main points of the language policy and analyze the language of the media in Ukraine. I will also share my research results. The data for this study was collected with the help of a survey I created and sent over to Ukrainians using different types of social media in 2011/2012. Both theoretical and practical parts show that Ukraine is not moving in the right direction in terms of language policy and is not building the prestige of national language and identity.

Språkforskningsinfrastrukturer: en statusrapport

Fredag 7. mars 2014, 14:14–15:45

Koenraad De Smedt, forskergruppe LaMoRe (Language Models and Resources), UiB

Siden CLARIN ble satt på ESFRI-veikartet i 2006 har infrastruktur for språkforskning vært en aktuell målsetning i Europa, også i Norge. CLARIN (Common Language Resources and Technology Infrastructure) har etter hvert blitt til en ERIC (European Research Infrastructure Consortium) der Norge er blitt observatør med CLARINO som nasjonalt prosjekt. UiB er koordinator for CLARINO. Status for CLARINO vil bli skissert i foredraget. Samtidig har CLARIN forbindelser med det såkalte ‘cluster’prosjektet DASISH og det enda bredere prosjektet EUDAT. Etter hvert har det oppstått et variert landskap der ulike aktører bidrar med strategier og metoder for en bedre forvaltning og tilgjengeliggjøring av språklige forskningsressurser som f.eks. tekstkorpus, leksikalske og terminologiske baser, annoterte litterære tekster, historiske arkiver, osv. Det er fortsatt flere problemstillinger knyttet til bruk og gjenbruk av språkdata, bl.a. juridiske og etiske aspekter (mangel på ‘fair use’ i Europa), metadataformater og ansvar for langvarig lagring og drift.

German nominal compounds in Statistical Machine Translation tasks into Spanish

Friday February 28, 2014 at 14:15 – 16:00 (HF: 217)

Compounds in Germanic languages such as German or Norwegian pose a challenge for many Natural Language Processing (NLP) Applications as they can be coined on-the-fly and therefore need to be detected, disambiguated and processed successfully along with the other words in a text being processed by an NLP Application.

The common state-of-the-art strategy to deal with new non lexicalized compounds usually consists on splitting them into their constituents to avoid data scarcity problems. This approach has also been proven to be successful in the case of Statistical Machine Translation (SMT), as reported by Koehn and Knight (2003), Popović et al. (2006), Stymne (2008), Fritzinger and Fraser (2010) and Stymne et al. (2013). However, all experiments involved language pairs between Germanic languages (mainly German, but also Swedish, Danish and Norwegian) and English. I have focused on the statistical machine translation of German nominal compounds into Spanish. Spanish being a morphologically rich language, the state-of-the-art strategy of simply splitting the compounds does not work as well as it does in the case of English and alternative solutions are needed.

In this presentation, I will show the results of the experiments I carried out during my secondment in the RWTH Aachen University in Germany using both the state-of-the-art strategy and also another approach and I will briefly present the work I am currently doing to incorporate my findings and achieve a better outcome.

By Carla Parra Escartín, PhD Candidate, Research Group: Language Models and Resources, LLE

LFG parse disambiguation for Wolof

Friday February 14, 2014 at 14:15 – 16:00 (HF: 217)

In this presentation, I will discuss ambiguity issues arisen during the design and implementation of a computational large-scale LFG grammar for Wolof (Niger-Congo language mainly spoken in Senegal, ca. 10 millions speakers). The LFG grammar has been developed as part of my PhD project which deals with building language resources and tools for Wolof. The grammar is implemented with XLE and tested on natural language data.

In the course of grammar development, many ambiguity issues arose via alternative definitions of morphological and lexical entries, and from syntactic ambiguities. Thus, the discussion will focus on three kinds of ambiguity: 1) morphological, 2) lexical and 3) syntactic ambiguities. I will show how Wolof nouns constitute a typical source of ambiguity due to polysemy and homonymy. I will argue that Wolof nouns exhibit noun class underspecification to be considered as a case of feature indeterminacy (Dalrymple, 2009). Following on from this, I will provide evidence that the noun class structures show similarities with the CASE underspecification phenomena as has been observed for German (Dalrymple, 2009).

Concerning issues related to lexical and syntactic ambiguities, I will discuss disambiguation methods based on Constraint Grammar and the c-structure pruning algorithm of XLE as possible models to increase parsing efficiency and performance. I will describe experiments conducted on Wolof and discuss the benefits of applying such methods to control/reduce ambiguity in the LFG grammar.

By Cheikh Bamba Dione, PhD Candidate, Research Group: Language Models and Resources, LLE

Non-canonical case marking and modality: The case of the Indo-European “gerundive + dative” construction

Friday January 24, 2013 at 14:15 – 16:00 (HF: 217)

In this presentation, I analyze gerundives in combination with the so-called “dative of agent” in six different Indo-European languages, namely Sanskrit, Avestan, Ancient Greek, Latin, Tocharian, and Lithuanian. Consider the following examples from Ancient Greek and Latin:

hēmîn … pánta poiētéa Ancient Greek

us:DAT all:ADJ.NEUT.PL.NOM to-be-done:NEUT.PL.

‘Everything must be done by us’ (Xen. An. 3, 1, 35)

desperanda tibi … concordia Latin

to-be-despaired:NOM you:DAT harmony:NOM

‘You must not despair of harmony’ (Iuv. 6.231)

The gerundives poíētéa and desperanda qualify an entity, which should experience the event expressed by the verbal root from which these gerundives derive. The entity which is in patient-relation with the gerundive, is expressed with the nominative, while the entity carrying out the event is expressed with the dative.

Developing an idea, originally suggested by Hettrich (1990: 64ff), I argue that this particular combination mirrors a construction of Indo-European inheritance. The proposal will be advanced with the aid of the theoretical framework of Construction Grammar in which, the basic unit of language is the Construction, i.e. a form–function correspondence, where no principled distinction between lexical items and complex syntactic structures is assumed. I will provide evidence that the structures investigated show similarities at a morpho-syntactic level (DAT – VB.ADJ (‘be’) – NOM), at a semantic level (modal meaning and low degree of transitivity), and also, to a certain extent, at an etymological level. In sum, they constitute form–meaning pairings, available as units of comparanda, as required by the Comparative Method, and can thus successfully be reconstructed for a common proto-stage.

With regard to the semantics of the construction, I will show that the notion of ‘participant-external modality’ (van der Auwera & Plungian 1998, Narrog 2010) and the modal meaning entailed by the gerundive are crucial for determining the argument structure of the construction. I will argue, therefore, that an analysis involving a dative subject, a nominative object and a modal reading instead of the ‘agentive/passive’ reading, is better equipped to account for the “gerundive + dative” construction than the standard analysis.

Serena Danesi, Postdoc (NonCanCase Project), LLE, University of Bergen.

The Story of ‘Woe’

Friday January 17, 2013 at 14:15 – 16:00 (HF: 217)

In contrast to the received consensus in the historical-comparative linguistic community, we argue that syntactic reconstruction is both a plausible and a feasible enterprise. We illustrate this with an investigation of the syntactic behavior of *wai ‘woe’ across five subbranches of Indo-European, i.e. Indo-Iranian, Italic, Baltic, Slavic and Germanic. The adverbial interjection *wai ‘woe’ is found instantiating three different constructions, which we label: 1) the Bare Exclamative Construction, 2) the Dative Exclamative Construction, and 3) the Predicative Construction. We suggest that the Predicative Construction is archaic in the Indo- European languages, and that the Dative Exclamative Construction has developed from a focalized variant of the Predicative Construction, used in exclamatory context, since ‘woe’ is the quintessential candidate for being focused in situations of adversity. On the basis of the comparative evidence, all three constructions must be reconstructed for Proto-Indo-European, as well as a subject–verb construction, which determines the default word order properties between the subject and the verb, and finally a focus construction where focalized material occurs in first position. We couch our analysis within the formalism of Sign-Based Construction Grammar, establishing beyond doubt that syntactic reconstruction is a viable endeavor within historical-comparative linguistics.

Jóhanna Barðdal, Valgerður Bjarnadóttir, Serena Danesi, Tonya Kim Dewey, Thórhallur Eythórsson, Chiara Fedriani & Thomas Smitherman

Fredagsseminar: Silje Ragnhildstveit

Fredag 10. januar, 2014, 14:15 – 16:00 (HF-bygget 217)

Her presenteres resultater fra pilotstudien til mitt pågående ph.d.-prosjekt om genuskongruens og morsmålspåvirkning (transfer) i norsk innlærerspråk. For å avdekke transfer, sammenligner jeg språkbruken til innlærere med morsmål (S1) som er ulike når det gjelder grammatisk genus og kongruens. Hovedhypotesen er at innlærere som har et S1 uten kongruens (vietnamesisk) vil ha større problemer med genuskongruens enn innlærere som haret S1 med kongruens (tysk, spansk og engelsk). Denne hypotesen springer ut fra en tidligere et S1 med kongruens (tysk, spansk og engelsk). Denne hypotesen springer ut fra en tidligere studie om genustildeling og transfer, der jeg fant at vietnamesiske innlærere hadde signifikant mer korrekt genustildeling enn både tyske, nederlandske, spanske og engelske innlærere. (les mer)