Proposal Body:
The Interactive Linguistics Databases Project for Lower-division Instruction and Student Research

Contents:

Introduction
How we can improve the educational materials to better meet our educational goals
Overall architecture
Examples by course
Linguistics 101--Introduction to Language
Linguistics 211--Languages of the World
Linguistics 290--Introduction to Linguistic Analysis
Materials applicable to all courses
Upper division applications
Graduate credit
Summary

Introduction

Historically, the Department of Linguistics (Home page) has tended to concentrate its teaching efforts on undergraduate majors and graduate students. In the past few years, in keeping with College- and University-wide initiatives, Linguistics has been renewing and extending its commitment to the lower division/general education components of our program. This involves rethinking the Introduction to Linguistics course (LING 290), which in the past has been thought of as simultaneously a general education course and a foundational introduction for linguistics majors. It also has resulted in the expansion of lower-division courses which are not required components of the major such as LING 211 Languages of the World, and LING 150 Structure of English Words (which exists as a web-based course as well).

A major change in this direction is the separation of the general education and linguistics major functions of the Intro course. Linguistics is traditionally conceived of as constituting a core of analysis and theory concerning the universal principles and instantiations in individual languages of phonology (sound structure), morphology (word structure), syntax (sentence structure) and semantics, and a range of potentially interdisciplinary and applied topics dealing with the actual language behavior by human beings: sociolinguistics (social constraints on and implications of language use), psycholinguistics (language processing, child language acquisition), neurolinguistics (representation of language in the brain), writing systems, etc. These latter topics tend to be of more interest to non-specialists than the analytical techniques and formal principles of core linguistics. This is entirely appropriate. The well-educated citizen doesn't necessarily need to know the difference between ergative and active-stative case- marking patterns, but will be well-served by learning something about how and why people react positively or negatively to particular dialect features, or knowing the typical stages which a child goes through in learning its native language. Many people are interested in questions of what languages are related to one another and where they come from, without necessarily needing to know in detail the scientific principles by which these questions are answered.

In the past we have tried to include in our Intro course both an introduction to analysis and theory and a survey of other topics. Two years ago we introduced a new course, LING 101 Introduction to Language, intended primarily for non-majors; this is intended to introduce students to many of the areas of real-world experience to which linguistics is relevant, with less emphasis on phonological, morphological, and syntactic analysis. As of AY 1999-2000 the 290 course, renamed "Introduction to Linguistic Analysis", will be aimed more at potential linguistics majors and other students of language studies. With most of the interdisciplinary and applied topics covered in 101, 290 can then concentrate on the basic concepts of linguistic analysis. This proposal outlines an initiative which will allow significant improvements in both of these aspects of our undergraduate curriculum.

The courses with which this proposal is particularly concerned are LING 101, Introduction to Language, LING 211, Languages of the World, and LING 290, Introduction to Linguistic Analysis. These include the foundations of both our general education and specialized major programs. The Web facilities which we propose to develop will eventually be used throughout the UG curriculum.

While various faculty have made efforts to include more innovative types of assignments in all of these courses, a basic part of the syllabus remains the classic problem set. A typical linguistics problem involves a small set of data from one or a small number of languages, carefully chosen to explicate a particular point or provide practice in a particular analytical method. While the utility of such exercises cannot be denied, their fundamental artificiality imposes some limits on their pedagogical utility. Compared to what linguists actually do, the data problems which we typically give to beginning students are almost like crossword puzzles, with preselected and formatted data and a set of clues designed to lead the student to a particular solution. Actual research by linguistic scholars does not start with preselected data, carefully extracted from any larger linguistic context.

This proposal envisions the development of a Linguistics Website (The Interactive Linguistics Database) which will include a number of features, some of them using contemporary technology to present standard material in an improved format. But the most important feature will be a coded and searchable database of material from a broad range of languages, organized so as to allow instructors to design assignments with more resemblance to actual linguistic research.

Some examples will be discussed in subsequent sections. Most of our faculty conduct or have conducted research on relatively little-known languages, and thus have extensive primary source materials for a diverse range of languages. We intend to draw heavily on this material in creating the databases for this project. This will not only provide materials for instructional purposes, but will bring students directly into contact with both the materials and the results of past and current faculty research. We anticipate that this will further provide opportunities for students to involve themselves more directly and individually in faculty-directed research.

Return to Contents

How we can improve the educational materials to better meet our educational goals

Given the limits of the traditional format (i.e. printed paper) there is a drastic limit on how much data can be physically provided to the student, and more crucially, on how much data the student can actually consider and manipulate for an assignment. Obviously, simply taking old problems and posting them on a website, for all the convenience which it may provide, makes no significant difference in the pedagogical limitations of the traditional format.

Consider a typical syntactic typology problem for this level. This problem is intended, among other things, to illustrate to students the phenomenon of word-order correlations. Specifically, languages like Tibetan or Japanese which place the verb at the end of the sentence also typically place prepositions (technically postpositions) after the noun rather than before. Conversely, languages like Irish or English which place the object after the verb typically have prepositions which precede the noun. However, no researcher would reach such a conclusion on the basis of the analysis of a few sentences from only two languages, which is what a problem like this invites the student to do. A more realistic illustration of how we investigate word order correlations would require the student to correlate facts about verb-object order, noun-preposition order, verb-auxiliary order, etc., from a more substantial sample of languages. Such assignments can be developed if the students have access to a database including extensive data from 15-20 languages, with search capacities. In short, this would permit the assignment of more realistic problems. At a lower-division level, students could generate a list of languages with OV order, another of languages with noun-preposition order, and a third of languages with auxiliaries preceding the verb, and determine to what extent these features do or do not correlate.

This would give students a more realistic sense of the statistical nature of many typological principles. In a traditional problem they analyze one or two languages which either do or don't conform to the standard generalizations. Students working with a larger sample can determine for themselves that the correlation between features is strong but not perfect, and that some correlations are stronger than others. While this sort of material is, of course, covered in lecture, there is clearly significant pedagogical value in giving students convenient access to the data and watch the conclusions emerge from it.

Additionally, traditional problem sets are decontextualized and impovershed representations of language behavior. As such, the process of linguistic analysis often becomes decoupled from the basic activity of language behavior. We would like to at least partially address this last concern by providing more linguistic material in audio and even video format so that students can have more of a feel for 1) how each language is actually produced/what it sounds like (hearing actual language makes it more "real"); and 2) how one analyses a language when starting from the original recorded behavior.

Return to Contents

Overall architecture

The Interactive Linguistics Databases (ILD) project will create an interlinked set of databases which can be used in search queries. (Examples of these are provided elsewhere in this document; you can skip directly to these examples by clicking on the hyperlinks in the next two paragraphs.)

Most of these databases will be used for class assignments and reasonably simple lower-division research projects. The same data can be used for a number of different assignments (even across diverse courses!) and will be largely drawn from the data of faculty research. Some databases will be set up such that the students themselves can input data (eg. the sociolinguistics assignment for LING 101) and comment on other student contributions.

Additionally, we expect to build in a number of linguistic study aids. For example, maps of languages and linguistic features and audio files of the International Phonetic Alphabet.

Most department members have extensive primary source materials for diverse and often little-known languages. We expect to draw on this material heavily in creating the materials for this project. Since all of these databases have the advantage of being indefinitely augmentable, we expect the value of this project to grow proportionate to the data inputted. We further anticipate that faculty members will continue to provide data from their research for this project long after the funding of this grant expires. (The department has committed to long term support of the project.)

The use of primary language data from the faculty not only provides materials for instructional purposes, but brings students directly into contact with the results of faculty research. This project should ultimately make the preparation of course materials more efficient and ensure greater comparability of topics and assignments across different professors. In addition to having greater contact with research materials, students attending any single course will have exposure to a richer source of data, including audio-video materials, than would be possible with materials assembled for any single class.

Return to Contents

Examples by course

Linguistics 101--Introduction to Language

In 1998, LING 101 was taught with a minor Web component: Review notes and assignments were given as web pages and the syllabus could be updated efficiently. However, in this course there are a number of topics which could be better demonstrated through an elaborated set of web pages than is possible in the traditional lecture format.

For example, non-verbal communication is a topic of great interest to many students. Students are particularly and appropriately interested in American Sign Language and the nature of manual gestures accompanying speech. These can only be demonstrated in a stilted manner in lecture which scarcely constitutes actual exposure to non-verbal communication. Further, recent advances in this field of study have only been possible because of the ability to view details of videotaped sign language and gesture repeatedly. It is simply impossible to see what is gestured in adequate detail without replaying the videotaped segment many many times. Digitized video segments from real sign language and gestural behavior can be loaded on a high-speed web server. Assuming that the appropriate software has been loaded on the browser, this can be viewed as many times as is necessary for each student. A database of such video segments (complete with text information about the data source, etc.) can be used for hands-on research assignments even by completely introductory students. The more ambitious can attempt to replicate the findings of recent research (e.g. Pederson's head nodding research) on same or similar videotaped material.

The integration of web databases into the course curriculum will also prove valuable for course assignments. For example, the phonetics/phonology assignment last year gave students physical copies of audio-taped material for transcription. Many students had difficulty with the tapes or with using their home tape players as transcription devices. With a web database of digitized speech samples, students could select from a wider array of speech types and would have greater pause/replay capacity than is possible with a mechanical tape player. Again, this assumes that the browser has appropriate software and hardware for audio playback. As extra credit, students could find their own speech samples to contribute to the course database.

Another assignment from LING 101 was a sociolinguistic/lexicon assignment. Students were to interview speakers from a community to which they did not belong and collect slang or jargon terms specific to that community. This was a popular and useful assignment which gave students a hands-on feel for real data collection. Unfortunately the net result was only a set of papers to grade. Interest in this process would increase dramatically if we create an open-ended database of current slang and jargon terms which students would contribute to (as well as be able to add to the contributions of others). The database would be cumulative in that the contributions of previous classes would remain in the database. With the passing of years, the database would become a substantial work of lexicography in its own right, created by the introductory students. This would require some database and server-side programming which would need to be outsourced.

Return to Contents

Linguistics 211--Languages of the World

In 1998 LING 211 was taught with a fairly significant Web component. Students were provided with a page of links which they used for research for written assignments. They were also provided with excerpts from lecture notes including data discussed in lecture and detailed information about language families not easily available in print format. Students were quite enthusiastic about this use of the Web, but the briefest examination of the materials posted makes it clear that, except for the links page (which we plan to expand and augment with a search engine), this use of the Web is an advance over traditional paper-and-print formats in convenience and accessibility, but not in any substantive sense beyond that. We can see a range of ways in which the Web could be used much more imaginatively.

In the first place, there are many ways in which current technology permits presentation of materials which otherwise are cumbersome or impossible to present to a large audience. For example, students learning about the linguistic diversity of the world should--and usually want to--hear actual samples of some different languages. Traditionally, this involves either sending students to the language lab, or taking time in lecture to play taped samples (with all the problems associated with using AV materials in large classrooms). Digitized samples available on the Web provide a much more efficient and convenient channel for presentation of this kind of information.

But we can go well beyond this sort of improvement in delivery, and develop innovative ways of bringing students into the world of linguistic research, toward an understanding of how we come to know what they are being taught. A major thread of this course is linguistic geography--how languages are distributed through the world, and how they came to be where they are. In the past we have addressed this sort of material in lecture using the tried-and-true devices of wall maps and pointers. This can be improved, of course, using technology to develop more appropriate linguistic maps, highlighting the details emphasized in lecture, which can then be both projected in lecture and posted on the Web. Beyond this, however, we can envision interactive problem sets which lead students through the process of reconstructing prehistoric population movements from linguistic geographic data.

For example, the native languages of western North America present a set of characteristic features of greater or lesser geographical distribution. For example, languages from southeastern Alaska down to the Yucatan Peninsula have an unusual type of sound called ejective stops. A smaller set of languages from southern British Columbia through northern California, have glottalized nasal and liquid consonants. Still more limited in geographical extent are elaborated sets of liquid (roughly, l-like) sounds, which occur from the Columbia River north to Canada.

Shared features of this sort are evidence for contact among the languages involved; very arbitrary features of morphology and syntax are evidence for close or prolonged contact. (See DeLancey 1996 for a relevant example of this kind of argument). Thus when we find extended feature complexes shared by languages which are not now in close geographical contact--e.g. particular patterns of morphological structure in Tsimshian (SE Alaska) and Wintu (Sacramento Valley)--we have evidence which can be used to reconstruct prehistoric locations and movements of languages and their speakers.

If we place on the Web a language map of western North America, with an associated database listing particular linguistic features of interest for each language. It should then be possible to program an interactive interface which would allow the user to generate versions of the map with all languages showing a particular feature (or set of features) highlighted. This, in turn, permits the student to generate hypotheses about the relative chronology of the spread of different features, and to work backwards toward earlier reconstructed language maps.

Return to Contents

Linguistics 290--Introduction to Linguistic Analysis

Beginning in Fall 1999, LING 290 is being renamed and revamped. While it was formerly a survey course of the major subfields within linguistics (cf. the syllabus from Winter 1999). In future, the course will focus more tightly on the technical details of linguistic analysis and problem solving. Traditionally, such courses have taught linguistic problem solving by giving students idealized and reduced single-page data sets applicable only to a specific question. As per the example discussed under LING 211, these problems do exemplify specific points, but they present rarified, decontextualized, and often misleading data which fails to give the students the feel of linguistic research. Since the faculty (and many graduate students) in our department have a wealth of material on various languages, it would be invaluable to post much of this material in a searchable database format for student assignments. Rather than being spoon-fed puzzle-like language data, students will systematically search through the database to find the relevant examples. Once a problem is solved, they can be asked to find further examples, or even possible counter-examples. Comparison across languages becomes a tool available for the students, rather than an ability limited to senior researchers.

Even when problems are constrained to reasonably tight data sets, these data can be linked to primary language material which is also available on the web site. Such cross-referencing allows for more open-ended assignments which can serve as small practice learning and research assignments. Additionally, this relates the course work to the research materials of the members of the department which should make the entire process of linguistic analysis seem far less abstract (a common complaint in the field). We expect the majority of the faculty and many graduate students to contribute materials and research questions for this purpose. While the grant period will allow this database to be established, we anticipate that the database will grow continuously over the subsequent years. This will allow greater consistency across different instructors as well as enriching the amount of available material for the students.

Return to Contents

Materials applicable to all courses

There are a number of databases and web pages which can be of use to all of the above courses (as well as to some upper division courses). Here we mention the two we envision most immediately creating.

Typological feature database

Since each language can be described in terms of a number of structural features (inventory of sounds, word order, etc.), a database of these features can be created and linked to the previously discussed materials. This has simply not been available for pedagogical purposes in the past, but we see it as quite valuable in fostering research-centered instruction. The advantages of a web-based database is that it can be interactively structured. Students can customize search queries; they can use the same data to investigate a number of different problems; further, the web format can provide immediate links to primary data sources (transcribed or digitized audio speech).

The International Phonetic Alphabet (IPA)

The IPA notation system for writing spoken language is an indispensible part of any linguistics course. Depending on the level and topic of the course, varying knowledge of the IPA is required. Currently, students are given a photocopy of the master chart, told how to read the chart, and they perhaps hear an instructor or a GTF attempt to produce some of the sounds. Many students have asked for sound recordings to be made available so that they can better learn the IPA system and (importantly) so that they have some experiential basis for discriminating, e.g., between an implosive and an ejective consonant. A CD is now available which provides digitized audio for all of the sounds represented by the IPA. We propose acquiring a license for this CD so that we could mount it on our server with a link to each sound from a master IPA chart. With this, a student could simply click on any phonetic symbol and immediately hear the sound which that symbol represents.

Return to Contents

Upper division applications

Once this project has established lower-division course and research relevant materials, it can be readily expanded with materials allowing similar improvements in the upper division curriculum. We see LING 460 (Historical and Comparative Linguistics) and LING 451/452 (Syntax and Semantics I and II) as courses which would particularly benefit from an expanded and interactive linguistic database.

Graduate credit

Since many of our graduate students are talented researchers also working with primary data on a wealth of languages, we envision that they could elect to contribute to the project. This participation in the undergraduate curriculum is of obvious value to students who foresee a future in college/university teaching (the majority of our students). They could receive graduate unit credit for this work.

Summary

This project is ambitious--although not so much because the scale of the project is particularly daunting, but because a linguistic database for pedagogical purposes is a fairly novel idea. With the ready availability of high-speed internet connections, fast CPU speeds, digitized audio-video standards, as well as user-friendly search engines and database interfaces, we feel that this is an opportunity which calls to be developed.

The webserver "Logos" for the linguistics department currently has a limited number pages which introduce a few language concerns to the general public. These pages receive "hits" from approximately 200 individuals per day. While some sections of the proposed databases may need to have restricted access, we feel that having the database publically accessible will help propogate awareness of linguistic methods far beyond those who register for linguistic courses at the University of Oregon.

Return to Contents

Return to main index…