ORCID Profile
0000-0001-7832-6156
Current Organisation
The University of Auckland
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Linguistics not elsewhere classified | Indonesian languages | Phylogeny and Comparative Analysis | Pacific Languages | Historical comparative and typological linguistics | Linguistic structures (incl. phonology morphology and syntax) | Evolutionary Biology | Linguistics |
Expanding Knowledge through Studies of Human Society | Expanding Knowledge in History and Archaeology | Expanding Knowledge in Language, Communication and Culture
Publisher: The Royal Society
Date: 18-03-2009
Abstract: Phylogenetic methods have recently been applied to studies of cultural evolution. However, it has been claimed that the large amount of horizontal transmission that sometimes occurs between cultural groups invalidates the use of these methods. Here, we use a natural model of linguistic evolution to simulate borrowing between languages. The results show that tree topologies constructed with Bayesian phylogenetic methods are robust to realistic levels of borrowing. Inferences about ergence dates are slightly less robust and show a tendency to underestimate dates. Our results demonstrate that realistic levels of reticulation between cultures do not invalidate a phylogenetic approach to cultural and linguistic evolution.
Publisher: Routledge
Publisher: Springer Science and Business Media LLC
Date: 16-12-2022
DOI: 10.1038/S41559-021-01604-Y
Abstract: Language ersity is under threat. While each language is subject to specific social, demographic and political pressures, there may also be common threatening processes. We use an analysis of 6,511 spoken languages with 51 predictor variables spanning aspects of population, documentation, legal recognition, education policy, socioeconomic indicators and environmental features to show that, counter to common perception, contact with other languages per se is not a driver of language loss. However, greater road density, which may encourage population movement, is associated with increased endangerment. Higher average years of schooling is also associated with greater endangerment, evidence that formal education can contribute to loss of language ersity. Without intervention, language loss could triple within 40 years, with at least one language lost per month. To avoid the loss of over 1,500 languages by the end of the century, urgent investment is needed in language documentation, bilingual education programmes and other community-based programmes.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 28-07-2023
Abstract: The origins of the Indo-European language family are hotly disputed. Bayesian phylogenetic analyses of core vocabulary have produced conflicting results, with some supporting a farming expansion out of Anatolia ~9000 years before present (yr B.P.), while others support a spread with horse-based pastoralism out of the Pontic-Caspian Steppe ~6000 yr B.P. Here we present an extensive database of Indo-European core vocabulary that eliminates past inconsistencies in cognate coding. Ancestry-enabled phylogenetic analysis of this dataset indicates that few ancient languages are direct ancestors of modern clades and produces a root age of ~8120 yr B.P. for the family. Although this date is not consistent with the Steppe hypothesis, it does not rule out an initial homeland south of the Caucasus, with a subsequent branch northward onto the steppe and then across Europe. We reconcile this hybrid hypothesis with recently published ancient DNA evidence from the steppe and the northern Fertile Crescent.
Publisher: Proceedings of the National Academy of Sciences
Date: 21-11-2022
Abstract: Human history is written in both our genes and our languages. The extent to which our biological and linguistic histories are congruent has been the subject of considerable debate, with clear ex les of both matches and mismatches. To disentangle the patterns of demographic and cultural transmission, we need a global systematic assessment of matches and mismatches. Here, we assemble a genomic database (GeLaTo, or Genes and Languages Together) specifically curated to investigate genetic and linguistic ersity worldwide. We find that most populations in GeLaTo that speak languages of the same language family (i.e., that descend from the same ancestor language) are also genetically highly similar. However, we also identify nearly 20% mismatches in populations genetically close to linguistically unrelated groups. These mismatches, which occur within the time depth of known linguistic relatedness up to about 10,000 y, are scattered around the world, suggesting that they are a regular outcome in human history. Most mismatches result from populations shifting to the language of a neighboring population that is genetically different because of independent demographic histories. In line with the regularity of such shifts, we find that only half of the language families in GeLaTo are genetically more cohesive than expected under spatial autocorrelations. Moreover, the genetic and linguistic ergence times of population pairs match only rarely, with Indo-European standing out as the family with most matches in our s le. Together, our database and findings pave the way for systematically disentangling demographic and cultural history and for quantifying processes of shifts in language and social identities on a global scale.
Publisher: The Royal Society
Date: 07-04-2010
Abstract: There are approximately 7000 languages spoken in the world today. This ersity reflects the legacy of thousands of years of cultural evolution. How far back we can trace this history depends largely on the rate at which the different components of language evolve. Rates of lexical evolution are widely thought to impose an upper limit of 6000–10 000 years on reliably identifying language relationships. In contrast, it has been argued that certain structural elements of language are much more stable. Just as biologists use highly conserved genes to uncover the deepest branches in the tree of life, highly stable linguistic features hold the promise of identifying deep relationships between the world's languages. Here, we present the first global network of languages based on this typological information. We evaluate the relative evolutionary rates of both typological and lexical features in the Austronesian and Indo-European language families. The first indications are that typological features evolve at similar rates to basic vocabulary but their evolution is substantially less tree-like. Our results suggest that, while rates of vocabulary change are correlated between the two language families, the rates of evolution of typological features and structural subtypes show no consistent relationship across families.
Publisher: MIT Press - Journals
Date: 12-2011
DOI: 10.1162/COLI_A_00073
Abstract: The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subs ling three language subsets from a large database of Austronesian languages. Comparing the classification proposed by the Levenshtein distance to that of the comparative method shows that the Levenshtein classification is correct only 40% of time. Standardizing the orthography increases the performance, but only to a maximum of 65% accuracy within language subgroups. The accuracy of the Levenshtein classification decreases rapidly with phylogenetic distance, failing to discriminate homology and chance similarity across distantly related languages.This poor performance suggests the need for more linguistically nuanced methods for automated language classification tasks.
Publisher: Public Library of Science (PLoS)
Date: 23-09-2015
Publisher: Walter de Gruyter GmbH
Date: 2011
Publisher: Center for Open Science
Date: 02-08-2022
Abstract: The Uto-Aztecan language family is one of the largest language families in the Americas. However, there has been considerable debate about its origin and how it spread. Here we use Bayesian phylogenetic methods to analyze lexical data from 34 Uto-Aztecan varieties and 2 Kiowa-Tanoan languages. We infer the age of Proto-Uto-Aztecan to be around 4,100 years ago (3,258 - 5,025 years), and identify the most likely homeland to be near what is now southern California. We reconstruct the most probable subsistence strategy in the ancestral Uto-Aztecan society and infer no casual or intensive cultivation, an absence of cereal crops, and a primary subsistence mode of gathering (rather than agriculture). Our results therefore support the timing, geography, and cultural practices of a northern origin, and are inconsistent with alternative scenarios.
Publisher: Springer Science and Business Media LLC
Date: 03-05-2019
DOI: 10.1038/S41467-019-09842-2
Abstract: Language ersity is distributed unevenly over the globe. Intriguingly, patterns of language ersity resemble bio ersity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological ersification. Here we present the first global analysis of language ersity that compares the relative importance of two key ecological mechanisms – isolation and ecological risk – after correcting for spatial autocorrelation and phylogenetic non-independence. We find significant effects of climate on language ersity, consistent with the ecological risk hypothesis that areas of high year-round productivity lead to more languages by supporting human cultural groups with smaller distributions. Climate has a much stronger effect on language ersity than landscape features, such as altitudinal range and river density, which might contribute to isolation of cultural groups. The association between bio ersity and language ersity appears to be an incidental effect of their covariation with climate, rather than a causal link between the two.
Publisher: John Benjamins Publishing Company
Date: 05-08-2014
Publisher: Cambridge University Press (CUP)
Date: 07-06-2021
DOI: 10.1017/S002510032000033X
Abstract: Language documentation faces a persistent and pervasive problem: How much material is enough to represent a language fully? How much text would we need to s le the full phoneme inventory of a language? In the phonetic honemic domain, what proportion of the phoneme inventory can we expect to s le in a text of a given length? Answering these questions in a quantifiable way is tricky, but asking them is necessary. The cumulative collection of Illustrative Texts published in the Illustration series in this journal over more than four decades (mostly renditions of the ‘North Wind and the Sun’) gives us an ideal dataset for pursuing these questions. Here we investigate a tractable subset of the above questions, namely: What proportion of a language’s phoneme inventory do these texts enable us to recover, in the minimal sense of having at least one allophone of each phoneme? We find that, even with this low bar, only three languages (Modern Greek, Shipibo and the Treger dialect of Breton) attest all phonemes in these texts. Unsurprisingly, these languages sit at the low end of phoneme inventory sizes (respectively 23, 24 and 36 phonemes). We then estimate the rate at which phonemes are s led in the Illustrative Texts and extrapolate to see how much text it might take to display a language’s full inventory. Finally, we discuss the implications of these findings for linguistics in its quest to represent the world’s phonetic ersity, and for JIPA in its design requirements for Illustrations and in particular whether supplementary panphonic texts should be included.
Publisher: Springer Science and Business Media LLC
Date: 07-06-2021
DOI: 10.1007/S13752-021-00379-6
Abstract: Across the world people in different societies structure their family relationships in many different ways. These relationships become encoded in their languages as kinship terminology, a word set that maps variably onto a vast genealogical grid of kinship categories, each of which could in principle vary independently. But the observed ersity of kinship terminology is considerably smaller than the enormous theoretical design space. For the past century anthropologists have captured this variation in typological schemes with only a small number of model system types. Whether those types exhibit the internal co-selection of parts implicit in their use is an outstanding question, as is the sufficiency of typologies in capturing variation as a whole. We interrogate the coherence of classic kinship typologies using modern statistical approaches and systematic data from a new database, Kinbank. We first survey the canonical types and their assumed patterns of internal and external co-selection, then present two data-driven approaches to assess internal coherence. Our first analysis reveals that across parents’ and ego’s (one’s own) generation, typology has limited predictive value: knowing the system in one generation does not reliably predict the other. Though we detect limited co-selection between generations, “disharmonic” systems are equally common. Second, we represent structural ersity with a novel multidimensional approach we term kinship space . This approach reveals, for ego’s generation, some broad patterning consistent with the canonical typology, but ersity (and mixed systems) is considerably higher than classical typologies suggest. Our results strongly challenge the descriptive adequacy of the set of canonical kinship types.
Publisher: The Royal Society
Date: 22-06-2015
Publisher: The Royal Society
Date: 07-04-2013
Abstract: Despite a burgeoning science of cultural evolution, relatively little work has focused on the population structure of human cultural variation. By contrast, studies in human population genetics use a suite of tools to quantify and analyse spatial and temporal patterns of genetic variation within and between populations. Human genetic ersity can be explained largely as a result of migration and drift giving rise to gradual genetic clines, together with some discontinuities arising from geographical and cultural barriers to gene flow. Here, we adapt theory and methods from population genetics to quantify the influence of geography and ethnolinguistic boundaries on the distribution of 700 variants of a folktale in 31 European ethnolinguistic populations. We find that geographical distance and ethnolinguistic affiliation exert significant independent effects on folktale ersity and that variation between populations supports a clustering concordant with European geography. This pattern of geographical clines and clusters parallels the pattern of human genetic ersity in Europe, although the effects of geographical distance and ethnolinguistic boundaries are stronger for folktales than genes. Our findings highlight the importance of geography and population boundaries in models of human cultural variation and point to key similarities and differences between evolutionary processes operating on human genes and culture.
Publisher: Springer Science and Business Media LLC
Date: 03-02-2022
Publisher: The Royal Society
Date: 09-12-2022
Abstract: Although language-family specific traits which do not find direct counterparts outside a given language family are usually ignored in quantitative phylogenetic studies, scholars have made le use of them in qualitative investigations, revealing their potential for identifying language relationships. An ex le of such a family specific trait are body-part expressions in Pano languages, which are often lexicalized forms, composed of bound roots (also called body-part prefixes in the literature) and non-productive derivative morphemes (called here body-part formatives). We use various statistical methods to demonstrate that whereas body-part roots are generally conservative, body-part formatives exhibit erse chronologies and are often the result of recent and parallel innovations. In line with this, the phylogenetic structure of body-part roots projects the major branches of the family, while formatives are highly non-tree-like. Beyond its contribution to the phylogenetic analysis of Pano languages, this study provides significative insights into the role of grammatical innovations for language classification, the origin of morphological complexity in the Amazon and the phylogenetic signal of specific grammatical traits in language families.
Publisher: The Royal Society
Date: 17-05-2021
Abstract: Modern phylogenetic methods are increasingly being used to address questions about macro-level patterns in cultural evolution. These methods can illuminate the unobservable histories of cultural traits and identify the evolutionary drivers of trait change over time, but their application is not without pitfalls. Here, we outline the current scope of research in cultural tree thinking, highlighting a toolkit of best practices to navigate and avoid the pitfalls and ‘abuses' associated with their application. We emphasize two principles that support the appropriate application of phylogenetic methodologies in cross-cultural research: researchers should (1) draw on multiple lines of evidence when deciding if and which types of phylogenetic methods and models are suitable for their cross-cultural data, and (2) carefully consider how different cultural traits might have different evolutionary histories across space and time. When used appropriately phylogenetic methods can provide powerful insights into the processes of evolutionary change that have shaped the broad patterns of human history. This article is part of the theme issue ‘Foundations of cultural evolution'.
Publisher: The Open Journal
Date: 08-11-2018
DOI: 10.21105/JOSS.01040
Publisher: John Benjamins Publishing Company
Date: 14-12-2012
Abstract: Donohue et al.’s critique of our work on the origins and spread of the Austronesian language family is marred by misunderstandings. We respond to these by noting that our Bayesian phylogenetic approach: (1) distinguishes between retentions and innovations probabilistically, (2) focuses on basic vocabulary not ‘the lexicon’, (3) eliminates known loanwords, (4) produces results that are congruent with the results of the comparative method and conflict with the scenarios requiring unprecedented amounts of language shift postulated by Donohue et al.
Publisher: Cambridge University Press (CUP)
Date: 2021
DOI: 10.1017/EHS.2021.32
Publisher: Walter de Gruyter GmbH
Date: 28-08-2018
Abstract: The Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. But CLICS has also been criticized for obvious shortcomings, ranging from the underlying dataset, which still contains many errors, up to the limits of cross-linguistic colexification studies in general. Building on recent standardization efforts reflected in the Cross-Linguistic Data Formats initiative (CLDF) and novel approaches for fast, efficient, and reliable data aggregation, we have created a new database for cross-linguistic colexifications, which not only supersedes the original CLICS database in terms of coverage but also offers a much more principled procedure for the creation, curation and aggregation of datasets. The paper presents the new database and discusses its major features.
Publisher: SAGE Publications
Date: 2008
DOI: 10.4137/EBO.S893
Abstract: Phylogenetic methods have revolutionised evolutionary biology and have recently been applied to studies of linguistic and cultural evolution. However, the basic comparative data on the languages of the world required for these analyses is often widely dispersed in hard to obtain sources. Here we outline how our Austronesian Basic Vocabulary Database (ABVD) helps remedy this situation by collating wordlists from over 500 languages into one web-accessible database. We describe the technology underlying the ABVD and discuss the benefits that an evolutionary bioinformatic approach can provide. These include facilitating computational comparative linguistic research, answering questions about human prehistory, enabling syntheses with genetic data, and safe-guarding fragile linguistic information.
Publisher: Proceedings of the National Academy of Sciences
Date: 08-2022
Abstract: The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed “break-away” geographical diffusion model, specially designed for modeling migrations, with “augmented” geographic information, to reconstruct the Bantu language family expansion. This Bayesian phylogeographic approach with augmented geographical data provides a powerful way of linking linguistic, archaeological, and genetic data to test hypotheses about large language family expansions. We compare four hypotheses: an early major split north of the rainforest a migration through the Sangha River Interval corridor around 2,500 BP a coastal migration around 4,000 BP and a migration through the rainforest before the corridor opening, at 4,000 BP. Our results produce a topology and timeline for the Bantu language family, which supports the hypothesis of an expansion through Central African tropical forests at 4,420 BP (4,040 to 5,000 95% highest posterior density interval), well before the Sangha River Interval was open.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 21-04-2023
Abstract: While global patterns of human genetic ersity are increasingly well characterized, the ersity of human languages remains less systematically described. Here, we outline the Grambank database. With over 400,000 data points and 2400 languages, Grambank is the largest comparative grammatical database available. The comprehensiveness of Grambank allows us to quantify the relative effects of genealogical inheritance and geographic proximity on the structural ersity of the world’s languages, evaluate constraints on linguistic ersity, and identify the world’s most unusual languages. An analysis of the consequences of language loss reveals that the reduction in ersity will be strikingly uneven across the major linguistic regions of the world. Without sustained efforts to document and revitalize endangered languages, our linguistic window into human history, cognition, and culture will be seriously fragmented.
Publisher: Springer Science and Business Media LLC
Date: 13-04-2011
DOI: 10.1038/NATURE09923
Abstract: Languages vary widely but not without limit. The central goal of linguistics is to describe the ersity of human languages and explain the constraints on that ersity. Generative linguists following Chomsky have claimed that linguistic ersity must be constrained by innate parameters that are set as a child learns a language. In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases, rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic ersity in an evolutionary framework. First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that-at least with respect to word order-cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.
Publisher: Center for Open Science
Date: 16-03-2021
Abstract: Modern phylogenetic methods are increasingly being used to address questions about macro-level patterns in cultural evolution. These methods can illuminate the unobservable histories of cultural traits and identify the evolutionary drivers of trait-change over time, but their application is not without pitfalls. Here we outline the current scope of research in cultural tree thinking, highlighting a toolkit of best practices to navigate and avoid the pitfalls and ‘abuses’ associated with their application. We emphasise two principles that support the appropriate application of phylogenetic methodologies in cross-cultural research: researchers should (1) draw on multiple lines of evidence when deciding if and which types of phylogenetic methods and models are suitable for their cross-cultural data, and (2) carefully consider how different cultural traits might have different evolutionary histories across space and time. When used appropriately phylogenetic methods can provide powerful insights into the processes of evolutionary change that have shaped the broad patterns of human history.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 02-2008
Abstract: Linguists speculate that human languages often evolve in rapid or punctuational bursts, sometimes associated with their emergence from other languages, but this phenomenon has never been demonstrated. We used vocabulary data from three of the world's major language groups—Bantu, Indo-European, and Austronesian—to show that 10 to 33% of the overall vocabulary differences among these languages arose from rapid bursts of change associated with language-splitting events. Our findings identify a general tendency for increased rates of linguistic evolution in fledgling languages, perhaps arising from a linguistic founder effect or a desire to establish a distinct social identity.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 25-04-2008
Publisher: Center for Open Science
Date: 23-02-2023
Abstract: Many recent proposals claim that languages adapt to their environments. The Linguistic Niche hypothesis claims that languages with numerous native speakers and substantial proportions of non-native speakers (societies of strangers) will tend to lose grammatical distinctions. In contrast, languages in small, isolated communities should maintain or expand their range of grammatical markers. Here, we test such claims using a new global dataset of grammatical structures - Grambank. We model the impact of the number of native speakers, the proportion of non-native speakers, the number of linguistic neighbors, and the status of a language on grammatical complexity while controlling for spatial and phylogenetic autocorrelation. We deconstruct "grammatical complexity" into two separate dimensions: (i) how much morphology a language has ("fusion"), and (ii) the amount of information obligatorily encoded in the grammar ("informativity"). We find several instances of weak positive associations but no inverse correlations between grammatical complexity and sociodemographic factors. Our findings cast doubt on the widespread assumption that grammatical complexity is shaped by the sociolinguistic environment.
Publisher: The Open Journal
Date: 20-06-2016
DOI: 10.21105/JOSS.00028
Publisher: Oxford University Press (OUP)
Date: 07-2022
DOI: 10.1093/JOLE/LZAD002
Abstract: The so-called ‘Altaic’ languages have been subject of debate for over 200 years. An array of different data sets have been used to investigate the genealogical relationships between them, but the controversy persists. The new data with a high potential for such cases in historical linguistics are structural features, which are sometimes declared to be prone to borrowing and discarded from the very beginning and at other times considered to have an especially precise historical signal reaching further back in time than other types of linguistic data. We investigate the performance of typological features across different domains of language by using an admixture model from genetics. As implemented in the software STRUCTURE, this model allows us to account for both a genealogical and an areal signal in the data. Our analysis shows that morphological features have the strongest genealogical signal and syntactic features diffuse most easily. When using only morphological structural data, the model is able to correctly identify three language families: Turkic, Mongolic, and Tungusic, whereas Japonic and Koreanic languages are assigned the same ancestry.
Publisher: Proceedings of the National Academy of Sciences
Date: 06-05-2019
Abstract: Given its size and geographical extension, Sino-Tibetan is of the highest importance for understanding the prehistory of East Asia, and of neighboring language families. Based on a dataset of 50 Sino-Tibetan languages, we infer phylogenies that date the origin of the language family to around 7200 B.P., linking the origin of the language family with the late Cishan and the early Yangshao cultures.
Publisher: The Royal Society
Date: 12-02-2011
Publisher: The Royal Society
Date: 17-05-2021
Abstract: In this paper, past plant knowledge serves as a case study to highlight the promise and challenges of interdisciplinary data collection and interpretation in cultural evolution. Plants are central to human life and yet, apart from the role of major crops, people–plant relations have been marginal to the study of culture. Archaeological, linguistic, and historical evidence are often limited when it comes to studying the past role of plants. This is the case in the Nordic countries, where extensive collections of various plant use records are absent until the 1700s. Here, we test if relatively recent ethnobotanical data can be used to trace back ancient plant knowledge in the Nordic countries. Phylogenetic inferences of ancestral states are evaluated against historical, linguistic, and archaeobotanical evidence. The exercise allows us to discuss the opportunities and shortcomings of using phylogenetic comparative methods to study past botanical knowledge. We propose a ‘triangulation method’ that not only combines multiple lines of evidence, but also quantitative and qualitative approaches. This article is part of the theme issue ‘Foundations of cultural evolution’.
Publisher: University of Chicago Press
Date: 07-2016
DOI: 10.1086/687383
Publisher: Oxford University Press
Date: 22-12-2011
Publisher: The Royal Society
Date: 07-04-2015
Abstract: Supernatural belief presents an explanatory challenge to evolutionary theorists—it is both costly and prevalent. One influential functional explanation claims that the imagined threat of supernatural punishment can suppress selfishness and enhance cooperation. Specifically, morally concerned supreme deities or ‘moralizing high gods' have been argued to reduce free-riding in large social groups, enabling believers to build the kind of complex societies that define modern humanity. Previous cross-cultural studies claiming to support the MHG hypothesis rely on correlational analyses only and do not correct for the statistical non-independence of s led cultures. Here we use a Bayesian phylogenetic approach with a s le of 96 Austronesian cultures to test the MHG hypothesis as well as an alternative supernatural punishment hypothesis that allows punishment by a broad range of moralizing agents. We find evidence that broad supernatural punishment drives political complexity, whereas MHGs follow political complexity. We suggest that the concept of MHGs diffused as part of a suite of traits arising from cultural exchange between complex societies. Our results show the power of phylogenetic methods to address long-standing debates about the origins and functions of religion in human society.
Publisher: Springer Science and Business Media LLC
Date: 16-06-2022
DOI: 10.1038/S41597-022-01432-0
Abstract: The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to erse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.
Publisher: Project MUSE
Date: 03-2023
Publisher: Springer Science and Business Media LLC
Date: 13-01-2020
DOI: 10.1038/S41597-019-0341-X
Abstract: Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.
Publisher: The Royal Society
Date: 02-05-2022
Publisher: The Royal Society
Date: 27-03-2019
Abstract: Although many hypotheses have been proposed to explain why humans speak so many languages and why languages are unevenly distributed across the globe, the factors that shape geographical patterns of cultural and linguistic ersity remain poorly understood. Prior research has tended to focus on identifying universal predictors of language ersity, without accounting for how local factors and multiple predictors interact. Here, we use a unique combination of path analysis, mechanistic simulation modelling, and geographically weighted regression to investigate the broadly described, but poorly understood, spatial pattern of language ersity in North America. We show that the ecological drivers of language ersity are not universal or entirely direct. The strongest associations imply a role for previously developed hypothesized drivers such as population density, resource ersity, and carrying capacity with group size limits. The predictive power of this web of factors varies over space from regions where our model predicts approximately 86% of the variation in ersity, to areas where less than 40% is explained.
Publisher: The Royal Society
Date: 08-2018
DOI: 10.1098/RSOS.181100
Abstract: A growing number of studies seek to identify predictors of broad-scale patterns in human cultural ersity, but three sources of non-independence in human cultural variables can bias the results of cross-cultural studies. First, related cultures tend to have many traits in common, regardless of whether those traits are functionally linked. Second, societies in geographical proximity will share many aspects of culture, environment and demography. Third, many cultural traits covary, leading to spurious relationships between traits. Here, we demonstrate tractable methods for dealing with all three sources of bias. We use cross-cultural analyses of proposed associations between human cultural traits and parasite load to illustrate the potential problems of failing to correct for these three forms of statistical non-independence. Associations between parasite stress and sociosexuality, authoritarianism, democracy and language ersity are weak or absent once relatedness and proximity are taken into account, and parasite load has no more power to explain variation in traditionalism, religiosity and collectivism than other measures of bio ersity, climate or population size do. Without correction for statistical non-independence and covariation in cross-cultural analyses, we risk misinterpreting associations between culture and environment.
Publisher: The Royal Society
Date: 04-03-2009
Abstract: The nature of social life in human prehistory is elusive, yet knowing how kinship systems evolve is critical for understanding population history and cultural ersity. Post-marital residence rules specify sex-specific dispersal and kin association, influencing the pattern of genetic markers across populations. Cultural phylogenetics allows us to practise ‘virtual archaeology’ on these aspects of social life that leave no trace in the archaeological record. Here we show that early Austronesian societies practised matrilocal post-marital residence. Using a Markov-chain Monte Carlo comparative method implemented in a Bayesian phylogenetic framework, we estimated the type of residence at each ancestral node in a s le of Austronesian language trees spanning 135 Pacific societies. Matrilocal residence has been hypothesized for proto-Oceanic society ( ca 3500 BP), but we find strong evidence that matrilocality was predominant in earlier Austronesian societies ca 5000–4500 BP, at the root of the language family and its early branches. Our results illuminate the ergent patterns of mtDNA and Y-chromosome markers seen in the Pacific. The analysis of present-day cross-cultural data in this way allows us to directly address cultural evolutionary and life-history processes in prehistory.
Publisher: The Royal Society
Date: 12-04-2011
Abstract: Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the integration of evidence from linguistics, anthropology, archaeology and genetics. Unfortunately, although the comparative method in historical linguistics can provide a relative chronology, it cannot provide absolute date estimates and an alternative approach, called glottochronology, is fundamentally flawed. In this paper we outline how computational phylogenetic methods can reliably estimate language ergence dates and thus help resolve long-standing debates about human prehistory ranging from the origin of the Indo-European language family to the peopling of the Pacific.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 24-08-2012
Abstract: English is part of the large Indo-European language family, which includes Celtic, Germanic, Italic, Balto-Slavic, and Indo-Iranian languages. The origin of this family is hotly debated: one hypothesis places the origin north of the Caspian Sea in the Pontic steppes, from where it was disseminated by Kurgan semi-nomadic pastoralists a second suggests that Anatolia, in modern-day Turkey, is the source, and the language radiated with the spread of agriculture. Bouckaert et al. (p. 957 ) used phylogenetic methods and modeling to assess the geographical spread of the Indo-European language group. The findings support the suggestion that the origin of the language family was indeed Anatolia 7 to 10 thousand years ago—contemporaneous with the spread of agriculture.
Publisher: Oxford University Press (OUP)
Date: 2016
DOI: 10.1093/JOLE/LZV007
Publisher: Center for Open Science
Date: 24-06-2023
Abstract: Recent years have seen Bayesian phylogenetic methods from evolutionary biology applied to questions about language evolution in two major contexts. First, language phylogenies are now routinely used to make inferences and test hypotheses about human prehistory. Second, language phylogenies provide a solid backbone 10 to test hypotheses about how aspects of language and culture have evolved in three key ways: by revealing the evolutionary dynamics, by modelling the trait history, and testing coevolutionary hypotheses. In this chapter I will survey this literature, present some case studies that highlight how these tools have been and continue to be useful, and discuss some shortcomings and open problems.
Publisher: The Royal Society
Date: 12-12-2010
Abstract: Phylogenetic comparative methods (PCMs) provide a potentially powerful toolkit for testing hypotheses about cultural evolution. Here, we build on previous simulation work to assess the effect horizontal transmission between cultures has on the ability of both phylogenetic and non-phylogenetic methods to make inferences about trait evolution. We found that the mode of horizontal transmission of traits has important consequences for both methods. Where traits were horizontally transmitted separately , PCMs accurately reported when trait evolution was not correlated even at the highest levels of horizontal transmission. By contrast, linear regression analyses often incorrectly concluded that traits were correlated. Where simulated trait evolution was not correlated and traits were horizontally transmitted as a pair , both methods inferred increased levels of positive correlation with increasing horizontal transmission. Where simulated trait evolution was correlated, increasing rates of separate horizontal transmission led to decreasing levels of inferred correlation for both methods, but increasing rates of paired horizontal transmission did not. Furthermore, the PCM was also able to make accurate inferences about the ancestral state of traits. These results suggest that under certain conditions, PCMs can be robust to the effects of horizontal transmission. We discuss ways that future work can investigate the mode and tempo of horizontal transmission of cultural traits.
Publisher: Frontiers Media SA
Date: 27-04-2018
Publisher: Springer Science and Business Media LLC
Date: 16-10-2018
Abstract: The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage ex les of best practices.
Publisher: Public Library of Science (PLoS)
Date: 27-01-2017
Publisher: Springer Science and Business Media LLC
Date: 12-2007
Publisher: Public Library of Science (PLoS)
Date: 10-03-2010
Publisher: Oxford University Press (OUP)
Date: 04-06-2018
DOI: 10.1093/JOLE/LZY005
Publisher: The MIT Press
Date: 22-11-2013
Publisher: The Royal Society
Date: 03-2018
DOI: 10.1098/RSOS.171504
Abstract: The Dravidian language family consists of about 80 varieties (Hammarström H. 2016 Glottolog 2.7 ) spoken by 220 million people across southern and central India and surrounding countries (Steever SB. 1998 In The Dravidian languages (ed. SB Steever), pp. 1–39: 1). Neither the geographical origin of the Dravidian language homeland nor its exact dispersal through time are known. The history of these languages is crucial for understanding prehistory in Eurasia, because despite their current restricted range, these languages played a significant role in influencing other language groups including Indo-Aryan (Indo-European) and Munda (Austroasiatic) speakers. Here, we report the results of a Bayesian phylogenetic analysis of cognate-coded lexical data, elicited first hand from native speakers, to investigate the subgrouping of the Dravidian language family, and provide dates for the major points of ersification. Our results indicate that the Dravidian language family is approximately 4500 years old, a finding that corresponds well with earlier linguistic and archaeological studies. The main branches of the Dravidian language family (North, Central, South I, South II) are recovered, although the placement of languages within these main branches erges from previous classifications. We find considerable uncertainty with regard to the relationships between the main branches.
Publisher: Public Library of Science (PLoS)
Date: 27-10-2015
Publisher: Center for Open Science
Date: 31-03-2023
Abstract: The Philippines are central to understanding the expansion of the Austronesian language family from its homeland in Taiwan. It remains unknown to what extent the distribution of Malayo-Polynesian languages has been shaped by back migrations and language leveling events following the initial Out-of-Taiwan expansion. Other aspects of language history, including the effect of language switching from non-Austronesian languages, also remain poorly understood. Here we apply Bayesian phylogenetic methods to a core-vocabulary dataset of Philippine languages. Our analysis strongly supports a sister group relationship between the Sangiric and Minahasan groups of Northern Sulawesi on one hand, and the rest of the Philippine languages on the other, which is incompatible with a simple North-to-South dispersal from Taiwan. We find a pervasive geographical signal in our results, suggesting a dominant role for cultural diffusion in the evolution of Philippine languages. However, we do find some support for a later migration of Gorontalo-Mongondow languages to Northern Sulawesi from the Philippines. Subsequent diffusion processes between languages in Sulawesi appear to have led to conflicting data and a highly unstable phylogenetic position for Gorontalo-Mongondow. In the Philippines, language switching to Austronesian in ‘Negrito’ groups appears to have occurred at different time-points throughout the Philippines, and based on our analysis, there is no discernible effect of language switching on the basic vocabulary.
Publisher: Public Library of Science (PLoS)
Date: 08-07-2016
Publisher: Springer Science and Business Media LLC
Date: 12-05-2021
DOI: 10.1057/S41599-021-00785-Y
Abstract: Humans in most cultures around the world play rule-based games, yet research on the content and structure of these games is limited. Previous studies investigating rule-based games across cultures have either focused on a small handful of cultures, thus limiting the generalizability of findings, or used cross-cultural databases from which the raw data are not accessible, thus limiting the transparency, applicability, and replicability of research findings. Furthermore, games have long been defined as competitive interactions, thereby blinding researchers to the cross-cultural variation in the cooperativeness of rule-based games. The current dataset provides ethnographic, historic information on games played in cultural groups in the Austronesian language family. These game descriptions ( N games = 907) are available and codeable for researchers interested in games. We also develop a unique typology of the cooperativeness of the goal structure of games and apply this typology to the dataset. Researchers are encouraged to use this dataset to examine cross-cultural variation in the cooperativeness of games and further our understanding of human cultural behaviour on a larger scale.
Publisher: Elsevier BV
Date: 11-2018
Publisher: Oxford University Press (OUP)
Date: 07-2021
DOI: 10.1093/JOLE/LZAB005
Abstract: Bayesian phylogenetic methods provide a set of tools to efficiently evaluate large linguistic datasets by reconstructing phylogenies—family trees—that represent the history of language families. These methods provide a powerful way to test hypotheses about prehistory, regarding the subgrouping, origins, expansion, and timing of the languages and their speakers. Through phylogenetics, we gain insights into the process of language evolution in general and into how fast in idual features change in particular. This article introduces Bayesian phylogenetics as applied to languages. We describe substitution models for cognate evolution, molecular clock models for the evolutionary rate along the branches of a tree, and tree generating processes suitable for linguistic data. We explain how to find the best-suited model using path s ling or nested s ling. The theoretical background of these models is supplemented by a practical tutorial describing how to set up a Bayesian phylogenetic analysis using the software tool BEAST2.
Publisher: Elsevier
Date: 2015
Publisher: Public Library of Science (PLoS)
Date: 27-05-2016
Publisher: Research Square Platform LLC
Date: 02-09-2021
DOI: 10.21203/RS.3.RS-870835/V1
Abstract: The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to erse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, the majority of published datasets lack standardization which makes their comparison difficult. Here, we present the first step to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that increase the FAIRness of linguistic data. We test the Lexibank workflow on a collection of 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.
Publisher: Proceedings of the National Academy of Sciences
Date: 04-10-2017
Abstract: Do different aspects of language evolve in different ways? Here, we infer the rates of change in lexical and grammatical data from 81 languages of the Pacific. We show that, in general, grammatical features tend to change faster and have higher amounts of conflicting signal than basic vocabulary. We suggest that subsystems of language show differing patterns of dynamics and propose that modeling this rate variation may allow us to extract more signal, and thus trace language history deeper than has been previously possible.
Publisher: Proceedings of the National Academy of Sciences
Date: 02-02-2015
Abstract: Evolutionary methods are increasingly being applied to investigating linguistic change. But does language change conform to the predictions of evolutionary theory? Here, we use data from closely related pairs of languages to show that a key prediction of evolutionary theory is met: rates of gain of new words are higher in larger populations whereas rates of word loss are greater in small populations. Our analysis provides, to our knowledge, the first statistically robust evidence of an influence of population size on rate of language change. These results demonstrate the potential for demographic factors to influence language evolution.
Publisher: Springer Science and Business Media LLC
Date: 10-2010
DOI: 10.1038/NATURE09461
Abstract: There is disagreement about whether human political evolution has proceeded through a sequence of incremental increases in complexity, or whether larger, non-sequential increases have occurred. The extent to which societies have decreased in complexity is also unclear. These debates have continued largely in the absence of rigorous, quantitative tests. We evaluated six competing models of political evolution in Austronesian-speaking societies using phylogenetic methods. Here we show that in the best-fitting model political complexity rises and falls in a sequence of small steps. This is closely followed by another model in which increases are sequential but decreases can be either sequential or in bigger drops. The results indicate that large, non-sequential jumps in political complexity have not occurred during the evolutionary history of these societies. This suggests that, despite the numerous contingent pathways of human history, there are regularities in cultural evolution that can be detected using computational phylogenetic methods.
Publisher: The Royal Society
Date: 07-04-2014
Abstract: Traditional knowledge is influenced by ancestry, inter-cultural diffusion and interaction with the natural environment. It is problematic to assess the contributions of these influences independently because closely related ethnic groups may also be geographically close, exposed to similar environments and able to exchange knowledge readily. Medicinal plant use is one of the most important components of traditional knowledge, since plants provide healthcare for up to 80% of the world's population. Here, we assess the significance of ancestry, geographical proximity of cultures and the environment in determining medicinal plant use for 12 ethnic groups in Nepal. Incorporating phylogenetic information to account for plant evolutionary relatedness, we calculate pairwise distances that describe differences in the ethnic groups' medicinal floras and floristic environments. We also determine linguistic relatedness and geographical separation for all pairs of ethnic groups. We show that medicinal uses are most similar when cultures are found in similar floristic environments. The correlation between medicinal flora and floristic environment was positive and strongly significant, in contrast to the effects of shared ancestry and geographical proximity. These findings demonstrate the importance of adaptation to local environments, even at small spatial scale, in shaping traditional knowledge during human cultural evolution.
Publisher: The Royal Society
Date: 12-12-2010
Abstract: In this paper we outline two debates about the nature of human cultural history. The first focuses on the extent to which human history is tree-like (its shape), and the second on the unity of that history (its fabric). Proponents of cultural phylogenetics are often accused of assuming that human history has been both highly tree-like and consisting of tightly linked lineages. Critics have pointed out obvious exceptions to these assumptions. Instead of a priori dichotomous disputes about the validity of cultural phylogenetics, we suggest that the debate is better conceptualized as involving positions along continuous dimensions. The challenge for empirical research is, therefore, to determine where particular aspects of culture lie on these dimensions. We discuss the ability of current computational methods derived from evolutionary biology to address these questions. These methods are then used to compare the extent to which lexical evolution is tree-like in different parts of the world and to evaluate the coherence of cultural and linguistic lineages.
Publisher: Center for Open Science
Date: 29-08-2021
Abstract: Humans currently collectively use thousands of languages1,2. The number of languages in a given region (i.e. language “richness”) varies widely3–7. Understanding the processes of ersification and homogenization that produce these patterns has been a fundamental aim of linguistics and anthropology. Empirical research to date has identified various social, environmental, geographic, and demographic factors associated with language richness3. However, our understanding of causal mechanisms and variation in their effects over space has been limited by prior analyses focusing on correlation and assuming stationarity3,8. Here we use process-based, spatially-explicit stochastic models to simulate the emergence, expansion, contraction, fragmentation, and extinction of language ranges. We varied combinations of parameter settings in these computer-simulated experiments to evaluate the extent to which different processes reproduce observed patterns of pre-colonial language richness in North America. We find that the majority of spatial variation in language richness can be explained by models in which environmental and social constraints determine population density, random shocks alter population sizes more frequently at higher population densities, and population shocks are more frequently negative than positive. Language ersification occurs when populations split after reaching size limits, and when ranges fragment due to population contractions following negative shocks or due to contact with other groups that are expanding following positive shocks. These findings support erse theoretical perspectives arguing that language richness is shaped by environmental and social conditions, constraints on group sizes, outcomes of contact among groups, and shifting demographics driven by positive innovations, such as new subsistence strategies, or negative events, such as war or disease.
Publisher: Project MUSE
Date: 2011
DOI: 10.1353/OL.2011.0014
Publisher: American Association for the Advancement of Science (AAAS)
Date: 20-12-2019
Abstract: It is unclear whether emotion terms have the same meaning across cultures. Jackson et al. examined nearly 2500 languages to determine the degree of similarity in linguistic networks of 24 emotion terms across cultures (see the Perspective by Majid). There were low levels of similarity, and thus high variability, in the meaning of emotion terms across cultures. Similarity of emotion terms could be predicted on the basis of the geographic proximity of the languages they originate from, their hedonic valence, and the physiological arousal they evoke. Science , this issue p. 1517 see also p. 1444
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 04-07-2023
Publisher: Public Library of Science (PLoS)
Date: 24-11-2021
DOI: 10.1371/JOURNAL.PONE.0259746
Abstract: While most animals play, only humans play games. As animal play serves to teach offspring important life-skills in a safe scenario, human games might, in similar ways, teach important culturally relevant skills. Humans in all cultures play games however, it is not clear whether variation in the characteristics of games across cultural groups is related to group-level attributes. Here we investigate specifically whether the cooperativeness of games covaries with socio-ecological differences across cultural groups. We hypothesize that cultural groups that engage in frequent inter-group conflict, cooperative sustenance acquisition, or that have less stratified social structures, might more frequently play cooperative games as compared to groups that do not share these characteristics. To test these hypotheses, we gathered data from the ethnographic record on 25 ethnolinguistic groups in the Austronesian language family. We show that cultural groups with higher levels of inter-group conflict and cooperative land-based hunting play cooperative games more frequently than other groups. Additionally, cultural groups with higher levels of intra-group conflict play competitive games more frequently than other groups. These findings indicate that games are not randomly distributed among cultures, but rather relate to the socio-ecological settings of the cultural groups that practice them. We argue that games serve as training grounds for group-specific norms and values and thereby have an important function in enculturation during childhood. Moreover, games might server an important role in the maintenance of cultural ersity.
Start Date: 2020
End Date: 2023
Funder: Marsden Fund
View Funded ActivityStart Date: 2014
End Date: 2020
Funder: Australian Research Council
View Funded ActivityStart Date: 2023
End Date: 12-2026
Amount: $503,363.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2012
End Date: 07-2016
Amount: $375,000.00
Funder: Australian Research Council
View Funded Activity