Yesterday, I submitted a post asking for your help in creating a global auxiliary conlang. With not much interest being shown in the first day of posting, I decided to go ahead and embark on this project alone.
Phonology & Orthography
My "north star" when designing the phonemic inventory for the language was striking a balance between usability and accessibility. In my humble opinion, minimalist languages such as Toki Pona excel in the latter but utterly fail in the former. More complicated languages such as Esperanto and Ido tend to do the opposite (they can be quite eurocentric). I believe I have found a happy medium between the two.
Consonants
The language's consonants (and their graphs) are as follows:
- Stops: /p^h^/ (p), /b/, /t^h^/ (t), /d/, /k^h^/ (k), /g/
- Fricatives: /f/, /s/, h /x~h/ (h)
- Nasals: /m/, /n/
- Approximants: /w/, /l/, /j/
- Rhotic: /r~ʀ~ɾ~ɺ~ɹ~ɻ~ʁ~.../ (r)
I have chosen to aspirate the unvoiced stops to allow speakers from more languages to distinguish them. Those who's native tongue distinguises stops on aspiration can use [p^h^] and [p] for /p^h^/ and /b/, respectively, and those who's native tongue does so on voice can use [p] and [b].
Rhotics vary wildly cross-linguistically, making including one in a lingua franca very difficult. However, with how I plan to derive the language's vocabulary, doing so seemed necessary. As such, I have coined the "whatever the heck rhotic." So long as the sound produced is a rhotic, it is the "correct" phoneme for . However, speakers who are able to produce multiple rhotics should use some discretion when deciding which one they use as some can be harder to distinguish than others.
My decision to include the remaining consonants simply came down to their presence cross-linguistically. I initially planned to create a heat map of the IPA with each phoneme's "temperature" being a weighted sum of its occurrences in the languages on Ethnologue's list of the most spoken languages, but the work required for that seemed to outweigh the benefit. As such, I did what every self-respecting linguist would do and eyeballed it!
Vowels
The language's vowels are as follows:
- High: /i/, /u/
- Mid: /e/, /o/
- Low: /a/
Ah, the ol' five-vowel system, tried and true. I mean, there's not much to say here. Moving on!
Phonotactics
The language's phonotactic rules are largely inspired by those of Toki Pona. They are as follows:
- All syllables follow a (C)V(S) structure, where "S" denotes a sonorant that is not /w/ or /j/.
- Null onsets are word-initial only.
- No adjacent sonorants; codas always assimilate to following onsets.
- Words may be no longer than three syllables.
- To reduce sliding, /j/ is only permitted before /a, o, u/, and /w/ is only so before /a, e, i/.
- The penultimate vowel is always stressed.
Final Thoughts
This may not be much, but I am trying to refrain from hyperfixating and making everything too quickly as I want your thoughts on every step I take. What do you like so far? What could be improved?
Collaborators are always welcome!
Edit: Typos (of course)
looks swag :D how you gonna derive the vocabulary?
I have my list of the most spoken languages in the world, so I was going to pick the top 5–10 (maybe skipping a few here and there to represent a few more language families). Then, for each word, I would try to derive a word that is at least somewhat similar to that from each of those 5–10 languages. It seems ambitious but doable, especially if the vocabulary is kept small
nice! seems like a cool approach:D
This could be done programmatically if you don't want to do it by hand (e.g. with Levenshtein Distance) -- though I would also caution that your data source is unlikely to capture important intricacies about syllable structure (assuming word lists are not standardized to a real orthography).
Agreed. I've thought about this as well as I am a programmer and could very well make something crude to get the job done. My caution, however, isn't due to the process of deriving words, but to that of gathering accurate derivatives themselves.