Linguistics Data Browser

A Streamlit app to explore and compare grammatical and vocabulary data across multiple languages.

This project was done as an autonomous programming project in the autumn semester of 2025.

It resulted in a Streamlit web app for exploring and interacting with structured linguistic data across multiple languages. It lets you browse and compare grammar rules and vocabulary — either individually or across languages — through a clean, filterable interface. Grammar topics and vocabulary entries can be selected and expanded to view detailed descriptions, while comparison views help highlight similarities and differences between languages, including Romance (French, Spanish, …) and Bantu Languages (Zulu, Chichewa, …).

The underlying data is generated by synthesizing and normalizing linguistic information from sources such as Wikipedia, Wiktionary, and textbooks into a uniform schema, using LLM-based processing. The result is a promising (albeit, as of now, imperfect) tool for language learners, available as a browser application. For more detailed information about the application, consult the repository.

Above: a view of the grammar comparison between German and Romanian comparatives.
Below: a view of the vocabulary comparison of Swahili loan words in Chichewa and Shona.