Figure Ilustration AI
FORMOSA NEWS - Samarinda - AI-Ready Data: Indonesian Researchers Map the Hidden Power Dynamics of Thesis Supervision. The high-stakes environment of university thesis supervision is getting a digital upgrade that could soon help artificial intelligence understand the delicate balance of academic power and student mentorship. A multidisciplinary research team from Universitas Muhammadiyah Ponorogo and Universitas Widya Gama Mahakam Samarinda has successfully built the first domain-sensitive annotated corpus of advice-giving in Indonesian thesis supervision, paving the way for advanced educational text mining. Published in the Formosa Journal of Computer and Information Science (FJCIS) in March 2026, the study was led by Elok Putri Nimasari, Adi Fajaryanto Cobantoro, Mohammad Bhanu Setyawan, Ismail Abdurrozaq, Ariyanti, and Navila Uliya Sahidah. By analyzing authentic, naturally occurring interactions between professors and undergraduate students across four Indonesian universities, the researchers created a structured dataset designed to teach machine learning models how academic authority, collaboration, and emotional support are linguistically expressed.
The Critical Need for Context-Sensitive Educational Text Mining
While modern educational text mining and Natural Language Processing (NLP) technologies are frequently used to analyze massive volumes of student feedback and course reviews, they rarely capture the interactional nuances of personalized supervision. The relationship between a graduate student and their thesis advisor is a cornerstone of academic success, yet poor communication and unresolved power imbalances frequently lead to student isolation, stress, and delayed graduation.In low-resource language settings like Indonesian, the lack of high-quality, manually annotated datasets has historically blocked the development of specialized AI tools capable of evaluating complex educational relationships. Because supervisory feedback blends direct orders with polite suggestions and emotional reassurance, generic sentiment analysis tools fail to grasp the actual intent behind a professor's words. Building a dedicated, linguistically accurate corpus is an essential foundational step toward deploying state-of-the-art architectures, like Bidirectional Encoder Representations from Transformers (BERT), to map these intricate communication patterns.
Methodology: Mining Authentic Supervisory Conversations
To address this resource gap, the research team used a qualitative-informed corpus development design to collect and clean 155 highly specific, naturally occurring utterances from authentic undergraduate thesis supervision transcripts. The sampling strategy purposely spanned four diverse institutions including public, private, Islamic, and non-Islamic universities to capture a representative cross-section of Indonesian higher education culture. The researchers developed an iterative annotation framework grounded in Zhang and Hyland's theories of linguistic power and roles. Every analyzed utterance was stripped of personal names, student IDs, and identifiable thesis titles to protect participant confidentiality. Two independent experts then coded the data based on predominant interactional functions, mapping the communication into three distinct power modes:
The Critical Need for Context-Sensitive Educational Text Mining
While modern educational text mining and Natural Language Processing (NLP) technologies are frequently used to analyze massive volumes of student feedback and course reviews, they rarely capture the interactional nuances of personalized supervision. The relationship between a graduate student and their thesis advisor is a cornerstone of academic success, yet poor communication and unresolved power imbalances frequently lead to student isolation, stress, and delayed graduation.In low-resource language settings like Indonesian, the lack of high-quality, manually annotated datasets has historically blocked the development of specialized AI tools capable of evaluating complex educational relationships. Because supervisory feedback blends direct orders with polite suggestions and emotional reassurance, generic sentiment analysis tools fail to grasp the actual intent behind a professor's words. Building a dedicated, linguistically accurate corpus is an essential foundational step toward deploying state-of-the-art architectures, like Bidirectional Encoder Representations from Transformers (BERT), to map these intricate communication patterns.
Methodology: Mining Authentic Supervisory Conversations
To address this resource gap, the research team used a qualitative-informed corpus development design to collect and clean 155 highly specific, naturally occurring utterances from authentic undergraduate thesis supervision transcripts. The sampling strategy purposely spanned four diverse institutions including public, private, Islamic, and non-Islamic universities to capture a representative cross-section of Indonesian higher education culture. The researchers developed an iterative annotation framework grounded in Zhang and Hyland's theories of linguistic power and roles. Every analyzed utterance was stripped of personal names, student IDs, and identifiable thesis titles to protect participant confidentiality. Two independent experts then coded the data based on predominant interactional functions, mapping the communication into three distinct power modes:
- Power-Over (Directive Advice): Authoritative corrections, explicit tasks, and strict instructions where the advisor exerts structural control.
- Power-Gaining (Collaborative Advice): Step-by-step scaffolding, guided questioning, and contextual suggestions that foster student independence.
- Power-Maintaining (Supportive Advice): Affirming statements and decision validation aimed at reducing student anxiety and keeping interpersonal balance.
To verify the reliability of the dataset, the researchers calculated inter-annotator agreement using Cohen’s Kappa. The process yielded a perfect score of 1.00, proving that the linguistic boundaries between these three modes are highly distinct and mathematically reliable for training machine learning algorithms.
Key Findings: The Heavy Weight of Directive Advice
The resulting dataset exposed a stark imbalance in how thesis guidance is delivered in Indonesian higher education. The corpus characteristics revealed the following distribution:
Implications and Real-World Impact
This dataset serves as an important bridge between qualitative language analysis and AI development. By establishing a robust, reliably labeled corpus, the researchers have given educational technology developers the raw materials needed to fine-tune AI models capable of evaluating the health of academic mentorship. Future software built on this framework could automatically audit supervisory interactions, alerting universities to toxic or overly restrictive advising environments, while highlighting strategies that boost student well-being and graduation rates. Furthermore, the systematic categorization of low-resource languages like Indonesian ensures that future pedagogical AI applications remain culturally sensitive and responsive to local academic hierarchies.
Author Profiles
Elok Putri Nimasari, M.Pd. is a researcher at Universitas Muhammadiyah Ponorogo, specializing in educational linguistics, discourse analysis, and the development of language datasets for text mining.
Adi Fajaryanto Cobantoro, M.Kom. is a computer science faculty member at Universitas Muhammadiyah Ponorogo, focusing on software engineering and educational natural language processing.
Mohammad Bhanu Setyawan, M.Kom. holds a degree in information technology from Universitas Muhammadiyah Ponorogo, with expertise in data mining and computational modeling.
Ismail Abdurrozaq, S.Kom. is a computational linguistics research assistant at Universitas Muhammadiyah Ponorogo.
Ariyanti, Ph.D. is a faculty member at Universitas Widya Gama Mahakam Samarinda, specializing in language education policy and sociocultural discourse analysis.
Navila Uliya Sahidah, S.Pd. is an educational researcher affiliated with Universitas Muhammadiyah Ponorogo.
Key Findings: The Heavy Weight of Directive Advice
The resulting dataset exposed a stark imbalance in how thesis guidance is delivered in Indonesian higher education. The corpus characteristics revealed the following distribution:
- Dominance of Power-Over Modes: Explicit directive advice accounted for a massive 63.9% of the dataset, with specific instruction categories (arahan eksplisit) alone making up 43.9% of all logged utterances.
- Moderated Collaborative Scaffolding: Power-gaining interactions made up 34.8% of the data, showing that step-by-step guidance (bimbingan bertahap) is a secondary but vital technique utilized by advisors.
- Scarcity of Emotional Validation: Explicitly supportive advice (dukungan keputusan) appeared marginally, accounting for a mere 1.3% of the corpus.
Implications and Real-World Impact
This dataset serves as an important bridge between qualitative language analysis and AI development. By establishing a robust, reliably labeled corpus, the researchers have given educational technology developers the raw materials needed to fine-tune AI models capable of evaluating the health of academic mentorship. Future software built on this framework could automatically audit supervisory interactions, alerting universities to toxic or overly restrictive advising environments, while highlighting strategies that boost student well-being and graduation rates. Furthermore, the systematic categorization of low-resource languages like Indonesian ensures that future pedagogical AI applications remain culturally sensitive and responsive to local academic hierarchies.
Author Profiles
Elok Putri Nimasari, M.Pd. is a researcher at Universitas Muhammadiyah Ponorogo, specializing in educational linguistics, discourse analysis, and the development of language datasets for text mining.
Adi Fajaryanto Cobantoro, M.Kom. is a computer science faculty member at Universitas Muhammadiyah Ponorogo, focusing on software engineering and educational natural language processing.
Mohammad Bhanu Setyawan, M.Kom. holds a degree in information technology from Universitas Muhammadiyah Ponorogo, with expertise in data mining and computational modeling.
Ismail Abdurrozaq, S.Kom. is a computational linguistics research assistant at Universitas Muhammadiyah Ponorogo.
Ariyanti, Ph.D. is a faculty member at Universitas Widya Gama Mahakam Samarinda, specializing in language education policy and sociocultural discourse analysis.
Navila Uliya Sahidah, S.Pd. is an educational researcher affiliated with Universitas Muhammadiyah Ponorogo.
Source
Elok Putri Nimasari, Adi Fajaryanto Cobantoro, Mohammad Bhanu Setyawan, Ismail Abdurrozaq, Navila Uliya Sahidah Ariyanti (2026): Building an Annotated Corpus of Advice-Giving in Indonesian Thesis Supervision for Educational Text Mining. Formosa Journal of Computer and Information Science (FJCIS) Vol 5. No.1 2026. Halaman 137-156
DOI:https://doi.org/10.55927/fjcis.v5i1.16529
URL:https://journal.formosapublisher.org/index.php/fjcis
Elok Putri Nimasari, Adi Fajaryanto Cobantoro, Mohammad Bhanu Setyawan, Ismail Abdurrozaq, Navila Uliya Sahidah Ariyanti (2026): Building an Annotated Corpus of Advice-Giving in Indonesian Thesis Supervision for Educational Text Mining. Formosa Journal of Computer and Information Science (FJCIS) Vol 5. No.1 2026. Halaman 137-156
DOI:
URL:

0 Komentar