arXivJournal Paper2025

Ethic-BERT: Ethical and Non-Ethical Content Classification

Abstract

Developing AI systems capable of nuanced ethical reasoning is critical as they increasingly influence human decisions, yet existing models often rely on superficial correlations rather than principled moral understanding. This paper introduces Ethic-BERT, a BERT-based model for ethical content classification across four domains: Commonsense, Justice, Virtue, and Deontology. Leveraging the ETHICS dataset, the approach integrates robust preprocessing to address vocabulary sparsity and contextual ambiguities, alongside advanced fine-tuning strategies like full model unfreezing, gradient accumulation, and adaptive learning rate scheduling. An adversarially filtered 'Hard Test' split isolates complex ethical dilemmas for robustness evaluation.

Key Achievements

82.32% average accuracy on standard test set
15.28% average accuracy improvement on Hard Test
Covers 4 ethical domains: Commonsense, Justice, Virtue, Deontology
Adversarially filtered 'Hard Test' for robustness evaluation
Advanced fine-tuning: gradient accumulation + adaptive LR scheduling
Bias-aware preprocessing pipeline

Topics

BERTEthicsNLPContent ClassificationAdversarial Testing