arXivJournal Paper2025
Ethic-BERT: Ethical and Non-Ethical Content Classification
Abstract
Developing AI systems capable of nuanced ethical reasoning is critical as they increasingly influence human decisions, yet existing models often rely on superficial correlations rather than principled moral understanding. This paper introduces Ethic-BERT, a BERT-based model for ethical content classification across four domains: Commonsense, Justice, Virtue, and Deontology. Leveraging the ETHICS dataset, the approach integrates robust preprocessing to address vocabulary sparsity and contextual ambiguities, alongside advanced fine-tuning strategies like full model unfreezing, gradient accumulation, and adaptive learning rate scheduling. An adversarially filtered 'Hard Test' split isolates complex ethical dilemmas for robustness evaluation.
Key Achievements
82.32% average accuracy on standard test set
15.28% average accuracy improvement on Hard Test
Covers 4 ethical domains: Commonsense, Justice, Virtue, Deontology
Adversarially filtered 'Hard Test' for robustness evaluation
Advanced fine-tuning: gradient accumulation + adaptive LR scheduling
Bias-aware preprocessing pipeline
Topics
BERTEthicsNLPContent ClassificationAdversarial Testing