arXivJournal Paper2025

Ethic-BERT: Ethical and Non-Ethical Content Classification

Abstract

Developing AI systems capable of nuanced ethical reasoning is critical as they increasingly influence human decisions, yet existing models often rely on superficial correlations rather than principled moral understanding. This paper introduces Ethic-BERT, a BERT-based model for ethical content classification across four domains: Commonsense, Justice, Virtue, and Deontology. Leveraging the ETHICS dataset, the approach integrates robust preprocessing to address vocabulary sparsity and contextual ambiguities, alongside advanced fine-tuning strategies like full model unfreezing, gradient accumulation, and adaptive learning rate scheduling. An adversarially filtered 'Hard Test' split isolates complex ethical dilemmas for robustness evaluation.

Key Achievements

82.32% average accuracy on standard test set

15.28% average accuracy improvement on Hard Test

Covers 4 ethical domains: Commonsense, Justice, Virtue, Deontology

Adversarially filtered 'Hard Test' for robustness evaluation

Advanced fine-tuning: gradient accumulation + adaptive LR scheduling

Bias-aware preprocessing pipeline

Topics

BERTEthicsNLPContent ClassificationAdversarial Testing

Read Full Paper

All Publications