Detecting Obscene Words in Documents Using Python

Fahiz
3 min readSep 20, 2024

--

In today’s digital age, filtering out inappropriate content is essential, especially for websites, apps, and platforms that allow user-generated content. This tutorial will explore how to detect obscene words in a document using Python, focusing on clean and readable code. We will also make use of a library to ease the detection process and provide flexibility for customization.

Table of Contents:

  1. Overview
  2. Setting Up the Environment
  3. Working with a List of Obscene Words
  4. Using the obscenity Python Library (or Alternative)
  5. Step-by-Step Implementation
  6. Full Code
  7. Conclusion

1. Overview

Our goal is to read a text document, analyze it, and flag any obscene words present. This will be a basic solution that can be extended for more sophisticated use cases. We will use a pre-defined list of obscene words and a Python library for detecting them.

2. Setting Up the Environment

We will use Python for this task, along with an external library called obscenity (or a similar one). This library allows us to quickly identify and filter inappropriate content without having to manually create an exhaustive list of obscene words.

3. Working with a List of Obscene Words

You can manually create a list of obscene words if you don’t want to rely solely on an external library. However, maintaining such a list can become cumbersome. Here’s an example:

obscene_words = ['badword1', 'badword2', 'obsceneword']

But to save time and ensure a broader range of words, using an external library that regularly updates is preferable.

4. Using the obscenity Python Library

The obscenity library (or similar ones like better_profanity) is designed to detect profane language and helps in filtering out inappropriate words from user-generated content.

To install the obscenity library, you can run the following command:

pip install obscenity

If you’re using better_profanity, install it with:

pip install better_profanity

Both libraries function similarly by allowing you to check for obscene words in a text string.

5. Step-by-Step Implementation

Let’s walk through the implementation in a clean and simple manner.

Step 1: Import the Required Library

Start by importing the necessary libraries and setting up the basic structure.

from obscenity import check  # For obscenity
# OR
from better_profanity import profanity # Alternative library

# Load the document
def load_document(file_path):
"""Function to load a document from the given file path"""
try:
with open(file_path, 'r') as file:
content = file.read()
return content
except FileNotFoundError:
print(f"File at {file_path} not found.")
return ""

Step 2: Define a Function to Detect Obscene Words

Now we’ll define a function to scan the document for obscene words.

def detect_obscenity(content):
"""Function to detect obscene words in the given content"""
obscene_found = False
if profanity.contains_profanity(content):
print("Obscene words detected!")
obscene_found = True
else:
print("No obscene words detected.")

return obscene_found

Step 3: Main Function to Run the Detection

We will now set up the main function that will combine loading the document and detecting obscene words.

def main(file_path):
"""Main function to load the document and check for obscene words"""
document_content = load_document(file_path)

if document_content:
detect_obscenity(document_content)

Step 4: Running the Code

To run the above code, save it as detect_obscenity.py and call it from the terminal or include a file path directly.

if __name__ == "__main__":
# Provide the path to your document here
file_path = "sample.txt"
main(file_path)

6. Full Code

Here is the complete code:

from obscenity import check  # Or from better_profanity import profanity

def load_document(file_path):
"""Function to load a document from the given file path"""
try:
with open(file_path, 'r') as file:
content = file.read()
return content
except FileNotFoundError:
print(f"File at {file_path} not found.")
return ""

def detect_obscenity(content):
"""Function to detect obscene words in the given content"""
obscene_found = False
if profanity.contains_profanity(content):
print("Obscene words detected!")
obscene_found = True
else:
print("No obscene words detected.")

return obscene_found

def main(file_path):
"""Main function to load the document and check for obscene words"""
document_content = load_document(file_path)

if document_content:
detect_obscenity(document_content)

if __name__ == "__main__":
# Provide the path to your document here
file_path = "sample.txt"
main(file_path)

7. Conclusion

Detecting obscene or inappropriate language in documents or user-generated content can be essential for ensuring a safe and respectful environment on your platform. By using Python and libraries such as obscenity or better_profanity, you can build a profanity filter in just a few steps.

This basic approach can be expanded by incorporating more advanced techniques like machine learning models or customizable word lists based on context.

--

--

Fahiz
Fahiz

No responses yet