Detecting Obscene Words in Documents Using Python

3 min readSep 20, 2024

In today’s digital age, filtering out inappropriate content is essential, especially for websites, apps, and platforms that allow user-generated content. This tutorial will explore how to detect obscene words in a document using Python, focusing on clean and readable code. We will also make use of a library to ease the detection process and provide flexibility for customization.

1. Overview

Our goal is to read a text document, analyze it, and flag any obscene words present. This will be a basic solution that can be extended for more sophisticated use cases. We will use a pre-defined list of obscene words and a Python library for detecting them.

2. Setting Up the Environment

We will use Python for this task, along with an external library called obscenity (or a similar one). This library allows us to quickly identify and filter inappropriate content without having to manually create an exhaustive list of obscene words.

3. Working with a List of Obscene Words

You can manually create a list of obscene words if you don’t want to rely solely on an external library. However, maintaining such a list can become cumbersome. Here’s an example:

obscene_words = ['badword1', 'badword2', 'obsceneword']

But to save time and ensure a broader range of words, using an external library that regularly updates is preferable.

4. Using the `obscenity` Python Library

The obscenity library (or similar ones like better_profanity) is designed to detect profane language and helps in filtering out inappropriate words from user-generated content.

To install the obscenity library, you can run the following command:

pip install obscenity

If you’re using better_profanity, install it with:

pip install better_profanity

Both libraries function similarly by allowing you to check for obscene words in a text string.

5. Step-by-Step Implementation

Let’s walk through the implementation in a clean and simple manner.

Step 1: Import the Required Library

Start by importing the necessary libraries and setting up the basic structure.

from obscenity import check  # For obscenity
# OR
from better_profanity import profanity  # Alternative library

# Load the document
def load_document(file_path):
    """Function to load a document from the given file path"""
    try:
        with open(file_path, 'r') as file:
            content = file.read()
        return content
    except FileNotFoundError:
        print(f"File at {file_path} not found.")
        return ""

Step 2: Define a Function to Detect Obscene Words

Now we’ll define a function to scan the document for obscene words.

def detect_obscenity(content):
    """Function to detect obscene words in the given content"""
    obscene_found = False
    if profanity.contains_profanity(content):
        print("Obscene words detected!")
        obscene_found = True
    else:
        print("No obscene words detected.")
    
    return obscene_found

Step 3: Main Function to Run the Detection

We will now set up the main function that will combine loading the document and detecting obscene words.

def main(file_path):
    """Main function to load the document and check for obscene words"""
    document_content = load_document(file_path)
    
    if document_content:
        detect_obscenity(document_content)

Step 4: Running the Code

To run the above code, save it as detect_obscenity.py and call it from the terminal or include a file path directly.

if __name__ == "__main__":
    # Provide the path to your document here
    file_path = "sample.txt"
    main(file_path)

6. Full Code

Here is the complete code:

from obscenity import check  # Or from better_profanity import profanity

def load_document(file_path):
    """Function to load a document from the given file path"""
    try:
        with open(file_path, 'r') as file:
            content = file.read()
        return content
    except FileNotFoundError:
        print(f"File at {file_path} not found.")
        return ""

def detect_obscenity(content):
    """Function to detect obscene words in the given content"""
    obscene_found = False
    if profanity.contains_profanity(content):
        print("Obscene words detected!")
        obscene_found = True
    else:
        print("No obscene words detected.")
    
    return obscene_found

def main(file_path):
    """Main function to load the document and check for obscene words"""
    document_content = load_document(file_path)
    
    if document_content:
        detect_obscenity(document_content)

if __name__ == "__main__":
    # Provide the path to your document here
    file_path = "sample.txt"
    main(file_path)

7. Conclusion

Detecting obscene or inappropriate language in documents or user-generated content can be essential for ensuring a safe and respectful environment on your platform. By using Python and libraries such as obscenity or better_profanity, you can build a profanity filter in just a few steps.

This basic approach can be expanded by incorporating more advanced techniques like machine learning models or customizable word lists based on context.

Detecting Obscene Words in Documents Using Python

Table of Contents:

1. Overview

2. Setting Up the Environment

3. Working with a List of Obscene Words

4. Using the `obscenity` Python Library

5. Step-by-Step Implementation

Step 1: Import the Required Library

Step 2: Define a Function to Detect Obscene Words

Step 3: Main Function to Run the Detection

Step 4: Running the Code

6. Full Code

7. Conclusion

Written by Fahiz

No responses yet

Detecting Obscene Words in Documents Using Python

Table of Contents:

1. Overview

2. Setting Up the Environment

3. Working with a List of Obscene Words

4. Using the obscenity Python Library

5. Step-by-Step Implementation

Step 1: Import the Required Library

Step 2: Define a Function to Detect Obscene Words

Step 3: Main Function to Run the Detection

Step 4: Running the Code

6. Full Code

7. Conclusion

Written by Fahiz

No responses yet

4. Using the `obscenity` Python Library