In today’s digital age, filtering out inappropriate content is essential, especially for websites, apps, and platforms that allow user-generated content. This tutorial will explore how to detect obscene words in a document using Python, focusing on clean and readable code. We will also make use of a library to ease the detection process and provide flexibility for customization.
Table of Contents:
- Overview
- Setting Up the Environment
- Working with a List of Obscene Words
- Using the
obscenity
Python Library (or Alternative) - Step-by-Step Implementation
- Full Code
- Conclusion
1. Overview
Our goal is to read a text document, analyze it, and flag any obscene words present. This will be a basic solution that can be extended for more sophisticated use cases. We will use a pre-defined list of obscene words and a Python library for detecting them.
2. Setting Up the Environment
We will use Python for this task, along with an external library called obscenity
(or a similar one). This library allows us to quickly identify and filter inappropriate content without having to manually create an exhaustive list of obscene words.
3. Working with a List of Obscene Words
You can manually create a list of obscene words if you don’t want to rely solely on an external library. However, maintaining such a list can become cumbersome. Here’s an example:
obscene_words = ['badword1', 'badword2', 'obsceneword']
But to save time and ensure a broader range of words, using an external library that regularly updates is preferable.
4. Using the obscenity
Python Library
The obscenity
library (or similar ones like better_profanity
) is designed to detect profane language and helps in filtering out inappropriate words from user-generated content.
To install the obscenity
library, you can run the following command:
pip install obscenity
If you’re using better_profanity
, install it with:
pip install better_profanity
Both libraries function similarly by allowing you to check for obscene words in a text string.
5. Step-by-Step Implementation
Let’s walk through the implementation in a clean and simple manner.
Step 1: Import the Required Library
Start by importing the necessary libraries and setting up the basic structure.
from obscenity import check # For obscenity
# OR
from better_profanity import profanity # Alternative library
# Load the document
def load_document(file_path):
"""Function to load a document from the given file path"""
try:
with open(file_path, 'r') as file:
content = file.read()
return content
except FileNotFoundError:
print(f"File at {file_path} not found.")
return ""
Step 2: Define a Function to Detect Obscene Words
Now we’ll define a function to scan the document for obscene words.
def detect_obscenity(content):
"""Function to detect obscene words in the given content"""
obscene_found = False
if profanity.contains_profanity(content):
print("Obscene words detected!")
obscene_found = True
else:
print("No obscene words detected.")
return obscene_found
Step 3: Main Function to Run the Detection
We will now set up the main function that will combine loading the document and detecting obscene words.
def main(file_path):
"""Main function to load the document and check for obscene words"""
document_content = load_document(file_path)
if document_content:
detect_obscenity(document_content)
Step 4: Running the Code
To run the above code, save it as detect_obscenity.py
and call it from the terminal or include a file path directly.
if __name__ == "__main__":
# Provide the path to your document here
file_path = "sample.txt"
main(file_path)
6. Full Code
Here is the complete code:
from obscenity import check # Or from better_profanity import profanity
def load_document(file_path):
"""Function to load a document from the given file path"""
try:
with open(file_path, 'r') as file:
content = file.read()
return content
except FileNotFoundError:
print(f"File at {file_path} not found.")
return ""
def detect_obscenity(content):
"""Function to detect obscene words in the given content"""
obscene_found = False
if profanity.contains_profanity(content):
print("Obscene words detected!")
obscene_found = True
else:
print("No obscene words detected.")
return obscene_found
def main(file_path):
"""Main function to load the document and check for obscene words"""
document_content = load_document(file_path)
if document_content:
detect_obscenity(document_content)
if __name__ == "__main__":
# Provide the path to your document here
file_path = "sample.txt"
main(file_path)
7. Conclusion
Detecting obscene or inappropriate language in documents or user-generated content can be essential for ensuring a safe and respectful environment on your platform. By using Python and libraries such as obscenity
or better_profanity
, you can build a profanity filter in just a few steps.
This basic approach can be expanded by incorporating more advanced techniques like machine learning models or customizable word lists based on context.