Searching through 18 different words for one word would be slow. This ultimately gives us 18 different options to search for: Looking at each individual letter, s h i t: Let's look at how this would work in practice, with the word shit. To do this, I use some recursion to add extra TrieNodes when I find a letter that has a leetspeak equivalent - passing the remainder of the letters as the word to insert to fill in the Trie with all possibilities. So for my example, I want to not only add all my words from my bad words list to my Trie, but I also wanted to add leet speak variations to Trie to catch as many foul words as possible. Here is an example image from TheoryOfProgramming: To "build" the Trie you will insert all of your words into the structure, and then you can search to see if a word is in your Trie. The _MAX_SIZE is 40 because that is the size of my Trie itself has two base functions: insert and search. The implementation is stored here, but the code is very simple:Įnter fullscreen mode Exit fullscreen mode Tries are made up of nodes, and each node contains two properties: children representing the nodes beneath the node (usually an array), and a way to designate the END of a word. The purpose of this post isn't to teach Tries: so if you'd like to read more, this GeeksforGeeks post is a great resource. It's a unique type of tree because unlike a binary tree, no node in the tree stores the key that you're searching for - instead, the key itself is distributed. Thankfully, this is almost a textbook use-case for a Trie.Ī Trie, or prefix tree, is an ordered-tree structure used to store dynamic information that can be searched for. If your list has only 7 words in it, this might be acceptable! But with over a thousand words, this isn't going to work. Searching a python list ( x in list) is O(n) where n is the number of items in your list. The problem with this approach is time complexity. Every time you come across a word, you check to see if it exists in your list of "bad words". The naive approach would be a list search. Next we need a way to see if any words in a message happens to be in the designated list of "bad words". That being said - there were a lot of words that could be seen as "controversial" that I didn't feel the need to censor. But I wanted to be as thorough as possible - so I chose this list of 1300 words from CMU. Another article I found had 26 swear words listed. You could choose a classic, the 7 words you can't say by George Carlin. Depending on what you're comfortable with, this can vary quite a bit. The first problem is establishing a list of "bad words". In building this bot there were 4 areas that I focused on in development and I'll be focusing on each shortly below. If you have any requests or bugs to report, you can report an issue here on Github or you can reach out to me on Twitter, the bot If you'd like to take a look at the bot, all of the code is available here on Github. It requires the manage messages permission as it deletes messages with bad words. FYI: You must be an administrator on the Discord server to install it. If you'd like to install the bot on your Discord server, simply click this link. Personally, I have much worse things to worry about in 2020. If that bothers you, I would recommend not reading this. I will be talking about swear words in this write-up. This is a short write-up of how that bot works.ĭisclaimer: This bot (and the following write-up) revolves around swear words. It also can determine if the user is trying to get around the filter by using "leetspeak". My latest experiment has been a bot that can determine if a message has a "bad word" in it, and if so, deletes the message, and then shames the user. Now that it's been 7 years since quarantine began, I've been expanding my assortment of random Discord bots.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |