INTERVIEW QUESTION - Count the frequency of words appearing in a string Using Python
Table of Contents
Introduction
In this tutorial, we will learn how to count the frequency of words appearing in a string using Python. This is a common interview question that tests your understanding of string manipulation and data structures. By the end of this guide, you will be able to implement a function that efficiently counts word occurrences in any given string.
Step 1: Set Up Your Python Environment
Before diving into the code, ensure that you have Python installed on your computer. You can download it from the official Python website.
- Open your code editor or IDE (like PyCharm, VSCode, or Jupyter Notebook).
- Create a new Python file named
word_frequency.py
.
Step 2: Define the Function
Let's start by defining a function that will take a string as input and return a dictionary containing the frequency of each word.
def word_frequency(input_string):
# Convert the string to lowercase to make the count case insensitive
input_string = input_string.lower()
# Remove punctuation from the string
for char in '-.,\n':
input_string = input_string.replace(char, ' ')
# Split the string into words
words = input_string.split()
# Create a dictionary to hold word counts
frequency = {}
# Count the frequency of each word
for word in words:
if word in frequency:
frequency[word] += 1
else:
frequency[word] = 1
return frequency
Practical Tips
- Using
lower()
ensures that the counting is case insensitive, meaning "Word" and "word" will be counted as the same. - Removing punctuation is essential to avoid counting words like "word." and "word," as separate entities.
Step 3: Test the Function
Now that we have defined our function, let's test it with an example string.
if __name__ == "__main__":
test_string = "Hello world! Hello everyone. Welcome to the world of Python."
result = word_frequency(test_string)
print(result)
Common Pitfalls
- Make sure to test your function with various strings, including those with different casing and punctuation, to ensure accuracy.
- Watch out for trailing spaces or special characters that might affect word splitting.
Step 4: Analyze the Output
When you run the script, you should see the frequency count of each word printed as a dictionary. For example, the output might look like this:
{'hello': 2, 'world': 2, 'everyone': 1, 'welcome': 1, 'to': 1, 'the': 1, 'of': 1, 'python': 1}
Real-World Application
This function can be useful in various scenarios, such as:
- Analyzing text data in natural language processing.
- Generating word clouds to visualize text data.
- Conducting data analysis for text-based applications.
Conclusion
In this tutorial, we learned how to count the frequency of words in a string using Python. We defined a function that handles punctuation and case sensitivity, tested it with an example, and analyzed the output. You can enhance this function further by adding features such as ignoring common stop words or sorting the results by frequency. Happy coding!