Web Scraping Instagram Reels and Pictures with Python

3 min read 1 month ago
Published on Sep 05, 2024 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

In this tutorial, you'll learn how to scrape Instagram Reels and pictures using Python. We'll utilize libraries like Selenium and BeautifulSoup to automate the process of logging in, searching for content, and downloading media files. This guide is ideal for anyone looking to enhance their Python skills while exploring web scraping techniques.

Step 1: Import Python Libraries

Start by importing the necessary libraries. Ensure you have the following installed:

  • Selenium
  • BeautifulSoup
  • Requests

Use the following code to import them:

from selenium import webdriver
from bs4 import BeautifulSoup
import requests

Practical Tips

  • You may need to install these libraries if you haven't already. Use pip install selenium beautifulsoup4 requests in your terminal.

Step 2: Setup Chromedriver

Chromedriver is essential for using Selenium with Chrome. Download the appropriate version for your Chrome browser from the Chromedriver website.

Steps to Set Up

  1. Place the Chromedriver executable in a directory accessible to your script.
  2. Use the following code to initiate the driver:
driver = webdriver.Chrome(executable_path='path_to_chromedriver')

Step 3: Automate Login and Password

To scrape Instagram, you need to log in. Use Selenium to automate this process.

Steps to Automate Login

  1. Navigate to Instagram's login page:
driver.get('https://www.instagram.com/accounts/login/')
  1. Locate the username and password fields and input your credentials:
username_input = driver.find_element_by_name('username')
password_input = driver.find_element_by_name('password')

username_input.send_keys('your_username')
password_input.send_keys('your_password')
  1. Submit the login form:
password_input.submit()

Common Pitfalls

  • Ensure that your account is not set to two-factor authentication, as this may complicate the scraping process.

Step 4: Automate Search

After logging in, you can automate searches for specific content.

Steps to Perform a Search

  1. Use the search bar to find a user, hashtag, or location:
search_url = 'https://www.instagram.com/explore/tags/your_hashtag/'
driver.get(search_url)

Step 5: Gather Post URLs with BeautifulSoup

Once on the desired page, use BeautifulSoup to extract post URLs.

Steps to Extract URLs

  1. Parse the page content:
soup = BeautifulSoup(driver.page_source, 'html.parser')
  1. Find all the relevant post links:
posts = soup.find_all('a', href=True)
post_urls = ['https://www.instagram.com' + post['href'] for post in posts if 'p' in post['href']]

Step 6: Access JSON to Collect URLs

Instagram may serve JSON data for media. Access this data to gather more URLs.

Steps to Access JSON

  1. Use the URL of a specific post to access its JSON data:
json_url = 'https://www.instagram.com/p/your_post_id/?__a=1'
response = requests.get(json_url)
data = response.json()
  1. Extract media URLs from the JSON response:
media_url = data['graphql']['shortcode_media']['display_url']

Step 7: Download Files

Now that you have the media URLs, download the images or videos.

Steps to Download Media

  1. Create a function to handle file downloads:
def download_file(url, filename):
    response = requests.get(url)
    with open(filename, 'wb') as file:
        file.write(response.content)
  1. Call this function for each media URL:
for index, url in enumerate(media_urls):
    download_file(url, f'image_{index}.jpg')

Conclusion

In this tutorial, you’ve learned how to set up a web scraper for Instagram using Python, Selenium, and BeautifulSoup. You can now automate the login process, search for content, gather URLs, and download media files. As a next step, consider exploring additional features such as filtering by date or user engagement. Happy scraping!