Getting the Forum ID for episode discussions

Notice

Recent Posts

Recent Comments

Link

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

제니 블로그

Getting the Forum ID for episode discussions 본문

Project

Getting the Forum ID for episode discussions

jennystar 2023. 2. 19. 15:58

When an episode airs, MAL posts discussions for every episode. For example, the anime Bleach has 366 episode discussion posts. Each of them has an unique forum ID, which we can get from web crawling.

We plan on getting the ID of each forum post, and then apply the API to get all the comments from the users that shows their opinions and feelings about that certain episode.

This is an experiement of this project, and I think it is on a positive direction ATM.

Making a web crawler is basic. Jupyter Notebook has a very clean UI that is easily readable and allows users to interactrively develop and test the code. In the field of data science, it provides many data analysis and visualization tools by organizing code into cells that are ran independently.

import requests  # send HTTP requests (GET/POST) to web servers 
from bs4 import BeautifulSoup  # sets up HTML/XML parser

These are the libraries that we will be using throughout the project.

The HTML elements - the data we want is the data-topic-id='#'

We want those ID numbers so we will be using those elements for the parser.

url = f"https://myanimelist.net/forum/?animeid={anime_id}&topic=episode&show={shows}"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
rows = soup.find_all('tr', {'id': lambda x: x and x.startswith('topicRow')})

This snippet will make a get request to the URL and select all HTML `tr` tags that have an `id` attribute starting with the string "topicRow".

The lambda function will check if the value of the id attribute starts with "topicRow", x representing the value of the id attribute for each tr tag that is being searched.

The full code ends up being :

def get_forum_ids(anime_id):
    shows = 0
    page_num = 1
    while True:
        # construct the URL for the current page number
        url = f"https://myanimelist.net/forum/?animeid={anime_id}&topic=episode&show={shows}"
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        rows = soup.find_all('tr', {'id': lambda x: x and x.startswith('topicRow')})
        if not rows:
            break
        
        t_id = []
        for row in rows:
            t_id.append(row['data-topic-id'])
        print(f"Forum IDs for page {page_num}: {t_id}")
        shows += 50 # 50 forum posts for each page
        page_num += 1
        
get_forum_ids(269)

The forum board of episodes start with &show=0, showing the latest episode discussions. It will start from page 1 of the board to until when show= the first episode.

This will create a list for the IDs in each forum post.

The values from the list will be used for the information we need for the analylsis later on!

'Project' 카테고리의 다른 글

Making a Database Schema (0)	2023.03.01
Text Preprocessing (0)	2023.02.25
Getting the Data from API (0)	2023.02.21
MyAnimeList Web Crawler (0)	2023.02.16
Using API from MyAnimeList and making a Database (0)	2023.02.15

'Project' Related Articles

제니 블로그

Getting the Forum ID for episode discussions 본문

Getting the Forum ID for episode discussions

'Project' 카테고리의 다른 글

티스토리툴바