Task #181: Generic Web Scraping Engine for Vidyarti - VIDYARTI - sita softwares - project management and bug tracking

Actions

Copy link

Task #181

open

Generic Web Scraping Engine for Vidyarti

Added by Dana Basheer about 2 months ago. Updated about 2 months ago.

Status:

In Progress

Priority:

High

Assignee:

Sreemayi C M

Start date:

03/17/2026

Due date:

% Done:

Estimated time:

Description

Develop a centralized web scraping module that can fetch data from multiple external sources and map it into different modules of Vidyarti such as:

Syllabus
Current Affairs
Mock Test Questions
Study Materials

The system should be configurable, reusable, and scalable.

Table

vid_scraping_source_master

id INT (PK) Source ID
source_name VARCHAR(150) Website name
base_url VARCHAR(255) Website URL
module_type ENUM('current_affairs','syllabus','mock_test','study_material') Target module
parsing_rules TEXT JSON rules for scraping
status BOOLEAN Active/Inactive
created_at DATETIME Created date

vid_scraped_data_staging

id INT (PK) ID
source_id INT (FK) Reference source
module_type VARCHAR(50) Target module
raw_title TEXT Extracted title
raw_content TEXT Extracted content
raw_data JSON Full raw scraped data
source_url VARCHAR(255) Original link
status ENUM('pending','approved','rejected') Workflow status
created_at DATETIME Scraped time

vid_scraping_logs

id INT (PK) Log ID
source_id INT Source reference
status VARCHAR(50) Success/Failed
message TEXT Error or success message
run_time DATETIME Execution time

Validations

Backend

source_name → required
base_url → valid URL
module_type → must be valid enum
source_url → unique (avoid duplicates)
Prevent duplicate data:
Same source_url OR same title

Frontend

Required fields:

Source Name
URL
Module Type
JSON validation for parsing rules
Show preview test scraping (optional)

Actions

Copy link

Updated by Sreemayi C M about 2 months ago

Status changed from New to In Progress

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

VIDYARTI

Custom queries

Task #181

Generic Web Scraping Engine for Vidyarti

Updated by Sreemayi C M about 2 months ago