What is Custom Extraction ?
- Get link
- X
- Other Apps
Aap Custom Extraction ke baare mein puch rahe ho, toh main SEO / web scraping context me explain karta hoon. Agar aap kisi aur context me puch rahe ho, toh bata dena.
Custom Extraction kya hai?
Custom Extraction ka matlab hai specific data ko website ya source se manually define karke nikalna, jaise ki Screaming Frog SEO tool me hota hai. Ye automatic scraping se zyada targeted aur specific hota hai.
Use case / Importance:
-
Aap kisi bhi website ke specific elements ko extract kar sakte ho, jaise:
-
Product prices
-
Meta descriptions
-
H1, H2 tags
-
Image URLs
-
Structured data
-
-
SEO audits me use hota hai jab default crawling info se kaam nahi banta.
-
Large websites ke liye useful, jahan sirf particular data chahiye.
Kaise kaam karta hai (Screaming Frog example):
-
Crawl website: Website ko Screaming Frog me crawl karo.
-
Custom Extraction tab: Yahan XPath ya regex define karo, jisse specific data pick ho.
-
Extracted Data: Tool crawl ke baad woh data report me show karega aur CSV export bhi kar sakte ho.
Example:
Agar aapko product price extract karna hai:
-
XPath:
//span[@class='price'] -
Ya Regex:
₹(\d+,?\d*)
Ye sirf matching price ko extract karega, baaki content ignore karega.
Agar chaho, main aapko Screaming Frog me step-by-step Custom Extraction setup ka practical example bana ke dikha sakta hoon.
Kya mai wo bana doon?
1️⃣ Custom Extraction Meaning (Detail)
Custom Extraction ka matlab hai specific data ko define karke extract karna, jo normal crawl se nahi milta.
-
Normal crawl: Title, Meta, H1, H2, URL sab automatically milta hai.
-
Custom Extraction: Agar aapko special info chahiye (jaise product price, SKU, ratings, reviews), tab use karte hain.
2️⃣ Kaam kaise karta hai (Technical)
Aap XPaths, CSS Selectors, ya Regex use karke define karte ho ki exactly kya extract karna hai.
-
XPath: XML/HTML ke elements ko locate karne ke liye.
Example://div[@class='product-price']/span→ Ye product price pick karega. -
CSS Selector: Web element ko style/class ke basis pe select karna.
Example:div.product-price > span -
Regex (Regular Expression): Text ke pattern ke basis pe extract karna.
Example:₹(\d+,?\d*)→ Price ke numbers extract karne ke liye
3️⃣ Use Cases
-
E-commerce websites: Product name, price, SKU, ratings extract karna
-
Blogs / News websites: Author name, publish date, categories extract karna
-
SEO Audits: H1, H2, Meta tags ke custom checks
-
Competitor Analysis: Competitor ke specific product info scrape karna
4️⃣ Screaming Frog me Steps
-
Screaming Frog open karo → Website crawl karo
-
Top menu me Configuration → Custom → Extraction
-
Naya extraction rule add karo:
-
Type: XPath / Regex / CSS Selector
-
Name: Field ka naam (jaise Product Price)
-
Pattern: XPath ya Regex define karo
-
-
Crawl complete hone ke baad Custom Extraction tab me extracted data dikhega
-
CSV export karke analysis karo
5️⃣ Tips / Best Practices
-
XPath aur Regex accurate hona chahiye, warna data miss ho sakta hai
-
Large websites me filters use karo, warna unnecessary data bhi aa jayega
-
Screaming Frog me Preview option use karo taaki extraction pehle test ho jaye
Agar chaho, mai aapke liye ek real example website se product price, H1, Meta description extract karne ka full practical setup Screaming Frog me bana ke dikha sakta hoon, step by step.
Kya mai wo bana doon?
6️⃣ Advanced Custom Extraction Concepts
A. XPath vs CSS Selector vs Regex
| Method | Use Case | Example |
|---|---|---|
| XPath | HTML structure pe based, accurate | //h1[@class='product-title'] |
| CSS Selector | Simple element selection by class/id | div.product-title > h1 |
| Regex | Text pattern match, flexible | ₹(\d+,?\d+) → price numbers extract |
Tip: Agar element dynamically load hota hai (JavaScript), XPath/CSS directly fail ho sakta hai, tab aapko Render JavaScript option use karna padega Screaming Frog me.
B. Screaming Frog Custom Extraction Types
-
XPath Extraction → Exact HTML element select karne ke liye
-
Regex Extraction → Text pattern se data match karne ke liye
-
Multiple Extraction Rules → Ek se zyada data fields ek hi crawl me extract karne ke liye
-
Data Preview → Crawl ke dauran live test karke verify kar sakte ho
C. Example: E-commerce Product Crawl
Suppose aapko ek e-commerce site se extract karna hai:
| Field | XPath / Regex Example |
|---|---|
| Product Name | //h1[@class='product-title'] |
| Price | //span[@class='price'] or Regex ₹(\d+,?\d+) |
| SKU | //span[@id='sku'] |
| Rating | //div[@class='rating']/@data-rating |
| Availability | //p[@class='availability'] |
Is tarah se aap all important fields ek hi crawl me extract kar sakte ho.
D. Tips for Large Scale Extraction
-
Filter by URL pattern → Sirf relevant pages crawl karo
-
Avoid Overloading Server → Crawl speed adjust karo
-
Test Extraction Rules → Pehle 5–10 pages pe test karo
-
Export Regularly → CSV/Excel me data save karo
E. Common Problems & Fixes
-
Problem: Empty extraction field
Fix: XPath ya Regex incorrect hai, ya element dynamically load ho raha hai -
Problem: Multiple matches for one field
Fix: Useposition()in XPath ya refine Regex pattern -
Problem: JavaScript generated content missing
Fix: Enable “Render JavaScript” in Screaming Fro
- Get link
- X
- Other Apps
Comments
Post a Comment