What is Custom Extraction ?

 Aap Custom Extraction ke baare mein puch rahe ho, toh main SEO / web scraping context me explain karta hoon. Agar aap kisi aur context me puch rahe ho, toh bata dena.

Custom Extraction kya hai?

Custom Extraction ka matlab hai specific data ko website ya source se manually define karke nikalna, jaise ki Screaming Frog SEO tool me hota hai. Ye automatic scraping se zyada targeted aur specific hota hai.

Use case / Importance:

  1. Aap kisi bhi website ke specific elements ko extract kar sakte ho, jaise:

    • Product prices

    • Meta descriptions

    • H1, H2 tags

    • Image URLs

    • Structured data

  2. SEO audits me use hota hai jab default crawling info se kaam nahi banta.

  3. Large websites ke liye useful, jahan sirf particular data chahiye.

Kaise kaam karta hai (Screaming Frog example):

  1. Crawl website: Website ko Screaming Frog me crawl karo.

  2. Custom Extraction tab: Yahan XPath ya regex define karo, jisse specific data pick ho.

  3. Extracted Data: Tool crawl ke baad woh data report me show karega aur CSV export bhi kar sakte ho.

Example:

Agar aapko product price extract karna hai:

  • XPath: //span[@class='price']

  • Ya Regex: ₹(\d+,?\d*)

Ye sirf matching price ko extract karega, baaki content ignore karega.

Agar chaho, main aapko Screaming Frog me step-by-step Custom Extraction setup ka practical example bana ke dikha sakta hoon.

Kya mai wo bana doon?

1️⃣ Custom Extraction Meaning (Detail)

Custom Extraction ka matlab hai specific data ko define karke extract karna, jo normal crawl se nahi milta.

  • Normal crawl: Title, Meta, H1, H2, URL sab automatically milta hai.

  • Custom Extraction: Agar aapko special info chahiye (jaise product price, SKU, ratings, reviews), tab use karte hain.


2️⃣ Kaam kaise karta hai (Technical)

Aap XPaths, CSS Selectors, ya Regex use karke define karte ho ki exactly kya extract karna hai.

  • XPath: XML/HTML ke elements ko locate karne ke liye.
    Example: //div[@class='product-price']/span → Ye product price pick karega.

  • CSS Selector: Web element ko style/class ke basis pe select karna.
    Example: div.product-price > span

  • Regex (Regular Expression): Text ke pattern ke basis pe extract karna.
    Example: ₹(\d+,?\d*) → Price ke numbers extract karne ke liye


3️⃣ Use Cases

  1. E-commerce websites: Product name, price, SKU, ratings extract karna

  2. Blogs / News websites: Author name, publish date, categories extract karna

  3. SEO Audits: H1, H2, Meta tags ke custom checks

  4. Competitor Analysis: Competitor ke specific product info scrape karna


4️⃣ Screaming Frog me Steps

  1. Screaming Frog open karo → Website crawl karo

  2. Top menu me Configuration → Custom → Extraction

  3. Naya extraction rule add karo:

    • Type: XPath / Regex / CSS Selector

    • Name: Field ka naam (jaise Product Price)

    • Pattern: XPath ya Regex define karo

  4. Crawl complete hone ke baad Custom Extraction tab me extracted data dikhega

  5. CSV export karke analysis karo


5️⃣ Tips / Best Practices

  • XPath aur Regex accurate hona chahiye, warna data miss ho sakta hai

  • Large websites me filters use karo, warna unnecessary data bhi aa jayega

  • Screaming Frog me Preview option use karo taaki extraction pehle test ho jaye


Agar chaho, mai aapke liye ek real example website se product price, H1, Meta description extract karne ka full practical setup Screaming Frog me bana ke dikha sakta hoon, step by step.

Kya mai wo bana doon?

6️⃣ Advanced Custom Extraction Concepts

A. XPath vs CSS Selector vs Regex

MethodUse CaseExample
XPathHTML structure pe based, accurate//h1[@class='product-title']
CSS SelectorSimple element selection by class/iddiv.product-title > h1
RegexText pattern match, flexible₹(\d+,?\d+) → price numbers extract

Tip: Agar element dynamically load hota hai (JavaScript), XPath/CSS directly fail ho sakta hai, tab aapko Render JavaScript option use karna padega Screaming Frog me.


B. Screaming Frog Custom Extraction Types

  1. XPath Extraction → Exact HTML element select karne ke liye

  2. Regex Extraction → Text pattern se data match karne ke liye

  3. Multiple Extraction Rules → Ek se zyada data fields ek hi crawl me extract karne ke liye

  4. Data Preview → Crawl ke dauran live test karke verify kar sakte ho


C. Example: E-commerce Product Crawl

Suppose aapko ek e-commerce site se extract karna hai:

FieldXPath / Regex Example
Product Name//h1[@class='product-title']
Price//span[@class='price'] or Regex ₹(\d+,?\d+)
SKU//span[@id='sku']
Rating//div[@class='rating']/@data-rating
Availability//p[@class='availability']

Is tarah se aap all important fields ek hi crawl me extract kar sakte ho.


D. Tips for Large Scale Extraction

  1. Filter by URL pattern → Sirf relevant pages crawl karo

  2. Avoid Overloading Server → Crawl speed adjust karo

  3. Test Extraction Rules → Pehle 5–10 pages pe test karo

  4. Export Regularly → CSV/Excel me data save karo


E. Common Problems & Fixes

  • Problem: Empty extraction field
    Fix: XPath ya Regex incorrect hai, ya element dynamically load ho raha hai

  • Problem: Multiple matches for one field
    Fix: Use position() in XPath ya refine Regex pattern

  • Problem: JavaScript generated content missing
    Fix: Enable “Render JavaScript” in Screaming Fro

Comments

Popular posts from this blog

Business ko listing websites par submit karna kya hota hai?"

Excel Basic Short Keys ?

Robots.txt SEO me kis purpose se use ki jaati hai?"