Usage Guidelines

Detailed information about our platform's usage, AI models, credit system, and technical limits.

DeepSeek V3 AI Model

scrapen.space utilizes the DeepSeek V3 AI model for advanced data extraction and processing. This state-of-the-art large language model enables our platform to intelligently parse and structure data from websites with high accuracy.

Model Capabilities

Advanced pattern recognition for complex data structures
Semantic understanding of web content
Ability to extract structured data from unstructured text
Support for multiple languages and data formats
Contextual understanding of website layouts and content relationships

Model Limits

While powerful, the DeepSeek V3 model has certain limitations that users should be aware of when planning their workflows:

Parameter	Limit	Notes
Maximum Context Length	128,000 tokens	Content exceeding this limit is automatically split into batches
Rate Limits	10 requests per minute	Higher limits available for enterprise plans
Maximum Output Tokens	4,096 tokens	Per request
Supported Languages	50+ languages	Best performance with English content

Credit Calculation

Our platform uses a credit-based system to provide fair and transparent pricing based on actual usage. Here's how credits are calculated for AI extraction tasks:

Extract Data with AI Task

Base Cost: 10 credits per execution
Token Usage: First 30,000 tokens included in base cost
Additional Usage: 1 extra credit for every 10,000 tokens beyond the first 30,000

Total Credits = 10 + Math.ceil(Math.max(0, (totalTokens - 30000) / 10000))

Example Calculations

Token Usage	Credit Cost	Calculation
20,000 tokens	10 credits	Base cost only (under 30,000 tokens)
35,000 tokens	11 credits	10 + Math.ceil((35,000 - 30,000) / 10,000) = 10 + 1 = 11
75,000 tokens	15 credits	10 + Math.ceil((75,000 - 30,000) / 10,000) = 10 + 5 = 15
150,000 tokens	22 credits	10 + Math.ceil((150,000 - 30,000) / 10,000) = 10 + 12 = 22

Credit usage is logged and displayed in real-time during workflow execution, allowing you to monitor costs as your workflows run.

Proxy Rotation System

To ensure reliable and uninterrupted scraping, scrapen.space implements an advanced proxy rotation system:

Automatic IP Rotation: Our system automatically rotates through a pool of high-quality proxies to prevent IP blocking and rate limiting.
Geolocation Options: Access to proxies from multiple geographic locations to bypass region-specific restrictions.
Failure Handling: Automatic retry with different proxies if a request fails due to proxy-related issues.
Session Management: Maintain consistent sessions across multiple requests when needed.

Our proxy system is included in all plans at no additional cost, ensuring your workflows remain reliable even when scraping sites with strict anti-bot measures.

Timeout Settings

Timeouts are an important aspect of web scraping to handle unresponsive websites or long-running processes:

Operation	Default Timeout	Customizable
Page Navigation	30 seconds	Yes (5-120 seconds)
Element Wait	15 seconds	Yes (1-60 seconds)
AI Processing	120 seconds	No (fixed)
Workflow Execution	30 minutes	Yes (1-120 minutes)
API Requests	60 seconds	No (fixed)

Timeout Handling

When a timeout occurs, the system handles it gracefully with the following strategies:

Automatic Retries: The system will automatically retry failed operations up to 3 times with increasing delays.
Detailed Logging: Timeout events are logged with detailed information to help diagnose issues.
Partial Results: When possible, the system will return partial results rather than failing completely.
Workflow Recovery: Long-running workflows can be resumed from the point of failure in many cases.

Batch Processing

For large content that exceeds the DeepSeek V3 model's context length limit, our system automatically implements batch processing:

Content Splitting: Large HTML or text content is intelligently split into manageable chunks.
Parallel Processing: Chunks are processed in parallel when possible to improve performance.
Result Merging: Results from individual chunks are intelligently merged to provide a cohesive final output.
Deduplication: Duplicate data from overlapping chunks is automatically removed.

Batch processing is handled automatically by the system and is transparent to the user, ensuring that even very large pages can be processed effectively.

Best Practices

To optimize your experience and credit usage on scrapen.space, we recommend the following best practices:

Target Specific Elements: Instead of processing entire pages, target specific elements containing the data you need.
Use Preprocessing: Filter and clean HTML before sending it to the AI model to reduce token usage.
Implement Pagination: For sites with pagination, process one page at a time rather than attempting to load all pages at once.
Test on Small Samples: Test your workflows on small data samples before running them on large datasets.
Monitor Credit Usage: Regularly check your credit usage to identify workflows that might be consuming more credits than expected.