Usage Guidelines
Detailed information about our platform's usage, AI models, credit system, and technical limits.
DeepSeek V3 AI Model
scrapen.space utilizes the DeepSeek V3 AI model for advanced data extraction and processing. This state-of-the-art large language model enables our platform to intelligently parse and structure data from websites with high accuracy.
Model Capabilities
- Advanced pattern recognition for complex data structures
- Semantic understanding of web content
- Ability to extract structured data from unstructured text
- Support for multiple languages and data formats
- Contextual understanding of website layouts and content relationships
Model Limits
While powerful, the DeepSeek V3 model has certain limitations that users should be aware of when planning their workflows:
| Parameter | Limit | Notes |
|---|---|---|
| Maximum Context Length | 128,000 tokens | Content exceeding this limit is automatically split into batches |
| Rate Limits | 10 requests per minute | Higher limits available for enterprise plans |
| Maximum Output Tokens | 4,096 tokens | Per request |
| Supported Languages | 50+ languages | Best performance with English content |
Credit Calculation
Our platform uses a credit-based system to provide fair and transparent pricing based on actual usage. Here's how credits are calculated for AI extraction tasks:
Extract Data with AI Task
- Base Cost: 10 credits per execution
- Token Usage: First 30,000 tokens included in base cost
- Additional Usage: 1 extra credit for every 10,000 tokens beyond the first 30,000
Total Credits = 10 + Math.ceil(Math.max(0, (totalTokens - 30000) / 10000))
Example Calculations
| Token Usage | Credit Cost | Calculation |
|---|---|---|
| 20,000 tokens | 10 credits | Base cost only (under 30,000 tokens) |
| 35,000 tokens | 11 credits | 10 + Math.ceil((35,000 - 30,000) / 10,000) = 10 + 1 = 11 |
| 75,000 tokens | 15 credits | 10 + Math.ceil((75,000 - 30,000) / 10,000) = 10 + 5 = 15 |
| 150,000 tokens | 22 credits | 10 + Math.ceil((150,000 - 30,000) / 10,000) = 10 + 12 = 22 |
Credit usage is logged and displayed in real-time during workflow execution, allowing you to monitor costs as your workflows run.
Proxy Rotation System
To ensure reliable and uninterrupted scraping, scrapen.space implements an advanced proxy rotation system:
- Automatic IP Rotation: Our system automatically rotates through a pool of high-quality proxies to prevent IP blocking and rate limiting.
- Geolocation Options: Access to proxies from multiple geographic locations to bypass region-specific restrictions.
- Failure Handling: Automatic retry with different proxies if a request fails due to proxy-related issues.
- Session Management: Maintain consistent sessions across multiple requests when needed.
Our proxy system is included in all plans at no additional cost, ensuring your workflows remain reliable even when scraping sites with strict anti-bot measures.
Timeout Settings
Timeouts are an important aspect of web scraping to handle unresponsive websites or long-running processes:
| Operation | Default Timeout | Customizable |
|---|---|---|
| Page Navigation | 30 seconds | Yes (5-120 seconds) |
| Element Wait | 15 seconds | Yes (1-60 seconds) |
| AI Processing | 120 seconds | No (fixed) |
| Workflow Execution | 30 minutes | Yes (1-120 minutes) |
| API Requests | 60 seconds | No (fixed) |
Timeout Handling
When a timeout occurs, the system handles it gracefully with the following strategies:
- Automatic Retries: The system will automatically retry failed operations up to 3 times with increasing delays.
- Detailed Logging: Timeout events are logged with detailed information to help diagnose issues.
- Partial Results: When possible, the system will return partial results rather than failing completely.
- Workflow Recovery: Long-running workflows can be resumed from the point of failure in many cases.
Batch Processing
For large content that exceeds the DeepSeek V3 model's context length limit, our system automatically implements batch processing:
- Content Splitting: Large HTML or text content is intelligently split into manageable chunks.
- Parallel Processing: Chunks are processed in parallel when possible to improve performance.
- Result Merging: Results from individual chunks are intelligently merged to provide a cohesive final output.
- Deduplication: Duplicate data from overlapping chunks is automatically removed.
Batch processing is handled automatically by the system and is transparent to the user, ensuring that even very large pages can be processed effectively.
Best Practices
To optimize your experience and credit usage on scrapen.space, we recommend the following best practices:
- Target Specific Elements: Instead of processing entire pages, target specific elements containing the data you need.
- Use Preprocessing: Filter and clean HTML before sending it to the AI model to reduce token usage.
- Implement Pagination: For sites with pagination, process one page at a time rather than attempting to load all pages at once.
- Test on Small Samples: Test your workflows on small data samples before running them on large datasets.
- Monitor Credit Usage: Regularly check your credit usage to identify workflows that might be consuming more credits than expected.