#web-scraping

[ follow ]
fromRaymondcamden
1 week ago

Using AgentQL and Pipedream to Fix Missing RSS Feeds

AgentQL efficiently transforms web page data into structured formats, facilitating tasks like creating RSS feeds from blogs without existing feeds.
#ai
fromHackernoon
1 year ago
Tech industry

The TechBeat: Downside Liquidity: A Hypothesis on Short Pools for EVM (7/22/2025) | HackerNoon

fromHackernoon
1 year ago
Tech industry

The TechBeat: Downside Liquidity: A Hypothesis on Short Pools for EVM (7/22/2025) | HackerNoon

fromHackernoon
1 year ago

The TechBeat: Welcome to the Museum of AI Hallucinations (7/20/2025) | HackerNoon

Targeted campaigns, effective publisher selection, and real-time optimization can drive scalable growth for crypto brands, showcasing the importance of strategic marketing in this sector.
Tech industry
fromHackernoon
2 years ago

The HackerNoon Newsletter: Outsmarting Akamais Bot Detection with JA3Proxy (7/19/2025) | HackerNoon

The Machine Economy represents not just process optimization but a profound shift in the underlying forces that drive economics, as machines take more control over economic functions.
Tech industry
fromHackernoon
2 years ago

Teaching Your AI to Read: A Guide to Scraping, RAG, and Smart Data Insights | HackerNoon

Large Language Models are reshaping data analysis by allowing natural language queries instead of traditional Business Intelligence tools.
fromHackernoon
2 years ago

Scrape Smarter, Not Harder: Let MCP and AI Write Your Next Scraper for You | HackerNoon

The Model Context Protocol (MCP) is an open standard that enables large language models to interact with external tools and data through a standardized interface.
Web development
fromHackernoon
2 weeks ago

Kasada Anti-Bot Bypass Techniques: Save Money with These Open-Source Solutions | HackerNoon

The easiest way to detect when a website is using Kasada is by asking it for Wappalyzer, which has a browser extension you can use while visiting a website to detect its tech stack.
E-Commerce
#ai-bots
fromZDNET
3 weeks ago
Privacy technologies

This open-source bot blocker shields your site from pesky AI scrapers - here's how

fromZDNET
3 weeks ago
Privacy technologies

This open-source bot blocker shields your site from pesky AI scrapers - here's how

#cloudflare
fromBusiness Matters
1 month ago

Antidetect Browser + Automation: A Safe Setup for Web Scraping and Botting

The integration of antidetect browsers with automation frameworks is essential to counteract advanced web scraping barriers implemented by websites.
fromZDNET
1 month ago

This proxy provider I tested is the best for web scraping - and it's not IPRoyal or MarsProxies

Oxylabs provides a robust and ethical web scraping service powered by a vast network of residential proxies.
#nodejs
Privacy technologies
fromZDNET
1 month ago

Paid proxy servers vs free proxies: Is paying for a proxy service worth it?

Proxy servers serve as gateways, providing anonymity and various functionalities for both individuals and businesses.
Artificial intelligence
fromZDNET
1 month ago

Reddit sues Anthropic for scraping its users' content without consent

Reddit sues Anthropic for breaching user privacy by scraping content without consent, amid increasing legal challenges to AI content usage.
fromNature
1 month ago

Web-scraping AI bots cause disruption for scientific databases and journals

It's the wild west at the moment, the biggest issue is the sheer volume of requests [to access a website], which is causing strain on their systems. It costs money and causes disruption to genuine users.
Artificial intelligence
Artificial intelligence
fromHackernoon
3 years ago

Behind the Scenes of Using Web Scraping and AI in Investigative Journalism | HackerNoon

Web scraping is essential for journalists to extract public information and hold authorities accountable.
fromHackernoon
4 months ago

AI and Proxies: Are They Connected? | HackerNoon

Proxies are crucial for overcoming data collection barriers in machine learning.
E-Commerce
fromEntrepreneur
3 months ago

How Web Data Helps You Stay Ahead of the Competition | Entrepreneur

Ecommerce businesses need to leverage public web data for better decision-making across industries.
#ai-technology
Privacy technologies
fromArs Technica
3 months ago

AI bots strain Wikimedia as bandwidth surges 50%

AI crawlers are circumventing established rules, creating challenges for content platforms.
Wikimedia is focusing on a systemic initiative to address scraping issues and protect its infrastructure.
Artificial intelligence
fromTechzine Global
3 months ago

Wikimedia is dealing with a 50 percent increase in bandwidth due to AI crawlers

Wikipedia's bandwidth usage surged 50%, mostly due to AI crawlers, not human users.
AI scraping creates challenges for Wikimedia, slowing access and increasing costs.
#cryptocurrency
fromHackernoon
1 year ago
Cryptocurrency

The TechBeat: Bybit's $1.5 Billion Hack Proves Crypto's Biggest Flaw Isn't the Blockchain (4/7/2025) | HackerNoon

fromHackernoon
1 year ago
Cryptocurrency

The TechBeat: Bybit's $1.5 Billion Hack Proves Crypto's Biggest Flaw Isn't the Blockchain (4/7/2025) | HackerNoon

EU data protection
fromHackernoon
3 months ago

A Guide on How to Legally Web Scrape EU Data | HackerNoon

The Markup emphasizes the importance of web scraping for data journalism while navigating legal risks, especially in the EU.
fromHackernoon
3 months ago

The TechBeat: Your Next Tech Job? Vibe Coding (4/3/2025) | HackerNoon

Web scraping in 2025 faces AI defenses, legal hurdles, and new rules. Learn smart, compliant strategies to keep your data flows running smooth.
Cryptocurrency
Artificial intelligence
fromTheregister
3 months ago

Wikimedia Foundation bemoans AI bot bandwidth burden

Web-scraping bots are straining Wikimedia's resources, increasing bandwidth usage by 50% since January 2024, heavily impacting project sustainability.
Marketing tech
fromForbes
4 months ago

New Data Shows Just How Badly OpenAI And Perplexity Are Screwing Over Publishers

AI-powered search engines are sending significantly less referral traffic to news sites compared to traditional search engines.
Bootstrapping
fromHackernoon
2 years ago

How to Build a No-Limits Stock Market Scraper with Python | HackerNoon

Building a custom web scraping solution allows for unrestricted access to financial data without the limitations of traditional APIs.
fromMedium
5 months ago

Scala Web Scraping: Step-by-Step Tutorial 2025

Scala's concise syntax makes code easy to read and maintain, providing a unique balance between performance and flexibility for web scraping.
Scala
Miscellaneous
fromHackernoon
2 years ago

The HackerNoon Newsletter: Managing Stress May Be A Lot Simpler Than You Think (12/17/2024) | HackerNoon

Effective stress management is crucial in tech.
Bluesky API enhances content curation and management.
BadGPT-4o showcases a shift in AI experimentation.
Web scraping and AI facilitate efficient data extraction.
JavaScript
fromHackernoon
2 years ago

Bypassing JavaScript Challenges for Effective Web Scraping | HackerNoon

JavaScript challenges block web scraping by requiring execution of scripts that verify human presence.
[ Load more ]