java如何爬图片
使用Jsoup库爬取图片
Jsoup是一个用于解析HTML的Java库,可以方便地从网页中提取图片链接。以下是一个简单的示例代码:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class ImageCrawler {
public static void main(String[] args) {
String url = "https://example.com";
try {
Document doc = Jsoup.connect(url).get();
Elements images = doc.select("img[src]");
for (Element image : images) {
String imageUrl = image.absUrl("src");
System.out.println("Found image: " + imageUrl);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
使用HttpClient下载图片
获取图片链接后,可以使用Apache HttpClient下载图片:
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import java.io.FileOutputStream;
import java.io.InputStream;
public class ImageDownloader {
public static void downloadImage(String imageUrl, String savePath) {
try (CloseableHttpClient httpClient = HttpClients.createDefault()) {
HttpGet request = new HttpGet(imageUrl);
try (CloseableHttpResponse response = httpClient.execute(request)) {
HttpEntity entity = response.getEntity();
if (entity != null) {
try (InputStream in = entity.getContent();
FileOutputStream out = new FileOutputStream(savePath)) {
byte[] buffer = new byte[1024];
int len;
while ((len = in.read(buffer)) != -1) {
out.write(buffer, 0, len);
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
处理动态加载的图片
对于动态加载的图片(如通过JavaScript生成),可以使用Selenium WebDriver模拟浏览器行为:

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import java.util.List;
public class DynamicImageCrawler {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
WebDriver driver = new ChromeDriver();
driver.get("https://example.com");
List<WebElement> images = driver.findElements(By.tagName("img"));
for (WebElement image : images) {
String imageUrl = image.getAttribute("src");
System.out.println("Found image: " + imageUrl);
}
driver.quit();
}
}
注意事项
- 遵守目标网站的robots.txt规则
- 设置合理的请求间隔避免被封禁
- 处理可能出现的各种异常情况
- 考虑使用代理IP应对反爬机制
- 注意图片版权问题
完整流程示例
- 使用Jsoup解析网页获取图片链接
- 过滤无效或重复的链接
- 使用多线程提高下载效率
- 保存图片到本地指定目录
- 记录下载日志便于后续处理





