当前位置：首页 > PHP

php 实现ocr

2026-01-29 21:47:33PHP

PHP 实现 OCR 的方法

PHP 可以通过调用第三方 OCR 服务或使用本地库来实现 OCR（光学字符识别）功能。以下是几种常见的方法：

调用第三方 OCR API

Google Cloud Vision API
使用 Google Cloud Vision API 可以轻松实现 OCR 功能。需要安装 Google Cloud SDK 并获取 API 密钥。

php 实现ocr

require 'vendor/autoload.php';
use Google\Cloud\Vision\VisionClient;

$vision = new VisionClient(['keyFile' => json_decode(file_get_contents('path/to/key.json'), true)]);
$image = $vision->image(file_get_contents('path/to/image.jpg'), ['TEXT_DETECTION']);
$result = $vision->annotate($image);

foreach ($result->text() as $text) {
    echo $text->description() . PHP_EOL;
}

Tesseract OCR
Tesseract 是一个开源的 OCR 引擎，可以通过 PHP 的 exec 函数调用命令行工具。

$imagePath = 'path/to/image.jpg';
$outputPath = 'path/to/output.txt';
exec("tesseract $imagePath $outputPath", $output, $returnCode);

if ($returnCode === 0) {
    echo file_get_contents($outputPath . '.txt');
} else {
    echo "OCR failed.";
}

Microsoft Azure Computer Vision API
类似于 Google Cloud Vision，Azure 也提供了 OCR 功能。

php 实现ocr

$endpoint = 'https://your-region.api.cognitive.microsoft.com/vision/v3.1/ocr';
$apiKey = 'your-api-key';
$imageData = file_get_contents('path/to/image.jpg');

$headers = [
    'Content-Type: application/octet-stream',
    'Ocp-Apim-Subscription-Key: ' . $apiKey
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $endpoint);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, $imageData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);

$result = json_decode($response, true);
print_r($result);

使用本地 PHP OCR 库

phpOCR
phpOCR 是一个简单的 PHP OCR 库，适合处理简单的图像文本识别。

include 'phpOCR.php';
$ocr = new phpOCR();
$text = $ocr->recognize('path/to/image.jpg');
echo $text;

smalot/pdfparser + Tesseract
结合 PDF 解析和 OCR 功能，适用于 PDF 文件的文本提取。

require 'vendor/autoload.php';
use Smalot\PdfParser\Parser;

$parser = new Parser();
$pdf = $parser->parseFile('path/to/document.pdf');
$text = $pdf->getText();

// 如果 PDF 是扫描件，需要先通过 OCR 处理
exec("tesseract scanned_page.jpg output");
$ocrText = file_get_contents('output.txt');