当前位置：首页 > PHP

php 实现 ocr

2026-01-30 04:49:16PHP

PHP 实现 OCR 的方法

PHP 可以通过集成第三方 OCR 库或 API 实现文本识别功能。以下是几种常见的方法：

使用 Tesseract OCR

Tesseract 是一个开源的 OCR 引擎，支持多种语言。可以通过 PHP 调用 Tesseract 命令行工具或使用包装库。

安装 Tesseract：

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

通过 PHP 调用 Tesseract：

$imagePath = 'path/to/image.png';
$outputPath = 'path/to/output.txt';

// 执行 Tesseract 命令
exec("tesseract $imagePath $outputPath");

// 读取识别结果
$text = file_get_contents($outputPath . '.txt');
echo $text;

使用 thiagoalessio/tesseract_ocr 包装库：

require 'vendor/autoload.php';
use thiagoalessio\TesseractOCR\TesseractOCR;

echo (new TesseractOCR('path/to/image.png'))
    ->lang('eng')
    ->run();

调用云 OCR API

许多云服务提供 OCR 功能，如 Google Cloud Vision、Azure Computer Vision 或百度 OCR。

Google Cloud Vision 示例：

require 'vendor/autoload.php';
use Google\Cloud\Vision\VisionClient;

$vision = new VisionClient([
    'keyFilePath' => 'path/to/service-account.json'
]);

$image = $vision->image(file_get_contents('path/to/image.png'), ['TEXT_DETECTION']);
$result = $vision->annotate($image);

foreach ($result->text() as $text) {
    echo $text->description() . PHP_EOL;
}

百度 OCR 示例：

$image = file_get_contents('path/to/image.png');
$base64 = base64_encode($image);

$url = "https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=YOUR_ACCESS_TOKEN";
$data = ['image' => $base64];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($data));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);

$result = json_decode($response, true);
print_r($result['words_result']);

使用本地 PHP OCR 库

php-ocr 是一个纯 PHP 实现的 OCR 库，适合简单需求。

安装：

composer require smalot/pdfparser

示例代码：

require 'vendor/autoload.php';
use Smalot\PdfParser\Parser;

$parser = new Parser();
$pdf = $parser->parseFile('path/to/document.pdf');
$text = $pdf->getText();
echo $text;