当前位置：首页 > PHP

php 实现 ocr

2026-02-15 02:41:58PHP

使用 Tesseract OCR 实现 PHP 文字识别

Tesseract 是一个开源的 OCR 引擎，支持多种语言。PHP 可以通过调用 Tesseract 命令行工具或使用包装库实现 OCR 功能。

安装 Tesseract OCR
在 Ubuntu/Debian 系统上安装 Tesseract 和中文语言包：

sudo apt install tesseract-ocr
sudo apt install tesseract-ocr-chi-sim

通过 exec 调用 Tesseract
PHP 可以通过 exec() 函数调用 Tesseract 命令行工具：

$imagePath = 'path/to/image.png';
$outputPath = 'path/to/output';
$language = 'chi_sim'; // 中文简体

exec("tesseract $imagePath $outputPath -l $language", $output, $returnCode);

if ($returnCode === 0) {
    $text = file_get_contents($outputPath . '.txt');
    echo $text;
} else {
    echo 'OCR 处理失败';
}

使用 thiagoalessio/tesseract-ocr-for-php 库

这是一个流行的 PHP Tesseract 包装库，提供了更友好的接口。

安装库：

composer require thiagoalessio/tesseract-ocr-for-php

使用示例：

require 'vendor/autoload.php';

use thiagoalessio\TesseractOCR\TesseractOCR;

$text = (new TesseractOCR('path/to/image.png'))
    ->lang('chi_sim') // 设置中文语言
    ->run();

echo $text;

使用百度 OCR API 实现

百度 AI 开放平台提供了强大的 OCR 接口，适合需要更高准确率的场景。

获取 API 密钥后使用：

$image = file_get_contents('path/to/image.jpg');
$base64 = base64_encode($image);
$url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token=YOUR_ACCESS_TOKEN';

$data = [
    'image' => $base64,
    'language_type' => 'CHN_ENG'
];

$options = [
    'http' => [
        'header' => "Content-type: application/x-www-form-urlencoded\r\n",
        'method' => 'POST',
        'content' => http_build_query($data)
    ]
];

$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);
$result = json_decode($result, true);

print_r($result['words_result']);

使用 Google Cloud Vision API

Google 的 OCR 服务支持多种语言和复杂布局。

安装客户端库：

composer require google/cloud-vision

使用示例：

require 'vendor/autoload.php';

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

$client = new ImageAnnotatorClient([
    'credentials' => 'path/to/service-account.json'
]);

$image = file_get_contents('path/to/image.jpg');
$response = $client->textDetection($image);
$texts = $response->getTextAnnotations();

foreach ($texts as $text) {
    echo $text->getDescription() . "\n";
}

$client->close();