Распознавание изображений через бота в Telegram. Проект на Go с использованием TensorFlow

В этой статье мы рассмотрим проект по распознаванию изображений с помощью Go. Мы также создадим Telegram-бота, с помощью которого сможем отправлять изображения для распознавания.

Первое, что нам нужно, — это уже обученная модель. Да, мы не будем обучать и создавать собственную модель, а возьмём уже готовый docker-образ ctava/tfcgo.

Для запуска нашего проекта нам понадобится одновременно 4 терминала:

В первом мы запустим сервер распознавания изображений.
Во втором мы запустим бота.
В третьем мы создадим туннель до нашего локального хоста из публичного адреса.
В четвёртом мы выполним команду на регистрацию нашего бота.

Запуск сервера распознавания изображений

Чтобы запустить сервер распознавания, создайте файл Dockerfile:

FROM ctava/tfcgo    RUN mkdir -p /model &&     curl -o /model/inception5h.zip -s "http://download.tensorflow.org/models/inception5h.zip" &&     unzip /model/inception5h.zip -d /model    WORKDIR /go/src/imgrecognize  COPY src/ .  RUN go build  ENTRYPOINT [ "/go/src/imgrecognize/imgrecognize" ]  EXPOSE 8080

Так мы запустим сервер распознавания. Внутри будет наш сервер: src/imgrecognize. Кроме того, мы распакуем модель в каталоге: /model.

Приступим к созданию сервера. Первое, что нам нужно — это установить значение константы:

os.Setenv("TF_CPP_MIN_LOG_LEVEL", "2").

Это необходимо, чтобы не получить ошибку:

I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA unable to make a tensor from image: Expected image (JPEG, PNG, or GIF), got empty file

Мы не будем оптимизировать наш сервер, а просто запустим его через ListenAndServe на порту 8080. Перед запуском сервера нам понадобится граф как основа для TensorFlow. Грубо говоря, граф можно рассматривать как контейнер для операций и переменных. Его мы сможем загрузить из файла в формате protobuf: /model/tensorflow_inception_graph. pb. Наполним его позже через сессии.

func loadModel() (*tensorflow.Graph, []string, error) {  	// Load inception model  	model, err := ioutil.ReadFile(graphFile)  	if err != nil {  		return nil, nil, err  	}  	graph := tensorflow.NewGraph()  	if err := graph.Import(model, ""); err != nil {  		return nil, nil, err  	}    	// Load labels  	labelsFile, err := os.Open(labelsFile)  	if err != nil {  		return nil, nil, err  	}  	defer labelsFile.Close()  	scanner := bufio.NewScanner(labelsFile)  	var labels []string  	for scanner.Scan() {  		labels = append(labels, scanner.Text())  	}    	return graph, labels, scanner.Err()  }

В modelGraph мы сохраняем структуру нашей модели и ключевые инструменты для работы с ней. Labels содержат словарь для работы с нашей моделью. Важной частью по работе с моделью является нормализация. Мы нормализуем изображение внутри обработчика HTTP-запросов. В реальном проекте обязательно нужно выделить модуль по работе с распознаванием и нормализацией от HTTP-хендлера. Но в учебных целях мы оставим их вместе.

Чтобы нормализовать входные данные, мы преобразуем наше изображение из значения Go в тензор:

tensor, err := tensorflow.NewTensor(buf.String()).

После этого мы получаем три переменные:

graph, input, output, err := getNormalizedGraph().

Graph нам нужен, чтобы декодировать, изменять размер и нормализовать изображение. Input вместе с тензором будет входной точкой для связи между нашим приложением и TensorFlow. Output будет использоваться в качестве канала получения данных.

Через graph мы также откроем сессию, чтобы начать нормализацию.

session, err := tensorflow.NewSession(graph, nil)

Код нормализации:

func normalizeImage(imgBody io.ReadCloser) (*tensorflow.Tensor, error) {  	var buf bytes.Buffer  	_, err := io.Copy(&buf, imgBody)  	if err != nil {  		return nil, err  	}    	tensor, err := tensorflow.NewTensor(buf.String())  	if err != nil {  		return nil, err  	}    	graph, input, output, err := getNormalizedGraph()  	if err != nil {  		return nil, err  	}    	session, err := tensorflow.NewSession(graph, nil)  	if err != nil {  		return nil, err  	}    	normalized, err := session.Run(  		map[tensorflow.Output]*tensorflow.Tensor{  			input: tensor,  		},  		[]tensorflow.Output{  			output,  		},  		nil)  	if err != nil {  		return nil, err  	}    	return normalized[0], nil  }

После нормализации изображения мы создаём сессию для работы с нашим графом:

session, err := tensorflow.NewSession(modelGraph, nil)

С помощью этой сессии мы начнём само распознавание. На вход подадим наше нормализованное изображение:

modelGraph.Operation("input").Output(0): normalizedImg,

Результат вычисления (распознавания) будет сохранён в переменной outputRecognize. Из полученных данных мы получаем последние 3 результата (ResultCount = 3):

res := getTopFiveLabels(labels, outputRecognize[0].Value().([][]float32)[0])  func getTopFiveLabels(labels []string, probabilities []float32) []Label {  	var resultLabels []Label  	for i, p := range probabilities {  		if i >= len(labels) {  			break  		}  		resultLabels = append(resultLabels, Label{Label: labels[i], Probability: p})  	}  	sort.Sort(Labels(resultLabels))    	return resultLabels[:ResultCount]  }

А для HTTP-ответа мы дадим только один наиболее вероятный результат:

msg := fmt.Sprintf("This is: %s (%.2f%%)", res[0].Label, res[0].Probability*100)  _, err = w.Write([]byte(msg))

Весь код нашего сервера для распознавания:

package main    import (  	"bufio"  	"bytes"  	"fmt"  	"io"  	"io/ioutil"  	"log"  	"net/http"  	"os"  	"sort"    	tensorflow "github.com/tensorflow/tensorflow/tensorflow  	/go"  	"github.com/tensorflow/tensorflow/tensorflow/go/op"  )    const (  	ResultCount = 3  )    var (  	graphFile  = "/model/tensorflow_inception_graph.pb"  	labelsFile = "/model/imagenet_comp_graph_label_strings  	.txt"  )    type Label struct {  	Label       string  	Probability float32  }    type Labels []Label    func (l Labels) Len() int {  	return len(l)  }  func (l Labels) Swap(i, j int) {  	l[i], l[j] = l[j], l[i]  }  func (l Labels) Less(i, j int) bool {  	return l[i].Probability > l[j].Probability  }    var (  	modelGraph *tensorflow.Graph  	labels     []string  )    func main() {  	// I tensorflow/core/platform/cpu_feature_guard.cc:140]  	 Your CPU supports instructions that this TensorFlow   	 binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2  	 FMA  	// unable to make a tensor from image: Expected image   	(JPEG, PNG, or GIF), got empty file  	err := os.Setenv("TF_CPP_MIN_LOG_LEVEL", "2")  	if err != nil {  		log.Fatalln(err)  	}    	modelGraph, labels, err = loadModel()  	if err != nil {  		log.Fatalf("unable to load model: %v", err)  	}    	log.Println("Run RECOGNITION server ....")  	http.HandleFunc("/", mainHandler)  	err = http.ListenAndServe(":8080", nil)  	if err != nil {  		log.Fatalln(err)  	}  }    func mainHandler(w http.ResponseWriter, r *http.Request) {  	normalizedImg, err := normalizeImage(r.Body)  	if err != nil {  		log.Fatalf("unable to make a normalizedImg from   		           image: %v", err)  	}    	// Create a session for inference over modelGraph  	session, err := tensorflow.NewSession(modelGraph, nil)  	if err != nil {  		log.Fatalf("could not init session: %v", err)  	}    	outputRecognize, err := session.Run(  		map[tensorflow.Output]*tensorflow.Tensor{  			modelGraph.Operation("input").Output(0):  			normalizedImg,  		},  		[]tensorflow.Output{  			modelGraph.Operation("output").Output(0),  		},  		nil,  	)  	if err != nil {  		log.Fatalf("could not run inference: %v", err)  	}    	res := getTopFiveLabels(labels, outputRecognize[0].Value().([][]float32)[0])  	log.Println("--- recognition result:")  	for _, l := range res {  		fmt.Printf("label: %s, probability: %.2f%% ", l.Label, l.Probability*100)  	}  	log.Println("---")    	msg := fmt.Sprintf("This is: %s (%.2f%%)", res[0].Label, res[0].Probability*100)  	_, err = w.Write([]byte(msg))  	if err != nil {  		log.Fatalf("could not write server response: %v", err)  	}  }    func loadModel() (*tensorflow.Graph, []string, error) {  	// Load inception model  	model, err := ioutil.ReadFile(graphFile)  	if err != nil {  		return nil, nil, err  	}  	graph := tensorflow.NewGraph()  	if err := graph.Import(model, ""); err != nil {  		return nil, nil, err  	}    	// Load labels  	labelsFile, err := os.Open(labelsFile)  	if err != nil {  		return nil, nil, err  	}  	defer labelsFile.Close()  	scanner := bufio.NewScanner(labelsFile)  	var labels []string  	for scanner.Scan() {  		labels = append(labels, scanner.Text())  	}    	return graph, labels, scanner.Err()  }    func getTopFiveLabels(labels []string, probabilities []float32) []Label {  	var resultLabels []Label  	for i, p := range probabilities {  		if i >= len(labels) {  			break  		}  		resultLabels = append(resultLabels, Label{Label: labels[i], Probability: p})  	}  	sort.Sort(Labels(resultLabels))    	return resultLabels[:ResultCount]  }    func normalizeImage(imgBody io.ReadCloser) (*tensorflow.Tensor, error) {  	var buf bytes.Buffer  	_, err := io.Copy(&buf, imgBody)  	if err != nil {  		return nil, err  	}    	tensor, err := tensorflow.NewTensor(buf.String())  	if err != nil {  		return nil, err  	}    	graph, input, output, err := getNormalizedGraph()  	if err != nil {  		return nil, err  	}    	session, err := tensorflow.NewSession(graph, nil)  	if err != nil {  		return nil, err  	}    	normalized, err := session.Run(  		map[tensorflow.Output]*tensorflow.Tensor{  			input: tensor,  		},  		[]tensorflow.Output{  			output,  		},  		nil)  	if err != nil {  		return nil, err  	}    	return normalized[0], nil  }    // Creates a graph to decode, rezise and normalize an image  func getNormalizedGraph() (graph *tensorflow.Graph, input, output tensorflow.Output, err error) {  	s := op.NewScope()  	input = op.Placeholder(s, tensorflow.String)  	decode := op.DecodeJpeg(s, input, op.DecodeJpegChannels(3)) // 3 RGB    	output = op.Sub(s,  		op.ResizeBilinear(s,  			op.ExpandDims(s,  				op.Cast(s, decode, tensorflow.Float),  				op.Const(s.SubScope("make_batch"), int32(0))),  			op.Const(s.SubScope("size"), []int32{224, 224})),  		op.Const(s.SubScope("mean"), float32(117)))  	graph, err = s.Finalize()    	return graph, input, output, err  }

Теперь нам нужно построить этот образ (build it). Конечно, мы можем создать образ и запустить его в консоли с помощью соответствующих команд. Но удобнее создавать эти команды в файле Makefile. Итак, давайте создадим этот файл:

recognition_build:  	docker build -t imgrecognition .    recognition_run:  	docker run -it -p 8080:8080 imgrecognition

После этого откройте терминал и выполните команду:

make recognition_build && make recognition_run

Теперь в первом терминале у нас есть локальный HTTP-сервер, который может принимать изображения. В ответ он отправляет текстовое сообщение, содержащее информацию о том, что было распознано на изображении.

Это, так сказать, ядро нашего проекта.

Создание бота Telegram

Теперь нам нужно построить бота. Для этого необходимо написать второй HTTP-сервер. Первый сервер распознает наши изображения и использует порт 8080. Второй станет сервером бота и будет использовать порт 3000.

Для начала нужно создать бота через вашу учетную запись в Telegram через BotFather. После этой регистрации вы получите имя бота и его токен. Никому не говорите об этом токене.

Поместим токен в константу BotToken. Вы должны получить:

const BotToken = "1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5"

Обработчик нашего бота расшифрует тело ответа JSON.

json.NewDecoder(r.Body).Decode(webhookBody)

Нас интересует фотография в отправленном сообщении webhookBody.Message.Photo. По уникальному идентификатору изображения photoSize.FileID соберём ссылку на само изображение: fmt.Sprintf(GetFileUrl, BotToken, photoSize.FileID)

И загрузим его:

downloadResponse, err = http.Get(downloadFileUrl).

Мы отправим изображение обработчику нашего первого сервера:

msg := recognitionClient.Recognize(downloadResponse)

В ответ мы получаем определённое сообщение — текстовую строку. После этого мы просто отправляем эту строку пользователю как есть в боте Telegram.

Весь код бота:

package main    import (  	"bytes"  	"encoding/json"  	"errors"  	"fmt"  	"io/ioutil"  	"log"  	"net/http"    	"github.com/romanitalian/recognition/src/bot/recognition"  )    // Register Bot: curl -F "url=https://9068b6869da7.ngrok.io "  https://api.telegram.org/bot1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5/setWebhook  const (  	BotToken = "1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5"    	GetFileUrl       = "https://api.telegram.org/bot%s/getFile?file_id=%s"  	DownloadFileUrl  = "https://api.telegram.org/file/bot%s/%s"  	SendMsgToUserUrl = "https://api.telegram.org/bot%s/sendMessage"  )    type webhookReqBody struct {  	Message Msg  }    type Msg struct {  	MessageId int    `json:"message_id"`  	Text      string `json:"text"`  	From      struct {  		ID        int64  `json:"id"`  		FirstName string `json:"first_name"`  		Username  string `json:"username"`  	} `json:"from"`  	Photo *[]PhotoSize `json:"photo"`  	Chat  struct {  		ID        int64  `json:"id"`  		FirstName string `json:"first_name"`  		Username  string `json:"username"`  	} `json:"chat"`  	Date  int `json:"date"`  	Voice struct {  		Duration int64  `json:"duration"`  		MimeType string `json:"mime_type"`  		FileId   string `json:"file_id"`  		FileSize int64  `json:"file_size"`  	} `json:"voice"`  }    type PhotoSize struct {  	FileID   string `json:"file_id"`  	Width    int    `json:"width"`  	Height   int64  `json:"height"`  	FileSize int64  `json:"file_size"`  }  type ImgFileInfo struct {  	Ok     bool `json:"ok"`  	Result struct {  		FileId       string `json:"file_id"`  		FileUniqueId string `json:"file_unique_id"`  		FileSize     int    `json:"file_size"`  		FilePath     string `json:"file_path"`  	} `json:"result"`  }    func main() {  	log.Println("Run BOT server ....")  	err := http.ListenAndServe(":3000", http.HandlerFunc(Handler))  	if err != nil {  		log.Fatalln(err)  	}  }    // This handler is called everytime telegram sends us a webhook event  func Handler(w http.ResponseWriter, r *http.Request) {  	// First, decode the JSON response body  	webhookBody := &webhookReqBody{}  	err := json.NewDecoder(r.Body).Decode(webhookBody)  	if err != nil {  		log.Println("could not decode request body", err)  		return  	}    	// ------------------------- Download last img    	var downloadResponse *http.Response    	if webhookBody.Message.Photo == nil {  		log.Println("no photo in webhook body. webhookBody: ", webhookBody)  		return  	}  	for _, photoSize := range *webhookBody.Message.Photo {  		// GET JSON ABOUT OUR IMG (ORDER TO GET FILE_PATH)  		imgFileInfoUrl := fmt.Sprintf(GetFileUrl, BotToken, photoSize.FileID)  		rr, err := http.Get(imgFileInfoUrl)  		if err != nil {  			log.Println("unable retrieve img by FileID", err)  			return  		}  		defer rr.Body.Close()  		// READ JSON  		fileInfoJson, err := ioutil.ReadAll(rr.Body)  		if err != nil {  			log.Println("unable read img by FileID", err)  			return  		}  		// UNMARSHAL JSON  		imgInfo := &ImgFileInfo{}  		err = json.Unmarshal(fileInfoJson, imgInfo)  		if err != nil {  			log.Println("unable unmarshal file description from api.telegram by url: "+imgFileInfoUrl, err)  		}  		// GET FILE_PATH    		downloadFileUrl := fmt.Sprintf(DownloadFileUrl, BotToken, imgInfo.Result.FilePath)  		downloadResponse, err = http.Get(downloadFileUrl)  		if err != nil {  			log.Println("unable download file by file_path: "+downloadFileUrl, err)  			return  		}  		defer downloadResponse.Body.Close()  	}    	// --------------------------- Send img to server recognition.  	recognitionClient := recognition.New()  	msg := recognitionClient.Recognize(downloadResponse)    	if err := sendResponseToUser(webhookBody.Message.Chat.ID, msg); err != nil {  		log.Println("error in sending reply: ", err)  		return  	}  }    // The below code deals with the process of sending a response message  // to the user    // Create a struct to conform to the JSON body  // of the send message request  // https://core.telegram.org/bots/api#sendmessage  type sendMessageReqBody struct {  	ChatID int64  `json:"chat_id"`  	Text   string `json:"text"`  }    // sendResponseToUser notify user - what found on image.  func sendResponseToUser(chatID int64, msg string) error {  	// Create the request body struct  	msgBody := &sendMessageReqBody{  		ChatID: chatID,  		Text:   msg,  	}    	// Create the JSON body from the struct  	msgBytes, err := json.Marshal(msgBody)  	if err != nil {  		return err  	}    	// Send a post request with your token  	res, err := http.Post(fmt.Sprintf(SendMsgToUserUrl, BotToken), "application/json", bytes.NewBuffer(msgBytes))  	if err != nil {  		return err  	}  	if res.StatusCode != http.StatusOK {  		buf := new(bytes.Buffer)  		_, err := buf.ReadFrom(res.Body)  		if err != nil {  			return err  		}  		return errors.New("unexpected status: " + res.Status)  	}    	return nil  }

Клиент, который отправляет изображение от бота на сервер распознавания:

package recognition    import (  	"io/ioutil"  	"log"  	"net/http"  )    const imgRecognitionAddress = "http://localhost:8080/"    type Client struct {  	httpClient *http.Client  }    func New() *Client {  	return &Client{  		httpClient: &http.Client{},  	}  }    func (c *Client) Recognize(downloadResponse *http.Response) string {  	var msg string  	method := "POST"    	req, err := http.NewRequest(method, imgRecognitionAddress, downloadResponse.Body)  	if err != nil {  		log.Println("error from server recognition", err)  		return msg  	}  	req.Header.Add("Content-Type", "image/png")    	// do request to server recognition.  	recognitionResponse, err := c.httpClient.Do(req)  	if err != nil {  		log.Println(err)  		return msg  	}  	defer func() {  		er := recognitionResponse.Body.Close()  		if er != nil {  			log.Println(er)  		}  	}()    	recognitionResponseBody, err := ioutil.ReadAll(recognitionResponse.Body)  	if err != nil {  		log.Println("error on read response from server recognition", err)  		return msg  	}  	msg = string(recognitionResponseBody)    	return msg  }

Теперь нам нужно получить публичный HTTPS-адрес для нашего бота, который сейчас работает на localhost. В этом нам поможет ngrok:

ngrok http 3000

Сразу после выполнения этой команды вы увидите список общедоступных адресов. Последним будет адрес с HTTPS. Например, это может быть https://9068b6869da7.ngrok.io.

Теперь зарегистрируем нашего бота — пробросим наш адрес в Telegram API, куда отправлять веб-хуки:

curl -F "url=https://9068b6869da7.ngrok.io" https://api.telegram.org/bot1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5/setWebhook

Поздравляю, теперь мы можем отправить файл с фотографией своему боту и в ответ получим информацию о том, что на нём изображено.

Распознавание изображений через бота в Telegram. Проект на Go с использованием TensorFlow

Комментарии: