Implementation of faster R-CNN applied to the datasets COCO and PASCAL VOC

Pinto, Pedro de Carvalho Cayres

Please use this identifier to cite or link to this item: http://hdl.handle.net/11422/21714

Type:	Dissertação
Title:	Implementation of faster R-CNN applied to the datasets COCO and PASCAL VOC
Author(s)/Inventor(s):	Pinto, Pedro de Carvalho Cayres
Advisor:	Rodríguez Carneiro Gomes, José Gabriel
Abstract:	Esta dissertação apresenta implementações de dois detectores de objetos baseados em redes neurais convolucionais: Faster R-CNN e Faster R-CNN com FPN. E feita uma breve introdução ao aprendizado de máquinas, em seguida há uma explicação sobre a tarefa de classificação de imagens, onde são apresentadas as arquiteturas VGG-16 e ResNet-101, assim como uma explicação detalhada sobre a tarefa de detecção de objetos e sobre os métodos que levaram ao desenvolvimento do Faster R-CNN. Após isso, há uma discussão sobre as implementações como um todo, apresentando todos os parâmetros utilizados, assim como a infraestrutura utilizada para construir as redes e as diferenças com relação às implementações originais. Então, são realizados três experimentos, utilizando as bases de dados COCO e PASCAL VOC para treino e teste, e os resultados são comparados com os dos trabalhos originais com a métrica da média das precisões médias (mAP), e estes resultados são analisados. Também são feitas algumas considerações sobre o tempo de inferência dos métodos. Finalmente, alguns exemplos de detecção da melhor rede são apresentados. No experimento feito na base COCO, o detector FPN obteve um mAP@[.5, .95] de 38.1% e mAP@0.5 de 61.1% no conjunto COCO test-dev (um modelo mais recente, RetinaNet com ResNeXt-101-FPN, obtém 40.8% de mAP@[.5, .95] e 61.1% de mAP@0.5 no conjunto COCO test-dev). O código está disponível em: https://gitlab.com/pedrocayres/faster_rcnn_pytorch.
Abstract:	This dissertation presents implementations of two object detection systems, Faster R-CNN and Faster R-CNN with FPN, based on convolutional neural net- works. There is a brief introduction to machine learning, followed by an explanation of the image classification task, where the VGG-16 and ResNet-101 architectures are presented, as well as detailed explanations of the object detection task and the meth- ods that led to the development of Faster R-CNN. Next, the implementation of the algorithms is discussed thoroughly, specifying the parameters and the framework used to build the networks, and mentioning differences with the original. Then, three experiments are performed, using the COCO and PASCAL VOC datasets for training and testing, and the results, on the mean average precision (mAP) metric, are compared with the original counterparts of the methods. The obtained results are discussed and some considerations are made about the inference time of the im- plementations. Finally, detection examples of the most accurate implementation are presented. The FPN detector achieved 38.1% mAP@[.5, .95] and 61.1% mAP@0.5 on the COCO test-dev set (a more recent model, RetinaNet with ResNeXt-101-FPN, achieves 40.8% mAP@[.5, .95] and 61.1% mAP@0.5 on the COCO test-dev set). The code is available at: https://gitlab.com/pedrocayres/faster_rcnn_pytorch.
Keywords:	Detecção de objetos Rede neural convolucional Visão computacional Aprendizado profundo
Subject CNPq:	CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
Program:	Programa de Pós-Graduação em Engenharia Elétrica
Production unit:	Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia
Publisher:	Universidade Federal do Rio de Janeiro
Issue Date:	2-Mar-2019
Publisher country:	Brasil
Language:	eng
Right access:	Acesso Aberto
Appears in Collections:	Engenharia Elétrica

Files in This Item:

File	Description	Size	Format
924815.pdf		7.76 MB	Adobe PDF	View/Open

Show full item record Recommend this item View Statistics

Pantheon Institutional repository