A simple tutorial about Caffe-TensorFlow model conversion
Introduction
Since Caffe is really a good deep learning framework, there are many pre-trained models of Caffe. It is useful to know how to convert Caffe models into TensorFlow models. The whole process of this model conversion is so tricky that I decided to write it down, wishing it would help others.
Note:
- The source code and other related files of this tutorial can be found at: https://github.com/imWildCat/a-simple-tutorial-about-caffe-tensorflow-model-conversion
- The original pre-trained Caffe model in this tutorial is located at: https://github.com/choosehappy/public/tree/master/DL%20tutorial%20Code/4-lymphocyte/models
Pre-requisites
- Operating System: macOS or Linux
- Install
Protobuflibrary - CMake 2.8 or newer
- Python 2.7 (required by
caffe-tensorflow, better if you could usevirtualenv) - TensorFlow 1.x installed (tested with TensorFlow 1.7)
Note: this tutorial does not require to install Caffe except that you would like to convert the mean files.
Major steps
Step 1: Upgrade Caffe .prototxt (optional)
Since many .prototxt files are outdated, they must be upgraded before this kind of model conversion. If you have Caffe installed, you could just use upgrade_net_proto_text (reference). However, it is not easy to install Caffe on macOS. caffe-net-upgrade could be a good tool to use on Mac.
You could follow the Build Instructions to build the upgrade_caffe_layers. In this tutorial, we define the path to this executable as [path_to]/upgrade_caffe_layers. Here is a example usage:
➜ [path_to]/upgrade_caffe_layers deploy_train32.prototxt
Loading prototxt file ...
INFO: Reading the prototxt file from : /Users/wildcat/Downloads/201804/temp/caffe-tensorflow-sample-case-lymphoma/caffe-models/deploy_train32.prototxt
INFO: prototxt read successful
INFO: Network loaded is : CIFAR10_quick
INFO: Upgrading V1LayerParameter => LayerParameter
STATUS: upgrade successful.
INFO: upgraded net is written into net.prototxt
Step 2: Convert the model and the mean file
Convert the model
Here we will use caffe-tensorflow for model conversion. A tricky thing is that the original repository of caffe-tensorflow is out of maintenance so that we are using a forked version: https://github.com/dhaase-de/caffe-tensorflow-python3 . (Although it is claimed to be able to work with Python 3, I can only use it with Python 2)
After clone the source code, you can use python ./convert.py to convert the model. For more details, please read: https://github.com/dhaase-de/caffe-tensorflow-python3#3—convert-your-model
➜ python ./convert.py /path/to/net.prototxt --caffemodel /path/to/5_caffenet_train_w32_iter_600000.caffemodel --data-output-path case_tf.npy --code-output-path case_tf.py
------------------------------------------------------------
WARNING: PyCaffe not found!
Falling back to a pure protocol buffer implementation.
* Conversions will be drastically slower.
* This backend is UNTESTED!
------------------------------------------------------------
Type Name Param Output
----------------------------------------------------------------------------------------------
Data data -- (10, 3, 32, 32)
Convolution conv1 (32, 3, 5, 5) (10, 32, 32, 32)
Pooling pool1 -- (10, 32, 16, 16)
ReLU relu1 -- (10, 32, 16, 16)
Convolution conv2 (32, 32, 5, 5) (10, 32, 16, 16)
Pooling pool2 -- (10, 32, 8, 8)
Convolution conv3 (64, 32, 5, 5) (10, 64, 8, 8)
Pooling pool3 -- (10, 64, 4, 4)
InnerProduct ip1 (64, 1024) (10, 64, 1, 1)
InnerProduct ip2 (2, 64) (10, 2, 1, 1)
Softmax prob -- (10, 2, 1, 1)
Converting data...
Saving data...
Saving source...
Done.
Note:
- Remember to replace
/path/towith your real path to the related files net.prototxtand5_caffenet_train_w32_iter_600000.caffemodelare the model files used in my case, feel free to change themcase_tf.npystores the weights (parameters) andcase_tf.pystores the neural network architecture.
(Optional) Convert the mean file (Caffe needed)
Since many Caffe models use mean files for normalization, we must also convert the mean file to .npy, loading it in TensorFlow. Otherwise, the prediction cannot be right.
# Ref: https://github.com/BVLC/caffe/issues/290#issuecomment-62846228
# Modified by WildCat
import caffe
import numpy as np
import sys
if len(sys.argv) != 3:
print("Usage: python convert_protomean.py proto.mean out.npy")
blob = caffe.proto.caffe_pb2.BlobProto()
data = open('./original-caffe-models/DB_train_w32_5.binaryproto', 'rb').read()
blob.ParseFromString(data)
arr = np.array(caffe.io.blobproto_to_array(blob))
out = arr[0]
np.save('mean.npy', out)
Again, please feel free to modify the path and name of the .binaryproto and mean.py files.
Step 3: Finish the conversion by making predictions
import numpy as np
import tensorflow as tf
from case_tf import CIFAR10_quick
def check_correct(prob, path):
neg_prob, pos_prob= prob
is_pos = path.find('_p_') != -1 # find '_p_' in the file name
if not is_pos and is_pos == (pos_prob > neg_prob):
print(prob, path, 'True negative')
return is_pos == (pos_prob > neg_prob)
# load the converted mean file
means = np.load('mean.npy')
mean_tensor = tf.transpose(tf.convert_to_tensor(means, dtype=tf.float32), [1, 2, 0])
def classify():
'''Classify the given images using GoogleNet.'''
model_data_path = './case_tf.npy'
image_file_name_pattern = './subs/*.png'
NUM_OF_IMAGES = 100
# according to the .prototxt
IMAGE_SIZE = 32
IMAGE_CHANNELS = 3
# Create a placeholder for the input image
input_node = tf.placeholder(tf.float32, shape=(None, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNELS))
# Construct the network
net = CIFAR10_quick({'data': input_node})
# Create an image producer (loads and processes images in parallel)
# image_producer = dataset.ImageProducer(image_paths=image_paths)
# custom: read images
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once(image_file_name_pattern))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_img = tf.image.decode_png(value)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
print('Load weights...')
net.load(data_path=model_data_path, session=sess)
image_list = []
image_path_list = []
print('Making predictions...')
for _ in range(0, NUM_OF_IMAGES):
single_image = sess.run(my_img)
# Note (3 April) convert image channel sequence from RGB to BGR
reversed_image = tf.reverse(single_image, [-1])
reversed_image = tf.cast(reversed_image, tf.float32)
final_image = tf.subtract(reversed_image, mean_tensor)
image_list.append(final_image)
image_path_list.append(sess.run(key))
input_images = sess.run(tf.stack(image_list))
probs = sess.run(net.get_output(), feed_dict={input_node: input_images})
acc_list = []
predictions = zip(probs, image_path_list)
for prob, path in predictions:
acc_list.append(check_correct(prob, path))
print('accuracy: {}'.format(acc_list.count(True) / float(len(acc_list))))
for prob, path in predictions[:20]:
print('Image: {}, prob: {}'.format(path, prob))
coord.request_stop()
coord.join(threads, stop_grace_period_secs=2)
if __name__ == '__main__':
classify()
Note:
-
Part of the converted code (
case_tf.py) might not be correct, for example, change the layer name pattern from.conv(5, 5, 32, 1, 1, relu=False, name=conv1)to.conv(5, 5, 32, 1, 1, relu=False, name='conv1') -
We have to convert the image channel from
RGBtoBGRbecause the original caffe model was trained usingBGRconvention due to OpenCV:reversed_image = tf.reverse(single_image, [-1])
After this step, you could run this model successfully.
Conclusion
It is really a time-consuming task to convert a Caffe model to TensorFlow though this article is not so long. I wish that this article will help you to deal with this kind of problem.