A simple tutorial about Caffe-TensorFlow model conversion
Introduction
Since Caffe is really a good deep learning framework, there are many pre-trained models of Caffe. It is useful to know how to convert Caffe models into TensorFlow models. The whole process of this model conversion is so tricky that I decided to write it down, wishing it would help others.
Note:
- The source code and other related files of this tutorial can be found at: https://github.com/imWildCat/a-simple-tutorial-about-caffe-tensorflow-model-conversion
- The original pre-trained Caffe model in this tutorial is located at: https://github.com/choosehappy/public/tree/master/DL%20tutorial%20Code/4-lymphocyte/models
Pre-requisites
- Operating System: macOS or Linux
- Install
Protobuf
library - CMake 2.8 or newer
- Python 2.7 (required by
caffe-tensorflow
, better if you could usevirtualenv
) - TensorFlow 1.x installed (tested with TensorFlow 1.7)
Note: this tutorial does not require to install Caffe except that you would like to convert the mean files.
Major steps
Step 1: Upgrade Caffe .prototxt (optional)
Since many .prototxt
files are outdated, they must be upgraded before this kind of model conversion. If you have Caffe installed, you could just use upgrade_net_proto_text
(reference). However, it is not easy to install Caffe on macOS. caffe-net-upgrade could be a good tool to use on Mac.
You could follow the Build Instructions to build the upgrade_caffe_layers
. In this tutorial, we define the path to this executable as [path_to]/upgrade_caffe_layers
. Here is a example usage:
➜ [path_to]/upgrade_caffe_layers deploy_train32.prototxt
Loading prototxt file ...
INFO: Reading the prototxt file from : /Users/wildcat/Downloads/201804/temp/caffe-tensorflow-sample-case-lymphoma/caffe-models/deploy_train32.prototxt
INFO: prototxt read successful
INFO: Network loaded is : CIFAR10_quick
INFO: Upgrading V1LayerParameter => LayerParameter
STATUS: upgrade successful.
INFO: upgraded net is written into net.prototxt
Step 2: Convert the model and the mean file
Convert the model
Here we will use caffe-tensorflow for model conversion. A tricky thing is that the original repository of caffe-tensorflow is out of maintenance so that we are using a forked version: https://github.com/dhaase-de/caffe-tensorflow-python3 . (Although it is claimed to be able to work with Python 3, I can only use it with Python 2)
After clone the source code, you can use python ./convert.py
to convert the model. For more details, please read: https://github.com/dhaase-de/caffe-tensorflow-python3#3—convert-your-model
➜ python ./convert.py /path/to/net.prototxt --caffemodel /path/to/5_caffenet_train_w32_iter_600000.caffemodel --data-output-path case_tf.npy --code-output-path case_tf.py
------------------------------------------------------------
WARNING: PyCaffe not found!
Falling back to a pure protocol buffer implementation.
* Conversions will be drastically slower.
* This backend is UNTESTED!
------------------------------------------------------------
Type Name Param Output
----------------------------------------------------------------------------------------------
Data data -- (10, 3, 32, 32)
Convolution conv1 (32, 3, 5, 5) (10, 32, 32, 32)
Pooling pool1 -- (10, 32, 16, 16)
ReLU relu1 -- (10, 32, 16, 16)
Convolution conv2 (32, 32, 5, 5) (10, 32, 16, 16)
Pooling pool2 -- (10, 32, 8, 8)
Convolution conv3 (64, 32, 5, 5) (10, 64, 8, 8)
Pooling pool3 -- (10, 64, 4, 4)
InnerProduct ip1 (64, 1024) (10, 64, 1, 1)
InnerProduct ip2 (2, 64) (10, 2, 1, 1)
Softmax prob -- (10, 2, 1, 1)
Converting data...
Saving data...
Saving source...
Done.
Note:
- Remember to replace
/path/to
with your real path to the related files net.prototxt
and5_caffenet_train_w32_iter_600000.caffemodel
are the model files used in my case, feel free to change themcase_tf.npy
stores the weights (parameters) andcase_tf.py
stores the neural network architecture.
(Optional) Convert the mean file (Caffe needed)
Since many Caffe models use mean files for normalization, we must also convert the mean file to .npy
, loading it in TensorFlow. Otherwise, the prediction cannot be right.
# Ref: https://github.com/BVLC/caffe/issues/290#issuecomment-62846228
# Modified by WildCat
import caffe
import numpy as np
import sys
if len(sys.argv) != 3:
print("Usage: python convert_protomean.py proto.mean out.npy")
blob = caffe.proto.caffe_pb2.BlobProto()
data = open('./original-caffe-models/DB_train_w32_5.binaryproto', 'rb').read()
blob.ParseFromString(data)
arr = np.array(caffe.io.blobproto_to_array(blob))
out = arr[0]
np.save('mean.npy', out)
Again, please feel free to modify the path and name of the .binaryproto
and mean.py
files.
Step 3: Finish the conversion by making predictions
import numpy as np
import tensorflow as tf
from case_tf import CIFAR10_quick
def check_correct(prob, path):
neg_prob, pos_prob= prob
is_pos = path.find('_p_') != -1 # find '_p_' in the file name
if not is_pos and is_pos == (pos_prob > neg_prob):
print(prob, path, 'True negative')
return is_pos == (pos_prob > neg_prob)
# load the converted mean file
means = np.load('mean.npy')
mean_tensor = tf.transpose(tf.convert_to_tensor(means, dtype=tf.float32), [1, 2, 0])
def classify():
'''Classify the given images using GoogleNet.'''
model_data_path = './case_tf.npy'
image_file_name_pattern = './subs/*.png'
NUM_OF_IMAGES = 100
# according to the .prototxt
IMAGE_SIZE = 32
IMAGE_CHANNELS = 3
# Create a placeholder for the input image
input_node = tf.placeholder(tf.float32, shape=(None, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNELS))
# Construct the network
net = CIFAR10_quick({'data': input_node})
# Create an image producer (loads and processes images in parallel)
# image_producer = dataset.ImageProducer(image_paths=image_paths)
# custom: read images
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once(image_file_name_pattern))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_img = tf.image.decode_png(value)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
print('Load weights...')
net.load(data_path=model_data_path, session=sess)
image_list = []
image_path_list = []
print('Making predictions...')
for _ in range(0, NUM_OF_IMAGES):
single_image = sess.run(my_img)
# Note (3 April) convert image channel sequence from RGB to BGR
reversed_image = tf.reverse(single_image, [-1])
reversed_image = tf.cast(reversed_image, tf.float32)
final_image = tf.subtract(reversed_image, mean_tensor)
image_list.append(final_image)
image_path_list.append(sess.run(key))
input_images = sess.run(tf.stack(image_list))
probs = sess.run(net.get_output(), feed_dict={input_node: input_images})
acc_list = []
predictions = zip(probs, image_path_list)
for prob, path in predictions:
acc_list.append(check_correct(prob, path))
print('accuracy: {}'.format(acc_list.count(True) / float(len(acc_list))))
for prob, path in predictions[:20]:
print('Image: {}, prob: {}'.format(path, prob))
coord.request_stop()
coord.join(threads, stop_grace_period_secs=2)
if __name__ == '__main__':
classify()
Note:
-
Part of the converted code (
case_tf.py
) might not be correct, for example, change the layer name pattern from.conv(5, 5, 32, 1, 1, relu=False, name=conv1)
to.conv(5, 5, 32, 1, 1, relu=False, name='conv1')
-
We have to convert the image channel from
RGB
toBGR
because the original caffe model was trained usingBGR
convention due to OpenCV:reversed_image = tf.reverse(single_image, [-1])
After this step, you could run this model successfully.
Conclusion
It is really a time-consuming task to convert a Caffe model to TensorFlow though this article is not so long. I wish that this article will help you to deal with this kind of problem.