LPIRC CVPR 2018 Track 1 Description

Latest Update: 06/06/2018

 

Description

Track one of LPIRC CVPR 2018 is a public competition for fast and accurate on-device image classifiers. It is driven by the need for accurate image classification models that run real-time on mobile devices. Although model mobilization has received surging interests from academic and industrial communities, the research efforts has been rather scattered due to a lack of uniformity and resources. More often than not, researchers categorize model complexity using FLOPs, MACs, or unoptimized on-device latency with uncontrolled settings. Not only do the metrics fail to capture the actual runtime, the heterogeneity makes an apples-to-apples comparison difficult. In addition, designing efficient on-device models requires optimized implementations tailored to the underlying hardware as well as special pipelines for measuring model latency, which are outside of the comfort zone of many computer vision researchers.

 

In Track one, Google and LPIRC propose to alleviate the above-mentioned difficulties by offering state-of-the-art mobile technology at Google towards computer vision researchers. Such technology include the TensorflowLite compiler for optimizing a tensorflow model for inference, and Google’s benchmarker for reliable measurements of latency and power in realistic mobile use cases. We believe that our platform will encourage researchers without expertise in hardware to contribute to the field of mobile visual intelligence model design.

Submission

The participants should submit an image classifier in TensorflowLite format that outputs 1001 ImageNet classes. Training data are available at the ILSVRC 2012 website.

 

The models must expect input tensors with dimensions [batch size x input height x input width x channel count], where the batch size must be 1 and the channel count must be 3.

The participants can convert their Tensorflow model into a TensorflowLite model using the following command:

third_party/tensorflow/contrib/lite/toco/toco \  

--input_file="${local_frozen}"  --output_file="${toco_file}" --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE \ --inference_type="${inference_type}" \

--inference_input_type=QUANTIZED_UINT8 \ --input_shape="1,${input_height},${input_width},3" \

--input_array="${input_array}" \

--output_array="${output_array}" \

--mean_value="${mean_value}"  --std_value="${std_value}"

where local_frozen is the frozen graph definition;

inference_type is either FLOAT or QUANTIZED_UINT8;

input_height and input_width are the integer height and width expected by the model, each must be between 1 and 1000; The images will be resized to these dimensions but it is up to the designer to pick dimensions that are not too small to adversely impact accuracy or too large to adversely impact model run-time.

input_array and output_array are the names of the input and output in the tensorflow graph; and mean_value and std_value are the mean and standard deviation of the input image.

Note that:

  1. The input type is always QUANTIZED_UINT8, and specifically, RGB images with pixel values between 0 and 255. This requirement implies that for floating point models, a Dequantize op will be automatically inserted at the beginning of the graph to convert UINT8 inputs to floating-point by subtracting mean_value and dividing by std_value.
  2. The output is a [1 x 1001] tensor encoding probabilities of the classes, with the first value corresponding to the “background” class. The link of the full labels is here .

Evaluation

Submissions are evaluated based on classification accuracy / time while focusing on the real-time regime (defined below) running on Google’s Pixel phone.

  1. Each submission is evaluated using a single thread with a batch-size of 1 on a single big core of the Pixel 2 phone (Snapdragon835).
  2. Each submission will be evaluated on both the ImageNet validation set and a hold out test set freshly collected for the competition. The submission will run on each set for 30 x T ms where T is the number of images in the set. Accuracy is defined as the number of images correctly classified by the submission, divided by T. The submission with the highest accuracy wins.
    1. This is equivalent to the metric of acc / max(30 x T, runtime), where acc is the conventionally defined accuracy and runtime is the total time (ms) to process all T images.
    2. Ties will be broken by submission time.
  3. There is a public leaderboard (link TBA) that shows the accuracy on ImageNet validation set. The winners will be determined on their accuracy on a hold out test set. Note that discrepancies are expected between these two accuracy scores.
  4. The evaluation code uses TensorflowLite’s interpreter API in Java.
  5. The following source code is provided to aid the development of the participants. Check out the description here.
    1. A Java API to run the competition’s image classifier.
    2. Sample tflite models (download instructions)
    3. An Android benchmarker app to time a submission on any Android phone.

Items a) and b) allows the participants to debug runtime errors. Submissions that are incompatible with a) and b) will not be scored.

Item c) allows the participants to measure latency of their submissions on their local phone. Note that latency obtained via c) may be different from the latency reported by the competition’s server due to language differences, device specs and evaluation settings, etc.

Prizes

The top three submissions will be awarded $2,000, $1,000, and $500, respectively.

Timeline

Registration and submission open

March 22 2018

Evaluation server online

April 15 2018

Submission open

May 25, 2018

Submission closed

June 14th 2018

Winner announced

June 18 2018