差分

このページの2つのバージョン間の差分を表示します。

--- opencv_dnn:環境構築:dnn_with_cuda [2020/02/12 18:04] – baba
+++ opencv_dnn:環境構築:dnn_with_cuda [2020/02/23 14:51] – [OpenCV DNN with CUDA] baba
@@ 行 2: / 行 2: @@
 このページでは，OpenCVのdnnモジュールをcudaでinferenceさせるための環境構築に関してまとめます．もともとの動機は
   * OpenCVのdnn inferenceをもっと早くしたい
-なわけです．もちろんintelのinference engine ( https://github.com/opencv/opencv/wiki/Intel%27s-Deep-Learning-Inference-Engine-backend )を利用するのもありですが，導入したところで1.5倍程度の速度向上しか見込めません．若干モチベーションが下がる．それで，dnn moduleのcudaサポートがついに4.2から実現した（対応ネットワーク構成に制限があります）という情報を掴み，早速cudaでyoloやssd等を走らせてみよう．と思った次第です．実際にcudaで実行してあげると，大体CPU inferenceの15倍位になります．さすがcuda．
+なわけです．もちろんintelのinference engine ( https://github.com/opencv/opencv/wiki/Intel%27s-Deep-Learning-Inference-Engine-backend )を利用するのもありですが，導入したところで1.5倍程度の速度向上しか見込めません．若干モチベーションが下がる．それで，dnn moduleのcudaサポートがついに4.2から実現した（対応ネットワーク構成に制限があります）という情報を掴み，早速cudaでyoloやssd等を走らせてみよう．と思った次第です．実際にcudaで実行してあげると，大体CPU inferenceの15倍位（Geforce 1080Tiのとき）になります．さすがcuda．
 参考にした記事は以下となります．日本語でこのあたりをubuntu環境でやってる人がいなかったのでここに記しておくことにしました．
@@ 行 17: / 行 17: @@
   * cuDNN 7.6.4
-cmakeでconfigureする際に，導入しているバージョンが複数ある場合は，適切なバージョンに変更するなどのマニュアル作業が生じます．例えばcuda10.2でやり
+cmakeでconfigureする際に，導入しているバージョンが複数ある場合は，適切なバージョンに変更するなどのマニュアル作業が生じます．できればcmake-guiを利用してパスやバージョンが正しいかを細かく確認することをおすすめします．特にmakeに難しいことはないですが，エラーが出る場合はcuda周りを一通りチェックしてください．また，opencv_contribが必要なのでそれも忘れずに．必要に応じてnvidia-dockerすると良いかなと思います．
 ==== object_detection.cppの修正 ====
@@ 行 367: / 行 367: @@
     std::vector<float> confidences;
     std::vector<Rect> boxes;
-    if (outLayerType == "DetectionOutput")
+    boxes.clear();
+     if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN
+    {
+        /*
+        // Network produces output blob with a shape 1x1xNx7 where N is a number of
+        // detections and an every detection is a vector of values
+        // [batchId, classId, confidence, left, top, right, bottom]
+        CV_Assert(outs.size() == 1);
+        float* data = (float*)outs[0].data;
+        for (size_t i = 0; i < outs[0].total(); i += 7)
+        {
+            float confidence = data[i + 2];
+            if (confidence > confidenceThreshold)
+            {
+                int left = (int)data[i + 3];
+                int top = (int)data[i + 4];
+                int right = (int)data[i + 5];
+                int bottom = (int)data[i + 6];
+                int width = right - left + 1;
+                int height = bottom - top + 1;
+                classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.
+//                boxes.push_back(Rect(left, top, width, height));
+                confidences.push_back(confidence);
+            }
+        }
+        */
+    }
+    else if (outLayerType == "DetectionOutput")
     {
         // Network produces output blob with a shape 1x1xNx7 where N is a number of
@@ 行 435: / 行 462: @@
     else
         CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);
     std::vector<int> indices;
@@ 行 445: / 行 473: @@
                  box.x + box.width, box.y + box.height, frame);
     }
 }