Prepare Training Set, Validation Set, and Test Set

For the training of a High Detail tool in VisionPro Deep Learning, you first need 3 types of image dataset: The training set, the validation set, and the test set.

Prepare Training Set

The VisionPro Deep Learning tools are based on deep learning, which teaches a neural network by feeding the network data and having the machine deduce from the learned data what should be accomplished. So the data that is fed into the network is vitally important. The largest single determinant that you can use to affect the network training phase is the composition of the training image set. That is to say, a famous phrase in the field of machine learning "Garbage in, garbage out" also applies in the training of VisionPro Deep Learning tools. If meaningful images properly labeled are put into your neural network, the results of training could be good, but if not, there is no chance for good results. Therefore, it is paramount to ask yourself, "What am I teaching the network?"

What you are teaching the network is based on your training set. The training set represents the data that you are feeding your network. For the VisionPro Deep Learning tools, images constitute the backbone of the dataset that will be used to teach the tools. With deep learning, a large set of data is fed into the system, and the system will then deduce from the data what it is that we are trying to accomplish. So the data, in this context, images, is critically important.

Within the deep learning realm, the training image sets represent the data that will be passed into the network. When constructing your training image set, you will want to consider the following:

Is my training image set representative of runtime images? This includes the following:
- Lighting – If there will be lighting variations, then your Training Set should include the lighting variations you expect at runtime.
- Color – Are parts going to change color during runtime? Then you should include all of the different color variations.
- Rotation/Scale – When the part is presented to the camera, will it show up rotated or scaled? Then those variations should be included.
- Part-to-part variation – If the parts being inspected may have subtle variations, then those variations should be included in your Training Set.
- Background – In a lot of instances, images of your part may be captured on your desk. However, the parts will ultimately be running on a conveyor belt, which provides a much more interesting background for the Deep Learning tools. If that will be the case, you should get a Training Set of the part on the conveyor belt. And, if the background is not important to the inspection, you can mask it out. Masking out the background will generally improve the performance of the tool by not teaching the network about extraneous information.
Is my training image set providing enough data? Many problems encountered while programming the Deep Learning tools can be solved by providing more images and data, and then retraining the tools.

Tip: If you annotate your images, in other words, provide descriptive information in the file names, such as "good", "bad", etc., you can help speed up the process of labeling your images.

Construct Image Sample Sets

Once you have collected your images, you can organize them into sample sets. The image sample sets provide a helpful mechanism to manage and organize your database of images. Image sets allow you to divide your collection of images into categories that you can use for training, testing, experimentation and validation. These sets will help to establish confidence in your results, by executing your tools against specific types of images, which you define.

For example, you could divide your database of images into categories:

Train – Line 1 (04082019)
Train – Line 1 (04222019)
Train – Line 2 (03292019)
Test – Line 1 (04092019)
Test – Line 2 (03302019)
Test – Bad Parts

Note: For more information of setting up image sample sets, see Create and Edit Sets.

Tip: Once you have assembled image sample sets, you can use the Display and Database Overview filters (Display Filters) to easily filter and sort your images.

Construct Training Set

Once you have your images categorized and collected, you can create an image sample set, and use the them in your Training Set. The default behavior for all tools is to train using 50% of the images in the image set, with the images selected at random. The Testing/Training image set splits refer to the portion of labeled images that are used for training and testing. By default, VisionPro Deep Learning uses a 50% testing/training split in the Training Set dialog. However, this can be modified, based on the application needs.

The Select Training Set dialog is used to determine the composition of the samples that will be used to train your VisionPro Deep Learning tool. To do this, the Select Training Set dialog allows you to define a tool's training set based on either all of the views in your image database or one or more image sets, and a percentage of those images/views. The tool will use the designated Selection percentage to randomly select a portion of the labeled images for training, while the remainder of the images will be used to compute result statistics about the tool, which means that the remainder of the images will automatically be the test set.

A view's membership in the training set is graphically illustrated by the Train indicator at the top right corner of the Image Display View (as shown below).

Note:

Before a tool can be trained, all views must be labeled.
Ideally, prior to training your tool, you have set up Image Sample Sets.
The training set is set at the stream level, since each stream can have its own image database.
You can also use the Default Display Filters "trained" to view all of the views that have the train flag set.

Here are the steps for initializing the training set:

When you first add a tool, but have not defined a training set, if you press the Train icon, you will see the following:

Note: Before pressing the Train icon, first label one or more views.
This launches the Select Training Set dialog.

Note: You can also launch the Select Training Set dialog by pressing the Edit link on Tool Parameters panel.
If you have not previously defined Image Sample Sets, use the Select training set from all views option.
Then define the Selection, which determines the percentage of labeled images/views from your training set which will randomly be selected to be used to train the tool.
If you have previously defined Image Sample Sets, use the Select training set from Image Sets option.
Select the previously defined Image Sample Sets from the list.
Define the Selection, which determines the percentage of labeled images/views from your training set which will randomly be selected to be used to train the tool.

Note: When using the Select training set from Image Sets option, the selection percentage is based on the combined total number of images/views from the selected image sets, not a percentage from each individual set. The selection is done to best ensure coverage of all classes trained. For example, if you have three training image sets, where A contains 20 samples, B contains 10, and C contains 40 (70 total), and your Selection is the default 50%, the tool may randomly select 5 images from A, 8 from B and 22 from C (35 views).
Press the OK button to accept the training set configuration.
Images that are included in the training set will be denoted at the top right corner of the Image Display View.

Modify Training Set

After a training set has been created, if you add new images/views to the database, they will not be included in the training set. This means if you add images/views to your database or image sets after you have pressed the OK button in the Select Training Set dialog, they will not be included in the training set. There are a few different ways to modify a previously configured training set. If you want to include a selection of newly added images/views to your training set, you can either re-open the Select Training Set dialog and press the OK button again. This will discard the current training set, and create a new, randomly selected set. Alternatively, you can manually add the images/views, using the methods described below.

Note: Whatever manual adjustments you make to the training set will be overwritten the next time that you open the Select Training Set dialog and press the OK button.

You can explicitly add and/or remove images/views from a training set through one of the following:

In the image display area, right-click an image/view and you can selectively either add an image/view to the training set if it was not a part of the training set, or remove it if it had been used in the training set.

]
In the View Browser, if you use a Display Filter, the Actions for N Views menu option Add the selected views to training set or Remove views from training set can be used to add or remove multiple views from training sets.
From the Database menu, you can open the Select Training Set dialog.

Prepare Validation Set

Validation Set is only used for High Detail modes of Green Classify and Red Analyze. Validation Set is used to evaluate a model created from the train set on a certain iteration period. This is like an idea of a deep learning model taking a mock exam before a real exam. Based on the evaluation results with the validation set, High Detail architecture calculates losses from validation set and select a final model with the lowest loss.

Note: Validation set is not used in Green Classify High Detail Quick.

Validation Set Ratio

Validation Set Ratio is used for High Detail modes of Green Classify and Red Analyze. Validation Set ratio is the ratio of validation set among the training set. Validation set is randomly selected from the training set whenever you click the training button.

Validation set is different from testing set since validation set is participated in training, but testing set is not at all. Testing set is used only for selecting the model which shows best performance in general data set after training is finished. For example, you might get overfitting model to the training set when you set high epoch count without validation set. If you set a high epoch count and start training without validation set, there is no way to know whether overfitting occurs or not until the end of the training. In this case, the overfitting model will eventually be generated as a final model.

However, if there is a validation set, it can be determined that overfitting occurs when the performance on the validation set is overly lower than the performance on the training set after certain amount of training. And after the entire training is finished, the model generated before the overfitting occurs will be selected as the final model. Like this, the validation set is not directly used for training but participated in calibrating the model by checking the performance on unseen data during the training.

Once you finished editing your training set on Select Training Set, the views that are picked as the validation set will have special indicators which indicates that these views are included in the validation set among the images in the training set.

Enter 'validated' or 'not validated' in Display Filter to display only the views that are included in the validation set or to exclude them in the View Browser and Image Display Area.

Enter 'validated' or 'not validated' in Database Overview Filter to display only the results of the views that are included in the validation set or to exclude them in Database Overview.

Note: Database Overview Filter is activated only when the Expert Mode is enabled.

Prepare Test Set

Similar to the training set, you will want to have your test set contain representative variations of what you expect to encounter during runtime. The key differentiation between the training set and the test set is that the test set are not included during the training of the tool. Instead, the test set are used to evaluate the results of training and thus they are also called as evaluation set. The remainder of the images that are excluded from the training set will automatically be the test set.

You will also want to label the test set in the same way that you label the training set. This will allow you to perform statistical analysis of training set, to determine how well the tool is generalizing its results.