In this article, we expose how using AI pipeline easily permits to solve complex use cases requiring OCR and text analysis.
In recent years, within the world of Artificial Intelligence (AI), one of the most popular applications is computer vision. The main reason for this success is the great diversity of the market and needs: medical imaging, industry, transport etc. In particular, computer vision enables image recognition for all control processes, object detection and facial detection.
Many companies use object detection to automate control processes for example. They are then faced with two choices:
1. Using pre-trained models (models already trained to classify a specific kind of “object”) of the major AI providers in computer vision (Google Cloud, Amazon Web Services, IBM Watson, Microsoft Azure, Clarifai, etc.) in this case the object they want to identify is automatically recognized by these models.
2. If no pre-trained model is satisfactory, then the user will have to train a specific model for his own needs. In this case he has two possibilities:
The last possibility is based on a technology called Auto Machine Learning (AutoML) applied to the Deep Learning algorithms (neural networks) used in vision. This service has been created to respond to the desire to democratize AI and to allow developers without a real knowledge of Machine Learning to be able to train models easily.This AI service offers the possibility to obtain fast and accurate results at a lower cost, and without the expertise of a data scientist on a subject like computer vision which uses very complex algorithms.
However, using automatic generation of the custom model also means giving up total control over the technical solution and the algorithms behind the model.Although many of the techniques used by AutoML are known, such as Transfer Learning (using existing pre-trained model algorithms to detect similar objects) and Neural Architecture Search (an algorithm for building neural networks from assembled, deleted, added blocks), the algorithm is never precisely known by the user.
During our study of AutoML Vision (name associated with Google, but considered in this article as generalist), we projected ourselves in the role of a company that wants to develop an image classification project for a very specific object. We embody a company that does not have an AI expert and wishing to obtain a high level of performance at a lower cost, therefore without using a service provider. So the first question that comes to our minds is, “Which provider do I choose?” A first observation appears after going through the market: only a few providers offer an AutoML Vision service.We decide to test 4 of them:
As an early-stage company in the field of AI, we took a look at the Forester Computer Vision New Wave. We looked at the four highest ranked providers platform and checked if they provide an AutoML Vision service. We could have chosen other providers like IBM Watson Visual Recognition or Vize.ai by Ximilar.
In order to have a clear view of the market and the different providers, we benchmarked these 4 solutions on two different projects. Two image classification projects with real differences: database size, labels, domain, database quality. So we are going to carry out these two projects to analyse the results of the four suppliers respectively on the two projects.
The first project consists in the recognition of a specific box used for the implementation of a new generation internet connection. The objective is the creation of a model which allows you to indicate the presence or not of this box in any photo. We have a dataset of 2586 images labelled as “box” and 1013 negative images (without the box).
The second project aims at the recognition of melanoma. The aim is to create a model to classify melanomas according to whether they are benign or malignant. For this project, we have a dataset composed of 460 images with the label “benign” and 462 images with the label “malignant”.
Here , we tested solutions in two different fields, with two different problems. This can show us if there is really a need to carry out the test for every different project.
After having apprehended and taken in hand the 4 solutions on two distinct use cases, many differences appeared between the solutions in approach and in use. The first remark concerns the very first step: create an account (and get API token if using API). This step is laborious and already takes some time. It remains more time-consuming on Google and Amazon, assuming you have no experience on these platforms.
Then, the process leads us to the AutoML Vision platform. This is evident in Google and Microsoft. Even more intuitive for Clarifai, which only offers Vision services. For Amazon, on the other hand, the task is much more complicated: accessing their new Amazon Rekognition Custom Label service remains a mystery. We used a direct link to this service to access it. It’s easy to get lost, much more complicated to quickly access the interface to build a model.
Then we come to the import of the database (images). First of all, it is important to specify that Amazon and Google force the user to store the database in their Cloud service in order to be able to use it for the model. Concerning the file format, the classic image formats (png, jpg) are accepted for all providers. More specific formats may be tolerated (some only by the corresponding API). The problem is mainly for the labeling step. Assigning to each image its own label(s) can be tedious:
However, Microsoft forces the user to duplicate images in several imports if you process multi-label images.
Note the availability of the Amazon SageMaker GroundTruth service which will allow you to have your data labeled “automatically” by AWS.
Please note that these remarks concern only the console of each provider, considering that this method does not require any technical abilities. By using the APIs, it is obviously possible to generate a label by group of images via a few lines of code, and to facilitate multi labeling for all solutions.
Concerning the pricing, the offers are as follows:
As you can see, these price indications make it more than complex to estimate a final cost that will be charged to you. Nevertheless, this table gives an overview of the most cost-effective solutions according to your needs.
If you just want to try training multiple custom vision models, then Amazon and Clarifai will have reasonable cost. On the contrary, if you’re a company and you want to use your model to predict a large set of images, training cost is negligible. You will have to focus on Model Usage costs.
We can therefore already define a financial strategy on the choice of supplier, without even knowing the exact cost of operations.
However, beware of additional charges, especially for data storage, which can cost without you even noticing!
The next step concerns the launch of the automatic model drive and the parameters over which the user has control.
Once the dataset has been imported and tagged, it’s time to start model training. The two parameters that the user can control are the training time and the train set / test set distribution.
Google allows the user to define the number of nodes for the training (8 nodes ~ 1 hour of computation), and to define in the .csv file, for each image if it is part of the train set or the test set (if the user doesn’t fill in the distribution, it will be set by default by Google : 80% train, 10% val, 10% test).
Clarifai does not allow any user intervention on these parameters: the automatic distribution is 80% train, 20% test.
Microsoft offers the choice to the user either to define himself the training time, or to let Microsoft do it for him (Quick train / Advanced Train). However, it is not possible to change the Train / Test distribution.
Finally Amazon does not allow you to set a personalized training time, but it offers a customization of the train / test distribution quite advanced:
One of the most important steps is the evaluation of the model. This allows the user to determine, according to performance criteria linked to the user’s expectations, whether the model is reliable or not. Several indicators allow this. All the services tested give as metrics: precision and recall, and offers the possibility to manually consult the test dataset in order to observe where the model was wrong.
The precision answers the question : What proportion of positive identifications was actually correct?
The recall answers the question : What proportion of actual positive results were correctly identified?
We can also use the confusion matrix, on the Google and Clarifai service, to characterize the type of error of the model and the proportion.
Overall, Clarifai and Google offers a more thorough evaluation of the model, with interesting metrics and statistics. Amazon and Microsoft, on the other hand, keep to the bare minimum by highlighting only the basic metrics.
All the providers give a general metric supposed to represent the general accuracy of the model, but this metric is not the same for all the providers. Moreover, they do not really expose how it is calculated. This does not seem to be a good reference to use to compare the models.
Here, we can see that for the Internet box use case, if we want the best precision, we choose Google, if we want the best recall, we choose Amazon.
For the Melanoma use case, we would choose Microsoft for the best precision and Amazon for the best recall. In the melanoma project, we must choose the provider with the best recall because we want a model which misses as little malignant melanoma as possible. In the internet box project, we should look at the precision because we want the model not to predict that there is a box if there is not.
As we can see, depending on your database, and your project, providers do not perform with the same accuracy. Testing many providers must be the only way to choose which one you are going to use. First of all, performances are not regular depending on the project, you can look for the best precision, or the best recall, and there is never one provider which is the best for every project, for every database.
Once the model has been trained, it can finally be used. Each provider offers somewhat different services. Microsoft and Google allow you to test the model online on the console by importing images individually.
With Clarifai, we can create a workflow in the Explorer (console) and we can use our models to predict. One request is limited to 32 inputs.
All providers allow the user to use online prediction via a REST API.
During the test of all these solutions, we obviously encountered problems specific to our use as a normal user.
Google’s platform took us a long time to master, but once taken in hand, it is pretty ergonomic. The problem with the error management is that an error occurred during the training and there is no way to know the cause. Also, some data were not labelled as indicated in the .csv file. The only solution to these errors, whose source is unknown, is to contact the technical support, which is not free of charge!
Same observation for Amazon, some problems when importing the dataset for the model training, as well as problems preventing to visualize the model evaluation, and no indication as to the cause of these problems. Two solutions, investigate these errors yourself at the risk of losing a lot of time, or contact a support once again paying. The lack of user control over the threshold also severely handicapped us when we had to evaluate the model and compare it to others.
The user experience for Clarifai was quite laborious. A few problems of fluidity and clarity when importing images in particular, disrupted the process, as well as the process itself.
that some graphical bugs, especially in the evaluation of results. From our point of view, it would be better for Clarifai to judge them on their API, which is much more advanced, than their console.
Finally, Microsoft offers a very intuitive interface. It is the interface on which we spent the least time and encountered no notable problems. The oversimplified and accessible side is assumed, perhaps a little too much to our liking when we move on to the evaluation of the model.
For each project, each use case, an analysis is necessary in order to evaluate costs, uses and performance. It was observed during this study that each case is specific and we cannot be certain of the choice of solution until we have tested multiple solutions available on the market. Some solutions can bring very low results, and others excellent ones, and this logic can totally change for another use case. Also, depending on the project, priority will be given to costs, results, calculation times and number of queries per second, or ease of use and handling. These are all criteria that can impact the decision, and allows the user to choose the solution that best suits the project, the most relevant solution.
It is on this basis that our Eden AI offer comes into play. Thanks to our in-depth expertise in the use of these different Artificial Intelligence solutions, we are able to provide the recommendation most suited to your problem, and save you a lot of time and money.
You are a solution provider and wish to integrate Eden AI, contact us at : contact@edenai.co
This article is brought to you by the Eden AI team. We allow you to test and use in production a large number of AI engines from different providers directly through our API and platform.
You can directly start building now. If you have any questions, feel free to chat with us!
Get startedContact sales