App Annie News

Delivering Trusted Quality Estimates Without Sacrificing Privacy

Melania Calinescu, Head of Data Science

Discover how App Annie delivers trusted quality estimates

In my last blog, I talked about our AI operating principles, which focused on privacy, security and transparency. Today, I want to go into more detail on how we are able to deliver trusted quality estimates without sacrificing data privacy.

App Annie uses supervised learning algorithms, an AI task that analyzes known inputs from a benchmark dataset (called ground truth) to learn patterns and applies them to unseen data.

What does this look like in simple terms? Let’s say you want to use AI to determine if a photo contains a dog or a cat. You expose photos of dogs and cats (ground truth) to the algorithm, and AI will extract and learn visual patterns to distinguish a dog from a cat in the seen photos. When new, unseen photos are presented, AI will determine based on what it learned from ground truth whether it contains a cat or a dog.

The ground truth photos represent the dataset that both helps train the model as well as measures how accurate  the new photo predictions are when the dog vs cat algorithm  is applied.

How are we sure?

To objectively measure performance of supervised learning algorithms, it is paramount to measure quality of predictions on a separate benchmark dataset than the benchmark dataset the algorithm trained on. This is known as the train-test split procedure.

The train-test split procedure takes a dataset and divides it into two subsets. The first subset is used to fit the model, and is referred to as the training dataset. The second subset is called the test dataset and not used to train the model, instead is used as a known output to compare and validate the predictions from the trained algorithm.

The reason why we conduct a train-test split procedure is to estimate the performance of the machine learning algorithm on new, unseen data.

This is how the AI/ML model is expected to work‒train it on available data with known inputs and outputs, then make predictions on new inputs in the future where we do not know the output. A common train-test split ratio is 70 percent training and 30 percent validation.

Delivering Results - Right

To ensure that a one-time measurement is not just luck, we apply the train-test split repeatedly on different train-test partitions of the benchmark dataset. This is known as cross validation, and it provides mathematical confidence in the AI/ML performance.

Cross-validation helps prevent overfitting, which can occur when AI/ML algorithm parameters are optimized to match the training dataset, but can’t perform well when predicting the test set.

While many companies can run complex algorithms, App Annie can deliver insightful, predictive, and prescriptive insights with quality without sacrificing privacy and security. Our powerful combination of mobile market data and first-party analytics through end-to-end data science will allow us to continue to lead the industry with best-in-class privacy, security, transparency and AI/ML best practices.

For more information, please email datascience@appannie.com or visit our Trust and Assurance page.

November 9, 2021

App Annie News

Related blog posts