Gemini 3.5 Flash Disappoints in Android Tests: Google's New Model Costs More but is Outperformed by Older Versions

The first results from tests dedicated to Android development have highlighted some unexpected difficulties for Gemini 3.5 Flash, Google's new artificial intelligence model. Despite the premium position and high expectations from the company, benchmarks show that the system fails to outperform older models in specific tasks related to Android programming.

The Android Bench ranking, used to assess the capabilities of AI models in completing mobile development tasks, included Gemini 3.5 Flash for the first time. However, Google's new model did not manage to crack the top five rankings. At the top of the leaderboard was OpenAI's GPT 5.5 with a score of 74, followed by GPT 5.4 and Gemini 3.1 Pro Preview, both scoring 72.4 points. Even some more recent variations of Claude achieved results superior to the Flash model.

Gemini 3.5 Flash reached a score of 63.7, placing it in sixth place. The most surprising aspect, however, is the relationship between performance and cost. According to benchmark data, the model used an average of about 355.9 total tokens per execution, leading to an estimated average cost of $147.1 per test. This makes it the most expensive model in the ranking, despite yielding lower results compared to some alternatives.

Gemini 3.5 Flash and Further Details on Benchmark Difficulties

The situation looks peculiar considering the historical philosophy of Google's Flash family, generally associated with speed and greater economic convenience compared to more advanced models. During Google I/O 2026, the company had presented Gemini 3.5 Flash as one of the most powerful versions in the series, highlighting improvements in programming, AI agents, and management of complex processes.

Google also stated that the model would outperform Gemini 3.1 Pro in several internal tests and guarantee much faster response times compared to other competing systems. However, the specific results in Android development tell a different story.

The previous Gemini 3.1 Pro Preview, in fact, achieved better performance in this area at a much lower cost. This has raised doubts about the actual effectiveness of the new model for Android developers and about Google's ability to fulfill the promises made during the presentation. It remains to be seen whether future updates will improve Gemini 3.5 Flash, or if it will be the next Gemini 3.5 Pro that represents the true generational leap.