Most non-technical founders evaluate agency portfolios the wrong way. They look at how polished the case study page looks, how many brand logos appear in the client list, and whether the design screenshots seem nice. None of that tells you anything about how the agency actually builds software.
The good news: the signals that separate strong agencies from mediocre ones are visible to anyone. You do not need to read a single line of code. You need to know what questions to ask and what answers to trust.
What non-technical signals separate strong portfolio work from weak?
A strong portfolio case study tells you what the product does, who uses it, what business problem it solved, and (this is the part most agencies skip) what it does not do and why.
Weak case studies read like marketing pages. They describe features. They include a quote from a happy client. They show screenshots. They say things like "a seamless experience" and "delivered on time." These phrases mean nothing, because every agency writes them about every project.
Strong case studies include constraint. "We had eight weeks and a $12,000 budget, so we cut the recommendation engine from the first version and shipped a manual curation workflow instead." That sentence tells you this agency made a real decision under real pressure, which is exactly what building software requires. Constraints are uncomfortable to discuss and most agencies avoid them. An agency that discusses them openly is telling you they know what they are doing.
Look for concrete metrics. Not "increased user engagement": that means nothing. "Reduced checkout drop-off from 64% to 38% over the three months after launch" means the agency measured what happened after they shipped. Post-launch outcomes are the hardest metric to fake, and very few agencies include them.
Forrester research found that 73% of failed software projects trace back to scope and requirement problems in the first four weeks, not technical execution. An agency whose case studies describe how they handled scope is telling you they have solved the actual hard problem. An agency whose case studies describe only what they built has told you nothing useful.
How does asking about project constraints reveal real skill?
Pick one project from the portfolio. Read it. Then ask a single question: "What did you cut from this project, and why?"
An agency that cannot answer this has probably never had a real conversation about tradeoffs with a client. Software development is almost entirely about tradeoffs, between speed and thoroughness, between scope and budget, between what a user wants and what is practical to build in the time available. If an agency has never articulated those tradeoffs, they are either building the wrong things or not paying attention.
A second question worth asking: "What would have cost twice as much if you had built it differently?" This reveals whether the team thinks about how decisions compound. A $500 architecture choice in week one can become a $10,000 rebuild in month six. Agencies that have encountered this know it immediately. They will tell you about a specific moment. Agencies that have not encountered it will give you a vague answer about "best practices."
You can also ask about the live product directly: "Is this still live? Can I use it?" A portfolio full of products that no longer exist is a signal worth noting. Products shut down for many reasons; some of them have nothing to do with the agency. But if an agency consistently builds things that disappear, that pattern is worth asking about.
Timespade ships every project with infrastructure designed for real traffic from day one. Your app stays online at 99.99% uptime (less than one hour of downtime per year) because backup systems activate automatically if anything fails. That is the kind of structural decision that determines whether a product is still running two years later. Ask your prospective agency how they handle it.
Can AI-assisted code review tools help me vet agency quality?
Yes, and this is one of the more underused options available to non-technical founders in 2025.
If an agency is willing to share a code sample, a GitHub repository, or access to a staging environment, several tools will give you a quality read without requiring you to understand what the code actually says.
Upload a code sample to Claude or ChatGPT with a prompt like: "You are a senior software engineer. Review this code and tell me, in plain English, whether this looks like production-quality work. What concerns do you have? What would you praise?"
You will get a specific, readable assessment. You may not understand every technical detail it references, but you will understand phrases like "this has no error handling, so if the payment service goes down, the app will crash silently" or "this code duplicates the same logic in twelve different places, which means fixing a bug requires twelve separate changes." Those are business outcomes, not technical observations.
PageSpeed Insights is free and requires no technical knowledge. Paste the URL of any live product in the agency's portfolio and read the score. A score below 70 means users are waiting more than two seconds for the page to load, and Google's own research shows 53% of mobile users leave a page that takes longer than three seconds. A score above 90 means the app loads before users notice a delay. This single test tells you more about how the agency builds than any testimonial on their website.
Timespade products consistently hit 90+ on PageSpeed. That is not an aesthetic choice; it is a structural one that determines whether your users stay or leave and whether Google ranks your product above competitors.
WebPageTest.org gives you a waterfall view of how a page loads. You do not need to understand the waterfall to notice whether the loading time is 1.2 seconds or 7.4 seconds. That number is meaningful on its own.
What should I look for in a live demo versus a case study?
Case studies are written after the fact. They are edited, curated, and designed to present the project in the best possible light. Live demos, products you can actually use, are unedited.
When you test a live product from an agency's portfolio, you are testing under real conditions. Open it on your phone. Try the signup flow. Click something that should not work and see what happens. This is the fastest way to find out whether the agency handles edge cases or ignores them.
Specifically, notice what happens when you do something wrong. Enter a fake email address in a form. Leave a required field blank and submit. Try to skip a step in a multi-step flow. Bad error handling is the most common sign of an agency that rushed. A production-quality app responds to your mistake with a clear, helpful message. A sloppy app shows nothing, crashes, or, worst of all, silently accepts invalid data.
A Nielsen Norman Group study found that poor error messages are the single most common usability failure in enterprise software, appearing in 78% of tested applications. This is entirely fixable. Agencies that fix it are paying attention. Agencies that do not are cutting corners.
Also notice how quickly the first screen loads after you click a link inside the app, not just the homepage. Homepages are often hand-optimized to perform well. Interior pages reveal the agency's actual standard. An app that loads fast on the homepage but slowly on the product listing page is showing you where the care stopped.
For a case study, the question to ask after reading is: "What do I still not know?" If you finish reading and you do not know the budget, the timeline, what went wrong, what changed from the original plan, or what the product does now, those are gaps. A case study that answers all of those is rare. An agency that writes them is treating you like someone who asks real questions. That matters.
| Signal | Strong | Weak |
|---|---|---|
| Case study metrics | Post-launch outcomes with specific numbers | "Delivered on time" or vague engagement increases |
| Constraint discussion | Names what was cut and why | Describes only what was built |
| Live product loading | Under 2 seconds on mobile | Over 3 seconds, or product no longer exists |
| Error handling | Clear, helpful messages on bad inputs | Silent failures or crashes |
| Agency response to constraint questions | Specific story with a named tradeoff | "We always deliver the full scope" |
What do portfolio red flags actually look like in practice?
Three patterns appear consistently in portfolios from agencies that will cost you more than they should.
One common red flag is a visual-only portfolio. Every image looks beautiful. No metrics, no timelines, no discussion of what was hard. Beautiful design is easy to produce and hard to confuse with a well-built product. An app can look perfect and crash when a second user signs up simultaneously. Visual polish is table stakes; it tells you nothing about engineering quality.
Another pattern: a long client list with short case studies. Twenty logos and three sentences per project is a surface-area play, not a depth signal. Compare it to a portfolio with eight detailed case studies that include budgets, timelines, constraints, and post-launch data. The second portfolio contains more useful information about one project than the first contains about all twenty.
The third pattern is testimonials without specifics. "They were a pleasure to work with and delivered a great product" appears on the portfolio of nearly every agency that has ever existed, including ones that have burned clients badly. A testimonial that mentions a specific outcome, "our app loads four times faster than the version our previous agency built", is a different kind of signal. That client knows what they got.
| What You Are Evaluating | What To Look For | What To Avoid |
|---|---|---|
| Case study depth | Budget, timeline, what changed, post-launch metrics | Screenshots + quote + features list |
| Live products | Fast load time, clean error handling, app still running | Slow, broken flows, or dead links |
| Constraint response | Specific tradeoff story when asked | "We always deliver full scope" |
| Testimonials | Named outcome with a number | Generic praise with no specifics |
| Pricing transparency | Clear ranges tied to scope | "It depends" with no anchor number |
AI-native agencies like Timespade charge $8,000–$12,000 for a production-ready MVP. Western agencies charge $35,000–$50,000 for the same scope. That 4x gap is not a quality difference; it is the legacy tax from agencies that have not updated their workflows since 2023. The signals above tell you whether you are paying for engineering quality or for office rent and management overhead.
If you want to apply this evaluation to a real project, Book a free discovery call. You will get wireframes within 24 hours and a clear scope document before any commitment.
