Resume Vibe Check

I've put an an app online to check your resume with a GPT-wrapper simulator.

Motivation

Modern job application systems are a block box. Your resume goes in, and nothing comes out. Rumor has it they're using AI to check your resume. But how?

Theory

Training and deployment of custom models means capital expense. Nobody's going to buy an H200 for their B2B SaaS when OpenAI's offering discount inference, all subsidized by venture capital and Microsoft.

Or maybe they've simply copy+pasted your resume into ChatGPT.

Either way, it's going to an OpenAI model, so its opinion matters.

Artificial Intelligence?

For less technical readers, suffice it to say "it just says stuff." It picks words according to its conditioning. There's a lottery-element. When you see it type out the words, what it's doing at every small step is preparing a little menu of its top choices, then rolling dice.

This can be very useful, but it's important not to respect it.

We don't want to know what it thinks. We want its gut feel. Its impulsive answer. Hence the name; Resume Vibe Check.

Concept

Put in your resume.
It grabs similar resumes from its library.
Ask a basic and common AI to compare your resume to the others.
For every "miss", write a little report.
(Upsell to my newsletter at the end.)

Building

Starter Library

I need some pool of resumes to start, so I found an off-the-shelf dataset of about two thousand anonymized PDF resumes.

Digitization

The input has to be PDF, because that's what people ask for. But the LLM won't digest PDF; it will expected it scraped. There's many sophisticated OCR solutions, AI solutions and so forth, and some ATS might even have them. But with enough inbound, some solutions will get away with ignoring difficult inputs.

Some tools will be lazy. So we're going for that lowest common denominator. I used pymupdf -- hold the OCR.

If you're building a real ingest pipeline, don't stop there. Don't give people a magic black box that, internally, doesn't work. Have some pride.

Personal Information

Each document goes through redaction. Unwanted information is replaced with little unicode square blocks. (Later, we can warn the LLM about the blocks.)

You cannot hand off inbound scrapes to an API.

The LLM cannot be relied upon to mind details.
You've already sent PII, which you're responsible for, to strangers.
All inputs translate directly to your bill.

Any line with enough digits to accomodate a phone number will be redacted.

Names are redacted. My first implementation does this in a potentially costly way. but I'm breaking down tokens, running them past LUT/cache that was initially seeded with non-name vocabulary lists. Unfamiliar tokens are first checked with simple heuristics. If we don't have strong evidence the token isn't a name, it's passed to a small model, ChatGPT-4.1-nano. Its answer goes in the cache. (This isn't free, but we'll come back to cost.)

E-mails and domains are easy, but incoming resumes had surprising forms. Domain names without protocol, e.g. linkedin.com/derp, or shortlinks without a .com. That's easy to read, and easy to fix, but if you're doing this yourself, remember the dumbest possible URL detector isn't enough for human-readable documents.

Having redacted text, the PDF is discarded.

Global Trial

In general, it is best to assume that the network is filled with malevolent entities that will send in packets designed to have the worst possible effect.
— RFC-1122, section 1.1.2.

Not only the meat of it, but the PII filter itself requires some non-zero inference cost. What if some rascal drops in a million PDFs?

The surest way to prevent abuse is to charge. You can't run unlimited inference on a public site. But I don't want to charge up front. It's a toy. I want to share it.

Enter the "global trial."

Screenshot show the app with a limited number of executions. — The shows how many resumes it will review.

It's a fool's errand to tell who's who on the open internet. So, I'm not going to restrict it on a per-user basis. But I can measure how much work they comptuer has done in total, and put an easy stop there.

If it hits the limit, I can check what's going on. If it never hits the limit, then I never have to touch it. Easy!

Comparison

I'm sending resumes two-by-two, redacted, identified in the prompt with a color. I've randomized the order to ward of order. I don't know that LLMs have an order biased, but maybe they do, and it's easy to shuffle.

I let the LLM one-shot the result. No CoT. Furthermore, I've restricted output with logit_bias, and read back the logprobs. In this way, I don't worry about "structured output" and can even see its actual (not hallucinated) weighted prefrence. Its weighted preference is counted into your total score. So, if it favored yours 70:30, your "grade" is 70%.

Is this a logical way to evaluate a resume? No. But it's the sort of figure that might go into an LLM-based sorting algorithm.

Compared to What?

If I run this against every resume in the system, cost will go to the Moon. In theory, I'd like the top-K most similar resumes. It's reasonable to imagine people applying to a given job will resemble each other more than to those applying to other jobs.

So, I embed the whole lot with paraphrase-MiniM12-v2, and run approximate nearest neighbor lookup with hnswlib.

Now, as I found benchmarking deduplication, these methods aren't that good and frankly anything short of GPT-4-Base is throwing darts. But it's "too cheap to meter" and you gotta pick something!

What the AI sees

What, exactly, made it to the model? Again, not your PDF.

Screenshot showing the scrape, raw text with redactions, extracted from a PDF. — How my PDF looks to the computer..

Here, you can see exactly what made it through the pipeline. Not what you expected? Rebuild your PDF until it's machine readable.

Feedback

This may be the most worthless part. You can't ask a GPT to explain itself. It just says stuff. But you can ask it to try, and maybe it'll even write something for you.

To focus efforts, the feedback is written with access to the competitor, on a contest-by-contest basis, only for contests which were "lost".

Real vs. Fake

I put a checkbox asking permission to use the anonymized inputs to improve the app. After dropping a link on Hacker News, fifty people put one in! I fed these back into the index and found my own CV's score dropped from 99% to just 47%.

Next Steps

I think it would be nice to fork out the "scrape" to a separate page, giving an easy way to see what happens to your PDF, without retaining information or involving an LLM.

Summary

I've dropped the app here and for now, you can use it :)