Technical SEO for small projects: what to fix before writing 100 articles

Technical SEO checklist for blogs and small projects: multilingual, canonical, sitemap, schema, and common mistakes with Astro.

Roger Bosch May 18, 2026

Updated: May 18, 2026

When I launched oshy.tech, I did what every developer who wants a technical blog does: I picked a modern stack (Astro), wrote a few articles, set up the deploy, and assumed Google would do the rest. Spoiler: Google didn’t do the rest. Months later I discovered I had indexed pages that shouldn’t exist, empty categories diluting the site’s authority, and a multilingual system that was generating duplicate content without me even noticing.

Technical SEO for small projects is different from SEO for large sites. You don’t need an SEO team or paid tools. But you do need to understand what signals Google is seeing when it crawls your site, and fix the structural problems before worrying about keywords and backlinks.

The technical checklist that actually matters

There are hundreds of SEO checklists on the internet. Most of them mix trivial things (“add a title tag”) with things irrelevant to small projects (“optimize crawl budget”). This is the checklist I’ve built after getting things wrong with oshy.tech, ordered by real impact:

1. Every URL should have a purpose

It seems obvious, but this is where small technical blogs fail the most. Every URL that Google indexes is a page that Google evaluates. If you have a category page with a single article, Google doesn’t see a thematic section; it sees a nearly empty page with one link.

Before creating categories or tags, ask yourself: does this page, on its own, offer value to someone who lands on it directly? If the answer is no, that page shouldn’t be indexed.

2. Canonical tags on every page

The canonical tag tells Google which is the “main” version of a page. It’s especially important if your blog has:

Versions with and without trailing slashes (/blog/post and /blog/post/)
URL parameters that generate variants (?page=1, ?sort=date)
Multilingual content where versions are translations

<link rel="canonical" href="https://oshy.tech/es/blog/spring-boot-kotlin-experiencia-real/" />

In Astro, you can generate the canonical dynamically in your base layout:

---
const canonicalURL = new URL(Astro.url.pathname, Astro.site);
---
<link rel="canonical" href={canonicalURL.href} />

The most common mistake is not having one. The second most common is having the canonical point to a different URL than the one the user is viewing, which confuses Google about which is the real version.

3. Correct and up-to-date sitemap

The sitemap should only include the pages you want Google to index. This sounds trivial, but Astro’s automatic sitemap generator (@astrojs/sitemap) includes all generated pages by default. If you have empty tag pages, categories with one article, or pagination pages, all of that ends up in the sitemap.

// astro.config.mjs
import sitemap from "@astrojs/sitemap";

export default defineConfig({
  site: "https://oshy.tech",
  integrations: [
    sitemap({
      filter: (page) => {
        // Exclude pages that add no value
        if (page.includes("/tags/") && isEmptyTag(page)) return false;
        if (page.includes("/category/") && hasFewArticles(page)) return false;
        return true;
      },
    }),
  ],
});

In practice, the cleanest approach is to not generate those pages in the first place, but if they already exist, filtering them from the sitemap is the first step.

4. Coherent robots.txt

The robots.txt and the sitemap should tell the same story. If you block a section in robots.txt but include it in the sitemap, you’re sending contradictory signals.

User-agent: *
Allow: /

Sitemap: https://oshy.tech/sitemap-index.xml

If there are sections you don’t want indexed, use noindex in the meta tag instead of blocking with robots.txt. The difference is important: robots.txt prevents crawling, but not indexing. If someone links to a page blocked by robots.txt, Google can still index it without crawling it, showing a result with no content.

<meta name="robots" content="noindex, follow" />

5. Basic performance

Google uses Core Web Vitals as a ranking factor. For a static site with Astro, this should be nearly perfect by default, but there are common mistakes:

Images without explicit dimensions (causes layout shift)
Web fonts that block rendering
Unnecessary JavaScript on pages that don’t need it

With Astro and the <Image> component, images are optimized automatically. But if you use images in markdown without the component, you lose that optimization.

Multilingual: where things get really complicated

Multilingual support is probably the most impactful technical decision for a small blog’s SEO. And it’s where I’ve made the most mistakes with oshy.tech, which has content in Spanish, English, and Catalan.

Hreflang: the correct implementation

Hreflang tags tell Google that several pages are the same piece of content in different languages. The correct implementation requires that:

Each version of a page points to all other versions, including itself
The URLs are absolute
There’s an x-default for the main version

<!-- On the Spanish page -->
<link rel="alternate" hreflang="es" href="https://oshy.tech/es/blog/mi-articulo/" />
<link rel="alternate" hreflang="en" href="https://oshy.tech/en/blog/my-article/" />
<link rel="alternate" hreflang="ca" href="https://oshy.tech/ca/blog/el-meu-article/" />
<link rel="alternate" hreflang="x-default" href="https://oshy.tech/es/blog/mi-articulo/" />

In Astro, generating this requires a system that connects the versions of each article across languages. In oshy.tech I use a mappingKey in each article’s frontmatter: all articles that are the same piece of content share the same mappingKey, and the layout looks up the alternate versions to generate the hreflang tags.

Common hreflang mistakes

Non-reciprocal hreflang. If the Spanish version points to the English one, but the English one doesn’t point back to the Spanish, Google ignores both hreflang tags. They must be bidirectional.

Incorrect URLs. If the hreflang points to a URL that redirects, Google discards it. The URL must be the final one, with no redirects.

Mixing canonical and hreflang. Each version’s canonical must point to itself. If the English version’s canonical points to the Spanish version, you’re telling Google that the English version is a duplicate of the Spanish one.

Hreflang is a suggestion, not a directive. Google can ignore it if it finds contradictory signals. The consistency between canonical, hreflang, and actual content is what makes it work.

The translated content problem

A direct translation of an article isn’t original content for Google. If your Spanish article is exactly the same as the English one, just translated, Google may treat them as similar content. This isn’t necessarily bad if hreflang is properly implemented, but on small sites with little authority, it can dilute signals.

My current approach: I write content in Spanish (my primary language) and the translations are adaptations, not literal translations. I change examples, adjust cultural references, and sometimes reorganize sections. The English article isn’t a copy; it’s the same idea adapted for a different audience.

Categories with few articles: a real problem

This is the mistake that had the most impact on oshy.tech. I created categories thinking about the future: “I’m going to write a lot about Kotlin,” “I’ll surely have several DevOps articles.” The problem is that Google doesn’t evaluate your editorial plan. It evaluates what exists right now.

A category with a single article generates a listing page that has:

A title (the category)
A link to one article
Possibly some descriptive text

For Google, that’s thin content. A page with very little standalone value. If you have ten such categories, you have ten low-value pages. And those pages get indexed, consume crawl budget (it matters even on small sites), and lower the overall perceived quality of the site.

The solution has two parts:

Short-term: mark categories that have fewer than a certain number of articles as noindex.

---
const articles = await getArticlesForCategory(category);
const shouldIndex = articles.length >= 3;
---
{!shouldIndex && <meta name="robots" content="noindex, follow" />}

Long-term: don’t create categories until you have at least 3-4 articles for them. It’s better to have few categories with content than many empty categories. Consolidate related topics into a broader category.

Strategic noindex: which pages shouldn’t be indexed

Not every page on a blog should be in Google’s index. For a small site, these pages are usually candidates for noindex:

Tag pages with few articles. If the tag “coroutines” has a single article, that page adds nothing that the article itself doesn’t already provide.
Pagination pages. Page 2, 3, etc. of a listing rarely add search value.
Categories with fewer than 3 articles. As I explained above.
Date archive pages. If you have them, they probably add nothing.
Standard legal pages. Privacy policy, legal notice. Necessary, but they don’t need to rank.

The noindex tag doesn’t remove the page from your site. Users can still navigate to it. It only tells Google not to include it in search results.

Schema Article: structured data that works

Structured data (Schema.org) isn’t a direct ranking factor, but it helps Google understand your content and can generate rich snippets that improve CTR.

For a technical blog, the Article schema is the most relevant:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Spring Boot con Kotlin: lo bueno, lo incómodo y lo que nadie te cuenta",
  "description": "Experiencia real con Spring Boot y Kotlin...",
  "author": {
    "@type": "Person",
    "name": "Roger Bosch",
    "url": "https://oshy.tech/about"
  },
  "publisher": {
    "@type": "Organization",
    "name": "oshy.tech",
    "url": "https://oshy.tech"
  },
  "datePublished": "2026-05-18",
  "dateModified": "2026-05-18",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://oshy.tech/es/blog/spring-boot-kotlin-experiencia-real/"
  },
  "inLanguage": "es",
  "keywords": ["Spring Boot", "Kotlin", "JPA", "backend"]
}
</script>

In Astro, this can be generated dynamically from each article’s frontmatter:

---
const { title, description, pubDate, updatedDate, tags } = Astro.props;
const schema = {
  "@context": "https://schema.org",
  "@type": "TechArticle",
  headline: title,
  description: description,
  datePublished: pubDate,
  dateModified: updatedDate || pubDate,
  author: {
    "@type": "Person",
    name: "Roger Bosch",
    url: "https://oshy.tech/about",
  },
  inLanguage: "es",
  keywords: tags,
};
---
<script type="application/ld+json" set:html={JSON.stringify(schema)} />

Some important points:

Use TechArticle instead of the generic Article if the content is technical. Google understands it.
dateModified should be updated when you change the actual content, not every time you deploy.
The author must be an identifiable person or entity, not generic.

Real experience with Astro and these issues

Astro is an excellent framework for static blogs. But SEO doesn’t come solved “out of the box.” These are the things I had to solve manually on oshy.tech:

Trailing slashes. Astro can generate /blog/post or /blog/post/. If you don’t pick one and enforce it, you’ll end up with both versions and Google may index them both. In astro.config.mjs:

export default defineConfig({
  trailingSlash: "always",
});

Sitemap with filtering. Astro’s sitemap plugin includes everything by default. I needed to filter out pages that shouldn’t be indexed.

Dynamic meta tags. Every page needs correct title, description, canonical, and open graph tags. In Astro it’s easy with layouts, but you have to do it. It doesn’t come for free.

Dynamic hreflang. With a multilingual blog, generating hreflang correctly requires an article mapping system. Astro doesn’t have this out of the box; I built it with the mappingKey I mentioned earlier.

RSS feed per language. A single RSS feed for all languages is confusing. I generated separate feeds per language, each with the correct URLs.

What I would have done differently

If I were starting oshy.tech today, with what I know now:

I would start with 2-3 categories at most, not eight. I’d only add categories once I had 3-4 articles for each one.
I would implement dynamic noindex from day one for any listing page with fewer than 3 items.
I would configure multilingual correctly before publishing content. Migrating hreflang after the fact is painful.
I wouldn’t generate tag pages until I had enough volume. Tags are useful for internal navigation, but tag pages on small sites are thin content.
I would validate with Google Search Console every week for the first few months. The console tells you what Google is indexing and what problems it finds. That information is worth more than any checklist.

The best time to fix technical SEO is before publishing content. The second best time is now. Every article you publish on a broken technical foundation is effort that yields less than it should.

Tags: #seo #technical #astro #multilingual #canonical #sitemap #schema #indexing #hreflang

Back to all posts

Cover for How to write technical content that works for Google, ChatGPT, and human readers

SEO & Engineering

Roger Bosch

•

May 18, 2026

How to write technical content that works for Google, ChatGPT, and human readers

Cover for Why Google may see your technical blog as low value content even when the articles are good

SEO & Engineering

Roger Bosch

•

May 18, 2026

Technical SEO for small projects: what to fix before writing 100 articles

The technical checklist that actually matters

1. Every URL should have a purpose

2. Canonical tags on every page

3. Correct and up-to-date sitemap

4. Coherent robots.txt

5. Basic performance

Multilingual: where things get really complicated

Hreflang: the correct implementation

Common hreflang mistakes

The translated content problem

Categories with few articles: a real problem

Strategic noindex: which pages shouldn’t be indexed

Schema Article: structured data that works

Real experience with Astro and these issues

What I would have done differently

Related Posts

How to write technical content that works for Google, ChatGPT, and human readers

Why Google may see your technical blog as low value content even when the articles are good

Legal

Navigation

RRSS

Cookie Settings

Technical SEO for small projects: what to fix before writing 100 articles

The technical checklist that actually matters

1. Every URL should have a purpose

2. Canonical tags on every page

3. Correct and up-to-date sitemap

4. Coherent robots.txt

5. Basic performance

Multilingual: where things get really complicated

Hreflang: the correct implementation

Common hreflang mistakes

The translated content problem

Categories with few articles: a real problem

Strategic noindex: which pages shouldn’t be indexed

Schema Article: structured data that works

Real experience with Astro and these issues

What I would have done differently

Related Posts

How to write technical content that works for Google, ChatGPT, and human readers

Why Google may see your technical blog as low value content even when the articles are good

Legal

Navigation

RRSS