Technical Documentation

The MultiLipi Architecture

A Technical Deep Dive Into Our Dual-Layer Infrastructure

Overview

Traditional translation plugins operate on the "Visual Layer"—swapping text strings in the browser. While this satisfies human readers, it creates a chaotic environment for search crawlers and AI agents.

MultiLipi is the first platform to engineer a Dual-Layer Architecture:

The HTML Layer

Fully rendered, localized pages for humans and Googlebot.

The Data Layer

A parallel, structured infrastructure (Markdown + JSON-LD) optimized specifically for Large Language Models (LLMs).

Below is the step-by-step documentation of how our 10-Step Pipeline transforms a static URL into a global, AI-ready entity.

PHASE 1

Infrastructure Setup

The Foundation

We establish the physical routing and security layer before touching a single word of content.

Step 1: Intelligent Provisioning & SSL

Upon connection, our system fingerprints your origin CMS (Shopify, Webflow, WordPress, or Custom Stack) to determine the optimal injection method.

  • Action: We instantly provision dedicated SSL certificates for your localized endpoints.
  • Mechanism: Secure Handshake Protocol (TLS 1.3) ensures that data passing between your origin server and our edge nodes is encrypted, with zero latency added to the Time to First Byte (TTFB).

Step 2: URL Architecture Mapping

We support three distinct routing architectures to match your SEO strategy.

Option A: Subdomains (Fastest Deployment)

Structure: es.example.com, fr.example.com

Best for: Large enterprise sites needing separation of concerns.

Option B: Subdirectories (Maximum SEO Authority)

Structure: example.com/es/, example.com/fr/

Mechanism: We provide a lightweight Reverse Proxy script (Cloudflare Worker or Nginx config) that routes /es/ traffic to our edge servers while keeping the domain authority unified.

Option C: ccTLDs (Local Dominance)

Structure: example.es, example.fr

Mechanism: Advanced CNAME mapping allows you to point custom country domains to our localization engine.

PHASE 2

Content Processing

The Neural Layer

We separate "Content" from "Code" to ensure perfect localization without breaking UI functionality.

Step 3: Deep Ingestion & Slug Localization

Our crawler ingests your original HTML and builds a dynamic content map.

  • The Code Separation: We parse the DOM to identify translatable text nodes while "locking" HTML attributes, scripts, and classes. This ensures your site layout never breaks, even in Right-to-Left (RTL) languages.
  • Slug Translation: Unlike standard proxies, we translate the URL path itself.

    Original:example.com/products/red-running-shoes
    Localized:es.example.com/productos/zapatillas-rojas
    Impact:

    Increases Click-Through Rate (CTR) in local search results by matching user intent.

Step 4: The "Spiderweb" Internal Linking

To prevent "orphan pages" (a common issue where localized pages exist but aren't linked), we inject a dynamic routing block.

  • Action: A <footer> or hidden navigation block is appended to the DOM.
  • Content:

    1. Language Switchers: Direct links to the EN, FR, DE versions of the current page.
    2. Cross-Links: Links to other relevant pages within the same language bucket.
  • Why: This ensures bots can crawl continuously through your entire localized network without hitting dead ends.
PHASE 3

The SEO Layer

Google Compliance

Strict adherence to Google's engineering guidelines to prevent duplicate content penalties.

Step 5: Tag Injection (Hreflang & Canonical)

We automatically modify the <head> of every served page to strictly define its relationship to the network.

Hreflang Maps: We inject a full x-default and localized map for every page variation.

<link rel="alternate" hreflang="en" href="https://example.com/page" />
<link rel="alternate" hreflang="es" href="https://es.example.com/pagina" />

Self-Referencing Canonicals: The Spanish page points to itself as the canonical source, ensuring Google indexes it as a unique asset, not a duplicate of the English page.

PHASE 4

The GEO Layer

AI Infrastructure - The Secret Sauce

The "Secret Sauce." This phase builds the parallel web for AI Agents (ChatGPT, Claude, Gemini).

Step 6: Identity & Schema Injection

We transform your site from "Strings" (text) to "Things" (Entities).

  • Action: The system generates a JSON-LD script based on your Identity Graph.
  • Contextual Logic:

    • Global: Injects Organization Schema (Logo, Socials, Founder) on all pages.
    • Page-Level: Auto-detects content types (e.g., /blog/ triggers Article schema; /product/ triggers Product schema with price and stock).
  • Result: Robots recognize your brand as a verified entity, crucial for appearing in Knowledge Panels.

Step 7: The "AI Twin" Creation (Markdown Generation)

For every HTML page served to humans, we generate a hidden, parallel Markdown file (.md) optimized for LLM token limits.

The Optimization Process:

  1. Context Header: We inject a "Cheat Sheet" summary at the very top of the file, giving AI models the key facts (Who, What, Price) in the first 500 tokens.
  2. Table Extraction: HTML tables are converted into clean Markdown pipes | Column | Column | for perfect data extraction.
  3. Noise Removal: All CSS, JavaScript, and decorative divs are stripped away, leaving only pure semantic signal.

Step 8: The Robot Map (llms.txt)

We auto-generate and host a root-level llms.txt file (e.g., es.example.com/llms.txt).

  • Purpose: This is the standard "Sitemap for Robots." It tells AI agents exactly where to find the clean Markdown files (.md) instead of forcing them to scrape messy HTML.
  • Content:

    • Global Site Description (System Prompt).
    • List of Priority URLs pointing to their "AI Twin" versions.
PHASE 5

Safety & Maintenance

The Loop

Ensuring stability, security, and accurate analytics.

Step 9: Conflict Prevention Protocol

We strictly separate the "Human Web" from the "AI Web" to prevent SEO conflicts.

  • The Rule: All generated Markdown (.md) files are served with an X-Robots-Tag: noindex HTTP header.
  • The Why: This instructs Googlebot to ignore the Markdown files (preventing duplicate content penalties) while allowing AI agents (like GPTBot) to consume them freely via the llms.txt map.

Step 10: Sentient Analytics

We track the performance of your global infrastructure in real-time.

  • Bot Watch: Our edge logs identify specific requests from AI crawlers (GPTBot, ClaudeBot, Perplexity), giving you visibility into your "Share of Model."
  • Geo-Verification: We track "Match Rate"—the percentage of users from a specific region (e.g., Spain) who successfully landed on the correct localized version (e.g., es.example.com).

Ready to Experience the Architecture?

See how our 10-step pipeline can transform your website into a global, AI-ready platform.