apache.htaccessmod-rewriteurl-rewriting

Reading variable from a file into .htaccess causes an internal server error


What I'm Trying to Accomplish

  1. I have more than 1,000 individual files in the format https://example.com/archives/<NNNNNN>.php where N is an integer.

  2. I need to keep the existing structure as-is, though I can add to it. In this case, each page has a unique data-url-title="<url-friendly-title>".

  3. I would like my .htaccess config to determine what file was read, use regex to extract the url-friendly-title, and rewrite the final URL to replace <NNNNNN>, in the format https://example.com/archives/url-friendly-title.php (I would then strip .php from it—but I didn't get that far).

What I Try to Do

# Enable rewriting
RewriteEngine On

# Match the old archive URL
RewriteCond %{REQUEST_URI} ^/archives/([0-9]+)\.php$

# Check that the file exists
RewriteCond expr "-f '%{DOCUMENT_ROOT}/archives/%1.php'"

# Extract data-url-title from the file content
RewriteCond expr "file('%{DOCUMENT_ROOT}/archives/%1.php') =~ /data-url-title=\"([^\"]+)\"/"

# Rewrite the URL to the new friendly URL
RewriteRule ^archives/([0-9]+)\.php$ /archives/%1.php [R=301,L]

What Happens

Any attempt to use this configuration causes my entire website to throw an internal server error with the log message RewriteCond: bad flag delimiters.

I have looked at numerous answered questions, and they seem to indicate that, in Apache 2.4.x, it's possible to read a variable from an external file by using RewriteCond expr "<expression>":

I can't get this to work. What more:


Solution

  • Well, because rewriting the page URL based on a variable within it, only by using .htaccess, seems unworkable . . . here's what I did:

    /.htaccess

    First things first, I check for the required pattern and pass it to the redirect.php handler script. However—here's the trick—I also pass the rewritten URL to the handler script to ensure that anyone using the old-style URL will still get a valid result.

    # Enable rewriting
    RewriteEngine On
    
    # Redirect numeric IDs to friendly URLs
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule ^archives/(\d+)\.php$ /archives/redirect.php?id=$1 [R=301,L,QSA]
    
    # Handle friendly URLs
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^archives/([a-z0-9-]+)/?$ /archives/redirect.php?title=$1 [L,QSA]
    

    /archives/redirect.php

    Here's where the fun begins. The comments in the code explain how the script works, but the point is that now:

    <?php
    // Iterate through all available `<NNNNNN>.php` files and check
    // whether file contains the `data-url-title` attribute
    function findMatchingFile($fileFormat, $isId = false) {
      $files = glob(__DIR__ . "/*.php");
      foreach ($files as $file) {
        if (basename($file) === 'redirect.php') continue;
        $content = file_get_contents($file); 
        if ($isId) {
          if (basename($file) === $fileFormat . '.php') {
            if (preg_match('/data-url-title="([a-z0-9-]+)"/', $content, $matches)) {
              return array($file, $matches[1]);
            }
          }
        } else {
          if (preg_match('/data-url-title="' . preg_quote($fileFormat, '/') . '"/', $content)) {
            return array($file, $fileFormat);
          }
        }
      }
      return false;
    }
    
    // Get input from query parameters
    $input = isset($_GET['id']) ? $_GET['id'] : (isset($_GET['title']) ? $_GET['title'] : '');
    
    // Sanitize input
    $input = trim(preg_replace('/[^a-z0-9-]/', '', strtolower($input)), '-');
    
    // Find file matching input: if searchin by ID and matching file is found,
    // redirect to archive URL; if searching by title, include matching fild
    if (!empty($input)) {
      $result = findMatchingFile($input, isset($_GET['id']));
      if ($result) {
        if (isset($_GET['id'])) {
          header("HTTP/1.1 301 Moved Permanently");
          header("Location: /archives/" . urlencode($result[1]));
          exit();
        } else {
          include($result[0]);
          exit();
        }
      }
    }
    
    header("HTTP/1.0 404 Not Found");
    echo "404 Not Found";
    ?>