In this article, we will look at how to block malicious robots and requests with the .htaccess file.
Introduction
Website traffic does not just come from human visitors. A large proportion of requests are generated by robots: some are legitimate (like search engines), but others are malicious. The later can:
- unnecessarily consume server resources;
- attempt to exploit security vulnerabilities; or even
- participate in DDoS attacks.
Some requests may also come from suspicious IP addresses or user agents known to scan, hack or disrupt Web services. In some cases, it is useful to set up simple rules to block these unwanted accesses before they even reach your scripts or application.
The .htaccess file, used with a server such as Apache, enables you to filter this type of traffic upstream, with rules that are easy to adapt to your needs. This guide will show you how to implement these rules safely and effectively.
Prerequisites
Enter the following address in your web browser: https://mg.n0c.com.
How to Filter Upstream Traffic
Step 1 — Identify Malicious Requests
Locate malicious requests in the access log (see How to Use Access Logs).
- Start by analyzing the site’s access log.
- In the example below, we see an IP address located in France sending requests to wp-cron.php, with an HTTP response code of 200 (success).
- As the application firewall has not blocked this request, it may be appropriate to intercept it directly via .htaccess, as it is not legitimate in the context of this website.
- In the example below, we see a request from a user agent identified as BigBadRobot, accessing the file wp-cron.php with HTTP response code 200 (success).
- Since this request has not been blocked by the application firewall and this agent is known to behave in an abusive or illegitimate manner, it may be appropriate to block it directly via .htaccess.
Step 2 — Locate the .htaccess File
Identify the document root for the domain or sub-domain on which you wish to block requests in the domain manager (please refer to the How to Manage Domains article).
- In the Domain Manager, locate the document root corresponding to the domain (or sub-domain) on which you wish to apply the blocks.
- Open the file manager and locate the .htaccess file (see How to Use the File Manager).
- Access the file manager, then navigate to the document root identified in the previous step. Locate the .htaccess file.
- As .htaccess is a hidden file, it may be necessary to activate the display of hidden files in the Profile Settings.
Step 3 — Modify the .htaccess File
- Open the .htaccess file in edit mode. Here is an example of the contents of a default .htaccess file for a WordPress site:
- Add the necessary blocking rules.
- Then add the appropriate blocking directives, as we will see in the examples below.
Step 4 — Clear the LSCache Cache
For the rules to take effect, it may be necessary in N0C to clear the cache for the domain in question (see the article How to Use LSCache).
Important: Avoid Conflicts Between Blocking Methods
When setting up blocking rules in the .htaccess file, it is important not to combine several different methods to block the same target (IP, User-Agent, etc.) in the same context. For example, mixing directives, mod_rewrite and SetEnvIfNoCase rules to block the same IP addresses or agents can result in :
- conflicts of priority between rules;
- unexpected behaviour; or even
- incomplete blocking or, conversely, blocking that is too broad.
Tips to avoid these problems:
- Choose a blocking method that is appropriate for your environment and needs (often
mod_rewriteorSetEnvIfNoCaseare sufficient). - Group your rules together using the same logic, rather than duplicating blocks.
- Always test after each change.
- Document your rules clearly to facilitate maintenance.
In summary, keep things simple and consistent to ensure effective filtering with no surprises.
How to block IP addresses
There are several ways to block IP addresses in an .htaccess file. Here are some common approaches depending on the type of blocking you want.
Blocking Individual IP Addresses
Method 1: via mod_rewrite(recommended)
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^47\.79\.1\.100$
RewriteRule ^.* - [F,L]This method works at all levels of request processing, including less common HTTP methods. It is more reliable for strict blocking.
Method 2: Via the <Limit> Directives
<Limit GET POST>
order allow,deny
allow from all
deny from 47.79.1.100
</Limit>This method is simple, but it only applies to GET and POST requests, and may be ignored by some clients. It is also less flexible than mod_rewrite.
Blocking Multiple IP Addresses
RewriteCond %{REMOTE_ADDR} ^(47\.79\.1\.100|47\.82\.1\.100)$
RewriteRule ^.* - [F,L]You can combine multiple IPs in the same regular expression, separated by | (logical OR).
Blocking IP Ranges (CIDR or Subnets)
Blocking entire ranges requires adapting the method used.
Method 1: With mod_rewrite (Precise and Flexible)
You can extend the pattern to cover an entire IP range based on the starting bytes (for example: ^47\.79\. for a larger block).
RewriteEngine On
# Bloque toute IP commençant par 47.79.1.
RewriteCond %{REMOTE_ADDR} ^47\.79\.1\.
RewriteRule ^.* - [F,L]Method 2: With <Limit> (Simple, But Less Accurate)
<Limit GET POST>
order allow,deny
allow from all
deny from 47.79.0.0/16
</Limit>CIDR notation (/16) is not always supported in .htaccess. In this case, use prefix-based blocking instead.
<Limit GET POST>
order allow,deny
allow from all
deny from 47.82.0.
</Limit>Blocking by User-Agent (Unwanted Robots)
Some abusive robots (or crawlers) identify themselves via their HTTP User-Agent. It is possible to block these agents using .htaccess. Here are two common methods.
Method 1 (Recommended Method): SetEnvIfNoCase(Simple and Effective)
SetEnvIfNoCase User-Agent "GrandMechantRobot|BigBadRobot" bad_bot
Order Allow,Deny
Allow from all
Deny from env=bad_botThis method is simple, easy to read and generally compatible with modern servers. It allows you to easily block one or more unwanted User-Agent using an environment variable (bad_bot).
SetEnvIfNoCase: allows you to ignore case in agent names.- The field
‘GrandMechantRobot|BigBadRobot’contains the names of agents to be blocked, separated by|(logical OR). Deny from env=bad_bot: blocks requests marked with this variable.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} GrandMechantRobot|BigBadRobot [NC]
RewriteRule ^.* - [F,L]This method also works very well and offers finer control in complex contexts, or when combining multiple rules in the .htaccess file.
Blocking by IP and User-Agent (Unwanted Robots)
In some cases, you may want to block a request only if it comes from both a specific IP address and a malicious User-Agent.
Here is how to do it with mod_rewrite:
RewriteEngine On
# If the IP address matches...
RewriteCond %{REMOTE_ADDR} ^47\.79\.1\.100$
# ...and that the User-Agent is one of the following...
RewriteCond %{HTTP_USER_AGENT} (GrandMechantRobot|BigBadRobot) [NC]
# ...then we block the request (403 Forbidden)
RewriteRule ^.* - [F,L]How it Works
- The conditions are combined logically (AND): all
RewriteCondmust be true for the rule to apply. [NC]allows case to be ignored in agent names (for example,grandmechantrobotwill also be blocked).[F,L]means: return a 403 Forbidden code (F) and do not continue processing other rules (L= Last).
Tips and Recommendations
- This method is ideal for blocking a targeted bot that is attempting to bypass general protections.
- If you want to block the IP or the User-Agent (instead of ‘and’), you must use two separate rules.
Additional Tips
- Redirect certain IP addresses to a static page. Some administrators prefer to redirect certain unwanted robots or IP addresses to a customised static page (e.g. a humorous 403 page) instead of returning a simple error code.
- Use a CDN (Content Delivery Network) to automatically filter and mitigate some of the malicious traffic before it reaches your server.
- Refine your
robots.txtfile to guide legitimate bots and restrict access to sensitive parts of the site. - Use
mod_rewriteif you need flexible rules and effective blocking. - Always test your rules after making changes to avoid accidentally blocking yourself or preventing legitimate visitors from accessing your site.
- Do not overuse IP blocking. This type of filtering should be used in conjunction with an application firewall or other upstream security systems.
- Only block User-Agents identified as harmful, so as not to penalise legitimate traffic or block robots that are important for your web visibility.
- Avoid overly generic patterns that could block beneficial visitors or robots.
Test Your Rules Via a Terminal
To validate your User-Agent blocks, use a tool such as curl on a terminal with a custom agent, for example:
curl -A "GrandMechantRobot" https://votre-site.comThis allows you to verify that your .htaccess rules are working as expected.
Conclusion
Protecting your website from malicious bots, DDoS attacks and suspicious requests is essential for preserving server resources and ensuring security.
The .htaccess file offers a simple and effective solution for filtering unwanted IP addresses and User-Agent upstream, before they reach your applications.
By implementing these rules, you improve the reliability and security of your Apache site against malicious traffic.





