OpenStack Troubleshooting

2017年08月30日

This post will record the OpenStack troubleshooting, that will benefit the future.

1. Unable to ping and ssh instance

This is because the security group blocks the traffic, run the following cmds to add security group rules:

$ nova secgroup-add-rule default icmp -1 -1 0.0.0.0/0
$ nova secgroup-add-rule default tcp 22 22 0.0.0.0/0

2. How to create openrc for a tenant

$ openstack project create --description "Project for mars" wenchma
$ openstack user create --password 40bb211707b92bf96e3 mars
$ openstack role add --project wenchma --user xtrail user

# update tenant resources quota, -1 means no limit
$ neutron quota-update --tenant-id project_wenchma_id --network -1 --subnet -1 --port -1 --router -1 --floatingip -1 --security-group -1 --security-group-rule -1
$ nova quota-update  project_wenchma_id  --instances -1 --cores -1 --ram -1 --fixed-ips -1

3. maximum open files limited

  • Find open files limit per process: ulimit -n
  • Count all opened files by all process: lsof | wc -l or cat /proc/sys/fs/file-nr
  • Get maximum open files count allowed per system: cat /proc/sys/fs/file-max

Using ulimit -n to set per shell based maximum open files limit. To make this value persistent to edit /etc/security/limits.conf and restart the system:

*               soft    nofile          10240
*               hard    nofile          10240
root            soft    nofile          10240
root            hard    nofile          10240

check maximum open files limit for memcached service:

# ps -ef |grep memcached
root     11684  9824  0 08:31 pts/0    00:00:00 grep --color=auto memcached
memcache 22789     1  0 Sep01 ?        00:23:11 /usr/bin/memcached -m 64 -p 11211 -u memcache -l 0.0.0.0
# cat /proc/22789/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             64115                64115                processes 
Max open files            1024                 1024                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       64115                64115                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

check current open files for memcached service:

# lsof -p 22789 | wc -l
291

# ls /proc/22789/fd/ | wc -l
274

Memecached conf file:

# Run memcached as a daemon.
-d

# memory
-m 64

# Default connection port is 11211
-p 11211

# Run the daemon as root. The start-memcached will default to running as root if no
-u memcache

# Specify which IP address to listen on. The default is to listen on all IP addresses
# This parameter is one of the only security measures that memcached has, so make sure
# it's listening on a firewalled interface.
-l 127.0.0.1

# Limit the number of simultaneous incoming connections. The daemon default is 1024
# -c 1024

# Lock down all paged memory. Consult with the README and homepage before you do this
# -k

# Return error when memory is exhausted (rather than removing items)
-M

# Maximize core file limit
# -r

4. Neutron MTU

Neutron uses the MTU of the underlying physical network to calculate the MTU for virtual network including instance network interfaces. The underlying physical network with a 1500-byte MTU yields a 1450-byte MTU for instances using a VXLAN network with IPv4 endpoints. Using IPv6 endpoints for overlay networks adds 20 bytes of overhead for any protocol.

For details, refer to Configure MTU 1 2

5. Neutron L3 agent scheduler issue

2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent     pm = self._get_state_change_monitor_process_manager()
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 298, in _get_state_change_monitor_process_manager
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent     default_cmd_callback=self._get_state_change_monitor_callback())
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 301, in _get_state_change_monitor_callback
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent     ha_device = self.get_ha_device_name()
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 137, in get_ha_device_name
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent     return (HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN]
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent TypeError: 'NoneType' object has no attribute '__getitem__'
2017-06-22 04:47:56.925 1783 ERROR neutron.agent.l3.agent 
2017-06-22 04:47:57.731 1783 WARNING neutron.agent.l3.router_info [-] Can't gracefully delete the router c43a1743-e057-40c1-8ff0-dc9150b20357: no router namespace found.
# vi /usr/lib/python2.7/dist-packages/neutron/scheduler/l3_agent_scheduler.py

Need to apply the following patch:

# cat l3_agent_scheduler.patch
--- /usr/lib/python2.7/dist-packages/neutron/scheduler/l3_agent_scheduler.py    2017-03-02 00:32:34.973501604 -0600
+++ l3_agent_scheduler.py       2017-03-02 00:32:07.923591569 -0600
@@ -411,7 +411,7 @@ class AZLeastRoutersScheduler(LeastRoute
                 target_routers.append(r)

         if not target_routers:
-            return
+            return []

         return super(AZLeastRoutersScheduler, self)._get_routers_can_schedule(
context, plugin, target_routers, l3_agent)

6. RabbitMQ Max Open File

The default RabbitMQ max open files is 924 (ulimit minus 100), it is too less in openstack env.

  • Increase the num without restarting RabbitMQ
    # rabbitmqctl eval 'file_handle_cache:set_limit(65435).'
    
  • Increase RabbitMQ file descriptors limit permanently, modify rabbitmq.config file:
    [
        {rabbit, [
        		    {file_descriptors, [{total_limit, 65435}]},
                  {vm_memory_high_watermark, 0.6}
        ]}
    ].
    
    

    Note. On distributions that use systemd, the OS limits are controlled via a configuration file at
    /etc/systemd/system/multi-user.target.wants/rabbitmq-server.service:
    [Service]
    LimitNOFILE=65435

7. OpenStack cannot query instances, images, net resources

This id due to deleting the service tenant by mistakes.
Nova, glance, neutron etc. components will use service tenant for authorization.

$ openstack project create --domain default  --description "Service Project" service
+-------------+----------------------------------+
| Field       | Value                            |
+-------------+----------------------------------+
| description | Service Project                  |
| domain_id   | e0353a670a9e496da891347c589539e9 |
| enabled     | True                             |
| id          | 894cdfa366d34e9d835d3de01e752262 |
| is_domain   | False                            |
| name        | service                          |
| parent_id   | None                             |
+-------------+----------------------------------+

$ openstack role add --project service --user glance admin

$ openstack role add --project service --user nova admin

$ openstack role add --project service --user neutron admin

8. openvpn subnet routing

Setun a VPN server on openstack, client succeed to connect the VPN server, but cannot ping the other vms in this vpn subnet, need to add a NAT rule:

# iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE