Betreffend System - Redundancy improvement.
As part of the continuous improvement plan, our upstream network team intends to carry out configuration changes to increase reliability and enable support for ipv6 redundancy on the core network.
Each IP range will experience connection down time for 1-2 minutes during the maintenance.
Maintenance window:
Start: 23 October 2022 20:00 NZST
End: 24 October 2022 00:00 NZST
Betreffend Anderes - NZ Datacentre (Auckland)
Summary
The datacenter lost mains power at 10.19 am due to an area-wide power outage (https://help.vector.co.nz/address/1001381424). Mains power was restored in 1 hour 18 minutes at 11.37. Details have been requested from the power supplier.
The site has several UPS units intended to provide failover protection against mains faults. One was under maintenance due to a recently failed battery. The remaining units had a battery failure during this event. Those units had provided good service during a mains event just a few days prior. The datacenter generator came online within 10 seconds. Due to the UPS issues, the entire datacenter was without power for approximately 45 seconds.
After the power event some equipment did not restart normally, we restarted those.
After the power event multiple of our distribution switches were found to be unresponsive. It appears a failed memory module is the reason they did not come back. We replaced those switches. We re-patched the network cables. We deployed essential configurations to the new switches to restore connectivity.
The replacement switches are a newer model and are not prone to the bug that affected the previous switches.
The switch issue created delays in restoring access to servers in two cabinets. We worked with remote hands to recable servers. Though servers in those cabinets were online, they were unreachable until that was done.
After networking was restored, some services still had problems, mostly related to not having connectivity to the internet during the outage, we provided essential support to resolve those.
We saw some extended delays when getting remote hands to assist with fixes. We are in discussion on improving outcomes when dealing with remote hands and having clear expectations set.
The datacenter is running normally, but currently (as of 30th June) still has limited UPS redundancy. Further repairs are being performed as quickly as parts and resources permit.
We will be performing further post-event analysis and will update this ticket in the coming days with more details.
Detail
Wed, 6 Jul 2022 20:13 PM UTC: The datacenter has restored redundancy to the second of three UPS units, So all cabinets now have filtered and protected power feeds. The remaining UPS is still providing filtered power to secondary circuits. We are continuing work with the datacenter to improve things for all our services.
We anticipate further scheduled work will be needed over the next couple months, and we will continue to monitor things very closely. However for now this issue is considered resolved.
Tue, 5 Jul 2022 01:02 AM UTC: The datacenter has scheduled repairs on one UPS today, this is expected to restore UPS redundancy to the site for dual powered gear, and normal protection for single PSU gear.
We are working with customers to migrate services to redundantly powered hardware where possible, those affected will have been contacted directly and we will co-ordinate that work on a case by case basis.
Thu, 29 Jun 2022 08:00 AM UTC: All services are restored.
Tue, 28 Jun 2022 04:09 AM UTC: Servers are slowly coming back online. Remote hands are currently cabling further servers to the new switches and we are testing connectivity for those.
Tue, 28 Jun 2022 02:19 AM UTC: Some servers are back online, we are working to restore the remainder.
Tue, 28 Jun 2022 01:20 AM UTC: We are seeing issues with a distribution layer of switches not responding. We are currently working on moving affected server connections to alternate switching so that servers are reachable.
Tue, 28 Jun 2022 00:56 AM UTC: We have a couple of switches that are not responding. We are waiting on remote hands in the datacenter to provide further details.
@00:52 https://status.voyager.nz/ reports:
Mains power has been restored to the building. We are continuing to work alongside some customers to restore services. Jun 28, 12:37 NZST
Power to the Piermark Drive Datacentre is running off generators as a result of a power outage in the local surrounding area. Some racks are seeing issues with network connectivity or other power faults. Our team is on site with vendors working to resolve these issues. Jun 28, 11:37 NZST
Tue, 28 Jun 2022 00:14 AM UTC: We are still waiting on the datacenter team to respond with findings, they are still checking affected cabinets and services.
Mon, 27 Jun 2022 23:07 PM UTC: The datacenter has reported that there is an area outage on power (https://help.vector.co.nz/address/1001381424). Generators are online but failover to that backup service has not worked correctly in some cases. Continuing to work on restoring service to affected gear.
Mon, 27 Jun 2022 22:46 PM UTC: Some servers are back online and reachable. The remaining servers are urgently being investigated by remote hands.
Mon, 27 Jun 2022 22:36 PM UTC: There has been a further power failure at the datacenter. Support staff are working with remote hands to restore service and understand the full scope of the issue.
Server at the Auckland datacenter are unreachable. There appears to be a network issue as well as possibly a power issue. Investigating.
Betreffend Anderes - Cloudflare
An issue occurred with Cloudflare and has now been rectified. Issue outline from Cloudflare as below:
Cloudflare Service Issues
Monitoring - A fix has been implemented and we are monitoring the results.
Jun 21, 07:20 UTC
Identified - The issue has been identified and a fix is being implemented.
Jun 21, 06:57 UTC
Investigating - A critical P0 incident was declared at approximately 06:34AM UTC. Connectivity in Cloudflare’s network has been disrupted in broad regions.
Eyeballs attempting to reach Cloudflare sites in impacted regions will observe 500 errors. The incident impacts all data plane services in our network.
We will continue updating you when we have more information.
Betreffend Anderes - NZ Datacentre
Update 03:08 AM UTC: All services appear to be restored. Some devices were restarted, they were all connected with a single PSU. Some devices expected to be redundantly connected turned out not to be. We are working further with the datacenter team to restore power redundancy for those and make sure the power setup is generally safe.
Update @UTC 00:07/NZT 12:07: It appears some BGP routes have been lost after the power issue. We have put in place some static routes. And we are seeing most servers be reachable again. If you are still seeing an issue please submit a support ticket? We are continuing to troubleshoot.
Update @UTC 23:53/NZT 11:53: hosts are pinging within our network, but not from outside the datacenter. So those hosts will be up and running and we are now attempting to troubleshoot a network issue.
Update @UTC 23:29/NZT 11:29: some hosts are restarting and servers will be coming back online shortly.
Update @UTC 23:29/NZT 11:20: Datacenter report a UPS failure.
We are seeing that some devices lost power across different racks. Working with the datacenter to understand the scope and cause.
Betreffend System - NZ Datacentre (HOST3/4/5)
Thu, 12 May 2022 03:35 AM UTC: A fix has been applied, pending further checks things seem to be better now. We will provide an update once the details are confirmed.
Thu, 12 May 2022 02:35 AM UTC: There appears to be a specific route that is impacting some NZ only customers. And only for some traffic. We are continuing to work with our upstream to diagnose and resolve the cause.
There are a couple of reports from Wellington customers of issues reaching some services at our Auckland site. Details at this stage are limited but it could be related to a local ISP issue in Wellington. Investigating.
Betreffend System - NZ Datacentre
As part of our ongoing effort to utilise the latest server technology, we will be migrating some servers to a cloud and hybrid based infrastructure.
Benefits of this will include more frequent backups (such as hourly), advanced email security (anti-spam and anti-virus) and extremely fast performance.
Downtime is expected to be minimal, with most services switching over within a few minutes. For customers with external registrars / nameservers, we will contact you directly in order to arrange for the necessary DNS changes to be made prior to migration of your services.
We have scheduled a number of windows for this work. The schedule will be released as we work through each server.
1) plskwp.createhosting.co.nz - 2 hour window has been scheduled for this work, between 2pm - 4pm NZ on Thursday 14th April 2022. [COMPLETED]
2) wpplus.createhosting.co.nz - 2 hour window has been scheduled for this work, between 10am - 12pm NZ on Sunday 24th April 2022. [COMPLETED]
3) plskmage.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Monday 25th April 2022. [COMPLETED]
4) vm51.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Tuesday 26th April 2022. [COMPLETED]
5) vm51-2.createhosting.co.nz - 2 hour window has been scheduled for this work, between 9pm - 11pm NZ on Tuesday 26th April 2022. [COMPLETED]
6) vm49.createhosting.co.nz - 2 hour window has been scheduled for this work, between 12pm - 12pm NZ on Thursday 28th April 2022. [COMPLETED]
7) vm47.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Thursday 28th April 2022. [COMPLETED]
8) vm40.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Friday 29th April 2022. [COMPLETED]
9) vm55.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Saturday 30th April 2022. [COMPLETED]
10) vm55-2.createhosting.co.nz - 2 hour window has been scheduled for this work, between 10pm - 11pm NZ on Saturday 30th April 2022. [COMPLETED]
11) vm58.createhosting.co.nz - 2 hour window has been scheduled for this work, between 3pm - 5pm NZ on Sunday 1st May 2022. [COMPLETED]
12) vm65.createhosting.co.nz - 2 hour window has been scheduled for this work, between 7pm - 9pm NZ on Sunday 1st May 2022. [COMPLETED]
13) vm67.createhosting.co.nz - 2 hour window has been scheduled for this work, between 10am - 12pm NZ on Tuesday 3rd May 2022. [COMPLETED]
14) vm61.createhosting.co.nz - 2 hour window has been scheduled for this work, between 10am - 12pm NZ on Thursday 5th May 2022. [COMPLETED]
15) vm41.createhosting.co.nz - 2 hour window has been scheduled for this work, between 10pm - 2pm NZ on Friday 6th May 2022. [COMPLETED]
16) vm48.createhosting.co.nz - 2 hour window has been scheduled for this work, between 2pm - 4pm NZ on Friday 6th May 2022. [COMPLETED]
17) vm59.createhosting.co.nz - 2 hour window has been scheduled for this work, between 4pm - 6pm NZ on Friday 6th May 2022. [COMPLETED]
18) vm43.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Friday 6th May 2022. [COMPLETED]
19) vm69.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Saturday 7th May 2022. [COMPLETED]
20) vm70.createhosting.co.nz - 2 hour window has been scheduled for this work, between 10pm - 11pm NZ on Wednesday 11th May 2022. [COMPLETED]
21) vm66.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Wednesday 11th May 2022. [COMPLETED]
22) vm62.createhosting.co.nz - 2 hour window has been scheduled for this work, between 8pm - 10pm NZ on Fri 13th May 2022. [COMPLETED]
23) vm66-2.createhosting.co.nz - 2 hour window has been scheduled for this work, between 2pm - 4pm NZ on Thursday 12th May 2022. [COMPLETED]
24) vm68.createhosting.co.nz - 4 hour window has been scheduled for this work, between 9am - 10am NZ on Sat 14th May 2022. [COMPLETED]
25) vm59.createhosting.co.nz - 4 hour window has been scheduled for this work, between 9am - 10am NZ on Sun 15th May 2022. [COMPLETED]
Betreffend System - NZ Datacentre (HOST5)
Update 23:25: host5.createhosting.co.nz has been remotely restarted. We are not seeing any kernel or disk I/O issues and it appears to be operating normally. All VM's are operating normally.
Update 22:52: host5.createhosting.co.nz (ID: 1219) is not responding normally. We are continuing to investigate and have initiated a remote hands reboot at the datacenter.
Issue 22:15: We have been notified of an outage with one of our XEN host servers (host5) affecting multiple VM's / VPS's (abnormally high disk I/O). We are investigating with urgency.
Affected VM's:
vm40.createhosting.co.nz
vm41.createhosting.co.nz
vm43.createhosting.co.nz
vm47.createhosting.co.nz
vm48.createhosting.co.nz
vm49.createhosting.co.nz
vm51.createhosting.co.nz
vm55.createhosting.co.nz
vm58.createhosting.co.nz
vm59.createhosting.co.nz
vm67.createhosting.co.nz
vm68.createhosting.co.nz
vm70.createhosting.co.nz
We will update this notice with any details.
Betreffend System - NZ Datacentre
Update: Tue 8 Mar 2022 20:24 PM UTC: This change has been completed. Our upstream provider has ongoing urgent investigations with their vendors to source additional equipment so as to restore full redundancy to the network. A new notice will be opened once they have provided have a progress plan for that.
Update: Mon 7 March 05:00 UTC: The network team intends to perform a network change to enhance the redundancy with our upstream provider. Some connections may experience packet loss for a few seconds as traffic is forwarded via a new path.
Maintenance window:
Start: 8 March 2022 19:00 (UTC)
End: 8 March 2022 23:00 (UTC)
Update: Sat 19 Feb 03:15 UTC: We had some packet loss case reported. The network core which was causing packet loss has been isolated once more. This workaround has mitigated the issue and network is stable again. The various network engineers and teams will be working on an updated plan during normal hours.
Update: Fri 18 Feb 20:30 UTC: One of the network cores has been upgraded successfully. Partial redundancy has been restored. We noticed an issue with one of the upstream links, mitigation approach applied.
Update: Wed 16 Feb 03:00 UTC: The network team intends to perform firmware update on network core for bug fixing. Some connections may experience packet loss for a few seconds as traffic is forwarded via a new path. After the change, full redundancy to the core will be restored.
Maintenance window:
Start: 18 Feb 2022 18:00 (UTC)
End: 18 Feb 2022 23:00 (UTC)
Update: Fri 04:19 UTC: We have disconnected one of the core routers, and packet loss has reduced/gone. If anyone is still seeing any issues, feel free to provide us a traceroute from your server: mtr --report -n -c50 174.136.11.74
Update: Fri 04:10 UTC: We are investigating a few problem reports.
Update: Fri, 01:15 AM UTC: An uplink interface has now been disabled on one of the routers. This appears to have mitigated the issue. We have seen the network issues improve. Please let us know if you are still seeing any issues. We will be investigating further what the issue was...
Update: Fri, 11 Feb 2022 00:19 AM UTC: We are continuing to investigate, with further escalation to our upstream provider.
Update: Thu, 10 Feb 2022 23:45 PM UTC: The affected device has been reloaded, however we are now aware of packet loss from some locations. Investigating further.
We noticed an issue with one of the core switches in the Auckland DC.
There shall be no impact/outage to the traffic at the moment per our redundant network design.
The network team is investigating the issue.
Betreffend System - MageVPS Servers
Update: All servers have been patched / secured against this issue.
There is a privilege escalation security issue with Polkit. This is being called "PwnKit" (a.k.a. CVE-2021-4034).
PwnKit is a local privilege escalation, meaning you are vulnerable if you have people logging in through ssh or a compromised website or other software. In conjunction with other vulnerabilities it could be used to allow your server to be exploited. The escalation of privileges is a problem if, for example, you have a website that is exploitable or if you permit someone you would not trust with root access to put anything on your server.
NIST have given the bug a 7.8 rating which indicates a serious issue.
All servers are in the process of being patched. Exact schedules will not be released. No downtime is expected.
Betreffend Server - PLESK MAGENTO
We will be performing scheduled maintenance on our Plesk Lite and Plesk Mage servers. Some services will be unavailable during this window (10-90min).
Start: 3/12/2021 23:00 NZT
Betreffend System - NZ Datacentre
We have detected one of the devices in our redundant core network in Auckland has stopped responding normally.
Network traffic has failed over to the other core device, and is still passing normally.
We are continuing to investigate.
Betreffend System - NZ Datacentre
Summary:
Datacenter staff intend to carry out fibre work in the Auckland Datacenter. The nature of this work is to reroute the fibre cables between network devices.
Customer impact:
Some customers may experience packet loss or slightly increased latency as traffic is routed to alternate paths. There should be minimal/no impact otherwise.
Maintenance schedule:
Start: 23 Sep 2021 09:45 (NZT)
End: 23 Sep 2021 11:00 (NZT)
Betreffend System - NZ Datacentre
We are receiving down alerts for VPS's on host5.createhosting.co.nz. This appears to be due to high load, possibly due to a DDOS.
This is affecting multiple Magento and WordPress sites, and does include mail services for those sites.
We are currently investigating and working on this issue.
UPDATE: We will be restarting individual VPS's to isolate the DOS attack. These will be coming online (staggered) shortly.
UPDATE: VPS's and services are all back online.
Betreffend System - NZ Datacentre
Situation: We just received an alert that there are some security issues in the version of Xen we are using. In the past we have performed live patches to mitigate these without impacting customers, however that is not an option in this case. The severity of the issues warrants we perform an immediate update. The security issues are embargoed until 2020-12-15 12:00 UTC so we cannot release details about the issues before then. We must also fix the issue before the embargo date.
We will be updating our host servers (requires a VM restart) to apply Xen security hardening, and bug fixes.
Customer action required: None at this stage. Now would be a good time to ensure that your VM works as expected after a restart (that services are automatically started on boot up, etc).
Who is affected: Customers running VMs on our Xen hosting stack including VMs on shared hosts and VMs on dedicated hosts.
Detail: Engineers will be updating our host server software stacks to the latest version of our Xen virtualization software, the most recent Intel microcode patches, and a more recent Linux kernel for the host. To do this we will be stopping the VMs on host servers, updating the software and restarting the servers. VMs will then start up as normal.
Note that we will not be making any changes to the VM file systems.
The software update process typically takes 20-60 minutes per server if all goes well. 45minutes is pretty typical. Hosts with lots of VMs take towards the longer end of that range. We reserve a 2 hour window for the upgrades.
Timing: host1.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
host2.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
host3.createhosting.co.nz - 2020-12-15 19:00:00 - Completed
host4.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
host5.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
The updates will start within 0-2 hours of the stated start time.
Betreffend System - NZ Datacentre
Situation: We just received an alert that there are some security issues in the version of Xen we are using. In the past we have performed live patches to mitigate these without impacting customers, however that is not an option in this case. The severity of the issues warrants we perform an immediate update. The security issues are embargoed until 2020-12-15 12:00 UTC so we cannot release details about the issues before then. We must also fix the issue before the embargo date.
We will be updating our host servers (requires a VM restart) to apply Xen security hardening, and bug fixes.
Customer action required: None at this stage. Now would be a good time to ensure that your VM works as expected after a restart (that services are automatically started on boot up, etc).
Who is affected: Customers running VMs on our Xen hosting stack including VMs on shared hosts and VMs on dedicated hosts.
Detail: Engineers will be updating our host server software stacks to the latest version of our Xen virtualization software, the most recent Intel microcode patches, and a more recent Linux kernel for the host. To do this we will be stopping the VMs on host servers, updating the software and restarting the servers. VMs will then start up as normal.
Note that we will not be making any changes to the VM file systems.
The software update process typically takes 20-60 minutes per server if all goes well. 45minutes is pretty typical. Hosts with lots of VMs take towards the longer end of that range. We reserve a 2 hour window for the upgrades.
Timing: host1.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
host2.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
host3.createhosting.co.nz - 2020-12-15 19:00:00 - Completed
host4.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
host5.createhosting.co.nz - 2020-12-14 19:00:00 - Completed
The updates will start within 0-2 hours of the stated start time.
Betreffend Anderes - NZ Datacentre
External monitoring detected a brief network issue in Auckland and then it came back online. We are investigating and will post updates as we get them.
Betreffend System - NZ Datacentre
Mon, 7 Sep 2020 00:59 AM UTC: We are seeing packet loss to some VM's at the Auckland datacenter, affecting International traffic (outside of NZ). We are investigating.
Update Mon, 7 Sep 2020 01:17 AM UTC: Our upstream reports there was a DDoS. We are no longer seeing any issues to VM's in Auckland. We will continue to monitor for further issues.
Betreffend Anderes - All Servers
We have recently discovered that some of our customers may have experienced outages in their Magento or WordPress services caused by a Sectigo Root certificate expiration on May 30 2020. As a consequence of the expiration, various services were not able to initiate secure connections with other services or could not be accessed by other services due to errors. If your website or other online service uses other applications or integrations such as APIs, CURL, OpenSSL, etc. you may have experienced problems or outages. If you have had any service disruptions or errors or your visitors use browsers older than 2015 and report issues, you will need to take action to update the service.
We are working through all affected servers and websites to reissue certificates where those have been issued by us prior to 1 June 2020, and to apply fixes to our internal systems. This will take some time to work through due to the number of sites and services on our network.
An overview of the issue is outlined here: https://www.namecheap.com/blog/sectigo-ssl-certificate-root-expiration-issue/
Betreffend System - NZ Datacentre
Tue 31/3/2020 19.00-23.59 NZT
We will be rolling out a series of updates to all servers. Server will be rebooted as part of the update process and services will be unavailable for a brief period (5 -10min) during those times. Some disruption to services such as mail and web is to be expected.
Betreffend Server - PLESK MAGENTO
Mon 31/03/2020 16.00-16.10
We will be rebooting plskmage.createhosting.co.nz for some changes to take effect. Expected downtime is 5 minutes. Some disruption to services such as mail and web is to be expected during this brief window.
Update: Completed 16:14
Betreffend Server - PLESK WP
Mon 31/03/2020 13.00PM-14.30PM
We will be performing emergency maintenance on plskwp.createhosting.co.nz. We will be moving all sites and services on this server to a new host server on the same network. The server will be rebooted a number of times as part of the process and services will be unavailable for a period of time (0.5 hours - 2 hours). Due to the type of work, some disruption to services such as mail and web is to be expected.
Updates:
31/03/2020 13:24 - Server move started and in progress
31/03/2020 13:45 - Completed
Betreffend Server - PLESK MAGENTO
Mon 30/03/2020 12.30PM-2.30PM
We will be performing emergency maintenance on plskmage.createhosting.co.nz. We will be moving all CloudLinux based sites and services on this server to a new host server on the same network. The server will be rebooted a number of times as part of the process and services will be unavailable for a period of time (0.5 hours - 2 hours). Due to the type of work, some disruption to services such as mail and web is to be expected.
Updates:
30/03/2020 12:30 - Server move started and in progress
30/03/2020 13:35 - Server move completed
Betreffend System - HOST4
We are currently seeing a high load on some VM's on HOST4. This is due to related maintenance on the same network. This will be causing slow page load times and in some cases, cause the VPS to become unresponsive. Our engineers are aware of this and working on a fix. ETA 10min.
Update 30/3/2020 13:11 - Issue resolved.
Betreffend Server - PLESK MAGENTO
Thur 24/10/2019 11.00AM-11.10AM
We will be upgrading plskmage.createhosting.co.nz as part of a series of updates that we are rolling out to all servers. We will be adding PHP 7.0 - PHP 7.3 support. The server will be rebooted as part of the upgrade process and services will be unavailable for a brief period (5 -10min) during those times. Due to the amount of changes, some disruption to services such as mail and web is to be expected.
Betreffend Server - PLESK MAGENTO
Sat 31/08/2019 20.30PM-21.30PM
We will be upgrading plskmage.createhosting.co.nz to Plesk Onyx as part of a series of updates that we are rolling out to all servers. The server will be rebooted a number of times as part of the upgrade process and services will be unavailable for a brief period (5 -10min) during those times. Due to the amount of changes, some disruption to services such as mail and web is to be expected.
Betreffend System - Client Billing & Support System
We will be changing our account system from Saasu to Xero during the 31 Mar 2019 EOY changeover period. This will involve multiple connected internal systems, as well as our externally facing Client Billing and Support System. Access to our Client Billing and Support system may be restricted / inaccessible during the changeover. It is expected to take a few days to transition over.
Please reach out to us directly during this time if you require any service or accounting support.
Betreffend Server - PLESK MAGENTO
Customers have reported high load across two servers in our Auckland datacenter. This is affecting mail and web services for the following servers:
This has been escalated to the datacenter staff and is being investigated. It is possible that a SSD drive is failing.
Update 11/03/2019 09:30am: One of the SSDs is showing higher latency than the other in the same RAID pair. The drive however reports healthy. Dropping it from the array has improved performance. Testing will be performed on the drive, then it will be reseated before trying to re-add to the array. Performance may be degraded while undergoing maintenance.
Update 11/03/2019 16:22pm: The SSD has been tested, reseated and added back into the RAID array. Performance has been restored.
Betreffend System - NZ Data Center
Our monitors have picked up an high packet loss at our Auckland datacenter.
This has been escalated to the datacenter staff.
Update Fri, 26 Oct 2018 16:25 PM UTC: The datacenter network team have identified the issue and are working with their upstream provider to resolve the issue.
Update Fri, 26 Oct 2018 18:05 PM UTC: We are still working on this issue.
Update Fri, 26 Oct 2018 18:49 PM UTC: The datacenter team are working to resolve a second issue. They have multiple engineers on-site. There is no ETA at this time.
Update Fri, 26 Oct 2018 19:21 PM UTC: The packet loss issue has been resolved. We are continuing to monitor the situation closely.
Update Fri, 26 Oct 2018 21:44 PM UTC: There are some concerns about the stability of our upstreams network, and that we may have packet loss issues again unless we take urgent action. We are moving forward a planned migration to our upstreams new core network. This will happen at 11:30AM NZST. There is an expected 1-5 minute outage at this happens. Once complete we do not expect any further issues.
Update Fri, 26 Oct 2018 22:46 PM UTC: The migration to the new core was completed as planned. There were a few subnets with routing issues that have now been resolved. We will be closely monitoring the network for further issues.
Update Mon, 29 Oct 2018 00:13 AM UTC: The network has remained stable since the migration. Please open a ticket if you are seeing any issues at all.
Betreffend System - NZ Data Center
At 5:58AM NZST we saw machines at the Auckland datacenter become unreachable.
At around 6:25AM NZST we saw these devices come back online.
The datacenter has informed us that this was caused by a power issue, causing some of their network devices to go offline.
An alert from the fire suppression system caused the three separate supplies servicing those network devices to go offline. Datacenter technicians believe this was due to either a PDU or power spike. The PDUs were replaced and service was restored.
Betreffend Anderes - NZ Data Center
We are currently experiencing high packet loss in Auckland. We are investigating.
Update @ 0300 NZDT: Datacenter staff are still investigating this issue.
Update @ 0320 NZDT: The issue has been resolved.
Update @ 1030 NZDT: The outage was due to a fault on an upstream router. We are seeking further information from the data center.
Update @ 1400 NZDT: We have been advised the failure lay with a core device, on which a line card, route processor and backplane all apparently failed at the same time. The affected device is being repaired under emergency maintenance to restore redundancy to the core network before the weekend. No customer impact from the work is expected, aside from some increased latency while traffic flows reconverge. The full NOC team as well as contract engineers are on site to deal with any issues as they arise.
Betreffend System - NZ Data Center
Summary: We will be updating our host servers (requires a restart) to apply Xen security hardening, performance improvements and bug fixes.
Who is affected: Customers running VMs on our Xen hosting stack including VMs on shared hosts (WordPress and Magento Lite customers) and VMs on dedicated hosts.
Detail: Our host server software stacks will be updated to the latest version of our Xen virtualization software, the most recent Intel microcode patches to address Intel CPU security issues, and a more recent Linux kernel for the host. To do this we will be stopping the VMs on host servers, updating the software and restarting the servers. VMs will then start up as normal.
Note that we will not be making any changes to the VM file systems.
The software update process typically takes 20-60 minutes per server if all goes well. 45minutes is pretty typical. Host servers with lots of VMs take towards the longer end of that range. We reserve a 2 hour window for the upgrades.
Timing: 6:30AM - 8:30AM NZDT 10 October 2018
The updates will start within 0-2 hours of the stated start time.
Betreffend System - NZ Datacenter
Wed, 29 Aug 2018 00:08 AM UTC: All access should be restored at this point. A report is pending.
Tue, 28 Aug 2018 22:52 PM UTC: We understand there are a small number of peering routes that may still be affected, this is being investigated.
Tue, 28 Aug 2018 22:19 PM UTC: Traffic appears to be back to normal. Our upstream confirmed there was an issue, which they beleive has been resolved. They are still analysing the event..
We are seeing some connectivity issues to our Auckland location. Investigating.
Betreffend Server - PLESK MAGENTO
13/02/2018 14:00 NZDT.
Engineers will be making a critical config change to the XEN stack for security, performance and stability. A series of reboots will be required as part of this work. We apologise for any disruption.
Expected downtime 5 - 30 min.
Betreffend Server - PLESK MAGENTO
A server reboot is scheduled for 12/02/2018 NZST at 19:30. Expected downtime 5 min.
Update: 19:40 - Server did not come back up. Investigating.
Update: 20:05 - Server online. Maintenance will be scheduled for tomorrow (13/02/2018) to investigate cause and mitigate.
Betreffend Server - PLESK MAGENTO
03/02/2018
We will be performing an update to plskmage.createhosting.co.nz, starting at 0300UTC today (16:00 NZST). The server will be rebooted a number times as part of the updates. We expect the updates to take 10-15 minutes to complete and test.
Update 16:25 NZST: the server is currently not booting into the new kernel and we are working to resolve the issue as soon as possible
Update 16:40 NZST: the server is now operating normally
Betreffend System - NZ XEN HOST SERVERS
Issue
There is an issue affecting all Intel CPUs, referred to in the media as “Spectre” & “Meltdown” or “Kernel Side-Channel Attacks”. These articles provides a good overview:
For customers on bare metal dedicated servers, we will be installing a kernel with the page table isolation patches.
For customers on virtual machines we are awaiting patches for the virtualization software we use per https://xenbits.xen.org/xsa/advisory-254.html. For Managed VPS customers, we have implemented the 4.14 kernel with the page table isolation patches.
The situation is changing on a day-to-day basis. It is a critical issue. And there are various patches and security updates being developed to address the problem. We are closely monitoring the situation and will be doing all we can to ensure the security of our customers.
Update Schedule
10/01/2018 - 11/01/2018
We will performing updates to all servers on 10/01/2018 - 11/01/2018. These will be scheduled mostly for the morning, starting at 6:45AM NZST.
The software will be updated on the host server. We will stop the VPS's running on that host server, restart the server so that gets the new kernel, then start up your VPS. This typically takes 20-60 minutes.
12/01/2018
Starting at 0030UTC today (13:30 NZST) we will shutdown plskmage.createhosting.co.nz for a short time to make some changes. It will then boot up with a newer kernel. This server runs CloudLinux which is in part incompatible with the newer Xen stack (that was introduced due to the Meltdown/Spectre vulnerabilities) without some changes. After the change, the server will be rebooted using a shim that will allow it to boot the latest CloudLinux kernel but using hardware virtualization.
13/01/2018
We will be performing a series of incremental updates to plskmage.createhosting.co.nz, starting at 0030UTC today (13:30 NZST). The server will be rebooted a number times as part of the updates. We expect the updates to take 2-5 hours to complete and test.
14/01/2018
We will be performing an update to plskmage.createhosting.co.nz, starting at 2300UTC today (12:00 NZST). The server will be rebooted a number times as part of the updates. We expect the updates to take 10-15 minutes to complete and test.
03/02/2018
We will be performing an update to plskmage.createhosting.co.nz, starting at 0300UTC today (16:00 NZST). The server will be rebooted a number times as part of the updates. We expect the updates to take 10-15 minutes to complete and test.
Please contact us if you require more information (support@createhosting.co.nz).
Betreffend System - All Servers
27/01/2018 NZST: We will be performing a number of critical security updates to all servers, which will require server reboots. Expected downtime 2-5min.
Betreffend System - NZ XEN HOST SERVERS
A new Xen Security Advisory (XSA) has been posted that requires us to perform updates to some of our Xen host servers. To do this we need to stop the VMs running on affected hosts, update the host servers and restart the host servers and their VMs.
These security advisories impact only some customers running VMs on some of our Xen-based host servers. They do not impact any customers with 'bare metal' dedicated servers. Your VM will be unavailable during the update. We expect the updates to take 45 minutes (typically) to 2 hours (less commonly) per host.
The XSAs will be publicly released by the Xen project team on Tue May 2 UTC. We must complete this maintenance before then.
host3.createhosting.co.nz - 2017-05-02 06:30:00 [Completed]
host4.createhosting.co.nz - 2017-05-02 06:30:00 [Completed]
host5.createhosting.co.nz - 2017-05-02 06:30:00 [Completed]
host6.createhosting.co.nz - 2017-05-01 06:30:00 [Completed]
host7.createhosting.co.nz - 2017-05-01 06:30:00 [Completed]
host8.createhosting.co.nz - 2017-05-01 06:30:00 [Completed]
host9.createhosting.co.nz - 2017-05-01 06:30:00 [Completed]
Betreffend Server - PLESK WP
Sat 09/09/2017 11.00AM-2.00PM
We will be upgrading plskwp.createhosting.co.nz to Plesk Onyx as part of a series of updates that we are rolling out to all servers. The server will be rebooted a number of times as part of the upgrade process and services will be unavailable for a brief period (5 -10min) during those times. Due to the amount of changes, some disruption to services such as mail and web is to be expected.
Betreffend System - NZ Data Center
Mon, 21 Aug 2017 21:55 PM UTC: We are working with our Auckland provider to add additional redundancy for Australian/Sydney traffic.
Mon, 21 Aug 2017 08:20 AM UTC: Network is stable
Mon, 21 Aug 2017 05:55 AM UTC: There was a further network blip for Auckland when the affected router was restarted.
Mon, 21 Aug 2017 05:25 AM UTC: A line card was replaced upstream,
Mon, 21 Aug 2017 05:03 AM UTC: Sydney traffic should be back to normal, the offending source has been isolated, traffic is not on alternate routes.
Betreffend System - NZ XEN HOST SERVERS
A new Xen Security Advisory (XSA) has been posted that requires us to perform updates to some of our Xen host servers. To do this we need to stop the VMs running on affected hosts, update the host servers and restart the host servers and their VMs.
These security advisories impact only some customers running VMs on some of our Xen-based host servers. They do not impact any customers with 'bare metal' dedicated servers. Your VM will be unavailable during the update. We expect the updates to take 45 minutes (typically) to 2 hours (less commonly) per host.
host3.createhosting.co.nz - 2017-03-28 06:30:00 [Completed]
host4.createhosting.co.nz - 2017-03-29 06:30:00 [Completed]
host5.createhosting.co.nz - 2017-03-30 06:30:00 [Completed]
host6.createhosting.co.nz - 2017-03-31 06:30:00 [Completed]
host7.createhosting.co.nz - 2017-04-1 06:30:00 [Completed]
host8.createhosting.co.nz - 2017-04-0206:30:00 [Completed]
host9.createhosting.co.nz - 2017-04-03 06:30:00 [Completed]
Betreffend System - Auckland, NZ DC
This notice is a heads up of low impact work that has been planned as part of regular power maintenance at the Auckland datacenter.
Event 1: UPS Maintenance & Generator Test
Performing UPS maintenance & a battery replacement on UPS B followed by a generator test, no impact is expected during this time.
Maintenance Window Start : Wed 18th Jan 2017 10:00AM NZDT
Maintenance Window End : Wed 18th Jan 2017 05:00PM NZDT
Duration: 7 Hours
Event 2: Vector Power Maintenance
Vector (power supplier) will be performing routine scheduled testing of their equipment, during this time HDDC will remove itself from the grid and be running on generator, no impact is expected during this time.
Maintenance Window Start : Sat 21 Jan 2017 08:45AM NZDT
Maintenance Window End : Sat 21 Jan 2017 1:00PM NZDT
Duration: 4h25m
Betreffend Server - PLESK WP
We will be rebooting plskwp.createhosting.co.nz just after 17:00 on Jan 17th 2017. Expected downtime 5-10min.
Betreffend System - UK Data Centre
Upgrades to core switching will be performed at the London datacenter. There will be two 1 hour windows starting at the times below.
During these windows there will be 2-5 small outages of 15-60 seconds as traffic is moved around between devices. No changes will be required on customer servers.
Window 1 starts Friday, 9 December 2016 12:00 Midnight UTC:
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20161209T0000
Window 2 starts Monday, 12 December 2016 12:00 Midnight UTC:
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20161212T0000
Betreffend System - NZ Data Centre
We are seeing packet loss to servers in Auckland. Investigating and will post updates here.
Update: data center suspect a faulty line card. Swapping over to secondary connections.
Update: issue seems resolved for now. Additional work may need to be scheduled for Friday night NZT or Sunday before closing this. It will take some investigation prior.
Betreffend System - NZ Data Centre
We have had several reports oF DNS issues relating to Auckland servers. We are investigating with urgency. The issue has also been escalated to our upstream provider.
Update: Additional testing indicated one of the upstream resolving name servers was having issues. After fixing that things gradually cleared. The issue has been resolved and we'll be taing steps to ensure that it is mitigated for future occurances.
Betreffend Anderes - NZ Data Center - HOST3
A host server is experiencing unexpected high load due to disk activity. This will be affecting sites on the following servers:
plskmage.createhosting.co.nz
plskwp.createhosting.co.nz
westmere.createhosting.co.nz
vm13.createhosting.co.nz
vm17.createhosting.co.nz
vm19.createhosting.co.nz
vm20.createhosting.co.nz
This should be resolved within 10 min, as the process completes. We aplogise for the inconvenience.
Update: resolved
Betreffend Anderes - UK Data Center
We are seeing servers in London not pinging.
Update at UTC 0922: We have contacted the data center. They are aware of the issue and are working on the problem. No details as of yet.
Update at UTC 0934: The data center are reporting a power issue. We are working to get further details.
Update at UTC 0943: The data center reports there is one part of the data center with a power issue (not sure of details yet). This is setting off the fire alarms. At this time there appears to be no fire. The data center are working to provide further details.
Update at UTC 1059: Most servers have come back online, some have not. We are chasing up the data center to get those back up and running ASAP. MySQL replication resync where applicable will be scheduled to minimise downtime.
Update UTC 0345: The failed access switch has been replaced. All servers are responding.
Please note: due to the recent earthquakes in NZ, calls to the Create Hosting emergency number may not get through. Please leave a message.
Betreffend Anderes - All Servers
There exists a bug in Linux kernels where users can 'escalate' their privileges. e.g. users with an regular user account can become root. e.g. exploitable webapps (like older versions of Wordpress or Magento).
For more information about the issue see: https://dirtycow.ninja/
To resolve the issue, we will need to install an updated, patched kernel. We will systematically upgrade all servers which will require a restart of your server with the fixed 4.4.26 (or similar) kernel. Please expect a short downtime of a few minutes while this is performed.
Betreffend System - NZ / UK Servers
We have received a critical Xen Security Advisory (XSA) that requires updates to to be performed to our Xen host servers. To do this we need to stop the VMs running on these hosts, update the host servers and restart the host servers and their VMs.
These security advisories impact all customers running on shared or VPS's on our Xen-based host servers. They do not impact any customers with 'bare metal' dedicated servers.
Your email and web services will be unavailable during the update. We expect the updates to take 45 minutes (typically) to 2 hours (less commonly) per host.
Scheduling
NZ Datacentre 1: 2016-09-03 07:00:00 - 08:59 NZT
NZ Datacentre 2: 2016-09-04 07:00:00 - 08:59 NZT
UK Datacentre 1&2: 2016-09-04 10:00:00 - 11:59 NZT
Where possible we will attempt to schedule updates at times that are off peak at the host server's location. Due to the logistical demands of the update, the schedules may not be ideal to all customers. We don't have too much choice unfortunately.
The XSAs will be publicly released by the Xen project team on Sep 8. We must complete this maintenance before then (before the vulnerabilities become public).
We apologise for the inconvenience, and are committed to ensuring the utmost security.
Betreffend System - NZ / UK Servers
We have received a critical Xen Security Advisory (XSA) that requires updates to to be performed to our Xen host servers. To do this we need to stop the VMs running on these hosts, update the host servers and restart the host servers and their VMs.
These security advisories impact all customers running on shared or VPS's on our Xen-based host servers. They do not impact any customers with 'bare metal' dedicated servers.
Your email and web services will be unavailable during the update. We expect the updates to take 45 minutes (typically) to 2 hours (less commonly) per host.
Scheduling
NZ Datacentre: 2016-07-23 02:00:00 - 09:00 NZT (Completed)
UK Datacentre: 2016-07-23 10:00:00 - 11:59 NZT
Where possible we will attempt to schedule updates at times that are off peak at the host server's location. Due to the logistical demands of the update, the schedules may not be ideal to all customers. We don't have too much choice unfortunately.
The XSAs will be publicly released by the Xen project team on July 26th. We must complete this maintenance before then (before the vulnerabilities become public).
We apologise for the inconvenience, and are committed to ensuring the utmost security.
Betreffend System - Multiple Servers
We will be implementing updates for PayPal TLS1.2 requirements which will require server restarts. Downtime exptected to be 15-30min. This will be performed between 17:30 - 23:59 on 17/07/2016 NZST.
Betreffend System - NZ Datacenter
Several VPS's failed to respond for a few minutes. The DC staff provided the following details:
International network performance was degraded for a period of several minutes as a host on our network was subjected to an international DDoS attack.
Unfortunately our automated systems did not automatically block this attack; this will be investigated and resolved.
The connection has been fully restored now.
Betreffend System - London DC
We are experiencing light packet loss at our London datacenter. Slow connections or even timeouts might be expected.
We will update this ticket as soon as we receive details from our datacenter staff.
Update: This seems to have resolved itself and we will continue to monitor closely.
Betreffend System - NZ & UK Servers
Issue: There are some security issues affected Xen hosts (and thus the VPS's that run on those hosts) perhttp://xenbits.xen.org/xsa/
The details are embargoed meaning that the details of the issue cannot be made public. Our upstream datacenter is on a 'predisclosure' list so we are privy to the details of those issues (and the fixes). However we are not permitted to disclose any details to our customers (while the issues remain embargoed).
Impact: This affects VPS's hosted with us (including those on dedicated host servers). It excludes regular dedicated servers not running the Xen hypervizor.
Course of action: We will be applying those fixes. To do that we will stop all VPS's on a Xen host, install the updated packages and restart the host server. VPS's will come up after the reboot. Typically it will take 20-60 minutes of downtime to complete this process.
Timing: Because of the nature of the problem we will be doing restarts around the clock on affected hosts. We will attempt to do those, where possible, outside of business hours at the host server data center location. Updates are scheduled at the following times:
NZ Servers: 15/12/2015 9PM NZST
UK Servers: 17/12/2015 9PM GMT
Betreffend System - NZ & UK Servers
Summary:Over the period ofOctober 22nd - 24th, at 6PM nightly NZT (+0 to 2hrs), we will be updating our host servers (requires a VM restart) to address a Xen security issue before it becomes public on October 29.
Customer action required: None
Who is affected: All customers, all services - including Magento Optimised VPS's on our Xen hosting stack (NZ and UK).
Detail: The Xen project report an XSA-148 security issue. Details of the issue and fix are available only to the Xen security pre-disclosure list. Details of the advisory and any fix are embargoed until October 29 to anyone (including our customers) not on that list. We are unable to provide further details about the nature or severity of the issue until after that date (when the issue becomes public).
Our host server software stacks will be updated to address this issue. To do this we will be stopping the VMs on host servers, updating the software and restarting the servers. VMs will then start up as normal.
Note that we will not be making any changes to the VM file systems (the issue is with the hypervizor software, not the VM).
The software update process typically takes 20-60 minutes per server if all goes well. 45minutes is pretty typical. Hosts with lots of VMs take towards the longer end of that range. We reserve a 2 hour window for the upgrades.
Betreffend System - HOST3
12 Oct 2015 22:15: We will be performing an LVM snapshot across all VM's on this HOST server. While no downtime is expected, there is a risk of increased IO which may briefly impact performance. Expected duration 20min.
Betreffend Server - PLESK MAGENTO
29/09/2015 NZT: plskmage.createhosting.co.nz is scheduled for a reboot to complete an LVM disk resize. Estimated downtime 5 min.
Betreffend System - London Data Centre
We are seeing multiple VPS's down at the London datacenter. We are investigating.
Update Wed, 23 Sep 2015 09:44 AM UTC: The datacenter network team reports they had a partial outage to a small part of their network. Servers are coming back online now.
Update Thu, 24 Sep 2015 02:00 AM UTC: The datacenter team reports that the root cause was a DDoS against that part of the network. During the DDoS an affected router did not fail over properly, causing a small outage. The problem with the router fail over has been corrected and should not reoccur.
Betreffend System - NZ DATA CENTRE
05/08/2015 14:45 NZST: We will be briefly restarting the following servers to detach resize and then reattach backup storage devices.
plskmage.createhosting.co.nz
plskwp.createhosting.co.nz
Normally this does not require a restart, however in this case we hit an issue with the hotplug which requires us to perform a manual reboot.
Downtime is expected to be around 2-3 min per restart.
Betreffend Server - PLESK WP
01/08/2015 (20:00 - 22:00 NZT) - We will be moving all sites on plskwp.createhosting.co.nz to our new platform based on SSD. Expected downtime is 1h15min.
Betreffend Anderes - NZ Data Centre
Monday, 3 August 2015 10:00 p.m - Tuesday 4am NZT
Our network provider is making some configuration changes to network feeds that may be visible via a few short periods of packet loss to some services. Customer-facing impact should be limited to one or two traffic disruptions for RSTP reconvergence, lasting only a few seconds each.
Tuesday, 11 August 2015 1:00 a.m. - 3am NZT:
Work is being carried out physically near our core network feed to move some large network gear. No service impact is expected as the work does not impact any device actually carrying our traffic. However it should still be considered as an at-risk window for the duration of the work.
------------------------------------
Monday, 27 July 2015 10:00 p.m - Tuesday 4am NZT
As part of ongoing efforts to improve network reliability, our network provider is making some hardware changes to its network. This stage will involve the repositioning of selected physical links on the core network. Due to our redundant network design, we do not expect that any of our services will be affected, however the network should be considered at risk during this window.
Update: This maintenance was completed without mayor issues.
Betreffend Server - WESTMERE
Mon, 13 Jul 2015 20:39 PM UTC: The issue has been resolved. This affected all servers at the Auckland datacenter and would have caused loss of connectivity via International and Domestic routes. No changes to individual servers was required to restore connectivity.
We have the following report from the datacenter, there was some planned maintenance that should not have caused any issues:
We have been preparing the final stages of upgrades to our core network, work to complete the second part of the migration to the new Cisco platform was completed successfully last Monday night. At 19:27 as we were preparing for the final stage of the migration due to kick off at 22:00, our network engineers accidentally pushed a configuration change that adversely affected two of the old core devices that were scheduled to be retired tonight.
This inadvertently resulted in some Datacentre Customers IP transit services being unavailable.
Remedial work and post Incident Action Items:
Further fault diagnosis is being undertaken with the engineers responsible.
We are confident that the upgrades we are performing to our network going forward will enhance overall resiliency and redundancy.
We are disappointed that this has happened as our core network uptime over the past 12 months has been excellent, this has affected our high standards of service delivery to you our customer.
Update Mon, 13 Jul 2015 08:36 AM UTC: Servers are responding again, we are awaiting details about the issue
Update Mon, 13 Jul 2015 08:13 AM UTC: Problem has been detected at the edge network core systems, engineers are working on the issue, there is no ETA as yet.
Update: Datacenter engineers picked up on the issue and are working on it as we speak.
We noticed some servers were unreachable in NZ data center. We're investigating.
Betreffend System - Auckland datacentre
Update: Datacenter reported a problem with one of the upstream providers, it was solved quickly and network returned to normal.
Thu, 16 Jul 2015 02:02 AM UTC:We are experiencing some connectivity issues at Auckland, we are investigating
Betreffend System - London Data Centre
Sat, 20 Jun 2015 13:44 UTC:We have picked up network connectivity issues to some of our data centres. We are currently investigating and will update as soon as we have more info.
Sat, 20 Jun 201514:29 UTC: We have detected a DOS attack affecting some servers within our networks. Engineers in various data centres are working to mitigate that.
Sat, 20 Jun 201514:40 UTC: Affected IP/s have been null routed to resolve issues. Our London network seems to also be recovering as well.
Sat, 20 Jun 201522:37 UTC: The null route was lifted, however the attack resumed, causing a small downtime. We are investigating how to mitigate the issue. Network should be up again, null route has been reinstated.
Sun 21 Jun 2015 00:15 UTC: the attack seems to be directed to ns1.zonomi.com, it has been null routed, ns3.zonomi.com has been null routed too for the time being by the data center, but we are working to see if it is possible to remove that null route. We are investigating alternatives to try to debug and mitigate the issue.
Sun 21 Jun 201505:04 UTC: ns1 and ns3 continue to be null routed, we continue to investigate the issue.
Sun 21 Jun 201505:50 UTC: ns1 back in service. Attack has abated.
Status: Zonomi is the target of a denial of service attack. We are working to mitigate the issue. Only some of its name servers are affected.
Betreffend System - Auckland Data Centre
Mon, 8 Jun 2015 10:21 AM NZST: We have been advised that there is an ongoing ddos attack affecting international transit levels. Additional capacity has been enabled to restore service, while engineers work on mitigating the attack. Connectivity appears to be normal again at this stage.
We are currently investigating an issue where packet loss is occuring on some international routes to Auckland servers. National traffic is not affected as far as we know.
Betreffend System - Auckland Data Centre
In order to mitigate an issue which has been identified in edge switching equipment, the Auckland datacenter network team will be migrating ports and Layer 3 interfaces associated with some services during the above maintenance window. This should cause little to no impact to end customers.
Maintenance Window: 2015-06-03 02:00-05:00 NZST
Betreffend System - NZ Data Centre
We have been advised of work by our network upstream to update security capabilities to their network core:
"As part of our continuous improvement programme, we have a requirement to deploy a service pack upgrade to our aggregation network devices as instructed by our vendor. This requires a reload and as such there will be a small outage to your services* while the nodes are reloaded."
The following schedule has been advised for each task.
Change 1: 3 May 2015 23:00 NZST - 3 May 2015 05:00 NZST
Change 2: 3 May 2015 23:00 NZST - 4 May 2015 05:00 NZST
Betreffend System - All NZ & UK VPS's
Issue: There are some security issues affected Xen hosts (and thus the VMs that run on those hosts) perhttp://xenbits.xen.org/xsa/.
The details are embargoed meaning that the details of the issue cannot be made public. We are on a 'predisclosure' list so we are privy to the details of those issues (and the fixes). However we are not permitted to disclose any details to our customers (while the issues remain embargoed).
Impact: This affects VPS's hosted with us. It excludes regular dedicated servers not running the Xen hypervizor (bare metal).
Course of action: We will be applying those fixes. To do that we will stop all VPS's on a Xen host, install the updated packages and restart the host server. VPS's will come up after the reboot. Typically it will take 20-60 minutes of downtime to complete this process.
Timing: Because of the nature of the problem we will be doing restarts around the clock on affected hosts. We will attempt to do those, where possible, outside of business hours at the host server data center location.
We will be updating this notice with more information as our planning and the rollout of the fixes progresses.
Update @UTC Mar 4 1320: We are scheduling host servers for the upgrade.We will email you when your VPS is added to the schedule. Please bear with us as we attempt to get these hosts patched as quickly as possible.
Betreffend Server - PLESK WP
23/2/2015 14:35: The server will need to be rebooted in order to mount and dismount a backup image. We anticipage a small downtime window of 5 minutes during this time.
Betreffend System - All NZ and UK servers
We will be systematically updating and rebooting all NZ and UK VPS and dedicated servers over the next few hours to implement citical security updates. This relates to the CVE-2015-0235 glibc gethostname 'ghost' vulnerability. Anticipated downtime is 5-10min, depending on the time to reboot. Please notify us support@createhosting.co.nz if you would like to schedule your VPS reboot for after hours.
Update 04:00 30/01/2015: Maintenance completed.
Betreffend System - London DC
We're noticing some packet loss in our London data center. Currently affecting HOST2 and HOST3. Investigating the issue.
Update: 20:43 NZT / 07:43UTC - There was a DOS (denial of service) issue with the network that is now be resolved.
Betreffend System - Auckland Data Centre
We're experiencing some high packet loss on multiple servers in our NZ data center. We're investigating now and will update this notice with further information as we receive it.
Update: 0855UTC - Network is back to normal. Awaiting root cause analysis from DC.
Betreffend System - vm9,vm11,vm13,vm13-w1,vm13-w2,vm13-d1,vm16,vm17,vm18
Reported: VPS's on the host3.createhosting.co.nz host server are reporting high I/O. Investigating.
Update: 10:00AM NZST - A failing SSD has been hot swapped out and the new drive is now syncing with the raid array
Update: 11:30AM NZST - We are still getting reports of fluctuating I/O on VPS's for this host.
Update: 13:00PM NZST - We are looking at options for changing all SSD's, replacing the existing enterprise Samsung SSD's which appear to be causing issues.
Update: 13:30PM NZST - We will be replacing the existing enterprise Samsung SSD's with enterprise Intel SSD's. A maintenance window and timeframe will be advised as soon as we have futher information.
Update: 15:00PM NZST - The first new Intel SSD is installed and the raid array is syncing. I/O issues should now be resolved, but performance will be slightly degraded until the sync has completed. We expect that to take 24-48hours.
Update: 24/12/2014 NZST - The last SSD is now installed and fully synced.
Betreffend System - Evolo Outgoing Mail Server
Reported: Issues logging in to evolomail.com - redirect loop. Mail sending OK.
Update: Issue is now resolved.
Betreffend Server - WESTMERE
Update Sat, 27 Sep 2014 18:05 PM UTC: Issue now resolved.
Update Sat, 27 Sep 2014 17:55 PM UTC: One IP range (103.6.213.0/24) seems to be having issues still. We have contacted our datacenter to get that resolved.
Update Sat, 27 Sep 2014 17:46 PM UTC: All servers are back up, if you are having any issues please let us know.
Update Sat, 27 Sep 2014 17:25 PM UTC: We are seeing a large number of hosts down after this network maintenance, we are investigating this issue.
The Auckland data center is making some changes to their core network setup at the Albany data center this Saturday the 27th September from 10PM NZST till 5AM NZST 28th September.
They have been planning and implementing these changes for past few weeks. The changes will require a no more than a few 2 to 15 minute outages between 10PM and 5AM.
host-uk.createhosting.com has a failing SSD drive, which will need to be replaced. This particular server does not feature hot swap, so we will require some downtime for this maintenance.
This affects the following VPS's
V1-V6
This has been scheduled for Sat, 27 Sep 2014 01:00 AM UTC. Staff in London data center will take the server offline for an estimated 20-40 minutes. We will update this notice with any news we have.
Our London datacenter will be performing scheduled service on the UPS units that back the A power feed to the datacenter. This is a standard service that is carried out to verify that the UPS is in good operating shape.
"Works are not expected to effect service; however whilst one of the units is under maintenance there will be reduced power redundancy and an At Risk declaration. Customers will remain on UPS protected supplies with generator backup throughout the maintenance periods protecting against interruption to the mains supply.
Power to the A feed is not expected to be interrupted at any time as the building infrastructure allows us to transparently switch the input source for the A feed to the B side, throughout the A side maintenance."
This has been scheduled for Tuesday 23rd September, commencing at 2200Hrs and ending by 0500Hrs on Wednesday 24th September 2014. (GMT)
Betreffend Anderes - NZ Data center
We are experiencing networking issue at the NZ Data center. We are investigating and will update this notice when we have more information.
Update: Mon, 26 May 2014 12:23 PM UTC: We are currently seeing severe packet loss for international connections to NZ. We are working with our datacenter to diagnose the source of that loss.
Update: Mon, 26 May 2014 12:42 PM UTC: We identified a server that was flooding our international connectivity. That server has been removed from the network pending further investigation. Normal network service should be restored now.
Betreffend Anderes - AUCKLAND NZ DC
We are currently seeing some network congestion on our upstream connection. This is causing some users to experience slower than normal connections, and in some cases a bit of packet loss.
We are working with our upstream provider to resolve this asap, and restore normal service.
Update: Tue, 13 May 2014 05:17 AM UTC: Our upstream has confirmed there is some congestion, particularly for national (NZ) users. They have added extra capacity to reduce the impact of that traffic, which appears to have resolved the issue with our current traffic levels.
Betreffend Anderes - Managed VPS (Magento Optimised) + Managed VPS (w/ Plesk Option)
The Heartbleed Bug is a serious vulnerability in the popular OpenSSL cryptographic software library CVE-2014-0160 (Common Vulnerabilities and Exposures) is the official reference to this bug. It can result in private keys (e.g. ssl keys) being exposed.
Further reading
http://heartbleed.com/
http://arstechnica.com/security/2014/04/critical-crypto-bug-in-openssl-opens-two-thirds-of-the-web-to-eavesdropping/
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0160
Distro specific information
Debian: https://security-tracker.debian.org/tracker/CVE-2014-0160 (versions prior to Debian 7 Wheezy are unaffected)
Centos: http://lists.centos.org/pipermail/centos-announce/2014-April/020248.html (all version prior to 6.5 are unaffected)
Ubuntu: http://askubuntu.com/questions/444702/how-to-patch-cve-2014-0160-in-openssl (all versions prior to 12.04 are unaffected)
What versions of the OpenSSL are affected?
Status of different versions:
• OpenSSL 1.0.1 through 1.0.1f (inclusive) are vulnerable
• OpenSSL 1.0.1g is NOT vulnerable (this is the bug fix released 7th of April 2014)
• OpenSSL 1.0.0 branch is NOT vulnerable
• OpenSSL 0.9.8 branch is NOT vulnerable
Bug was introduced to OpenSSL in December 2011 and has been out in the wild since OpenSSL release 1.0.1 on 14th of March 2012. Patch OpenSSL 1.0.1g was released on 7th of April 2014 which fixes the bug. This means the bug has been out in the wild for over 2 years, but is only now becoming widely known and all VPS servers need to be tested.
Version checking
Online checking site here: http://filippo.io/Heartbleed
To see which openssl version you are using, run the command:
openssl version
Check the version for being an old (unaffected) version. Or having an April 2014 build date (even if the version does not match a version of a fully fixed version, sometimes the vendor will just backport a specific patch).
For a completely accurate test use the command line tool here
https://github.com/FiloSottile/Heartbleed
Scheduled Maintenance
OpenSSL will be patched for all affected servers. All services linked against the older (actually and ironically, the "newer" vulnerable version of OpenSSL) will need to be restarted to apply the newly patched OpenSSL version. These services include Apache, PHP-FPM and mail services (imap4-ssl, pop3-ssl, smtp, smpt2). There will be a very brief service interruption as these services are restarted. We will perform a clean shutdown and VPS restart to ensure that all services are restarted.
Update: 09/04/2014 13:00 (NZST)
All Managed VPS servers have been checked and patched if affected. For all SSL certificates purchased through Create Hosting Ltd, we will be automatically be re-keying those SSL's and reinstalling on the VPS. All that you will need to do is approve the SSL issuance request (once you receive the email from GeoTrust). Due to the large amount of SSL's to re-issue, please be advised that it may be a few days before the certificate is re-keyed and installed.
Betreffend System - Varnish / Cluster
We are currently experiencing an issue with our NZ based HOST3 server. This is affecting all Varnish based servers and clusters:
vm11.createhosting.co.nz
vm13.createhosting.co.nz
vm13-d1.createhosting.co.nz
vm13-w1.createhosting.co.nz
vm13-w2.createhosting.co.nz
Technicians are working on this now and hope to have it resolved ASAP.
Update: 18:15NZT We have identified a bug in XEN (http://xen.crc.id.au/bugs/view.php?id=25). We will perform a software upgrade on the host server to address this and restart each VPS. We anticipate this to take 30 minutes.
Update: 18:52NZT All services back online
Root cause analysis:
It looks like the host server got hit by a bug that we have recently discovered in the host OS. This affects secondary storage on VPS's, and is described in detail on http://xen.crc.id.au/bugs/view.php?id=25
Adding a new VPS to the host triggered this rare event due to the host server being under extra load - it is consistent with the issue. Engineers had already tested a fix of our hosting stack that resolves this. To install the fix meant stopping all applicable VPS's and rebooting the host.
For stability purposes we made the call to go ahead and do that whilst the filesystem was already locked up and VPS's offline. The VPS's were cleanly shutdown and technicians started the upgrade process. There was a fair amount of work involved for that, so it did take a while to complete. The host is of course back online and all VPS's are operating as normal with the fix in place.
Betreffend Server - WESTMERE
Technicians are currently investigating an outgoing mail issue that is preventing queued mail from being sent. Once we have cleared the issue, the mail will be automatically sent. We hope to have the issue sorted as soon as possible.
Update: This issue has now been resolved and queued mail is now being delivered. This took a while to fix as all mailboxes had to be disabled in order to rebuild all configuration. We will investigate further to mitigate future occurances.
Betreffend System - Auckland Data Center (NZ)
Update: Thu, 9 Jan 2014 04:11AM UTC: Resolved
Update: Thu, 9 Jan 2014 04:01 AM UTC: Seeing this packet loss recur. Investigating.
Update: Thu, 9 Jan 2014 01:12 AM UTC: We have been advised that the initial issue was caused by a BGP routing problem with one of the datacenters upstreams. The fix was to drop that upstream until the issue was resolved. The more recent packet loss was caused by trying to reinstate that to service, however the problem continued and that peer was dropped again.
Update: Thu, 9 Jan 2014 00:29 AM UTC: Seeing further packet loss, particularly affecting international traffic, investigating.
Update: Thu, 8 Jan 2014 22:00 AM UTC: Waiting on an incident report from the datacenter.
The Auckland datacenter is experiencing some network issues (eg high packet loss, intermittent connections, etc). We are going to investigate and ask the datacenter staff to provide an detailed report.
Betreffend System - Auckland Data Center (NZ)
On 19 Nov 2013 at 8-10pm NZDT we are planning an update to our core network which will remove an intermediate routing device and improve our main feed redundancy. That will involve a staggered move of our redundant core uplinks to different routers at the datacenter.
The impact of the change should be minimal, you may see a couple of brief (<1min) periods of packet loss as routes fail across our redundant feeds and stabilize. The longer window is to allow for time to test after each uplink is moved.
Benefits will include transparent upgrade capabilities and adding the final planned layer of physical network redundancy within the datacenter. We also anticipate seeing slightly less connection latency after the move.
If you have any questions please get in touch with our support team. Note that due to the requirements of this work we are not able to reschedule.
UPDATE: Tue, 19 Nov 2013 21:03 PM UTC: This work was successfully completed.
Betreffend System - London Data Center
At 1335GMT, London experienced a routing issue that caused an outage in our providers own aggregation core router that is hosted with the datacenter. Not all customers were affected by this outage, only the ones that are routed via the core router.
Datacenter technicians suspected that the issue was with the core router, so as a preventative measure it was rebooted. Nothing appears to be wrong with that router however, and it is working perfectly fine.
It was later discovered by datacenter engineeers that this issue was a routing glitch caused by the internal routing at the datacenter and upstream providers, and was not related to the core router.
Update 1420GMT: The issue has now been solved. We apologise for the inconvenience caused and we are working with the datacenter to mitigate future issues like this.
Betreffend System - London Data Center(s)
Network techs will be conducting emergency maintenance on UK transit networks that will affect most if not all London based servers.
Impact is expected to be minimal, but you may notice a few brief periods of stalling (packet loss) during this window.
The work will occur between 05:00 - 07:30 GMT on 13/11/2013.
Update @1345GMT: the connection to London seems to be fluctuating. We are investigating the issue now.
Update: the connection issue is unrelated to the scheduled emergency maintenance - moved to a separate issue
Betreffend System - VPS Host Server (host2.createhosting.co.nz)
As part of a network upgrade and expansion, the VPS Host Server for vm1.createhosting.co.nz - vm13.createhosting.co.nz will be reconfigured, shutdown and restarted. All Magento Optimised VPS's on this server will be systematically shutdown and restarted.
We have allocated a 1 hour window for this work, to cover the worst case scenario. We only anticipate this to take around 10 min however.
Update: this has now been completed.
Betreffend System - Auckland Data Center
We are noticing issues at the Auckland datacenter which is resulting in severe packet loss. We are looking into this now.
Update: Fri, 25 Oct 2013 00:07 AM UTC: This appears to be resolved now, we are still researching the cause.
Update: Fri, 25 Oct 2013 00:08 AM UTC: A DDoS targeted at a server on our network was found, and the affected server null routed to restore normal service.
Betreffend System - Auckland Data Center
As part of our ongoing work to make our network more reliable, several network ranges need to be migrated from an upstream router to a redundant pair of routers on our provider's own core.
Benefits of this will include a more reliable response to device failure, and better network usage monitoring for those ranges.
Visible impact of the configuration change is expected to be minimal, with only a few seconds of outage while arp caches update.
We have scheduled a 1 hour window for this work, between 8pm-9pm NZ on Tuesday 15th October (7-8am UTC).
Update @16 Oct 0151 UTC: This work cannot be performed this evening. We will work to do this Tuesday evening next week at the same time.
Update Tue, 22 Oct 2013 06:39 AM UTC: Just confirming we will be starting on this in about 30 minutes.
Update @0724: Changes have been implemented. Servers are all pinging. Please let us know if you are seeing any 'odd' networking issues in the Auckland DC.
We are seeing a power issue at the London data centre. This has caused some servers to go offline. The core network is currently coming back up and we'll update this notice with further info as the servers come online.
UPDATE @14:33 PM UTC: Servers are now back online.
Betreffend System - London Data Center
We are seeing some network connectivity problems in the London datacenter, we are investigating.
Update: Fri, 11 Oct 2013 20:07 PM UTC: The datacenter has confirmed an issue with an upstream network. They are working on resolving that ASAP.
Update: Fri, 11 Oct 2013 20:57 PM UTC: It appears a switch has failed upstream of us, affecting the entire network. Engineers are working to replace that now.
Update: Fri, 11 Oct 2013 21:08 UTC: A switch is being replaced. Expecting progress in 20 minutes...
Update: Fri, 11 Oct 2013 21:29 UTC: Connectivity has been restored for now on our backup feed.
Update: Fri, 11 Oct 2013 21:38 PM UTC: While connectivity has been restored to most of our servers, there are several still affected by this issue. The failed switch is still being replaced and once that is back we should have those last three servers back online.
Update: Fri, 11 Oct 2013 22:54 PM UTC: The replacement switch has been racked, however the network is still not working. Network technicians are trying to isolate the cause of that now. In the meantime we are requesting an interim solution to restore connectivity for those last few servers.
Update: Fri, 11 Oct 2013 23:09 PM UTC: The last three servers are now responding.
Update: Sat, 12 Oct 2013 00:47 AM UTC: We are seeing some short periods of connectivity loss to those three servers. This is likely due to work restoring the network to its previous state. Normal service should be fully restored soon.
Update: Sat, 12 Oct 2013 03:30 AM UTC: The network appears to be stable. We have received the following initial root cause analysis report from the datacenter. We are expecting a full report from the datacenter on Monday.
"The underlying root cause of tonight's issues has been identified as a fibre break. This caused excessive flapping on the E1 transit ring which resulted in several of the ring switches rebooting. Unfortunately one of the switches did not recover - this then led to severe network instability which proved difficult to pinpoint."
Betreffend System - London Data Center(s)
We are currently experiencing connectivity issues in London. We are investigating and will update this notice with any news.
Update from data center @0852UTC: We are aware of a network related issue and the Network Team are currently investigating this.
Update @0911UTC: Network seems to be back to normal in the Maidenhead DC. Host1 is currently unavailable - affecting v1,v2,v3 VPS's. Still waiting for update from the network team.
Update @0959UTC: Edge servers on this network are now accessible. Awaiting root cause analysis report from datacenter.
Root Cause Analysis Report
On the morning of monday the 7th of October. We became aware of a routing instability that affected some servers, this issue resulted in the loss of connectivity to some dedicated servers.
The issue had two symptoms: Loss of traffic to the primary IP of the server and in some edge cases, further loss of traffic to any secondary assigned IPs. We have now after extensive engineering and diagnostic work with our router vendor identified this issue as a problem in the firmware that one of our routers was running.
We have deployed this fix and based off further testing we can say this issue has been fixed. Once again we apologise for the inconvenience caused by this issue and thank you for your patience with us during the problem.
Betreffend Server - WESTMERE
We have scheduled an emergency maintenance of westmere.createhosting.co.nz to begin at 23:00 NZST 24/09/2013. We anticiapte this to take 30-60min. This is phase one of a major update. This phase will allow us to clone the existing architecture and perform the update in a safe environment to check for any issues so that they can be mitigated in the actual/live update. During this window, the VPS will be paused and some services will not be available.
Update: This will be repeated at 23:00 NZST 26/09/2013.
Update: This will be repeated at 23:00 NZST 28/09/2013.
Update: This will be repeated at 23:00 NZST 02/10/2013. Immediately after, our system will undergo maintenance for a scheduled major upgrade. Please allow 60 min - 120 min. The VPS and services will be paused or disabled at certain stages through the upgrade process.
Update: The upgrade process failed and server had to be rolled back to the backup performed at approx 23:00 02/10/2013. There was no alternative unfortunately. We will investigate the cause and we plan to repeat the procedure or an alternative in the near future.
Betreffend Server - WESTMERE
A scheduled LVM disk partition resize is taking longer than expected. While this usually completes without disruption to services, in this instance the server has become unresponsive. We believe this is likely due to high I/O. Technicians are investigating.
Update: Resize scripts triggered a full disk backup as a precautionary measure prior to the resize. This has caused high I/O. We will need to let the backup complete and resize complete. Estimated time remaining 10 min.
Update: Backup completed, disk resized. Load has dropped to acceptable levels. We will investigate resize scripts to mitigate this for future occurances of this systems administration task.
We have scheduled an emergency maintenance of host2.westmere.createhosting.co.nz to begin at 07:00 NZST 19/07/2013. We anticiapte this to take 1 hour. This affects the following VPS's:
We have a reported outage to the VPS Host server for vm1.createhosting.co.nz - vm7.createhosting.co.nz. This is currently being investigated.
Betreffend Server - WESTMERE
Scheduled Wed 29th May 2013 02:00 - 03:00 NZST
A network equipment upgrade is scheduled for the Auckland data center that we use. Extra space is being added for more VPS hosts and more dedicated servers. New switches have been purchased, installed and configured. The new setup improves redundancy and enables us to support more features (e.g. excluding intra-data center traffic from your data transfer allowance) and adds capacity (when you are out of network ports you cannot add much gear!).
The new gear is already in place, but we will need to physically move the server cables over to the new switches. This requires an unplug at the old switch and a replug at the new switch. No changes are required to the servers themselves.
A one hour maintenance window has been set aside, between 2am - 3am NZ time on Wed 29th May. We anticipate the actual impact will only be about 2-5 minutes of network disconnection while cables are moved and network routes update.
Betreffend Server - WESTMERE
One of our servers is experiencing high load and consequently slower response times. We will investigate the cause and update this notice.
Update: A number of high I/O cronjobs (scheduled tasks) were set to run at the same time as the weekly backups (LVM snapshot). Combined with a DDoS Wordpress brute force attack, it potentially had a compounding effect on load. We hope to have mitigated that on all levels and don't expect that to recur. All services were restored at approx 17:35.
Betreffend System - Auckland Data Center
Our monitors picked up a 5 minute outage on connectivity to our NZ servers. All services are working normally again now and we are investigating; awaiting a report from DC staff.
Update: This appears to be a routing issue and is scheduled to be addressed on 29/05/2013
Betreffend System - All Servers (UK and NZ)
We will be applying an URGENT security update to all VPS and Dedicated servers that will require the servers to be rebooted. We have only just recently been made aware of the security advisory and have decided to implement immediately.
Downtime will be around 5 minutes and the work will start around 11AM NZST 17/05/2013.
Betreffend System - Host Servers (HOST2 and HOST3)
We are seeing some connectivity problems on London servers right now. We are investigating...
Update: Wed, 15 May 2013 05:09 AM UTC: Service has been restored. We are still trying to establish what occurred.
Update: Wed, 15 May 2013 05:19 AM UTC: A network technician accidentally null-routed some key IPs, that caused parts of our network to become unresponsive. We have asked them to make some records so this does not recur.
XEN Hypervisor Upgrade
Scheduled 5AM CET (3PM NZT) 26/04/2013
Fri, 26 Apr 2013 09:29 AM UTC: Server is back up with a similar but slightly different, updated hosting stack as well as general package updates. All VPSs should be responding again now, with the original network configuration.
Fri, 26 Apr 2013 09:16 AM UTC: We have managed to track down an old copy of the original host packages, and will test those now.
Fri, 26 Apr 2013 07:46 AM UTC: Testing some different kernels, unfortunately we have not been able to locate the exact kernel used before, so are having to find a workaround.
Fri, 26 Apr 2013 06:00 AM UTC: Experiencing problems connecting to the console, and slow turnaround on remote hands. Also seeing this server take a long time to start.
host.westmere-uk.createhosting.com was scheduled for an upgrade, however after the upgrade there were a number of issues encountered, including the new kernel not starting normally. We have opened this notice to track and document the issue.
Betreffend Server - WESTMERE
Host server "host2.westmere-nz.createhosting.co.nz" is not responding normally. Technicians are investigating and may have it restarted. This will affect all vm*.createhosting.co.nz VPS servers.
Betreffend Anderes - Auckland Datacenter / NZ/AU
Betreffend Server - WESTMERE
Sun 24th Feb 16:36 NZST
Betreffend Server - WESTMERE
Betreffend System - Auckland Datacenter / NZ/AU
Betreffend System - Auckland Datacenter / NZ/AU
There is currently a routing issue with at the Auckland datacenter. Packets stop at the Vocus upstream provider. Unfortunately Vocus holds the fiber coming into NZ, so failover to another provider in this instance isn't possible.
Betreffend System - Auckland Datacenter
It looks like upstream connectivity failed a short time ago. Normal services appear to be restored. We are still investigating the root cause.
Update: Network has been failed over to a secondary peer to restore network service. Apparently a device at Sky tower (Auckland, NZ) stopped forwarding traffic.
Betreffend System - Auckland Datacenter
Our provider in Auckland will be performing some maintenance which will cause some brief network outages. The nature of the work is VLAN maintenance required on core routers.
Betreffend System - Auckland Datacenter
We have been informed by our datacenter that there is a emergency maintenance change, affecting the core network for the servers. The scheduled window is on 27th November 2012 10:00pm to 28th November 2012 2:00am NZT.
The expected downtime is up to 15 minutes to perform the change.
Betreffend Anderes - Auckland Datacenter
There are connectivity problems at the Auckland datacenter. We are contacting the datacenter staff and working on resolving the issue as soon as possible.
Betreffend System - UK Data Center (Maidenhead)
Betreffend Anderes - Auckland Datacenter
Betreffend System - Data Center
Betreffend Server - WESTMERE
Betreffend Anderes - Auckland Datacenter
Period: 01/07/2012 05:00 - 01/07/2012 06:00 NZST (30/06/2012 17:00 - 30/06/2012 18:00 UTC).
Update @ 9:53:29 AM NZST: Connectivity has stabilized now, and routes from the NZ networks are also checking out ok now. We will continue monitoring, please open a ticket if you are having any trouble.
Update @ 8:57:39 AM NZST: Possibly seeing some routing issues from some NZ networks to certain IPs, we are checking this now.
Update @ 8:28:09 AM NZST: Datacentre reports issue should be resolved and we should not see anymore packet loss.
Update @ 8:12:30 AM NZST: Datacentre reports still seeing some routing issues, working on it.
Update @ July 1, 2012 7:52:03 AM NZST: We are seeing some packet loss in Auckland after this maintenance, we are investigating now.
===
The datacenter is making fundamental changes to it's Core Network, they have been planning and implementing these changes for the past 3 weeks, and expect no more than a few 10 minute outages.
The maintenance is designed to increase fault tolerance and insure that failover to different upstream providers is working reliably. Also, new international capacity is being configured. The changes are expected to improve latency and reliability of the network.
Betreffend Server - WESTMERE
Our monitoring has alerted us to several machines not responding at the Auckland datacenter. We are investigating.
Update Tuesday, June 12, 2012 12:15:32 AM UTC: Servers are responding again. There appears to have been a networking issue. We will post additional details once we receive an outage report from the datacenter.
Betreffend Anderes - London Datacenter
Update: Pulsant have now issued the following RFO:
"Pulsant would like to sincerely apologize for the power disruption that affected our customers at the Maidenhead site on Tuesday 29th May. We would also like to apologize and make a correction to the timeline stated in the report. The actual power disruption occurred at 22:08 and not 22:38 as previously stated. A further apology must also be made for the confusion that resulted from the initial incident being reported as a Network related issue which was factually incorrect and was in fact a Power Supply related issue as a result of the UPS failure. There are a significant number of lessons learnt in terms of improved communications.
The UPS units at the Maidenhead data center facilities 2 & 3 are approaching 5 years old and manufacturer recommended maintenance including schedule components replacement alongside routine offline health checks. Two separate maintenance windows were scheduled according to company policy, to be executed under Pulsant data center management responsibility and supported by certified and approved external UPS engineers from UPSL Limited. All replacement parts were genuine manufacturer certified components.
During the first of these maintenance windows which started at 21:00 on May 29th 2012, a number of UPS components were replaced to meet manufacturer guidelines. A UPS unit was removed from service by placing the UPS unit into bypass. The parts were successfully replaced and tested offline. The UPS unit was returned to the parallel UPS solution at Maidenhead. However, on the return to service, there was a failure of one of the replaced components which disabled the control function and disabled the load management capability of the UPS unit. This resulted in all racks connected to this UPS unit losing power immediately. All other units on the parallel string remained active on UPS protected support. An immediate decision was taken to place the electrical load into bypass to enable the affected clients to be recovered but all Maidenhead 2 & 3 data centres would be operating on unprotected mains power supply. This was very poorly communicated.
The engineers immediately identified the failed component and replaced this. However, on validation testing the engineers confirmed there had been potential damage to the UPS control board which is critical to management of the unit. This was tested and found to be faulty. The replacement of this component has to be to an exact specification given it must communicate with all the other UPS units in the parallel string. If there are any incompatibilities, this could result in UPS control issues. As the control board required replacement at the exact specification, investigations proceeded offsite but confirmed that one was not immediately available. It became apparent a suitable board would have to be shipped from the manufacturer in Switzerland. After risk assessment, a decision was taken to upgrade all the control boards. However, due to the impact of having to replace all the control boards simultaneously, a decision was taken to operate in an unprotected state of mains bypass during 30th May whilst all necessary parts were fitted and tested prior to UPS reinstatement. This involved load bank testing of the UPS units for 3 hours. All UPS units were successfully reinstated at 22:00 on 30th May 2012. The investigation and analysis works are continuing with Newave and UPSL.
The Maidenhead UPS service had been operating without issue since the last service issue on 17/03/2011 but required scheduled maintenance to maintain manufacturer support. It has been regularly and successfully maintained on 3 separate occasions since 01/03/2011. The failure of the UPS was a direct result of the failure in one of the serviceable components which was replaced as recommended by the UPS manufacturer.
The failure of the component resulted in both the UPS load being unsuccessfully discharged to neighboring UPS devices, and damage to the control board. The failed component and the control board were immediately couriered to the manufacturer testing facilities in Switzerland where they are still undergoing analysis. To ensure complete continuity in UPS controls, it was essential we align the firmware revisions on the control boards so it was necessary to review all UPS devices and their associated contactors to minimise reinstatement risk and any possible disruption during future maintenance events."
---
Update: We have received a partial update from Pulsant (who own and operate the datacenter). We are still waiting on an official RFO:
"Summary
The 6 Eaton Powerwave units supporting the Pulsant 2 & 3 facility are approaching 5 years old. Under Pulsant management, we have already completed and brought up-to-date the maintenance on these devices with UPSL, whom are the only authorised Powerwave support partner in the UK at present. At the last maintenance window, we successfully replaced a large number of the batteries as a precautionary measure following testing on battery performance. The maintenance schedule as published by Powerwave manufacturing in Switzerland recommends the replacement of specific Low Voltage fuses at 5 years of age. We therefore scheduled maintenance windows over 2 separate nights. The UPS devices are installed in a parallel configuration, supporting N+2 redundancy overall or N+1 on each parallel string and the load is distributed throughout the datacentre across all 3 phases and the six UPS units.
UPS Maintenance
The maintenance sequence removes one UPS device at a time still providing N+1 redundancy to one string but only N during maintenance to the other string. The UPSL engineer attended site with the approved manufacturer parts and we commenced maintenance on the planned first 3 UPS units on time. We put UPS unit 1 into bypass, completed maintenance successfully, tested this and returned this unit to service. We placed UPS unit 2 into bypass, completed maintenance successfully, tested this and returned this to service. The UPS took the load at 22:38 and a Low Voltage fuse which had just been replaced as part of the scheduled maintenance subsequently failed. UPS 2 shed the IT critical load at 22:40 but failed to disengage from the parallel configuration resulting in the partial loss of 2 phases of power. UPS units 4,5 and 6 were unaffected.
UPSL investigated and replaced all Low Voltage fuses and the inverters as a precaution. This was escalated to PowerWave support in Switzerland at 23:32. On their recommendation, a number of components were replaced. This took 4.5 hours to complete but there was insufficient time to load bank test the configuration to our satisfaction to approve a UPS reinstatement. The risk assessment was completed at 5AM and a decision was taken based on discussions with the SSE which confirmed no planned engineering works on the grid network supplying Maidenhead 2&3.
Current Position
UPS 2 has subsequently been placed into electrical bypass. UPSL and PowerWave Switzerland are continuing to validate the test results from the unit throughout today to provide sufficient confidence in service reinstatement from 22:00 today. We will only reinstatement the device if this can be achieved or the alternative is UPSL are arranging a unit exchange with a new unit. We are monitoring the power position in detail and actively working with UPSL and PowerWave on the reinstatement plan and approach. We will provide a further customer broadcast later today to confirm the exact arrangements for reinstatement."
---
Original notice:
We are experiencing connectivity issues at the London datacenter, we are currently investigating the issue and contacting the data center staff.We will update this maintenance notice with any feedback we receive.
UPDATE 21:25:38 UTC: London data center staff are looking at the issue
Update 21:47:04 UTC: The data center report they "are currently experiencing power issues affecting some dedicated servers - located". We are seeing some servers coming back.
Update 22:22:40 UTC: Core network has started, most of the servers are coming up.
Update 01:16:13 UTC: The data center lastest report: "Engineers on-site have restored power to the affected servers and have completed the re-powering of the servers". We are just dealing with a couple of dedicated servers for customers at the moment that failed to boot.
Update 0500 UTC: Normal service has been restored. We are still waiting for an official account of the incident, this page will be updated when we receive that.
Betreffend Anderes - London Datacenter
Thursday, May 31, 2012 3:07:04 PM UTC
We're seeing significant packet loss in our London data center. We are investigating and we'll post updates about the issue here.
UPDATE: The network issues is due to an external issue with a major UK internet peering point which is affecting ISPs across the UK. This is unfortunately outside of our provider's control, but the issue is being monitored. We'll provide further updates later.
Update: Network staff report that things are now stable again. But will be closely monitored for further issues.
Betreffend Anderes - Auckland Datacenter
Period: 09/05/2012 22:00 - 10/05/2012 02:00 NZST (09/05/2012 10:00 - 09/05/2012 14:00 UTC).
(http://tinyurl.com/6s3nopp)
A number of improvements have been scheduled on the network core affecting our Auckland servers. Expected benefits include improved routing and latency, and better peering. Also included will be the addition of a third peer to improve network redundancy.
We hope to only have 7 micro outages (5 minutes each maximum) during this event window, however there is a possibility of a longer outage (5-20 minutes maximum).
Betreffend Server - WESTMERE
Update @ 5:22:57 AM UTC: Issue is marked resolved, upstream provider gave us the following report:
On 29th April 2012 at 15:31 (NZST), an unplanned outage occurred on the HDDC core network stack impacting customer services that were traversing this path. The problem was identified by network alarms and our engineers resolved the issue within 20 minutes.
Root Cause Analysis:
HD is in the process of investigating the root cause of this issue. Preliminary findings indicate that the outage was the result of an issue with one of our switches in the core network stack, the stack was rebooted, the network started operating normally.
Corrective Action:
We will keep a close eye on this stack over the next 48 hours and implement any changes or hardware replacements needed to keep our network running smoothly for our customers.
===
Servers are not responding in our Auckland facility - technicians are investigating now.
Update: Upstream investigating, should have more information in the next 15 minutes.
Update @ April 29, 2012 3:55:49 AM UTC: Servers are responding again now, still awaiting further information on what happened.
Betreffend Server - WESTMERE
Maintenance Window: Sat 7/04/2012 23:30 - Sun 8/04/2012 00:15 NZT (Scheduled)
Affecting System: Auckland Data Centre
One of our upstream providers will be moving all Auckland datacenter machines on to new Power Distribution Units (PDUs). As there have been concerns over the reliability of the PDUs they will be replaced with a more reliable unit.
This involves stopping all running dedicated and VPSs, powering down the physical hosts and then having them plugged into the new PDUs.
During this time all servers will be unavailable.
The actual change over to the new PDUs is estimated to take 5-15 minutes. It may take 20-45 minutes for the physical hosts and VPSs to start up after that.
Betreffend Server - WESTMERE
Web servers will be restarted to pick the Daylight Savings time change for NZST. No interuption anticipated. Downtime 5-10 seconds.
Betreffend Server - WESTMERE
Mailserver will be restarted to pick the Daylight Savings time change for NZST. No interuption anticipated. Downtime 1-5 seconds.
Betreffend Server - WESTMERE
Maintenance Window: Tue 27/03/2012 00:01 - 05:00 NZT (Scheduled)
Affecting System: Primary Uplink
One of our upstream providers will be replacing equipment on on
our primary uplink.
Impact: We have redundancy in place to cope with this. But there
may be degraded performance and intermittent connectivity issues
for up to 30 minutes during the event window.
Betreffend Anderes - Data Center
There appears to be a network issue at the Auckland datacenter. We are investigating.
Update: All equipment is responding normally again. We are waiting on a outage report from the datacenter.
Update: The datacenter were doing a route change, problems occurred after that. They noticed packet loss on their core stack. They rebooted their core Juniper stack. They report that the packet loss issue is resolved. The network should be responding normally.
Update: The data center will be re-applying the route changes between 1800 and 2100 NZT tonight (NZT Wed). They will be calling in Juniper engineers to oversee the change. The routes happen to be for some new IP ranges we are bringing online.
Update: routing change has been applied and things seem to be working properly now... uh, perhaps not quite. Will be requesting more information from upstream.
Update: a firmware bug meant the datacenter core network devices had to be restarted with an upgraded OS. Which caused a further outage (about 10 minutes). We have been advised that the network is stable now.
Note that there have been good 'discussions' with the data center about keeping future network changes out of core business hours.
Betreffend Server - WESTMERE
1424 UTC: We are seeing significant packet loss to most of our Auckland servers at the moment, investigating.
Update 1437 UTC: all IPs are back, after network was failed over to separate upstream. Still checking to see what happened.
Update 1445 UTC: We have been advised that an upstream exchange (Vector albany exchange north shore) went offline. For some reason failover to working upstream (Telecom) took a while, we are investigating to see how that can be remedied.
Betreffend Server - WESTMERE
A server reboot will be performed at 23:30 in order to upgrade the Kernel. Estimated downtime 5 minutes. [completed]
Seeing a routing loop affecting a large number of our servers in London at the moment, investigating.
Update 0245 UTC: We confirmed all our services up up behind our core switches, there is a routing loop further up the chain causing this. We have escalated this at the datacenter.
Update 0250 UTC: Last update seen is the datacenter network team is working on this right now.
Update 0310 UTC: Coming back online.
Update 0315 UTC: It appears this was caused by a DDoS flooding an upstream device. The target has been nullrouted to mitigate that. Confirmed normal connectivity has been restored.
We have seem some short networking outages at our London facility, we are investigating now.
Still not sure what the problem is, network team will be in soon and will investigate further. All servers are responding, we saw a short outage about 21:20 UTC, and then a longer outage lasting several minutes at around 03:50 UTC.
UPDATE: We noticed an short network outage (1 min or so) at about 10:48GMT today. The datacenter staff updated an general network report:
We will be carrying out some work on our network to improve stability and avoid issues such as those seen on the 2/11/2011. The routers that host the default gateway for these servers are being replaced. It is not expected that there will be a complete outage longer than 10-15 seconds, however this is a legacy network so unforeseen interactions between our routers and your server may occur, causing localized or partial outages.
We will be carrying out the work at 07:30 on Wednesday the 23rd of November 2011.
Betreffend Server - WESTMERE
We are currently seeing significant packet loss to our Auckland servers. We have been advised there is a DDOS underway there, technicians are working on mitigating that already.
Update: awaiting confirmation; looks to be resolved.
We are seeing intermittent network issues at the London datacenter. We are investigating.
Update: We located an issue with the way our internal routes were being updated when partial connectivity to our provider lost. We have put a fix in place that has restored connectivity. It should not cause any further issues.
Betreffend Server - WESTMERE
We experienced a temporary memory overallocation issue due to an incorrect Kernel build since the recent hardware upgrade. The Kernel didn't recognise the full 64GB allocated to the server and was only able to utilise 16 cores instead of 24 cores.
A new kernel build was applied and the server rebooted to take affect. We do not anticipate a recurrence.
[Completed 21/10 17:29. Downtime 10 minutes.]
Betreffend Server - WESTMERE
Server hardware will be upgraded between the hours of 00:00 - 08:00 on Thu 20/10/2011. This is round five of a major
performance upgrade.
Because we are upgrading to 15K SAS drives, we are unable to hot swap our existing drives. During this time, access to services may
be interrupted for your domain, however a 503 maintenance page will be displayed where applicable. We apologise for the short notice.
Time: Thu 20 Oct at 00:00 NZDT (approx)
http://www.evolomail.com/link.php?M=229466&N=3248&L=3219&F=T
[Completed 05:10]
Betreffend Server - WESTMERE
Saturday, October 1, 2011 9:38 AM NZDT
Our monitoring has picked up servers not responding in the Auckland data center, we are investigating now.
UPDATE: All servers are responding. The datacentre in Auckland was performing some routine upgrades and scheduled maintenance. Due to a miscommunication we did not have an advance notice of the maintenance required posted. We are working on improving our communication channels with our upstream providers.
Betreffend Server - WESTMERE
The server will need to be rebooted at 19:45pm 29/09/2011 to mount a drive. We apologise for the short notice.
Time: Thursday 29th September at 1945NZDT
Duration: we expect the reboot to take 3-5 minutes
Betreffend Server - WESTMERE
A few servers are unreachable at the moment at the Auckland data center. We are investigating the issue. Update: Notification says technicians are working on it there is a connectivity issue affecting a significant portion of the datacenter |
Betreffend Server - WESTMERE
We will be moving our servers at the Auckland Data Center we use to a new rack.
Time: Monday 26th September at 0330NZDT
Duration: we expect the move to take 60-90 minutes
Our staff will ensure all servers are powered down gracefully before the maintenance starts. Then we will physically move them to the new location. Once the move is complete we will check that all machines have network connectivity.
During the move the servers will be unavailable.
We are moving the servers from cabinets in the Orcon data center to cabinet in the Piermark facility (a couple of kms up the road).
In the new location we have better control over networking in the cabinet and allow us to expand our network to scale with demand. This will work in with new systems due to arrive at the end of the month. Remote hands access will also be improved.
There are no IP changes as a result of the server moves.
Betreffend Server - WESTMERE
Issues with POP mail and SSH services reported at 19:24 and are being worked on by technicians. Server will need to be restarted at 20:00, expected downtime 5 min.
[Update] Server failed to come back up. All services will need to be paused to prevent data corruption while we restore essential libraries files from latest backup image. No data will be lost.
[Completed] 21:40 All services for westmere.createhosting.co.nz back online. Application services will be separated out to prevent a re-occurrence.
Betreffend Server - WESTMERE
An Apache and MySQL Server restart needs to be performed between 11:30 NZST and 12:30 NZST in order to apply performance improvements. MySQL will be optimised to utilise a further 16GB RAM. Website functionality is not expected to be be affected as Apache will be shutdown prior to MySQL restart. Expected downtime 30 seconds.
Betreffend Server - WESTMERE
Server will be restarted between 22:30 NZST and 23:00 NZST to increase RAID10 disk size and apply OS and other essential security updates. [Completed]
Betreffend Server - WESTMERE
MySQL Server reported to be unresponsive. Down for approx 2 min. Investigating. MySQL optimisation will be performed tonight. All databases tables will be checked for errors. This may take up to 60 minutes at which time the server performance wil be reduced.
Betreffend Server - WESTMERE
High packet loss on International Traffic has been reported from 1) NJ, USA 2) Montreal, Canada and 3) Cape Town, South Africa. This seems to have been an issue throughout the day. We will check this issue with NZ datacenter staff.
21:30 UTC: issue escalated at the datacenter. Please note that NZ national traffic does not appear to be affected.
21:50 UTC: issue appears to have subsided somewhat. We are still monitoring this and waiting for an update
22:00 UTC: Confirmed that the international link is receiving a large DDOS attack. Technicians are actively working on this issue.
22:20 UTC: Attack has been isolated and normal service seems to have been restored at this stage. If you are still seeing issues please submit a ticket with details.
Betreffend Server - WESTMERE
MySQL service was restarted due to an unknown error. Service unavailable for approx 2 min. We are investigating the cause.
Betreffend Anderes - .NZ Registrar
As most of you are aware, Distribute IT's systems are currently offline due to a deliberate, premeditated and targeted attack on their Network (completely isolated from Create Hosting).
This impacts domain renewals and UDAI issuance for .nz domains. Provisions are in place with .nz registry to allow extensions on renewal dates whilst Distribute IT resolve the issue. We apologise in advance for the inconvenience and will work with Distribute IT's emergency support staff to domain renewal requests outside of the automated system.
Domain UDAI's are not able to be issued at this time until further notice (out of our control). For more informaiton please see http://distributeit.com.au/.
Betreffend Server -
Date: Tuesday, 28 June 2011
Start Time: 0700UTC / 0800BST
End Time: 0730UTC / 0830BST
Services affected: Public Network Connectivity
Location: London (Maidenhead)
Duration: 30 minutes
During this maintenance window our network team will be working with our London provider to add capacity to our network.
This will involve swapping over to new uplinks and some slight configuration changes on our providers end. It will cause loss of connectivity and/or packet loss for up to 10 minutes.
We have selected this time as it is the earliest time in the day that our providers engineers are available. We will only be doing what is absolutely necessary to keep the impact to a minimum.
Betreffend Server - WESTMERE
A server upgrade is planned for midnight tonight, 16/06/11 00:00 NZDT. This will require a shutdown, upgraded Kernel and a reboot. Downtime expected to be between 15-30 minutes. [completed]
Betreffend Server - WESTMERE
We are seeing some packet loss to our NZ Data Centre. Technicians are investigating further.
@UTC 0406 (NZT 1606) Data centre have reported that they have identified the problem and are working on a fix. ETA 5 minutes.
@UTC 0420 (NZT 1620): Issue appears to be resolved. Awaiting report from Data Centre.
Issue was reported as being caused by a misconfigured switch at the Data Center.
Betreffend Server - WESTMERE
We have recieved several reports of possible connectivity issues in our NZ datacentre. We are investigating that further now.
Update 0115 UTC: We have been advised from the datacentre that there is a DDOS attack currently underway which is degrading service. Datacentre technicians are working to resolve in the next 5 minutes.
Update 0135 UTC: Ongoing - datacenter technicians are continuing to work on resolving the DDOS issue.
Update 0230 UTC: Ongoing - the DDOS appears to be mainly affecting internationally routed connections. Investigations are continuing.
Update 0245 UTC: As of 0235 UTC, network stability appears to have been restored. Monitoring is ongoing and some IP ranges may still be affected. If you are still seeing an issue please let us know. A full incident report is pending.
Update: Per the incident report we received the following details:
The purpose of this email[sic] is to provide a time-line and identify the root cause of the network attack that occurred on April 8, 2011 which affected all clients at our Piermark Datacentre. The attack centred on the core network equipment which was bombarded by a DDOS attack originating from China. Clients experienced a wide range of issues = intermittent or inaccessibility to complete outages during the network event.
Event Time: 12:30pm - 2:30pm
...
What actions are being taken to prevent this from re-occurring?
In the process of working with our upstream provider, we have taken the plan of action to diversify our upstream providers. At this time, we do not have a scheduled date as to when this will change but we will provide notice as soon as this change has taken place.
Betreffend Anderes - Network
We are currently seeing a network issue in the London data centre, we are investigating now.
We are still seeing intermittent packet loss on one switch in our London data centre. We are investigating the cause.
Update Saturday, March 26, 2011 10:43:17 PM UTC: The problem is isolated to a single cabinet. We are seeing errors on one of our uplinks that runs between that cabinet and our core. We have manually failed over to our backup uplink to restore connectivity. We are having technicians check on the primary uplink.
Betreffend Server - WESTMERE
Services (HTTP, SMTP, POP,MYSQL) are currently unavailable. Investigating. [resolved]
Update: Services need to be disabled while a backup image is mounted to resolve binary linking issue.
Update: Services restored.
Betreffend Server - WESTMERE
Some customers are reporting intermittent service access to services at the Orcon Datacentre. Issues only effects NZ customers with some ISP's. Investigating.
Update: It appears to be a routing issue. We are waiting for details from our upstream provider.
Update: the data center is reporting an issue with networking gear. They are working on replacing core equipment related to this (and doing a upstream bandwidth upgrade to boot). Waiting to hear back from them about this and how or if the upgrade will affect our customers.
Update: the data center has given us a heads up that they may need to do emergency network maintenance tonight at 21:00 (Friday 11/03 21:00 NZT). As soon as this is confirmed we will update this page.
Update: data center staff have reported they will be upgrading the network this evening. From 2100 (Friday, NZT) to 0600. While they are doing this there may be some congestion/slowness on international routes. There also may be 'brief' (30 seconds to a few minutes) network outages as routes change.
Betreffend Server - WESTMERE
External monitoring is reporting intermittent service access (SFTP, MySQL, Apache). This has been reported as a network issue with the upstream bandwidth provider at the Orcon Datacentre; 2 min "downtime". [resolved]
Betreffend System - DPS Payline
Some customers have reported that transactions processed via DPS Payment Express are not showing in DPS Payline and/or changes are not reflected. DPS have advised us that there is currently a 5 hour lag in transactions showing in DPS Payline.
Customers need not do anything. All payments are processing as normal and will show in DPS Payline in due course. [Resolved]
Betreffend System - Network
During a recent power outage, datacenter staff had to plug some of our servers into an adjacent rack to get them back online.
They now need to be plugged back into their original rack. To do this each machine will need to be powered down for a short time (1-5 minutes).
We will do this at Tuesday December 14 at 8PM NZDT. [completed]
Betreffend Server - WESTMERE
External monitoring is reporting intermittent service access (SFTP, MySQL, Apache). This has been reported as a network issue with the upstream bandwidth provider at the Orcon Datacentre; technicians are working on the issue and should have it resolved shortly. [resolved]
Betreffend Server - WESTMERE
Multiple servers in the Orcon NZ datacenter are not responding normally. Investigating the issue now. NZ customers not affected. [resolved, awaiting feedback from techies at Orcon as to what caused the issue]
Betreffend Server - WESTMERE
MySQL will be restarted to enable an important upgrade to take effect. Downtime will be minimal.
Betreffend Server - WESTMERE
Some customers have reported problems sending mail via SMTP. This is due to a recent PCI compliance POSTFIX configuration change. We are currently working on this issue and should have it resolved shortly. [resolved]
Betreffend Server - WESTMERE
Orcon data center technicians have unintentionally tripped a power circuit while installing remote power monitoring devices. The power back online and staff are working on the problem at the data center.
More information about this event: http://www.techday.co.nz/netguide/news/vector-blamed-for-orcon-outage/18161/1/
Betreffend Server - WESTMERE
Incorrect server time was reported. This was due to critical security patches and upgrades performed today. Investigating.
Betreffend Server - WESTMERE
We are seeing some servers not responding at the Orcon datacenter. External monitoring reported down at 09:39 AM UTC. An emergency notice has been opened to determine the cause and resolve the issue.
Update 1@0945 UTC: Data center reports switch has 'frozen' (while a VLAN was being changed). They are restarting the switch (after 500+ days of uptime apparently even the Ciscos need a quick nap). From this report is appears unrelated to the PDU/power issue earlier today.
Update 10:01:23 AM UTC: Issue resolved, network restored.
Betreffend Server - WESTMERE
We are seeing some servers not responding at the Orcon datacenter. External monitoring reported down at 10:06:00 PM UTC. We are not aware of any scheduled maintenance at the datacenter and an emergency notice has been opened to determine the cause and resolve the issue.
Update: A technician at the datacenter is investigating this now.
Update 11:03:19 PM UTC: A power supply issue has been identified. Servers are coming back up now.
Update 11:05:45 PM UTC: Issue resolved.
Betreffend Server - WESTMERE
External monitoring reported a firewall configuration problem at 10:58 NZST. This required a server reboot. [Completed]
Betreffend Server - WESTMERE
Performance tuning (Apache and MySQL) will be undertaken during business hours today. This has to be performed during "normal" operation so as to get the best results. No interuptions expected. [Completed]
Betreffend Server - WESTMERE
Some customers have reported that certain NZ websites and network connections were being blocked between 18:30 - 19:30 NZST tonight. This could be due to a firewall appliance. We are following up with the Orcon datacenter to see if there are any inappropriate firewall rules in place that could be causing the issue. We endeavor to have it corrected.
The datacenter reported back. Issue was with upstream security provider. This has now been resolved.
Betreffend Server - WESTMERE
External monitoring reported a possible network failure at 20:02. We monitored the issue until 20:10. Server rebooted at 20:12 to bring services back online. Technicians are currently investigating the issue to determine the cause.
Betreffend Server - WESTMERE
Scheduled server reboot for WESTMERE at 22:30 NZT (10:30 UTC). Expected downtime 2 min - 5 min. [Completed]
Betreffend Server - WESTMERE
Some customers have reported missing /img and /css folders. This has been investigated and is due to a Parallels Plesk Panel related issue. We will revert all missing files from the latest backup as soon as possible. [Completed]
Explanation:
These are default files as part of the Parallels Plesk Panel standard setup that were installed to all virtual host root directories during a recent migration to this server. While attempting to remove any unnecessary files from virtual hosts, some legitimate files were accidentally deleted. This was due to a bug in the script. We have reverted all /img and /css folders to the httpdocs directories. If you do not need these files, you can safely delete them.
For customers that have manually restored these folders via SFTP from their local copy, you will notice a folder named css_restored or img_restored so as not to overwrite your uploaded files.
Betreffend Server - WESTMERE
MySQL server experienced a temporary shutdown at 03:29 due to a disk space issue. Disk space allocation was increased which required a server reboot. MySQL was unavailable for 15 minutes. Disk allocation monitoring added to prvide advanced warning to prevent future issues.
Betreffend Server - WESTMERE
Scheduled reboot for WESTMERE at 15:30 NZT (03:30 UTC) for a Kernel upgrade.Expected downtime 2 min - 15 min (if file system check is required).
Betreffend Server - WESTMERE
PLESK-WESTMERE is scheduled for a reboot between 01:25 NZST and 02:00 NZST to enable an important Kernel update. Downtime expected to be 1-5 minutes.
Betreffend Server - WESTMERE
Performance changes will be made to MySQL server. This will require a brief restart of MySQL and Apache. No expected downtime.
Betreffend Server - WESTMERE
For improved redundancy, our primary and secondary nameserver IP's will be changed.
Old:
ns1.createhosting.net.nz 65.99.197.195
ns2.createhosting.net.nz 119.47.117.129
New:
ns1.createhosting.net.nz 66.119.228.130
ns2.createhosting.net.nz 60.234.72.71
In most cases you won't need to do anything and the changeover should be transparent. If you have manually set the "glue" records (IP Addresses) for our nameservers in your domain's DNS, you may need to update these at your registrar. If your domain is under our management, this won't be necessary as we would have already made the change for you.
Sites on PLESK-N will be progressively migrated to a new server for improved performance and redundancy over the next few days.
You will be notified by mail on the day that your site is scheduled for migration. An email will be sent to the account contact person recorded in our billing system
DNS SOA records (specifically TTL) are all set to 300 seconds in preparation for migration. This will mean that there will be a 5 minute switchover time during site transfer where your site may be unavailable.
Migrations will be done as close as possible to off peak times to avoid disruption. Please contact us at support@createhosting.co.nz if you have any questions or concerns.
Update 31/05/2010:
All sites migrated.
PLESK-N server memory will be upgraded. Downtime expected to be no longer than 5 minutes. [completed 22:35 - downtime 2 min]
Reported SFTP connection failure. This has now been resolved and was due to an upgrade of Parallels Plesk Panel today.
We are currently experiencing a possible hard drive failure on PLESK-N server. The drives are being tested, replaced and the RAID resynced. This will cause reduced performance and some sites may not be able to serve out during this time. We are working expediantly to fix the issue as soon as possible. We apologise for the inconvenience and hope to have the matter resolved as soon as possible.
UPDATE 15:43
This server is operational. Hard drives are in the process of being replaced (HOT SWAP). Performance will be degraded during RAID resync. [resync completed]
UPDATE 20/05/2010 06:00
Server will be rebooted. [completed]
UPDATE 20/05/2010 11:32
We are experiencing mail issues. Incoming and in some cases outgoing mail is affected. We are currently restoring all mail configurations to repair the issue. [completed]
UPDATE 20/05/2010 13:06
As a precautionary measure, we are upgrading PLESK to the latest version. No expected downtime or downtime expected to be minimal. [completed]
UPDATE 20/05/2010 14:11
All mail issues resolved. Some users have reported that their password is no longer being accepted. If this applies to you, please contact us at support@createhosting.co.nz and we will reset it for you.
Betreffend System - Server Upgrades
A number of domains and services will be individually migrated from PLESK-D to PLESK-N. No downtime is expected. SOA TTL (Time To Live) for all domains on PLESK-D have been set to 60 seconds to reduce likelyhood of any email problems and maintain near instananeous migration. Major chnages: PHP4 to PHP 5, MySQL4 to MySQL5, Centos4 to Centos5.
Betreffend Server -
PLESK-D will be incrementally upgraded from 8.6.0 - 9.2.2. No downtime is expected.
There was a DoS attack this afternoon.
A DoS attack on the network caused an outage between 3:30 PM and 3:40
PM. The attack was blockeded and service resumed as normal. We will
continue to monitor the situation.
IMPACT OF WORK: Suboptimal Routing, BGP Session Reset
DATE/TIME: 05:00 - 06:00 CST / 11:00 - 12:00 UTC January 28 - http://tinyurl.com/yagodhg
Colo4 will performing maintenance on their border routers. Estimated period of impact is no more than five minutes per border router (of which there are two).
Betreffend System - Border Routers
IMPACT OF WORK: Suboptimal Routing, BGP Session Reset
DATE/TIME: 05:00 - 06:00 CST / 11:00 - 12:00 UTC January 28 - http://tinyurl.com/yagodhg
Dallas Datacenter will performing maintenance on their border routers. Estimated period of impact is no more than five minutes per border router (of which there are two).
SFTP connection issues have been reported for PLESK-N. If you experience difficulty connecting via SFTP, please wait a few seconds and try again. This is an intermittent problem and we will advise as soon as it has been resolved.
Betreffend System - DPS Fail-proof notification
There is currently an issue with DPS Fail-proof result notifications coming from a different IP address than normal. This has been reported to cause repeat order confirmation emails and in some cases empty orders to be processed through some ecommerce storefronts.
We are currently working with DPS to get the issue resolved as soon as possible.
PLESK-N will be rebooted as part of a kernel upgrade. PHP will be upgraded. Expected downtime 5 min.