Octave Klaba
@olesovhcom
Wed Mar 10 02:42:28 +0000 2021

We have a major incident on SBG2. The fire declared in the building. Firefighters were immediately on the scene but could not control the fire in SBG2. The whole site has been isolated which impacts all services in SGB1-4. We recommend to activate your Disaster Recovery Plan.

Update 5:20pm. Everybody is safe.
Fire destroyed SBG2. A part of SBG1 is destroyed. Firefighters are protecting SBG3. no impact SBG4.

Update 7:20am
Fire is over. Firefighters continue to cool the buildings with the water.
We don’t have the access to the site. That is why SBG1, SBG3, SBG4 won’t be restarted today.

Update 10am. (I’m there).
We finished to shutdown the UPS in SBG3. Now they are off. We are looking to enter into SBG3 and check the servers. The goal is to create a plan to restart , at least SBG3/SBG4, maybe SBG1. To do so, we need to check the network rooms too.

Update 11:20am
All servers in SBG3 are okey. They are off, but not impacted. We create a plan how to restart them and connect to the network. no ETA.

Now, we will verify SBG1.

Update 11:40am
The network room in SBG1 is okey. 4 rooms destroyed. 8 rooms are okey.

Update 1pm
Plan for the next 1-2 weeks:
1) rebuilding 20KV for SBG3
2) rebuilding 240V in SBG1/SBG4
3) verifying DWDM/routers/switchs in the network room A (SBG1). checking the fibers Paris/Frankfurt
4) rebuilding the network room B (in SBG5). checking fibers Paris/Frankfurt

Update 4pm
We plan to restart SBG1+SBG4+the network by Monday March,15 and SBG3 by Friday March,19.

In RBX+GRA we have the stock of new servers, pcc, pci ready to be delivered for all the impacted customers. Of course for free. We will add 10K servers in the next 3-4 weeks.

Update 9:30pm
The teams are working to fix the issue on APIv6, Manager, Support etc. restoring the emails services hosted in SBG2.

Many thanks for all the empathy messages you sent us today ! I want to thank the teams who have been working all night/day.

It’s been the worst day for the last 22y and there is no word strong enough to say how sorry I feel today.

We keep working hard to restart SBG1/3/4 asap!

Update 3pm

1/3
From yesterday, we had the meetings with Police, DREAL, Experts and Insurance. Also, we started to clean up the site. The goal is to have the full and secured access to SBG1, SBG3 and SBG4.

2/3
The optical network from Network Room A (SBG1) to Paris + Frankfurt works.

Working on the power SBG1 option based on the generators.

The SBG3’ generators work. Good option to use if the network is UP before we finished to rebuild SBG3’s 20KV.

3/3
In 1H I will post a short (8m) video with more informations / more details.

Update 11 mars 16h40
C'est trop lent de vous communiquer tous ces détails via 280 caractères de Twitter. Voici un concentré de 8min qui résume la situation à ce jour.
(English version is coming)
https://t.co/d7BeD7nNpM https://t.co/ugvmLk3EEJ

Update 11 mars 16h40
It's too slow to give you all the information with just 280 chars. Here, my video with 8min of information we have today.
https://t.co/qjm3Vs0Ho2 https://t.co/xTb09wmXJ0

Update March 12, 8am https://t.co/BM9g8HINLn

Update March,12 11am
1/4
This afternoon, we will send the email to each customer with his specific situation and the options. In any case, we recommand to restart the service in our others DC (RBX/GRA) where we are adding the additional resources. Free months will be applied asap

Update March,12 11am
2/4
We are working with the insurance’s experts to repower SBG1/3/4 today from the generators. The goal is to verify, room by room, equipment by equipment, that infra works.
We don’t expect to restart the servers before next week.

Update March,12 11am
3/4
New network room (B) + 20KV for SBG3 are assembled and will be on the road this night, on the side tomorrow. Then, we start pluging them. It will take over the generators SBG3 by Fri.
We verify 20KV + 240V in SBG1/SBG4 to take over the generators by Tue

Update March,12 11am
4/4
We will start repowering SBG1/SBG4 by the end of the next week, Fri, 19. It will take 2 days to have all servers UP.
We will start repowering SBG3 by the mid of the next week, Wed, 17, it will take 6-8 days to have all servers UP.

State of Backup Service (Free or Paid) for the SBG customers:
1/5

- FTP Backup in SBG (Free/Paid) for VPS and Baremetal : the datas are in RBX. You have full access.

State of Backup Service (Free or Paid) for the SBG customers:
2/5

- pCS in SBG: We hope to restore the pCS cluster in SBG next week. The servers are in SBG1 + SBG3. We may have some bad news, this is why the final status has to be confirmed in the next days.

State of Backup Service (Free or Paid) for the SBG customers:
3/5

- Paid Backup VPS & PCI: 80% of the data are on pCS in SBG. please read the previous tweet about the state of pCS in SBG. 20% of the data was on pCA which was in SBG2.

State of Backup Service (Free or Paid) for the SBG customers:
4/5

- Free/Paid Backup pCC in SBG1 was hosted in an separated room of SBG1. Both rooms are destroyed.

- Free/Paid Backup pCC in SBG3: all datas seems to be safe.

State of Backup Service (Free or Paid) for the SBG customers:
5/5

We recommand to restart the service in our other DC (RBX or GRA). Free months will be applied asap.

Update March,13 / 3:30am
Last emails are in the process to be sent to all customers, with the current information, about the state of the primaire data and the backup (if the service was subscribed) for Baremetal, Public Cloud (pCI pCS pCA K8S..), Hosted Private Cloud (pCC),NAS..

Update March, 13 / 3pm

We are creating a page for
- the services hosted in SBG
- the service backups (FTP Backup, VPS Automated Backup, VPS Snapshot SSD/Cloud, Instance Backup/Snapshot, Volume Snapshot/Backup)
- the internal backup
- the state of the backups
- next steps / ETA

Update March 14, 1pm
1/3
Still working on restarting of the A optical network in SBG1. We hope to have the network A UP today. It will allow us to redeploy the internal tools locally.
Reconstruction of the B optical network will take 2 days. Lot of fibers have to connected.

Update March 14, 1pm
2/3
240V in SBG1 is ready to be tested. Tomorrow 20KV for SBG1 will be UP.
The 20KV is SBG3 is in progress. 2 days.
Watercooling SBG1/3/4 checked and protected.
Cleaning and drying are in progress.

Update March 14, 1pm
3/3
60 tech work on the site. Still lot of to do, but it’s going faster that expected. We keep the same ETA and maybe we will be faster.

A webpage with the final state of:
- primary data
- internal backup (not contractual)
- service backup
is in progress.

Update March,15 0:30am

Here, the URL with
- the state of the each service in each DC
- the state of our Internal Backup (no contractual)
- the state of the Service Backup (if cust took the service)

EN:
https://t.co/w3bh6gRIHB

FR:
https://t.co/c4MMepE6tw

Update March,15 / 1pm
1/5
Power:
- 20KV in SBG1 is UP. 240V is UP.
- We are restarting UPS3 for the network room A (SBG1).
- We are working on UPS2+UPS4 for SBG4

Update March,15 / 1pm
2/5
Power:
- still working on 20KV in SBG3. Will be UP tomorrow.
- check of the UPS in SBG3 has started

Update March,15 / 1pm
3/5
Network:
- fibers Room A<>B ongoing
- fibers Room A<>SBG1/SBG4 will be started asap
- fibers Room A<>SBG3 Floor 1/2/3/4/5 ongoing
- fibers Room B<>SBG3 Floor 1/2/3/4/5 ongoing

Update March,15 / 1pm
4/5
Network:
- Optical network FRA<>SBG is UP
- Optical network PAR<>SBG is ongoing
- Routers in Network A are UP
- Routers in Network B are there. waiting for power.

Update March,15 / 1pm
5/5
Restarting of the servers
- the servers in some rooms have to be cleaned up because of the smoke: at least SBG3/Floor4+5, SBG1/61E+62E
- all others servers have to be inspected for the pollution risks
- check of the watercooling and then we start booting

https://t.co/1wU0eGdrUD

https://t.co/zKoejNHKnB

Update March,17 1pm

1/4
Power SBG1/3/4:
20KV: OK
Tranfo: OK
TGBT: OK
UPS: OK
Fuses: OK
Go ! :)

Network:
We restarted the network in SBG3. We need to rebuild a part of the internal network destroyed in SBG2.

We restarted first rack in SBG1. It works fine. We will continue.

Update March,17 1pm

2/4
Clean up of the servers:
We tested the servers in SBG3 : 1/3 of them have to be clean up. 2/3 can be restarted asap.
Cleaning up processus takes 12-16H per server. We will execute it on all 5 floors, in parallel. The tests start now.

Update March,17 1pm

3/4
We need to work on the network in SBG4.

We are looking for a special process to clean up the servers in SBG1/61E-62E. Some servers in SBG3/Floor 5 have the same issue.

Update March,17 1pm

4/4
To restart pCC/HPC, we need the pCC-Master that manages the vSphere. 1/3 are there, 2/3 have to be rebuilt. It’s a long process that we are looking to accelerate.

To restart OpenStack, we need to rebuild OS’s control plane : <24H.

Update March,17 4:30pm

First rack with the cust’ servers is UP !

Update March,17 11:30pm

Few racks are UP. https://t.co/rvBRpNCgVj

Wed Mar 17 22:33:44 +0000 2021