Purpose:
Section titled “Purpose:”To ensure the proper CIT employees are aware and engaged in the effort to meet SLAs for critical incidents.
Scope:
Section titled “Scope:”Applies to Service Desk, VITM, VCIO, CSOC, Customer Success.
Responsibility:
Section titled “Responsibility:”On Call Analyst: Triages call, determines priority level, creates swarm space, may contact SME’s as determined necessary, will stay in touch with user(s) that reported issue, and may need to call POC. Escalation Tech: Makes the call if it is necessary to contact the VITM
Information Security Lead: Takes the lead on security incidents.
VITM: May be contacted after hours to assist with outlined approvals, may need to assist with drafting client communications, may need to assist with continued communication with POC(s) during business hours, posts any mass communication info in swarm space.
VCIO: May be contacted after hours to assist if VITM unavailable, may need to assist with drafting client communications, may need to assist with continued communication with POC(s) during business hours, posts any mass communication info in swarm space.
VITM Manager: May be contacted after hours to assist if VITM and VCIO unavailable.
Director of Client Success: Assists with sending out mass communications regarding outages.
Everyone: Opens a problem ticket if they find missing information.
Service Desk: Attaches any tickets that come in related to the first reported issue as a child ticket.
Definitions:
Section titled “Definitions:”A Critical Incident is defined as follows:
Critical Business Impact: The entire company, large group, department or VIP users affected that is causing a financial impact. A stoppage in major business processes, applications, or critical operations.
Urgency: Needs immediate attention to restore service and prevent or reduce a financial impact to the business. No possible workarounds.
Examples: network down, possible security incident, mission critical systems inoperable, issues affecting payroll deadlines.
Procedure:
Section titled “Procedure:”-
A call is received after hours – it is triaged by the Service Desk on call analyst and is the responsibility of that analyst to determine impact and urgency which determines the priority level.
a. A Critical Incident is defined as follows:
Critical Business Impact: The entire company, large group, department or VIP users affected that is causing a financial impact. A stoppage in major business processes, applications, or critical operations.
Urgency: Needs immediate attention to restore service and prevent or reduce a financial impact to the business. No possible workarounds.
Examples: network down, possible security incident, mission critical systems inoperable, issues affecting payroll deadlines.
-
Once it has been determined to be a critical incident, after hours, the analyst will immediately create a SWARM space. This space will include themselves, escalation tech, VITM, VCIO, and SD manager. The title of the space will be the client name – ticket number – issue summary.
a. Example – GPW – TK13256 – PBA inaccessible
b. Reference KB00050560 (How to Create a SWARM space)
-
A brief summary of the issue needs to be immediately posted stating summary of the issue, what time the issue was reported, and include any troubleshooting that has already occurred.
-
If critical incident is a security incident the Information Security Lead must be added to the SWARM space immediately and contacted by phone. Information Security Lead should be continuously contacted until reached.
-
Acknowledgement by the VITM/VCIO is not required during after hours critical incidents unless the point of contact (POC) is requesting that the VITM be contacted, or approvals are required. Approvals are only required for a reboot of a server or firewall that is not already offline or hard down. If VITM action is required, after they are added to the space, Service Desk analyst is required to call the VITM. The need of the VITM is determined by the escalation tech. This can include but is not limited to: Reboot of servers, approvals for a change, unable to reach POC, etc. If the VITM is unreachable and has not returned the call within 15 minutes, escalation to VCIO is required by phone. Tagging the VITM or VCIO in the space is not an appropriate form of escalation contact during after hours. If after 30 minutes we have not made contact with the VITM or VCIO, the VITM manager is to be contacted by phone.
*The purpose for spinning up the SWARM space after hours is to document what is happening in real time for continued monitoring, future reference, and to ensure all client representation is aware of any ongoing problems. Just because a person has been added to the space does not mean they are required to engage after hours. If a person is actively needed after hours, they will be contacted by phone to assist.
-
Troubleshooting will proceed as normal and may require the VITM stepping in as the client SME to assist in resolving outages (e.g., licensing issues/expirations, applications with lack of documentation, etc.). If an additional SME is needed, they can be added to the space and will need to be contacted by phone due to it being after hours.
a. Reference KB00004142 SOP - On Call - Afterhours.
-
The expectation of the on call analyst is to remain in contact with the user(s) who reported the issue until resolution. On call analyst may also need to contact POC in the event of an unresolved outage.
-
If the after hours outage is going to continue into client business hours and mass communication to the client is necessary, it is the responsibility of the VITM/VCIO to draft up that communication and send it out with assistance from the Director of Client Success as soon as possible. If regular updates to the client POC(s) is required during business hours, that is the responsibility of the VITM/VCIO. All updates sent to the client need to be posted in the SWARM space so analysts or SMEs working the issue are aware of the information being sent to the client.
**If missing information is identified, it is the responsibility of the person who discovered the missing information to create a problem ticket and assign it to the VITM for updated documentation. This includes incorrect WI, missing WI, missing CI, missing vendor information, etc.
**It is the responsibility of the entire service team to attach any tickets that come in related to the first reported issue as a child ticket.
Effectiveness Criteria:
Section titled “Effectiveness Criteria:”Critical incident is responded to and resolved within the agreement defined prioritization levels and client is clearly communicated with throughout the entire procedure.
References:
Section titled “References:”Priority and Response Matrix: KB00002890
Create a Problem from an Incident: KB00001007
How to Create a SWARM space: KB00050560
SOP - On Call - Afterhours: KB00004142
Process References:
Section titled “Process References:”- Link to process map.
Note: Please add KB relationships to core process, process. SOPs or other WIs on the right.