Soft Skills
Atitude de Dono
Paixão por desafios
Engajamento
Trabalho em equipe
Comunicação Verbal e escrita
People
Colaboração
Ter atitude positiva diante dos desafios
Confiança
Empatia
Proatividade
Hard Skills
AWS
Type in your organization's name
ECS
Task Definition
Daemon Scheduling
NLB
EKS
Fargate
Node Group
ECR
AMI - Golden Images
Target Groups
CLI K8S
Databases
DynamoDB
Redshift
ElastiCache
RDS
Add a critical supplier for your organization
Where would failure in the supply chain cause significant problems?
These suppliers should be referenced in the risks section, and you should have a recovery strategy to cope with problems with your suppliers.
Storage
Glacier
EBS
Storage Gateway
S3
Add a core service or function
Core services are the ones that are critical to the survival of your organization - the ones without which you would not have customers or business.
These services or functions should be the first to be recovered in the event of a disaster.
Network and Content Delivery
Route 53
Cloud Front
Compute
EC2
ALB
SG
Launch Templates
ASG
ELB
Lambda
Add a team member
Ensure that you have representation for:
- Personnel
- IT systems and information security
- Core processes (e.g. manufacturing)
- Workplace safety
- Site security
- PR
Management
CloudTrail
Cloud Watch
Add a milestone for your business continuity planning project, and set dates.
Think about:
- Establishing a business continuity planning team with executive backing
- Identifying all foreseeable risks to your business or organization
- Having a written recovery procedure for all significant risks
- Communicating relevant parts of your plan to the people and organizations involved
Security and Identity
KMS
VPC
Add a critical staff member
Critical staff members are the ones on whom you will depend when a major incident occurs.
You will need to ensure that your critical staff members are trained and can be easily contacted.
Make sure that a copy of the contact list is kept off-site.
Route Tables
Shared Endpoints
VPC Endpoints
Subnetes
WAF
IAM
Who needs to be trained in your emergency procedures?
Consider:
- People who are in a position to detect incidents early
- The key staff who will lead recovery procedures
- People likely to be around when an incident occurs and affected by it
- People with relevant skills (e.g. First Aiders)
- People responsible for communicating with emergency services, staff, customers, suppliers or other stakeholders
Aplication Services
API GW
Add a communication audience
Who will you need to communicate parts of your plan to? Who needs to take action, stay informed or be reassured?
Consider:
- Your employees
- Visitors and subcontractors
- Your customers
- The emergency services in your area
- Specialist services
- Other companies or buildings nearby
- The local community
- Your suppliers
- Other stakeholders, e.g. insurers or banks
VPC Link
What action will you take to communicate with 'API GW'? What will they need to know?
Think about:
- What you expect and them to do
- Preparing and sharing contact lists
- Preparing and sharing checklists for emergency procedures for the people responsible for managing them
- Issuing bulletins and advisory notices
- Establishing a communications network to manage an incident
Messagins
SNS
SQS
Add a location where you have people, assets, and operations that will be covered by your plan.
Jornada MicroService
CI/CD
Gitlab
Codebuild
CodePipeline
Argocd
API Swagger
ArgoRollouts
Logs | Event
Fluentd
Keda
Observability
Prometheus
AppDynamics
Kiali
Servicd Level Indicator
CloudWatch
Splunk
Kibana
Dynatrace
ElasticSearch / Opensearch
Grafana
IAC
Ansible
Helm
yaml | json
CloudFormation
Terraform
Add a operations' risk that may affect your business.
Think about:
- Failure of critical equipment or plant
- Unexpected loss of major customers or partners
- Processes that cannot be controlled
- Shutdown by an official agency (e.g. food hygiene or medical cleanliness)
- Supply chain failure: suppliers unable to provide adequate goods and services
- Loss of utilities - electricity, water, gas, telephone, cell / mobile phones, broadband, radio network
- Indirect effects of other industrial action
Docker
Dockerfile
Compose
Add a IT risk that may affect your business.
Think about:
- Cyber-attack on an online presence
- Unauthorised access to confidential data through a security breach
- Theft or loss of equipment containing confidential data
- Loss of services or data due to hardware failure
- Loss of services or data due to computer viruses
- Loss of media (e.g. software installation disks)
- Loss of communications capability
- Loss of cloud services or data
ServiceMesh
Istio - Objects
Add a physical risk that may affect your business.
Think about:
- Fire or explosion
- Chemical hazards or biological hazards
- Unsafe or unusable buildings
- Unsafe working conditions posing risks to personnel
- Floods, storm damage, earthquakes or other natural disasters
- Civil commotion, riots or terrorism
- Intruders accessing your premises
Istiod
Gateway
Virtual Services - Request Routing
DestionationRules - Traffic Shifting
Circuit Break
Authentication Policy ?
Retry Policy
Egress
Envoy Sidecar
Istio Operator
Chart/Istio
Add prevention measures
- What measures could be taken to prevent "Istio - Objects"?
- Can alternatives or backup facilities be prepared?
- Can a "fail-safe" mode be engineered?
- How can these measures be put in place?
- Would these measures introduce any new risks?
Kubernetes
Objects
Add a business risk that may originate from your staff.
Think about:
- Key people leaving or moving to competitors
- Death or long term illness of a key staff member
- Key staff unable to get to work, e.g. due to weather, epidemics, or transport issues
- Unauthorised disclosure of confidential information
- Negligence, fraud or theft
- Sabotage
- Industrial action
Deployments
Service
ConfigMap
Rbac
DaemonSet
StateFulset
Probe
Readiness
Liveness
What is the impact of "Objects" on your business? What level of resources is appropriate for dealing with it and recovering from it?
Rate the impact by clicking an icon:
Major- poses a critical risk to business
Survivable - causes problems that can be fixed
Minor - no recovery action needed
Namespace
Operator
HPA
Pod
Secret
Ingress