• Что бы вступить в ряды "Принятый кодер" Вам нужно:
    Написать 10 полезных сообщений или тем и Получить 10 симпатий.
    Для того кто не хочет терять время,может пожертвовать средства для поддержки сервеса, и вступить в ряды VIP на месяц, дополнительная информация в лс.

  • Пользаватели которые будут спамить, уходят в бан без предупреждения. Спам сообщения определяется администрацией и модератором.

  • Гость, Что бы Вы хотели увидеть на нашем Форуме? Изложить свои идеи и пожелания по улучшению форума Вы можете поделиться с нами здесь. ----> Перейдите сюда
  • Все пользователи не прошедшие проверку электронной почты будут заблокированы. Все вопросы с разблокировкой обращайтесь по адресу электронной почте : info@guardianelinks.com . Не пришло сообщение о проверке или о сбросе также сообщите нам.

Migrating to Self-Hosted 3scale API Management on ROSA: A Kubernetes Journey

Lomanu4 Оффлайн

Lomanu4

Команда форума
Администратор
Регистрация
1 Мар 2015
Сообщения
1,481
Баллы
155
I was tasked to migrate from a Red Hat-hosted 3scale portal to a self-hosted version in ROSA (Red Hat OpenShift Service on AWS). This presented quite a challenge as my knowledge in Kubernetes was mostly theoretical, based on studying for the Kubernetes and Cloud Native Associate (KCNA) certification exam.

The goal was to recreate a self-hosted version of 3scale using an operator in ROSA, but what I thought would be a straightforward deployment turned into a valuable learning experience.

What is Red Hat-Managed/hosted 3scale?


When using Red Hat-hosted 3scale (also known as "SaaS" or managed 3scale), all infrastructure complexities are abstracted away. Red Hat handles the deployment, maintenance, updates, and scaling of the platform.

As a user, you simply access a provided portal URL and focus on managing your APIs rather than worrying about the underlying infrastructure. Your daily tasks revolve around the actual API management activities like adding backends, configuring products, creating applications, setting up authentication, and managing rate limits.

It's a convenient option that requires minimal operational overhead, allowing your team to focus on API strategy rather than platform management.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



What is self-hosted 3scale?


In contrast, self-hosted 3scale brings both flexibility and responsibility. You gain complete control over your deployment configuration, integration with internal systems, customization options, and data locality.

Since the infrastructure runs on Kubernetes (in my case, ROSA - Red Hat OpenShift Service on AWS), you have access to all the native Kubernetes capabilities for scaling, monitoring, and management.

However, this freedom comes with the need to manage the entire application lifecycle within the Kubernetes ecosystem: installation via operators or templates, configuration through custom resources, scaling via horizontal pod autoscalers, implementing backup strategies, and handling upgrades.

You're responsible for ensuring high availability with proper pod distribution, performance tuning through resource allocation, and troubleshooting any issues that arise in both the 3scale application components and the underlying Kubernetes resources.

The Migration


Migrating from managed to self-hosted represented a significant shift in responsibilities, and I was about to discover just how much Red Hat had been handling behind the scenes.

This blog post documents a real-world troubleshooting journey that encountered and overcame significant challenges:


  1. Missing Routes for Admin Access


  2. DNS resolution issues preventing access to Red Hat's container registry


  3. Architecture mismatch between my ARM-based MacBook for and the x86_64 docker container images required for deployment


  4. PVC Access Mode Issues


  5. Resource Constraints


  6. Missing Service for App Components

By sharing this experience, I hope to help others who might encounter similar issues during their deployment process, especially those who are transitioning from theoretical Kubernetes knowledge to practical application.

The Initial Deployment Attempt


We started by creating a dedicated namespace for our 3scale deployment:


oc create namespace 3scale-backup

After switching to this namespace (oc project 3scale-backup), we downloaded the 3scale API Management Platform template:


curl -o amp.yml

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



Then we tried to deploy 3scale using this template:


oc new-app --file=amp.yml \
--param WILDCARD_DOMAIN=apps.[domain of your openshift].openshiftapps.com \
--param ADMIN_PASSWORD=password123

The template processing appeared successful, creating numerous resources:

  • Imagestreams
  • Deployment configs
  • Services
  • Routes
  • Persistent volume claims
  • Secrets

oc get all -n [your namespace]


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



However, when checking the status of the pods, we noticed that many deployments were either not starting, with errors, crashLoopBackOff or stuck in initialization phases:


oc get pods


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



While some components like Redis and database pods were running fine, critical components like backend-listener, backend-worker, and backend-cron were not deploying at all.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



The system components were also failing during initialization.

Challenge 1: Missing Routes for Admin Access


Our first challenge was that the URLs for accessing the admin portal https://3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com were showing "Application is not available".

The reason was simple - the template had not created the necessary routes for our self-hosted 3scale services. We manually created them:


oc create route edge system-admin --service=system-provider --hostname=3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
oc create route edge system-developer --service=system-developer --hostname=3scale.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com
oc create route edge system-master --service=system-master --hostname=master.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com

However, after creating the routes, the admin portal still wasn't accessible. Digging into the logs with oc logs system-app-1-hook-pre, we discovered a more fundamental issue.

Challenge 2: DNS Resolution Issues


The pre-deployment hook was failing with a specific error:


ThreeScale::Core::APIClient::ConnectionError: connection refused: backend-listener:80

Further investigation revealed that the backend components weren't deployed at all. When checking the deployment configs:


oc get dc/backend-listener
NAME REVISION DESIRED CURRENT TRIGGERED BY
backend-listener 0 1 0 config,image(amp-backend:2.12)

We saw that backend-listener, backend-worker, and backend-cron had REVISION 0 and CURRENT 0, indicating they hadn't been deployed.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



The root cause was found in the imagestream:


oc describe imagestream amp-backend

This showed an error:


error: Import failed (InternalError): Internal error occurred: registry.redhat.com/3scale-amp2/backend-rhel8:3scale2.12: Get "

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

": dial tcp: lookup registry.redhat.com on 100.10.0.11:23: no such host

Our OpenShift cluster couldn't resolve the hostname registry.redhat.com due to DNS issues. This was confirmed by attempting to run:


nslookup registry.redhat.com

Which returned "No answer" from the DNS server.

Or technically it does not pull the image required from the RedHat registry to the pods.

So our workaround was to manually pull the image to the docker (locally), then push that image to the namespace's private RedHat registry.

Challenge 3: Architecture Mismatch


While working to address the DNS issues , we discovered another challenge - we were trying to pull Red Hat's container images on an ARM64-based machine (likely an Apple Silicon Mac), but the images were only available for x86_64 architecture.

When attempting to pull the images directly:


docker login [RedHat Credentials]
docker pull registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12

We received:


no matching manifest for linux/arm64/v8 in the manifest list entries
The Solution Process


We implemented a multi-step solution to overcome these challenges:

Step 1: Authentication with Red Hat Registry

First, we logged in to the Red Hat Container Registry:


docker login registry.redhat.io

Step 2: Architecture-aware Image Pulling

Because I was using macOS and having the docker desktop installed in it, the pulled image does not match with the operating system's arch.

To overcome the architecture mismatch, we explicitly specified the platform when pulling:


docker pull --platform linux/amd64 registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12

This successfully pulled the image by using Rosetta 2 emulation on macOS.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



Step 3: Exposing the OpenShift Registry

To make our OpenShift registry accessible:


oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge

Step 4: Pushing Images to Internal Registry

We pushed the pulled images to our OpenShift internal registry:


# Get credentials
TOKEN=$(oc whoami -t)
REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')

# Login to registry
docker login -u kubeadmin -p $TOKEN $REGISTRY

# Tag and push
docker tag registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12 $REGISTRY/[namespace]/amp-backend:2.12
docker push $REGISTRY/[namespace]/amp-backend:2.12

Step 5: Updating ImageStreams

We updated the imagestream to point to our locally pushed image:


oc tag $REGISTRY/[namespace]/amp-backend:2.12 amp-backend:2.12 --source=docker

This automatically triggered the deployment due to the ImageChange trigger on the deployment config.

Results


After implementing these steps for the backend-listener component, the deployment began successfully (at least for this resource!).

Challenge 4: PVC Access Mode Issues


So we went to the 3Scale self-hosted Admin Portal and checked that still it's not working.

We checked the pods and found out that some of them are having issues.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



The error logs show that several deployments are failing because their pods are taking too long to become available (timeout errors):

  • apicast-production-1-deploy: "pods took longer than 1800 seconds to become available"
  • system-sidekiq-1-deploy: "pods took longer than 1200 seconds to become available"
  • system-sphinx-1-deploy: "pods took longer than 1200 seconds to become available"

This typically happens when pods are stuck in a pending or initializing state for too long.

So we checked the logs of the problematic pods and checked the PVC as well.


# Check logs for apicast-production deployment
oc logs apicast-production-1-deploy

# Check logs for system-sidekiq deployment
oc logs system-sidekiq-1-deploy

# Check logs for system-sphinx deployment
oc logs system-sphinx-1-deploy

# Check events for the pending pod
oc describe pod system-app-1-hook-pre

We discovered a storage issue where the system-storage PVC was failing to provision:


oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
backend-redis-storage Bound pvc-ss987s5b-026a-4srg-au97-549d8958933a 1Gi RWO gp3 53m
mysql-storage Bound pvc-72s43210-s033-4c8w-ar53-043bf3kk1496 1Gi RWO gp3 53m
system-redis-storage Bound pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4 1Gi RWO gp3 53m
system-storage Pending

The error was related to access modes:


failed to provision volume with StorageClass "gp3": rpc error: code = InvalidArgument desc = Volume capabilities MULTI_NODE_MULTI_WRITER not supported. Only AccessModes[ReadWriteOnce] supported.

We fixed it by creating a new PVC with the correct access mode:


# First, delete pods using this PVC
oc delete pod system-app-1-hook-pre

# Back up the current PVC definition
oc get pvc system-storage -o yaml > system-storage-pvc.yaml

# Delete the stuck PVC
oc delete pvc system-storage

# Create a new PVC with the correct settings
oc create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: system-storage
namespace: [Namespace]
labels:
app: 3scale-api-management
threescale_component: system
threescale_component_element: app
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: gp3
EOF

After fixing the PVC issue, restart the deployments:


oc rollout retry dc/system-app
oc rollout retry dc/apicast-production
oc rollout retry dc/backend-listener
oc rollout retry dc/system-sidekiq
oc rollout retry dc/system-sphinx

The PVC issue got fixed, and the system-storage PVC is now correctly bound to a volume.


oc get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
backend-redis-storage Bound pvc-ss987s5b-026a-4srg-au97-549d8958933a 1Gi RWO gp3 53m
mysql-storage Bound pvc-72s43210-s033-4c8w-ar53-043bf3kk1496 1Gi RWO gp3 53m
system-redis-storage Bound pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4 1Gi RWO gp3 53m
system-storage Bound pvc-2286196a-8885-490s-11c1-654320bd8a5a6 1Gi RWO gp3 116s
Challenge 5: Resource Constraints


Even after resolving the PVC issue, pods were still stuck in Pending state due to insufficient resources:


oc describe pod system-app-2-mr25z

...
Warning FailedScheduling 2m34s default-scheduler 0/9 nodes are available: 2 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 6 node(s) had volume node affinity conflict.

We reduced the resource requirements to make the pods fit on the available nodes:


oc patch dc/system-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-master","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-provider","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-developer","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}}]}}}}'
oc patch dc/apicast-production -p '{"spec":{"template":{"spec":{"containers":[{"name":"apicast-production","resources":{"requests":{"cpu":"25m","memory":"128Mi"}}}]}}}}'
oc patch dc/system-sidekiq -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sidekiq","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'
oc patch dc/system-sphinx -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sphinx","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'

After applying these patches, we restarted the failed components:


# Retry system-sidekiq and system-sphinx deployments
oc rollout retry dc/system-sidekiq
oc rollout retry dc/system-sphinx

That got fixed too!!



➜ oc get services

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
apicast-production ClusterIP xxx.xx.xxx.xxx <none> 8080/TCP,8090/TCP 83m
apicast-staging ClusterIP xxx.xx.xxx.xxx <none> 8080/TCP,8090/TCP 83m
backend-listener ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m
backend-redis ClusterIP xxx.xx.xxx.xxx <none> 6379/TCP 83m
system-developer ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m
system-master ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m
system-memcache ClusterIP xxx.xx.xxx.xxx <none> 11211/TCP 83m
system-mysql ClusterIP xxx.xx.xx.xxx <none> 3306/TCP 83m
system-provider ClusterIP xxx.xx.x.xxx <none> 3000/TCP 83m
system-redis ClusterIP xxx.xx.xxx.xxx <none> 6379/TCP 83m
system-sphinx ClusterIP xxx.xx.xxx.xxx <none> 9306/TCP 83m
zync ClusterIP xxx.xx.xxx.xx <none> 8080/TCP 83m
zync-database ClusterIP xxx.xx.xx.xxx <none> 5432/TCP 83m

➜ oc get routes

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
backend backend-3scale.apps.[YOUR-DOMAIN].openshiftapps.com backend-listener http edge/Allow None
system-admin 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
system-developer 3scale.apps.[YOUR-DOMAIN].openshiftapps.com /developer system-app 3001 edge/Allow None
system-master master.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3002 edge/Allow None
system-provider 3scale.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
zync-3scale-api-hhhjs api-3scale-apicast-production.apps.[YOUR-DOMAIN].openshiftapps.com apicast-production gateway edge/Redirect None
zync-3scale-api-phh9n api-3scale-apicast-staging.apps.[YOUR-DOMAIN].p1.openshiftapps.com apicast-staging gateway edge/Redirect None
zync-3scale-master-nhhht HostAlreadyClaimed system-master http edge/Redirect None
zync-3scale-provider-q9hh9 HostAlreadyClaimed system-developer http edge/Redirect None
zync-3scale-provider-shh6z HostAlreadyClaimed system-provider http edge/Redirect None

Since the containers are starting up, it should be a matter of minutes before we can access the admin portal.

We were just pod status, and when system-app shows 3/3 ready, tried accessing the admin portal at:


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



But then, it is still UNAVAILABLE.

Challenge 6: Missing Service for App Components


Even after all pods were running, the admin portal was not accessible. The issue was that we created routes pointing to a service named "system-app" which didn't exist:


oc get routes

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
system-admin 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None
system-developer 3scale.apps.[YOUR-DOMAIN].openshiftapps.com /developer system-app 3001 edge/Allow None
system-master master.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3002 edge/Allow None
system-provider 3scale.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None

oc describe service system-app

Error from server (NotFound): services "system-app" not found

We fixed this by creating the missing service:


bash
oc create -f - <<EOF
apiVersion: v1
kind: Service
metadata:
name: system-app
namespace: [NAMESPACE]
labels:
app: 3scale-api-management
spec:
ports:
- name: provider
port: 3000
protocol: TCP
targetPort: 3000
- name: developer
port: 3001
protocol: TCP
targetPort: 3001
- name: master
port: 3002
protocol: TCP
targetPort: 3002
selector:
deploymentConfig: system-app
type: ClusterIP
EOF

Final Result
After working through all these challenges, we finally had a fully operational 3scale deployment:


bash
oc get pods

NAME READY STATUS RESTARTS AGE
apicast-production-4-7hh00 1/1 Running 0 2m12s
apicast-staging-1-6hh00 1/1 Running 0 83m
backend-cron-2-6955a 1/1 Running 0 23m
backend-listener-1-5hh00 1/1 Running 0 26m
backend-redis-1-mhh005 1/1 Running 0 57m
backend-worker-2-lr8gb 1/1 Running 0 23m
system-app-3-7ln8g 3/3 Running 0 85s
system-memcache-1-xddig 1/1 Running 0 80m
system-mysql-1-ee4wt 1/1 Running 0 80m
system-redis-1-45hh0 1/1 Running 0 80m
zync-1-l7ghy 1/1 Running 0 80m
zync-database-1-dt3l9 1/1 Running 0 80m
zync-que-1-wwri9 1/1 Running 2 (80m ago) 80m

With all components running, finally, we were able to access the 3scale admin portal and begin configuring our APIs.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



Key Lessons Learned


  1. DNS Resolution is Critical: Ensure your OpenShift cluster can resolve external registry hostnames before attempting deployments that rely on them.


  2. Architecture Awareness: When working with enterprise container images on ARM-based development machines, be explicit about architecture requirements using the --platform flag.


  3. Manual Image Mirroring: In restricted environments, manually pulling and pushing images to an internal registry is a viable workaround.


  4. ImageStream Mechanics: Understanding how OpenShift's ImageStreams work is essential for troubleshooting deployment issues.


  5. Network Policies: In enterprise environments, network policies may restrict access to external registries, requiring coordination with network administrators.
Conclusion


Deploying complex solutions like 3scale API Management in restricted network environments or across architecture boundaries presents unique challenges. By understanding the underlying issues and implementing a systematic approach to manually mirror images, we were able to overcome these obstacles.

While this process requires more manual effort than a standard deployment, it demonstrates the flexibility of OpenShift's container management capabilities and provides a path forward for deployments in environments with similar restrictions.

For organizations facing similar challenges, we recommend:

Validating registry access before beginning deployment

Having a strategy for cross-architecture container management

Understanding the image mirroring process for restricted environments

Working closely with network administrators to address connectivity issues

With these considerations in mind, even the most challenging deployments can be successfully completed.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

 
Вверх Снизу