Prometheus integration
The Prometheus integration enables you to query and visualize Coder's platform metrics.
Requirements
- A Coder deployment on Kubernetes
- Prometheus Operator installed on your cluster
Configuration
Coder sends Prometheus-formatted metrics to port 2112
on the coderd
container. Use the below PodMonitor resource to connect the Prometheus Operator
to this endpoint:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: master-monitor
namespace: coder
spec:
selector:
matchLabels:
app.kubernetes.io/component: coderd
podMetricsEndpoints:
- port: prom-coderd
Workspace Metrics
Each coder workspace has an agent that connects to a single coderd
instance.
Each coderd instance will include all metrics from the workspaces it manages.
The workspace metrics will all look like this:
coderd_workspace_<workspace_metric_name>{user_id="<user_id>",workspace_id="<workspace_id>"}
Due to the nature of workspace ids, this produces a high cardinality of metric labels. This could be problematic for some configurations. If specific workspace metrics are not of interest, or are causing issues, you can configure your metric scraping service to drop these metrics.
Note that if a workspace connects to a new coderd
(rebuild, network issue,
coder update, etc), the metrics for that workspace will be moved to the new
coderd
metrics endpoint. The labels on the new metrics will likely have the
new coderd
pod name. So when tracking a singular workspace, you should track
only by workspace_id
throughout the lifetime of the workspace until it is
deleted.
Drop workspace metrics config
Prometheus Documentation
about relabelling metrics. In this case we will drop all metrics that contain
the workspace_id
label.
metric_relabel_configs:
- source_labels: ["workspace_id"]
action: drop
In Prometheus Operator we can pass this config addition to our coderd
PodMonitor spec.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: master-monitor
namespace: coder
spec:
selector:
matchLabels:
app.kubernetes.io/component: coderd
podMetricsEndpoints:
- port: prom-coderd
relabelings:
- action: drop
sourceLabels:
- workspace_id
Coderd Metrics
Below is a list of the various metrics emitted by Coder's Prometheus endpoint:
Metric | Type | Description |
---|---|---|
coderd_agent_aggregator_agent_push_backlog | gauge | Total number of agent metric bundles waiting to be processed. |
coderd_agent_aggregator_collect_backlog | gauge | Total amount of gathers waiting to collect metrics. |
coderd_agent_aggregator_collect_nanoseconds | summary | Time taken to collect all metrics. |
coderd_agent_aggregator_count_total | gauge | Total number of agent metrics being reported by this coderd. |
coderd_agent_aggregator_delete_backlog | gauge | Total number of agents waiting to be deleted in aggregator. |
coderd_agent_aggregator_workspace_count_total | gauge | Total number of workspace agents pushing metrics to this coderd. |
coderd_api_concurrent_requests | gauge | The total number of concurrent API requests |
coderd_api_concurrent_websockets | gauge | The total number of concurrent API websockets |
coderd_api_request_latencies_ms | histogram | Latency distribution of requests in milliseconds |
coderd_api_requests_processed_total | counter | The total number of processed API requests |
coderd_api_websocket_durations_ms | histogram | Websocket duration distribution of requests in milliseconds |
coderd_background_workspace_build_duration_s | histogram | Duration distribution of workspace builds in seconds |
coderd_backgroundjob_completed_total | counter | Total number of jobs completed since startup. |
coderd_backgroundjob_current_enqueued_jobs | gauge | Current number of enqueued and not started background jobs. |
coderd_backgroundjob_enqueue_time_seconds | histogram | Histogram of total time taken by job type to transition from Enqueue to Running. |
coderd_backgroundjob_enqueued_total | counter | Total number of jobs enqueued. |
coderd_backgroundjob_execution_time_seconds | histogram | Histogram of total time taken by job type to transition from Running to Completed. |
coderd_backgroundjob_started_total | counter | Total number of jobs started. |
coderd_db_sql_queries_executed_total | counter | The total number of executed SQL queries |
coderd_db_sql_query_latencies_ms | histogram | Latency distribution of SQL queries in milliseconds |
coderd_license_expires_at_unix | gauge | Unix timestamp of the license expiry date. |
coderd_license_issued_at_unix | gauge | Unix timestamp of the license issue date. |
coderd_license_time_until_expires_days | gauge | Number of days until the license expires. |
coderd_license_user_count | gauge | Number of active (non-dormant) users. |
coderd_license_user_limit | gauge | Number of users allowed by the license. |
coderd_rtc_agent_listeners_concurrent | gauge | The total number of concurrent RTC agent listener websockets. |
coderd_rtc_client_connections_total | counter | The total number of RTC client connections. |
coderd_rtc_turn_connections_concurrent | gauge | The number of concurrent TURN connections. |
coderd_rtc_turn_connections_total | counter | The total number of TURN connections opened. |
coderd_rtc_workspace_connections_current | gauge | The number of concurrent wsnet workspace connections. |
coderd_rtc_workspace_connections_total | counter | The total number of wsnet workspace connections opened. |
go_gc_cycles_automatic_gc_cycles_total | counter | Count of completed GC cycles generated by the Go runtime. |
go_gc_cycles_forced_gc_cycles_total | counter | Count of completed GC cycles forced by the application. |
go_gc_cycles_total_gc_cycles_total | counter | Count of all completed GC cycles. |
go_gc_duration_seconds | summary | A summary of the pause duration of garbage collection cycles. |
go_gc_heap_allocs_by_size_bytes | histogram | Distribution of heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
go_gc_heap_allocs_bytes_total | counter | Cumulative sum of memory allocated to the heap by the application. |
go_gc_heap_allocs_objects_total | counter | Cumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
go_gc_heap_frees_by_size_bytes | histogram | Distribution of freed heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
go_gc_heap_frees_bytes_total | counter | Cumulative sum of heap memory freed by the garbage collector. |
go_gc_heap_frees_objects_total | counter | Cumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
go_gc_heap_goal_bytes | gauge | Heap size target for the end of the GC cycle. |
go_gc_heap_objects_objects | gauge | Number of objects, live or unswept, occupying heap memory. |
go_gc_heap_tiny_allocs_objects_total | counter | Count of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size. |
go_gc_pauses_seconds | histogram | Distribution individual GC-related stop-the-world pause latencies. |
go_goroutines | gauge | Number of goroutines that currently exist. |
go_info | gauge | Information about the Go environment. |
go_memory_classes_heap_free_bytes | gauge | Memory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime's estimate of free address space that is backed by physical memory. |
go_memory_classes_heap_objects_bytes | gauge | Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector. |
go_memory_classes_heap_released_bytes | gauge | Memory that is completely free and has been returned to the underlying system. This metric is the runtime's estimate of free address space that is still mapped into the process, but is not backed by physical memory. |
go_memory_classes_heap_stacks_bytes | gauge | Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. |
go_memory_classes_heap_unused_bytes | gauge | Memory that is reserved for heap objects but is not currently used to hold heap objects. |
go_memory_classes_metadata_mcache_free_bytes | gauge | Memory that is reserved for runtime mcache structures, but not in-use. |
go_memory_classes_metadata_mcache_inuse_bytes | gauge | Memory that is occupied by runtime mcache structures that are currently being used. |
go_memory_classes_metadata_mspan_free_bytes | gauge | Memory that is reserved for runtime mspan structures, but not in-use. |
go_memory_classes_metadata_mspan_inuse_bytes | gauge | Memory that is occupied by runtime mspan structures that are currently being used. |
go_memory_classes_metadata_other_bytes | gauge | Memory that is reserved for or used to hold runtime metadata. |
go_memory_classes_os_stacks_bytes | gauge | Stack memory allocated by the underlying operating system. |
go_memory_classes_other_bytes | gauge | Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more. |
go_memory_classes_profiling_buckets_bytes | gauge | Memory that is used by the stack trace hash map used for profiling. |
go_memory_classes_total_bytes | gauge | All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes. |
go_memstats_alloc_bytes | gauge | Number of bytes allocated and still in use. |
go_memstats_alloc_bytes_total | counter | Total number of bytes allocated, even if freed. |
go_memstats_buck_hash_sys_bytes | gauge | Number of bytes used by the profiling bucket hash table. |
go_memstats_frees_total | counter | Total number of frees. |
go_memstats_gc_sys_bytes | gauge | Number of bytes used for garbage collection system metadata. |
go_memstats_heap_alloc_bytes | gauge | Number of heap bytes allocated and still in use. |
go_memstats_heap_idle_bytes | gauge | Number of heap bytes waiting to be used. |
go_memstats_heap_inuse_bytes | gauge | Number of heap bytes that are in use. |
go_memstats_heap_objects | gauge | Number of allocated objects. |
go_memstats_heap_released_bytes | gauge | Number of heap bytes released to OS. |
go_memstats_heap_sys_bytes | gauge | Number of heap bytes obtained from system. |
go_memstats_last_gc_time_seconds | gauge | Number of seconds since 1970 of last garbage collection. |
go_memstats_lookups_total | counter | Total number of pointer lookups. |
go_memstats_mallocs_total | counter | Total number of mallocs. |
go_memstats_mcache_inuse_bytes | gauge | Number of bytes in use by mcache structures. |
go_memstats_mcache_sys_bytes | gauge | Number of bytes used for mcache structures obtained from system. |
go_memstats_mspan_inuse_bytes | gauge | Number of bytes in use by mspan structures. |
go_memstats_mspan_sys_bytes | gauge | Number of bytes used for mspan structures obtained from system. |
go_memstats_next_gc_bytes | gauge | Number of heap bytes when next garbage collection will take place. |
go_memstats_other_sys_bytes | gauge | Number of bytes used for other system allocations. |
go_memstats_stack_inuse_bytes | gauge | Number of bytes in use by the stack allocator. |
go_memstats_stack_sys_bytes | gauge | Number of bytes obtained from system for stack allocator. |
go_memstats_sys_bytes | gauge | Number of bytes obtained from system. |
go_sched_goroutines_goroutines | gauge | Count of live goroutines. |
go_sched_latencies_seconds | histogram | Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. |
go_sql_idle_connections | gauge | The number of idle connections. |
go_sql_in_use_connections | gauge | The number of connections currently in use. |
go_sql_max_idle_closed_total | counter | The total number of connections closed due to SetMaxIdleConns. |
go_sql_max_idle_time_closed_total | counter | The total number of connections closed due to SetConnMaxIdleTime. |
go_sql_max_lifetime_closed_total | counter | The total number of connections closed due to SetConnMaxLifetime. |
go_sql_max_open_connections | gauge | Maximum number of open connections to the database. |
go_sql_open_connections | gauge | The number of established connections both in use and idle. |
go_sql_wait_count_total | counter | The total number of connections waited for. |
go_sql_wait_duration_seconds_total | counter | The total time blocked waiting for a new connection. |
go_threads | gauge | Number of OS threads created. |
process_cpu_seconds_total | counter | Total user and system CPU time spent in seconds. |
process_max_fds | gauge | Maximum number of open file descriptors. |
process_open_fds | gauge | Number of open file descriptors. |
process_resident_memory_bytes | gauge | Resident memory size in bytes. |
process_start_time_seconds | gauge | Start time of the process since unix epoch in seconds. |
process_virtual_memory_bytes | gauge | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | gauge | Maximum amount of virtual memory available in bytes. |
promhttp_metric_handler_requests_in_flight | gauge | Current number of scrapes being served. |
promhttp_metric_handler_requests_total | counter | Total number of scrapes by HTTP status code. |