Methodology & Data Standards

How we collect,
validate, and publish
salary data.

Transparency is not just a talking point. This page documents exactly where our data comes from, how it is processed, and what its limitations are β€” so you can make informed decisions about how to use it.

Last updated: April 2026 Version 2.1 Reviewed quarterly
01

Data Sources

Where our salary figures come from and how much weight each source carries.

Salaryvia draws from five distinct source categories. Each is weighted differently based on reliability, verification level, and sample representativeness. No single source dominates β€” diversity of inputs is a feature, not a flaw.

Community Submissions
Self-reported salaries from verified users. Each submission includes title, city, experience, work type, and total compensation breakdown.
Primary source
Government Wage Records
Bureau of Labor Statistics Occupational Employment Statistics, H-1B visa disclosure data, and state wage boards.
Primary source
Partner HR Surveys
Anonymized compensation surveys from HR departments and professional associations in exchange for aggregate data access.
Secondary source
Job Posting Analysis
Salary ranges from job listings with disclosed pay. Used for trend signals only β€” not for absolute benchmarks due to known posting-vs-actual discrepancies.
Supplemental
Public Salary Disclosures
Pay transparency data from companies operating in states with pay disclosure requirements (CA, CO, NY, WA, IL, and others).
Primary source

Job posting salary ranges are used as trend indicators, never as the basis for a published median. Studies show posted salary ranges can understate actual median pay by 8–22% depending on the industry.

02

Salary Calculation

How raw data points become the percentiles and medians you see on each page.

Published salary figures are not averages of all available data. They are produced through a multi-step pipeline designed to reduce noise, weight sources appropriately, and flag low-confidence outputs.

1
Raw collection & deduplication
Submissions from the same user within 12 months are collapsed to the most recent. Cross-source deduplication removes records that appear in multiple input sets with high probability.
2
Outlier detection
Records more than 3 standard deviations from the trimmed mean for their job/city/level cohort are flagged for review. Confirmed outliers are excluded; ambiguous cases are weighted down, not removed.
3
Source weighting
Community submissions and public disclosures receive full weight. Government records receive 0.85Γ— weight (methodological lag). Job posting data receives 0.4Γ— weight (known upward bias in listings).
4
Percentile calculation
We publish P10, P25, P50 (median), P75, and P90 using weighted percentile interpolation across the validated sample. Mean is available but not the primary figure β€” medians are more robust to outliers in compensation data.
5
Minimum sample gate
Percentiles are only published if the validated cohort has at least 15 records for P50, 25 records for P25/P75, and 40 records for P10/P90. Below these thresholds, figures are either suppressed or shown with a low-confidence warning.
6
Confidence scoring
Each output receives a confidence score (0–100) factoring in sample size, source diversity, geographic precision, recency, and outlier rate. Scores below 40 are displayed with a visible warning label.
03

Cost of Living Context

How we adjust salary figures to make cross-city comparisons meaningful.

A $120,000 salary in San Francisco and a $120,000 salary in Dallas are not the same offer. Salaryvia surfaces cost-of-living context on every page so you can make accurate comparisons without doing the math yourself.

Our cost-of-living indices combine the Council for Community and Economic Research (C2ER) ACCRA index with MIT Living Wage data and regional housing cost surveys. The composite index is rebased to 100 = US national average.

CoL-adjusted figures shown on salary pages are informational β€” they normalize purchasing power, not taxes or lifestyle costs. A full breakdown of what the CoL index does and does not include is available in the note below.

What CoL includes: housing (35%), transportation (15%), groceries (13%), utilities (10%), healthcare (10%), misc goods & services (17%). Not included: state income tax rates, school quality, commute time, or quality of life factors.

CoL indices are updated semi-annually. There is typically a 3–6 month lag between real-world cost changes and index updates. For rapidly changing markets (e.g., post-pandemic housing spikes), current CoL context may understate recent changes.

04

Update Frequency

How often different parts of our data are refreshed and what triggers a refresh.

Data freshness is a core quality metric. Every published salary page includes a visible last-updated date. Pages with data older than 12 months are automatically flagged pending review.

Continuous
Community submissions
New submissions are queued for validation within 24 hours of receipt. Validated submissions enter the pipeline on a rolling basis.
Monthly
Percentile recalculation
All published percentiles are recalculated at the start of each month with the latest validated dataset. Significant changes trigger automatic review before publishing.
Quarterly
Government wage records
BLS OES data, H-1B disclosures, and state wage board records are ingested quarterly as they are published by their respective sources.
Semi-annually
Cost-of-living indices
C2ER ACCRA and composite CoL indices are updated twice per year, aligned to publication schedules of the underlying research organizations.
Annually
Methodology review
This methodology document and all underlying calculation weights are reviewed annually and updated when material changes are made.
05

Known Limitations

An honest account of where our data falls short and what that means for users.

No salary database is complete or perfectly representative. The following limitations are inherent to the nature of community-reported and publicly-sourced compensation data. We document them so users can calibrate their expectations appropriately.

Self-selection bias. People who voluntarily submit salary data may not be representative of all workers in a role. Those who are highly compensated (to validate their position) or underpaid (to seek leverage) may be overrepresented relative to median earners.
Tech industry overrepresentation. Our community skews toward software engineering, product, and data roles. Compensation data for trades, healthcare, education, and government positions may have smaller sample sizes and lower confidence.
Geographic concentration. Major metro areas (San Francisco, New York, Seattle, Austin) have significantly more data than mid-size or rural markets. Salary estimates for smaller cities carry wider uncertainty bands even when confidence scores are moderate.
Recency lag in government data. BLS and similar datasets are published 6–18 months after the survey period. We apply a recency weight discount but cannot eliminate the lag entirely. Pages that rely heavily on government data may understate recent market changes.
Total compensation complexity. Equity, RSUs, bonuses, and benefits vary enormously and are difficult to standardize across submissions. Our "total compensation" figures include base + reported bonus + annualized equity but may not reflect the true value of equity grants at illiquid companies.
Job title inconsistency. Job titles are not standardized across companies or industries. A "Senior Engineer" at one company may be equivalent to a "Staff Engineer" at another. Our classification system attempts to normalize by responsibility level, but edge cases exist.

We recommend using Salaryvia figures as a starting point for research, not as the single basis for a negotiation. Combine our data with your own network, LinkedIn Salary, and direct conversations with peers for the most complete picture.

06

Editorial Principles

The rules that govern how data is published and what we will not do.

Salaryvia editorial decisions are made independently of commercial interests. The following principles are binding for all data publication decisions, regardless of business pressure.

No paid placement
No company may pay to have its compensation figures altered, highlighted, or ranked differently than the data supports.
Suppress, don't fabricate
When sample size is too low to publish a reliable estimate, we suppress the figure. We do not fill gaps with interpolation or AI estimates presented as real data.
Label uncertainty clearly
Confidence scores and uncertainty flags are displayed prominently, not buried in footnotes. Users must be able to see data quality without looking for it.
Update or remove
Data that cannot be refreshed within 18 months is removed from public display rather than left to mislead users with stale figures.
Privacy before precision
When k-anonymity cannot be guaranteed for a geographic or demographic cohort, the cohort is merged with a broader group rather than published at risk of re-identification.
Documented methodology
This document is the canonical description of our methodology. If our practices diverge from what is documented here, this document wins β€” and we update it.