Salaryvia

01

Data Sources

Where our salary figures come from and how much weight each source carries.

Salaryvia draws from five distinct source categories. Each is weighted differently based on reliability, verification level, and sample representativeness. No single source dominates — diversity of inputs is a feature, not a flaw.

Community Submissions

Self-reported salaries from verified users. Each submission includes title, city, experience, work type, and total compensation breakdown.

Primary source

Government Wage Records

Bureau of Labor Statistics Occupational Employment Statistics, H-1B visa disclosure data, and state wage boards.

Primary source

Partner HR Surveys

Anonymized compensation surveys from HR departments and professional associations in exchange for aggregate data access.

Secondary source

Job Posting Analysis

Salary ranges from job listings with disclosed pay. Used for trend signals only — not for absolute benchmarks due to known posting-vs-actual discrepancies.

Supplemental

Public Salary Disclosures

Pay transparency data from companies operating in states with pay disclosure requirements (CA, CO, NY, WA, IL, and others).

Primary source

Job posting salary ranges are used as trend indicators, never as the basis for a published median. Studies show posted salary ranges can understate actual median pay by 8–22% depending on the industry.

02

Salary Calculation

How raw data points become the percentiles and medians you see on each page.

Published salary figures are not averages of all available data. They are produced through a multi-step pipeline designed to reduce noise, weight sources appropriately, and flag low-confidence outputs.

1

Raw collection & deduplication

Submissions from the same user within 12 months are collapsed to the most recent. Cross-source deduplication removes records that appear in multiple input sets with high probability.

2

Outlier detection

Records more than 3 standard deviations from the trimmed mean for their job/city/level cohort are flagged for review. Confirmed outliers are excluded; ambiguous cases are weighted down, not removed.

3

Source weighting

Community submissions and public disclosures receive full weight. Government records receive 0.85× weight (methodological lag). Job posting data receives 0.4× weight (known upward bias in listings).

4

Percentile calculation

We publish P10, P25, P50 (median), P75, and P90 using weighted percentile interpolation across the validated sample. Mean is available but not the primary figure — medians are more robust to outliers in compensation data.

5

Minimum sample gate

Percentiles are only published if the validated cohort has at least 15 records for P50, 25 records for P25/P75, and 40 records for P10/P90. Below these thresholds, figures are either suppressed or shown with a low-confidence warning.

6

Confidence scoring

Each output receives a confidence score (0–100) factoring in sample size, source diversity, geographic precision, recency, and outlier rate. Scores below 40 are displayed with a visible warning label.

03

Cost of Living Context

How we adjust salary figures to make cross-city comparisons meaningful.

A $120,000 salary in San Francisco and a $120,000 salary in Dallas are not the same offer. Salaryvia surfaces cost-of-living context on every page so you can make accurate comparisons without doing the math yourself.

Our cost-of-living indices combine the Council for Community and Economic Research (C2ER) ACCRA index with MIT Living Wage data and regional housing cost surveys. The composite index is rebased to 100 = US national average.

CoL-adjusted figures shown on salary pages are informational — they normalize purchasing power, not taxes or lifestyle costs. A full breakdown of what the CoL index does and does not include is available in the note below.

What CoL includes: housing (35%), transportation (15%), groceries (13%), utilities (10%), healthcare (10%), misc goods & services (17%). Not included: state income tax rates, school quality, commute time, or quality of life factors.

CoL indices are updated semi-annually. There is typically a 3–6 month lag between real-world cost changes and index updates. For rapidly changing markets (e.g., post-pandemic housing spikes), current CoL context may understate recent changes.

04

Update Frequency

How often different parts of our data are refreshed and what triggers a refresh.

Data freshness is a core quality metric. Every published salary page includes a visible last-updated date. Pages with data older than 12 months are automatically flagged pending review.

Continuous

Community submissions

New submissions are queued for validation within 24 hours of receipt. Validated submissions enter the pipeline on a rolling basis.

Monthly

Percentile recalculation

All published percentiles are recalculated at the start of each month with the latest validated dataset. Significant changes trigger automatic review before publishing.

Quarterly

Government wage records

BLS OES data, H-1B disclosures, and state wage board records are ingested quarterly as they are published by their respective sources.

Semi-annually

Cost-of-living indices

C2ER ACCRA and composite CoL indices are updated twice per year, aligned to publication schedules of the underlying research organizations.

Annually

Methodology review

This methodology document and all underlying calculation weights are reviewed annually and updated when material changes are made.

05

Known Limitations

An honest account of where our data falls short and what that means for users.

No salary database is complete or perfectly representative. The following limitations are inherent to the nature of community-reported and publicly-sourced compensation data. We document them so users can calibrate their expectations appropriately.

Self-selection bias. People who voluntarily submit salary data may not be representative of all workers in a role. Those who are highly compensated (to validate their position) or underpaid (to seek leverage) may be overrepresented relative to median earners.

Tech industry overrepresentation. Our community skews toward software engineering, product, and data roles. Compensation data for trades, healthcare, education, and government positions may have smaller sample sizes and lower confidence.

Geographic concentration. Major metro areas (San Francisco, New York, Seattle, Austin) have significantly more data than mid-size or rural markets. Salary estimates for smaller cities carry wider uncertainty bands even when confidence scores are moderate.

Recency lag in government data. BLS and similar datasets are published 6–18 months after the survey period. We apply a recency weight discount but cannot eliminate the lag entirely. Pages that rely heavily on government data may understate recent market changes.

Total compensation complexity. Equity, RSUs, bonuses, and benefits vary enormously and are difficult to standardize across submissions. Our "total compensation" figures include base + reported bonus + annualized equity but may not reflect the true value of equity grants at illiquid companies.

Job title inconsistency. Job titles are not standardized across companies or industries. A "Senior Engineer" at one company may be equivalent to a "Staff Engineer" at another. Our classification system attempts to normalize by responsibility level, but edge cases exist.

We recommend using Salaryvia figures as a starting point for research, not as the single basis for a negotiation. Combine our data with your own network, LinkedIn Salary, and direct conversations with peers for the most complete picture.

06

Editorial Principles

The rules that govern how data is published and what we will not do.

Salaryvia editorial decisions are made independently of commercial interests. The following principles are binding for all data publication decisions, regardless of business pressure.

No paid placement

No company may pay to have its compensation figures altered, highlighted, or ranked differently than the data supports.

Suppress, don't fabricate

When sample size is too low to publish a reliable estimate, we suppress the figure. We do not fill gaps with interpolation or AI estimates presented as real data.

Label uncertainty clearly

Confidence scores and uncertainty flags are displayed prominently, not buried in footnotes. Users must be able to see data quality without looking for it.

Update or remove

Data that cannot be refreshed within 18 months is removed from public display rather than left to mislead users with stale figures.

Privacy before precision

When k-anonymity cannot be guaranteed for a geographic or demographic cohort, the cohort is merged with a broader group rather than published at risk of re-identification.

Documented methodology

This document is the canonical description of our methodology. If our practices diverge from what is documented here, this document wins — and we update it.

How we collect,
validate, and publish
salary data.

Data Sources

Salary Calculation

Cost of Living Context

Update Frequency

Known Limitations

Editorial Principles

Data you can actually rely on.

How we collect,validate, and publishsalary data.

Data Sources

Salary Calculation

Cost of Living Context

Update Frequency

Known Limitations

Editorial Principles

Data you can actually rely on.

How we collect,
validate, and publish
salary data.