Developing good scholarly (alt)metrics.
To summarize, altmetrics should build on existing statistical and social science methods for developing reliable measures. The draft white paper from the NISO altmetrics project suggests many interesting potential action items, but does not yet incorporate, suggest or reference a framework for systematic definition or evaluation of metrics.
NISO offered a recent opportunity to comment on the draft recommendation on their ‘Altmetrics Standards Project’. MIT is a non-voting NISO member, and I am the current ‘representative’ to NISO. The following is my commentary, on the draft recommendation. You may also be interested in reading the other commentaries on this draft.
Response to request for public comments on on ‘NISO Altmetrics Standards Project White Paper ’
Scholarly metrics should be broadly understood as measurement constructs applied to the domain of scholarly/research (broadly, any form of rigorous enquiry), outputs, actors, impacts (i.e. broader consequences), and the relationships among them. Most traditional formal scholarly metrics, such as the H-Index, Journal impact Factor, and citation count, are relatively simple summary statistics applied to the attributes of a corpus of bibliographic citations extracted from a selection of peer-reviewed journals. The Altmetrics movement aims to develop more sophisticated measures, based on a broader set of attributes, and covering a deeper corpus of outputs.
As the Draft aptly notes, in general our current scholarly metrics, and the decision systems around them are far from rigorous: “Unfortunately, the scientific rigor applied to using these numbers for evaluation is often far below the rigor scholars use in their own scholarship.” 
The Draft takes a step towards a more rigorous understanding of alt metrics. It’s primary contribution is to suggest a set of potential action items to increase clarity and understanding.
However, the Draft does not yet identify either the key elements of a rigorous (or systematic) foundation for defining scholarly metrics, their properties, and quality. Nor does the Draft identify key research in evaluation and measurement that provide a potential foundation. The aim of these comments is to start to fill this structural.
Informally speaking, good scholarly metrics are fit for use in a scholarly incentive system. More formally, most scholarly metrics are parts of larger evaluation and incentive systems, where the metric is used to support descriptive and predictive/causal inference, in support of some decision.
Defining metrics formally in this way also helps to clarify what characteristics of metrics are important for determining their quality and usefulness.
– Characteristics supporting any inference. Classical test theory is well developed in this area.  Useful metric supports some form of inference, and reliable inference requires reliablilty. Informally, good metrics should yield the similar results across repeated measurements of the same purported phenomenon.
– Characteristics supporting descriptive inference. Since an objective of most incentive systems is descriptive, good measures must have appropriate measurement validity.  In informal terms, all measures should be internally consistent; and the metric should be related to the concept being measured.
– Characteristics supporting prediction or intervention. Since objective of most incentive systems is both descriptive and predictive/causal inference, good measures must aid accurate and unbiased inference.  In informal terms, the metric should demonstrably be able to increase the accuracy of predicting something relevant to scholarly evaluation.
– Characteristics supporting decisions. Decision theory is well developed in this area : The usefulness of metrics is dependent on the cost of computing the metric, and the value of the information that the metric produces. The value of the information depends on the expected value of the optimal decisions that would be produced with and without that information. In informal terms, good metrics provide information that helps one avoid costly mistakes, and good metrics cost less than the expected of the mistakes one avoids by using them.
– Characteristics supporting evaluation systems. This is a more complex area, but the field of game theory and mechanism design are most relevant. Measures that are used in a strategic context must be resistant to manipulation — either (a) requiring extensive resources to manipulate, (b) requiring extensive coordination across independent actors to manipulate, or by (c) inventing truthful revelation. Trust engineering is another relevant area — characteristics such as transparency, monitoring, and punishment of bad behavior, among other systems factors, may have substantial effects. 
The above characteristics comprise a large part of the scientific basis for assessing the quality and usefulness of scholarly metrics. They are necessarily abstract, but closely related to the categories of action items already in the report. In particular to Definitions; Research Evaluation; Data Quality; and Grouping. Specifically, we recommend adding the following action items respectively:
– [Definitions] Develop specific definitions of altmetrics that are consistent with best practice in the social-science field on the development of measures
– [Research evaluation] – Promote evaluation of the construct and predictive validity of individual scholarly metrics, compared to the best available evaluations of scholarly impact.
– [Data Quality and Gaming] – Promote the evaluation and documentation of the reliability of measures, their predictive validity, cost of computing, potential value of information, and susceptibility to manipulation based on the resources available, incentives, or collaboration among parties.
 NISO Altmetrics Standards Project White Paper, Draft 4, June 6 2014; page 8
 See chapter 5-7 in Raykov, Tenko, and George A. Marcoulides. Introduction to psychometric theory. Taylor & Francis, 2010.
 See chapter 6 in Raykov, Tenko, and George A. Marcoulides. Introduction to psychometric theory. Taylor & Francis, 2010.
 See chapter 7 in Raykov, Tenko, and George A. Marcoulides. Introduction to psychometric theory. Taylor & Francis, 2010.
 See Morgan, Stephen L., and Christopher Winship. Counterfactuals and causal inference: Methods and principles for social research. Cambridge University Press, 2007.
 See Pratt, John Winsor, Howard Raiffa, and Robert Schlaifer. Introduction to statistical decision theory. MIT press, 1995.
 See ch 7. in Fudenberg, Drew, and Jean Tirole. “Game theory, 1991.” Cambridge, Massachusetts (1991).
 Schneier, Bruce. Liars and outliers: enabling the trust that society needs to thrive. John Wiley & Sons, 2012.