Trascrizione – Numero Speciale

Tatsuya Kawahara
Professor of Kyoto University, Japan
Consultant to House of Representatives, Japanpresented with Masaya Morikawa
House of Representatives, Japan

La trascrizione è un processo che converte il parlato in un testo scritto accurato e leggibile. I resoconti parlamentari rispettano standard e linee guida che variano in base alla lingua, al paese e ad altri fattori.
Per la revisione di una trascrizione si deve tener conto di diversi fattori: eliminazione delle ridondanze, correzione di errori ed espressioni colloquiali e a volte dialettali, correzioni strutturali e semantiche.
È stata condotta un’analisi basata sui corpora utilizzando sia le trascrizioni delle riunioni ufficiali sia le trascrizioni dei discorsi integrali del Parlamento europeo e della Dieta giapponese.
L’analisi è stata effettuata anche in base al tipo di riunione e allo stile adottato. In alcuni casi è stato ridotto il numero di correzioni: ciò è dovuto alla diffusione via Internet e al riconoscimento automatico del parlato.
Questo sistema è stato introdotto nella Dieta giapponese nel 2011 al posto della stenografia e viene perfezionato continuamente. Il testo prodotto è corretto dai resocontisti che ascoltano l’audio ed è poi sottoposto a revisione.
Attualmente l’attenzione è rivolta anche alla formazione dei resocontisti e dei revisori.

1. Background
Transcription is a process to convert speech into text, and there are wo goals: one is accuracy or how faithful to speech, and the other is readability or how easy to read. They are often in trade-off relationships. Thus, standards or guidelines in Parliamentary reports have been strictly designed and enforced compared with private sectors. They are, however, different across languages and countries, and also change over time. They may be affected by other factors such as TV broadcasting, SNS, and use of automatic speech recognition (ASR) technology.2. Edits in transcription process
There are many factors requiring edits in the transcription process. First of all, disfluency must be removed. Other kinds of redundancy need to be removed. Then, grammatical errors must be corrected. And some of colloquial expressions should be rephrased to formal expression. Last but not least, speech does not have explicit punctuations unlike text, so we need to insert periods and commas in appropriate places. In addition to these edits, some structural modifications are sometimes made to improve readability. Moreover, some semantic corrections could be made for apparent mistakes, but this is a big issue. These are explained one by one.2.1. Removal of redundancy
Fillers, such as “um” and “ahh” in English, must be definitely removed. They are not transcribed by human stenographers in the first place. They are also automatically eliminated by ASR systems. Repeats and repairs must also be removed, but their automatic removal is difficult.
Discourse markers, such as “OK” and “yes” in English, can be kept, but too many tokens reduce readability. Other extraneous expressions, such as “Thank you”, can also be kept, but removal of them would improve readability2.2. Correction of errors and colloquial expressions
There are some kinds of grammatical errors whose correction is mandatory, for example, missing or incorrect particles such as “a” and “an”, and improper use of functional words such as “in” and “on”. Some kinds of colloquial expressions should also be corrected, for example, “was like” changed to “said” and “but” changed to ”however”. But we note language use changes over time. Handling of dialect is also an issue. While some dialect cannot be understood by many readers, dialect is often used to express an identity of the speaker.2.3. Structural and semantic corrections
Some structural reordering is conducted, for example, “Finnish incoming presidency” is changed to “incoming Finnish presidency”. We often need to split a long sentence into a sequence of plain sentences. In these cases, careful proof-reading is needed.
On the other hand, semantic correction needs attentions. While apparent errors such as mistakes of “billion” and “million” should be corrected via a proper process, it is a question if errors of proper name or fact errors should be corrected because MPs should be responsible for their statements. Especially when the errors affected the following interaction in the meeting, they should not be corrected.3. Corpus analysis in European Parliament and Japanese Diet
3.1. Used corpora
Corpus-based analysis was conducted by using transcripts of European Parliament and Japanese Diet (the House of Representatives). From European Parliament proceeding transcripts, English speaking parts in some plenary sessions selected in 2007 are used. With regard to Japanese Diet, a number of sessions in committee meetings held during 2005-2007 were selected. In addition to official proceeding transcripts, faithful transcripts of spoken words including fillers and disfluencies were prepared for the analysis. In fact, these faithful transcripts are prepared for development of ASR systems. General statistics of the two corpora are shown in Table 1.3.2. Analysis of edit categories
Table 2 lists the statistics of edit categories described in the previous section. We can see that majority of edits in Japanese Diet is removal of fillers and discourse markers, while English needs many grammatical corrections and syntax reordering. Thus, there is a different tendency according to the language.

Table 1 General statistics of corpora

European Parliament	Japanese Diet
#words (faithful)	30.9K	418K
#words (official)	27.1K	379K
% of edited words	20.5%	12.9%

Table 2 Statistics of Edit Categories

European Parliament	Japanese Diet
Remove	Fillers	11.6%	46.7%
Repeats/repairs	11.0%	9.4%
Discourse markers	1.8%	18.4%
Extraneous expressions	16.8%	3.0%
Correct	Grammatical errors	20.1%	7.5%
Colloquial expressions	18.0%	8.4%
Reorder	19.6%	5.9%

Here are typical edit patterns observed for English in European Parliament. Most frequently removed words other than fillers are “thank you”, “I think”, and “also”, while the most frequently inserted are particles and functional words such as “the”, “that”, “a”, ”also” and “and”. The most frequently corrected patterns are “but -> however”, “thank you -> Mr.”, “would -> should”, “our -> the”, and “this -> that”.

3.3. Analysis per meeting category and changes over time
The occurrence ratio of edits per committee in Japanese Diet is shown in Figure 1. There is a tendency in 2007 that less edits were made in the Commission on the Constitution, the Committee on Budget and the Question Time. While one-on-one interaction is a norm in other committees, the Commission on the Constitution adopts the style of free discussions by all members. This style affects the transcription process. The Committee on Budget and the Question Time are usually broadcasted on the national TV channel, and this may affect the editing process.
In Figure 1, we can see a significantly different tendency from 2007 to 2016. The ratio of edits has been reduced by 40% over the ten years.

4. Discussions
There are several causes of the reduction of edits. Most significantly, phrase reordering is not done any more. Some discourse markers now kept, and some repeats are allowed such as those expressing emphasis. Moreover, many colloquial expressions are getting accepted. These suggest that the transcripts become more verbatim than before.
There are some possible reasons for this trend. First is deployment of the ASR system. The new system has been used for all meetings since 2011, and reporters edit a faithful transcript, which contains errors, generated by the system. In the old system based on stenography, they typed in text with editing in their mind. Second factor is Internet broadcasting. All meetings are broadcasted via Internet, and they are archived and can be accessed at any time, and thus can be referred in social media. With these factors, the guideline by editors may have been changed although there is no written guideline in Japanese Diet.

Figure 1 Ratio of edits per committee in 2007 and 2016

5. ASR system performance
In 2011, the House of Representatives of Japan introduced the ASR system which directly transcribes MP’s speech recorded by the microphone in the meeting room, which was presented at the IPRS meeting in Intersteno 2011 in Paris. This is the first and still only one running system in the national-level Parliament. The language model is updated every year to incorporate new words and the acoustic model is updated using the meeting audio data after general election.
ASR performance in terms of Japanese character accuracy is monitored for most of the meetings. Initially in 2011, the accuracy was 89.7%, but it has been improved and saturated around 91% since 2012. Most recently, introducing the deep learning technology improves the accuracy by 3-4% absolute.

6. Usability and Effect of ASR system in House of the Representatives
After five years of the ASR (automatic speech recognition)-based system deployment, we have conducted a survey among reporters in the Record Department to find out how they feel about the new system. The reporter’s job is to edit a five-minute long text produced by the ASR system. Reporters edit the text by listening to the audio, and then submit the draft to the editor. A team of reporters is made up both with stenographers and non-stenographer reporters who have not been trained on stenography.
In this survey, we found that a majority of the reporters felt that it took less time and labor to finish a draft with the ASR system, and more than 80% said they are satisfied with the performance of the ASR system. Some also expressed the positive opinion that the system would make it possible for those who have not been trained on stenography to produce an edited draft upon proper training. We concluded that the ASR system is positively received.
Currently, training of reporters is conducted as follows. Those who have joined Record Department as a regular civil servant without any training on stenography goes through six month long basic training, and then one and a half year long practical training under the supervision and guidance of an experienced stenographer. To ensure the accuracy and speed required for the production of parliamentary proceedings, it is essential that reporters can produce high-quality transcript at the stage of the initial editing. In order to produce high-quality transcript, it is necessary for reporters to acquire knowledge and skills to listen to and understand the speech correctly.
In a view of an editor who checks drafts submitted by reporters, drafts submitted by non-stenographer reporters who have not been trained on stenography are getting as good as those by the trained stenographer. It suggests that the training system is so far well-functioning. This year we started an experiment with an expedited training program for reporters.
What is equally important is training of editors. There was a report at the IPRS meeting in Intersteno 2015 in Budapest that recently an emphasis is made more on fidelity to actual speech than readability of the text. Likewise, in the House of Representatives in Japanese Diet, the proceeding has been produced in a way more faithful to actual speech. This probably has to do with the increasing availability of SNS and real-time video streaming as mentioned before. We expect that the discrepancy between speech and text will further be minimized in the future.
As the society changes, so does parliamentary proceedings. But for us the challenge is to develop an effective and efficient program to train future officials to produce high-quality parliamentary proceedings.

Categoria: Trascrizione

What makes a quality transcript in Parliamentary reporting — Quantitative analysis on post-editing in Japanese and European Parliaments and changes over past ten years