[Remote] Data Labeling Analyst - Speech & Voice AI
Note: The job is a remote job and is open to candidates in USA. Welocalize is a company specializing in language services, and they are seeking a Data Labeling Analyst for Speech & Voice AI. The role involves updating machine learning models, managing data annotation, and ensuring quality assurance for data sets.
Responsibilities
- Update training and test model databases with new or amended synthetic textual and image data
- Modify and refine machine learning data creation, annotation, and rating guidelines
- Initiate model training processes using internal tools and command-line interfaces
- Evaluate the performance of trained models to gauge their efficacy and readiness for deployment
- Design and develop test and training datasets as per the criteria provided by the project manager and other full-time employees
- Handle data efficiently, ensuring its integrity throughout the workflow
- Engage in data relevance tasks, ensuring data sets are aligned with project goals
- Annotate data accurately, ensuring it adheres to set guidelines
- Conduct manual quality analysis of model results
- Recognize error patterns and report anomalies for further investigation
- Deliver detailed reports on findings, including aspects such as utterance quality, LLM evaluation, ASR bug tracking, and customer pain points to be reviewed by the User Experience Research team
- Implement basic quality control measures and ensure the reliability of processed data
- Utilize intermediate data analysis techniques to extract insights and inform decision-making
- Arbitrate discrepancies effectively, ensuring consistent data quality
- Apply basic knowledge of natural language processing and linguistics to data processing tasks
- Ensure linguistic accuracy in all processed and annotated data
Skills
- Foundational understanding of machine learning, data annotation, quality assurance, and natural language processing
- Ability to work in a fast-paced, collaborative environment
- Excellent communication skills
- Familiarity with command-line tools and interfaces
- Strong analytical skills with the ability to identify patterns and anomalies
- Bachelor's degree in Computer Science, Data Science, Linguistics or Computational Linguistics or a related field
- Familiarity with translation or multi-lingual data sets can be a plus for future projects
Company Overview
Company H1B Sponsorship